New product discovery is an established activity within brick-and-mortar grocery stores, but is still ripe for experimentation within an online setting. In this talk, we discuss a customer-level product recommendation system we developed for the Kroger Company, using Python, Apache Spark, and Apache Airflow.
We will briefly cover the Kroger Company and its digital properties, along with its current recommendation systems and need for a new one. We will then move into a deep dive of the system we developed, covering the Python APIs for large-scale data processing tool Spark, and the underlying Hadoop Distributed File System (HDFS) - focusing on how we utilized each in our implementation. We’ll also discuss process scheduling and coordination via Apache Airflow, along with its Python API and use of Python eggs. Finally, we will show the recommendation system in action, and discuss plans for testing and improvement.
Talk will be organized as follows:
What is 84.51?
Landscape: Kroger’s digital properties
Phil Anderson is a Lead Data Scientist at 84.51, a Cincinnati-based analytics services provider and wholly-owned subsidiary of the Kroger Company. He is a member of 84.51’s Digital Personalization team, which builds large-scale recommendation systems for Kroger’s digital assets, including Kroger.com and Kroger’s mobile applications. He has worked for 84.51/dunnhumbyUSA for 5 years, in Analytics roles related to ad targeting and measurement, CRM program deployment, and the development of enterprise applications for price/promotion evaluation. He holds a Bachelors of Economics from the University of Notre Dame, and is working on a Masters of Statistics from Texas A&M University (2018 expected completion).