From Zero to Beam

Moving from in-house streaming code to a flexible and portable solution with Apache Beam Long gone are the days when we used to consume data with Apache Spark Streaming, with an overly complicated, cloud-dependent infrastructure that was non-performant when load increased dramatically. Follow us on a journey of stack simplification,

Frequent Pattern Mining

TL; DR The field of Frequent Pattern Mining (FPM) encompasses a series of techniques for finding patterns within a dataset. This article will cover some of those techniques and how they can be used to extract behavioral patterns from anonymous interactions, in the context of an ecommerce site. Terms and

Creating Component Tests for Spark Applications

One of the main engineering challenges faced by the Data Team is creating robust tests for our Spark applications. Since these applications are constantly evolving, as for any application, we needed a way to ensure changes wouldn’t break the code; a guarantee that the output from our

APIs for Search Experience Experiments

Consider the Basics To create a memorable search experience for a certain set of products, one of the most important elements is the data to be shown. The information that shoppers receive regarding the different products available greatly impacts their decision whether or not to make a purchase. Of course,

Session Contextualization

TL;DR Yes, there is a way to help shoppers find what they need, without tracking or collecting personal information. It’s called session contextualization and, here at, it is how we interpret shoppers’ interactions in an ecommerce store, in order to provide a delightful user experience. You

ElasticSearch Data Migration in Kubernetes

Managing stateful applications in Kubernetes has a reputation for being difficult and tricky, but it doesn't have to be! Take a look at the options we tried first-hand at for migrating data with ElasticSearch.

Success story: From AWS EMR to Kubernetes

Motivation This article is an overview of the path we followed to migrate Spark Workloads to Kubernetes and to avoid EMR dependency. EMR was an important support tool at [] to orchestrate Spark workloads, but once the workloads became more complex, the use of EMR also