Data
From Zero to Beam
Moving from in-house streaming code to a flexible and portable solution with Apache Beam
Long gone are the days when we used to consume data with Apache Spark Streaming, with an overly complicated, cloud-dependent infrastructure that was non-performant when load increased dramatically. Follow us on a journey of stack simplification,
Frequent Pattern Mining
TL; DR
The field of Frequent Pattern Mining (FPM) encompasses a series of techniques for finding patterns within a dataset.
This article will cover some of those techniques and how they can be used to extract behavioral patterns from anonymous interactions, in the context of an ecommerce site.
Terms and
Creating Component Tests for Spark Applications
One of the main engineering challenges faced by the Empathy.co Data Team is creating robust tests for our Spark applications. Since these applications are constantly evolving, as for any application, we needed a way to ensure changes wouldn’t break the code; a guarantee that the output from our
APIs for Search Experience Experiments
Consider the Basics
To create a memorable search experience for a certain set of products, one of
the most important elements is the data to be shown. The information that
shoppers receive regarding the different products available greatly impacts
their decision whether or not to make a purchase.
Of course,
Session Contextualization
TL;DR
Yes, there is a way to help shoppers find what they need, without tracking or
collecting personal information.
It’s called session contextualization and, here at Empathy.co, it is how we
interpret shoppers’ interactions in an ecommerce store, in order to provide a
delightful user experience.
You
ElasticSearch Data Migration in Kubernetes
Managing stateful applications in Kubernetes has a reputation for being difficult and tricky, but it doesn't have to be! Take a look at the options we tried first-hand at Empathy.co for migrating data with ElasticSearch.
Success story: From AWS EMR to Kubernetes
Motivation
This article is an overview of the path we followed to migrate Spark Workloads
to Kubernetes and to avoid EMR dependency. EMR was an important support tool at
Empathy.co [https://empathy.co/] to orchestrate Spark workloads, but once the
workloads became more complex, the use of EMR also