268 Billion Events with Snowplow Analytics and Snowflake at CarGurus

268 Billion Events with Snowplow and Snowflake at CarGurus

Two years ago we set up Open-Source Snowplow at CarGurus to fulfill a need for self-managed client-side instrumentation. Since that time it has become incredibly impactful for the entire company and has scaled significantly beyond what was originally envisioned. The following is an overview of why we set up the ...

Continue reading

How to Get Started with Snowflake On Demand

How to Get Started with Snowflake On Demand

Snowflake has taken the data world by storm... And rightfully so! Snowflake is an incredibly-scalable columnar database that simply gets out of a developer's way. Scaling it with zero downtime is straight-forward, managing and scaling it without significantly increasing the size of the team is doable, security is built-in, and ...

Continue reading

How to install and configure SnowSQL

How to install and configure SnowSQL

SnowSQL is the command-line interface for accessing your Snowflake instance. The following is a quick "how to" guide for setting it up. Installation After logging into your Snowflake web interface, the SnowSQL installer is available via Help -> Download: You'll need to select the appropriate version for your machine: ..and ...

Continue reading

GDPR for Engineers - What You Need To Know

GDPR was approved by EU parliament on April 14, 2016, went into effect May 25, 2018, and impacts any business handling any personal data of any EU resident. At a high level, GDPR is a directive on the protection of personal data and can be scoped twofold. First, the law ...

Continue reading

GDPR for Engineers - What Is Personal Data?

We all know that GDPR (also known as RGPD in France) has brought data policy into the spotlight for many technical organizations. As of May 25, 2018, if your systems (both automated and otherwise!) handle PII of individuals residing in the EU, you must comply with regulation. While this enforcement ...

Continue reading

Client-side instrumentation for under $1 per month. No servers necessary.

Client-side instrumentation for under $1 per month. No servers necessary.

In a world where the importance of data is steadily increasing yet the cost of computing power is steadily decreasing, there are fewer and fewer excuses to not have control of your own data. To explore that point I instrumented this site as inexpensively as I possibly could, without sacrificing ...

Continue reading

Built to Scale:  Running highly-concurrent ETL with Apache Airflow

Apache Airflow has seemingly taken the data engineering world by storm. It was originally created and maintained by Airbnb, and has been part of the Apache Foundation for several years now. After heavily leveraging it for a couple years (over 2 million tasks) and seeing its full potential (but numerous ...

Continue reading