Hello, Learning.

Data engineering is a journey that takes time and effort, but the following is a starting point for your personal learning.

Don't see what you were hoping for? Leave us a quick note at the bottom. Your feedback will be valuable for future additions.

What is covered below:

  • Customer service
  • Design thinking
  • Systems thinking
  • Data thinking
  • Database systems
  • Data warehousing
  • Streaming data
  • Data modeling
  • Relevant architectures
  • What we watch
  • What we read
  • Events and conferences worth going to

Customer Service

You may be confused. Perhaps feel tricked.

Isn't this page supposed to be about data engineering?!

But I promise you, customer service is critically important when doing data engineering work. Many systems you build or maintain will have profound impact on all parts of the organization, but said systems will rarely be seen. Therefore, the face and representation of your work will be how well you service your customer. "Customer" may mean other software systems, it may mean a team of analysts, or it may mean people on your site. Customer. Service. Matters.

When it comes to learning, It's best to look at other industries where customer service makes or breaks a company. Think Disney's Imagineering. Or Starbucks' obsession on the customer experience. The books below have been crucial to learning.

Be Our Guest (A Disney Institute Book)

The Experience: The 5 Principles of Disney Service and Relationship Excellence

The Starbucks Experience: 5 Principles for Turning Ordinary into Extraordinary

Delivering Happiness: A Path to Profits, Passion, and Purpose

Design Thinking

Like a focus on your customer, design thinking is extremely important when doing data engineering work. People (or customers!) expect things to "just work", and 98% of day-to-day data engineering work is simply managing expectations, juggling complexity, and building systems so they..... "just work".


According to Clark's Third Law, "Any sufficiently advanced technology is indistinguishable from magic".


But how do you build technology and data systems so they seem like magic? Design thinking. Disney and Don Norman come to mind.

The Design of Everyday Things: Revised and Expanded Edition

The Psychology of Everyday Things

Emotional Design: Why We Love (or Hate) Everyday Things

The Design of Future Things

Designing Disney: Imagineering and the Art of the Show (A Walt Disney Imagineering Book)

Data Pipeline Design Considerations


Systems Thinking

Like design thinking, systems thinking is extremely important when building complex data-oriented systems. The biggest difference between the two? Design thinking is about the stakeholders while systems thinking is about the actual system (or "the mechanism used to deliver magic to stakeholders").

There are countless resources available online but two books have been transformative to how we think about data systems:

The Systems Bible: The Beginner's Guide to Systems Large and Small (See also Systemantics: How Systems Work and Especially How They Fail)

Thinking In Systems: A Primer


Data Thinking

Seeing the world through the lens of data is a remarkably powerful thing. But before doing so, it's almost more important to see your data through the lens of the world. Just like "thinking in systems", "thinking in data" is incredibly valuable for data engineering work.

Data and Reality: A Timeless Perspective on Perceiving and Managing Information in Our Imprecise World, 3rd Edition

Accuracy and Precision



Database Systems

When it comes to data engineering, there's really no way around it: you will need to know databases. I have yet to work at a company or be involved in a single data warehouse build-out where centralizing upstream application databases was not a core aspect of the project. If you are able to go deeper than just extracting data and can introduce or push best practices upstream, it will benefit you immensely.

Documentation

There's no substitute for knowing documentation. Understanding "how to database" and "how to learn to database" is very valuable.

Postgres Docs

MySQL Docs

MongoDB Docs

SQL Server Docs


Newsletters

Weekly newsletters are an excellent way to keep a pulse on what's going on in the database world. Cooperpress puts out some great newsletters:

Dbweekly

Postgres Weekly

MongoDB Memo

Data Warehousing

When people say "data engineering" it commonly means "centralizing a company's data assets into a data warehouse or data lake and making these assets useful". Having in-depth knowledge of at least one of the following systems will be beneficial for you:

BigQuery

AWS Redshift

Snowflake

For sake of learning it's highly recommended that you set one (or all) of them upcreate a set of tables, load data into said tables (from a local CSV or publicly-accessible dataset), and query your data.

Other

What is a Data Lake and How To Create One for Your Business

Streaming Data

Moving data on streams is becoming more and more commonplace. There are many options/alternatives, but each one comes with tradeoffs. There are numerous resources for implementing and managing these systems, but I suggest the following as starting points:

Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale

Designing Event-Driven Systems

Delivering Real-time Streaming Data to Amazon S3 Using Kinesis Data Firehose

An Introduction to Snowplow

Snowplow Open Source

Data Modeling

Industry Data Models

The single most-beneficial source of learning how to build good database models has come from databaseanswers.org - specifically the list of industry data modelsBy reading through various ways of modeling real-world problems and implementing them in a database of your own (postgres, mysql, etc), you'll be far ahead.

Data Warehousing and Business Intelligence

When it comes to data warehousing and modeling for business intelligence, Kimball wrote the book (or books) on how to do it when databases were severely resource-constrained. Since the only way to know where the world is going is knowing where it's been, I highly recommend these books:

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence

Kimball's Data Warehouse Toolkit Classics, 3 Volume Set

Data Vault

Data Vault is another popular methodology for data warehousing, and is well-worth knowing.

Data Architecture

This section will continue to be added to over time, but there are a couple high-level architectures that are very good to know:

Lambda Architecture

Kappa Architecture


What We Watch

The Data Council Youtube channel is a wealth of knowledge about how various organizations think about data.


What We Read

All the sources mentioned above :)

The dbt blog is a great source of information about data warehousing and analytics engineering.

Netflix does some very neat data things, and writes about it on their blog.

The Hashmap blog is fantastic, and discusses many facets of data and analytics engineering. I highly recommend a look.

Spotify's engineering blog is a great resource for learning how data drives the product and engineering efforts there.

Events and Conferences Worth Going To

Data Council

Kafka Summit

Is there something you would like to see added here? Please let us know!


Your cart
    Checkout