Built to Scale:  Running highly-concurrent ETL with Apache Airflow

Apache Airflow has seemingly taken the data engineering world by storm. It was originally created and maintained by Airbnb, and has been part of the Apache Foundation for several years now. After heavily leveraging it for a couple years (over 2 million tasks) and seeing its full potential (but numerous …

Continue reading

Why Your Company Should Own Its Own Data

When considering software and related infrastructure, the business of today is caught in a never-ending cycle of "build vs. buy". Many third-party companies solve serious challenges such as managing sales pipelines, accounting automation, payment processing, and internal communication. These alternatives to "building it yourself" empower companies to operate faster or …

Continue reading

Data Pipeline Design Considerations

Data Pipeline Design Considerations

There are many factors to consider when designing data pipelines, which include disparate data sources, dependency management, interprocess monitoring, quality control, maintainability, and timeliness. Toolset choices for each step are incredibly important, and early decisions have tremendous implications on future successes. The following post is meant to be a reference …

Continue reading
Your cart
    Checkout