Lars Albertsson has worked with data-intensive and scalable applications throughout his corporate career - at Google, natural language processing startup Recorded Future, Spotify, and Schibsted. He spent three years as an independent consultant, helping companies build data processing solutions. He is now a founder of Scling, who provide data value extraction as a service.
unfold_lessunfold_more Eventually, time will kill your data processing
- You will learn a multitude of ways that time causes problems in data processing.
- You will also learn how to avoid data loss caused by timing issues in data collection.
- You will learn how to mitigate time-related issues in batch processing pipelines with the aid of workflow orchestration.
- You will also learn the tradeoffs involved in handling timing issues in stream processing.
Race conditions and intermittent failures, daylight savings time, time zones, leap seconds, overload conditions - time is a factor in many of the most annoying problems in computer systems. Data engineering is not exempt from problems caused by time, but also has a slew of unique problems. In this presentation, we will enumerate the time-related problems that we have seen cause trouble in data processing system components, including data collection, batch processing, workflow orchestration, and stream processing. We will also provide a handful of tools and tricks to avoid timing issues in data processing systems.
unfold_lessunfold_more Engineering data quality
- We will describe the different ways in which technical systems can cause data quality to deteriorate.
- You will learn how to test and monitor data quality in production.
- We will also describe the tradeoffs involved in data quality, delivery latency, and availability.
- You will learn architectural patterns and data engineering methods for improving data quality.
Garbage in, garbage out - we have all heard about the importance of data quality. Having high quality data is essential for all types of use cases, whether it is reporting, anomaly detection, or for avoiding bias in machine learning applications. But where does high quality data come from? How can one assess data quality, improve quality if necessary, and prevent bad quality from slipping in? Obtaining good data quality involves several engineering challenges. In this presentation, we will go through tools and strategies that help us measure, monitor, and improve data quality. We will enumerate factors that can cause data collection and data processing to cause data quality issues, and we will show how to use engineering to detect and mitigate data quality problems.