Engineering data quality
Key takeaways
- We will describe the different ways in which technical systems can cause data quality to deteriorate.
- You will learn how to test and monitor data quality in production.
- We will also describe the tradeoffs involved in data quality, delivery latency, and availability.
- You will learn architectural patterns and data engineering methods for improving data quality.
Garbage in, garbage out - we have all heard about the importance of data quality. Having high quality data is essential for all types of use cases, whether it is reporting, anomaly detection, or for avoiding bias in machine learning applications. But where does high quality data come from? How can one assess data quality, improve quality if necessary, and prevent bad quality from slipping in? Obtaining good data quality involves several engineering challenges. In this presentation, we will go through tools and strategies that help us measure, monitor, and improve data quality. We will enumerate factors that can cause data collection and data processing to cause data quality issues, and we will show how to use engineering to detect and mitigate data quality problems.