Eventually, time will kill your data processing
Key takeaways
- You will learn a multitude of ways that time causes problems in data processing.
- You will also learn how to avoid data loss caused by timing issues in data collection.
- You will learn how to mitigate time-related issues in batch processing pipelines with the aid of workflow orchestration.
- You will also learn the tradeoffs involved in handling timing issues in stream processing.
Race conditions and intermittent failures, daylight savings time, time zones, leap seconds, overload conditions - time is a factor in many of the most annoying problems in computer systems. Data engineering is not exempt from problems caused by time, but also has a slew of unique problems. In this presentation, we will enumerate the time-related problems that we have seen cause trouble in data processing system components, including data collection, batch processing, workflow orchestration, and stream processing. We will also provide a handful of tools and tricks to avoid timing issues in data processing systems.