Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through the applications. That means two things:
* 80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected -- without ever being analyzed or accounted for.
* Application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT budgets and have constrained app development teams from keeping pace with the rate of change in the business.
The other 80% of the data is "Event Data" that can no longer be ignored if you want to stay competitive. Changes to application state are already stored as a sequence of events in application and middleware logs. In fact, since this data never held value to anyone but the developer in the past, a lot of potentially valuable information is often never collected. With Hadoop, we can:
* store and query these events - Transaction tracing,
* use the event log to reconstruct the application domain at any point in time - ETL,
* use the same event log to construct new domains we haven't planned for - ELT, and
* automatically adjust our data domains to cope with retroactive changes - ???
In this talk, we will demonstrate how capturing all event data could dramatically simplify data collection and management within the enterprise.