IT processes, especially those involving data management and analytics, have historically been premeditated and methodical. As business needs arise, IT specialists define the requirements, identify the data sources, and establish the schema and Extract, Transform, and Load (ETL) processes—all in painstaking fashion. Only after those steps have been completed will most IT teams build an analytics system to support the business need.
“Traditional data warehouses were so rigid and expensive, they forced everyone to be extremely deliberate,” says Nik Rouda, a senior analyst at ESG focused on big data. “But the economics and possibilities have shifted with Hadoop.”
- The traditional “Schema On Write” approach of data management requires a lot of forethought and upfront IT involvement.
- Conversely, Hadoop’s “Schema on Read” approach empowers users to quickly store data in any format and apply structure in a more flexible and agile way, as needed.
“To maximize the potential and value of analytics, companies need to rethink how they capture, collect, and curate their data,” says Scott Gnau, CTO of Hortonworks, a leading provider of enterprise-ready data platforms and applications. “The traditional waterfall approach is too slow.”
For data analytics purposes, Gnau recommends flipping standard IT processes on their head.
“With big data, you don’t always know what you’re looking for or what you’ll find, so you can’t meticulously define everything upfront,” he explains. “You need to think about data and data-related projects in reverse order, where requirements are determined at the very end—after the data has been landed and analyzed. It’s not hard, but it’s very different.”
Empowerment with control
Most legacy technologies can’t operate without schema, but Hadoop-based technologies allow any type of data—structured and unstructured, at rest and in motion—to be stored and explored. According to Gnau, ad hoc analyses often lead to the greatest insights.
“Invest in data scientists and arm them with data,” he recommends. “They key is providing access to the data and facilitating experimentation in ways that don’t hinder security or introduce risk.”
This often requires new governance processes, and a big data platform that complements legacy systems and data sources.
“There needs to be a balance between the two worlds, a balance between business empowerment and IT control,” Rouda claims. “Big data doesn’t get a pass because it’s new. It still has to meet enterprise requirements for security and governance.”
Fortunately, front-end tools are getting better, he notes. Especially those that:
- Span new data lakes and legacy business intelligence systems
- Allow data to be queried without requirements or schema
- Empower users without turning an enterprise IT environment into the Wild West
“At first blush, doing IT in reverse seems like a horrible idea,” Rouda chuckles. “But the more you think about it, the more it makes good sense.”