Gartner recently revealed that data lake interest is “becoming quite widespread.” In a world where organizations are confronted daily with new and different technologies, tools and platforms, data lakes offer something of an oasis: a one-stop hub that makes big data more manageable and valuable. But what will data lakes bring to the table in 2016?
Here are four ways that data lakes will influence the big data landscape in the New Year:
- Analytic Expansion: By deploying data lakes, it is possible to place an organization’s data assets on an RDF graph, explaining the relationships between elements in such a way that effectively overcomes the “dark data” phenomenon, or data that goes unused by enterprises. Innately understanding the context and meaning of data prior to analysis affects the type, degree and nature of analytics performed, which considerably refines their results and use.
- Semantics at Scale: With semantics at scale, an organization utilizing a smart data lake graph is optimized for analytics with in-memory, massively parallel processing of semantically tagged data. Such an engine, when combined with a smart data lake’s RDF graph and ontological models of business meaning, incorporates all relevant enterprise data for comprehensive results at a speed which semantic technology advancements have only recently been able to produce.
- Democratization of Stewardship: The availability of data provided by data lakes is aligned with the self-service movement and the democratization of big data that supports it. Data lakes will contribute to the expansion of these trends by facilitating the democracy of data stewardship — a more pervasive form of governance than that conventionally reinforced by only a few dedicated data stewards. With increasing regulatory mandates, this enterprise-wide ubiquity of data stewardship will prove invaluable to organizations.
- Automating IT and Data Science: Additionally, the alignment of smart data lakes with the self-service movement will result in automation of some of the more mundane, but highly necessary aspects of data science and the work of IT departments. It will enable these professionals to concentrate on more substantial ways to improve data-driven processes.
Sean Martin is Chief Technical Officer at Cambridge Semantics. Prior to founding Cambridge Semantics, he spent 15 years with IBM where he was a founder and the technology visionary for the IBM Advanced Internet Technology group.