Making Data Lakehouse Real on Azure

Data Lakehouse Conceptual Architecture

  1. Data ingestion services: The components to get the data in.
  2. Data Lake Stores: The components that store the data effectively.
  3. Serving Data Stores: The components that serve the data effectively.
  4. Data processing services: The components that process the data.
  5. Data cataloging and curation services: The components that prevent the data lakes from becoming a swamp.
  6. Data security services: The components that secure the data in the Lakehouse.
  7. Analytics services: The components that help to transform data into insights.

Data Lakehouse on Azure

1. Data ingestion services

2. Data Lake Stores

3. Serving Data Stores

4. Data processing services

5. Analytics services

6. Data cataloging services

  1. Any data that is ingested or landed back in the data Lakehouse needs to be cataloged. The scope of cataloging includes data lakes, serving layers, transformation pipelines, and reports.
  2. Any data source that publishes the data into the data Lakehouse needs to be cataloged.
  3. All artifacts that store data need to have technical and business metadata cataloged. It may include various attributes, including sensitivity and data classifications.
  4. Data transformation lineage should be cataloged and should depict data transformation from the source to the downstream.
  5. The cataloging information should be easily searchable and accessible to the right stakeholders.
  6. Azure Purview is the governance tool on Azure that enables active cataloging of data. Azure Purview is a software-as-a-service (SaaS) offering with rich data governance features, including automated data discovery, visual mapping of data assets, semantic search, and location and data movement across the data landscape.

7. Data security services

Key Takeaways

1. The Data Lakehouse paradigm is an evolving pattern.

2. Organizations adopting this pattern must be disciplined at the core and flexible at the edges.

3. Cloud computing provides the scalable and cost-effective services that can fruition the Data Lakehouse pattern.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pradeep Menon

Pradeep Menon

Creating impact through Technology | #CTO at #Microsoft| Data & AI Strategy | Cloud Computing | Design Thinking | Blogger | Public Speaker | Published Author