Is it a time for Data Lakehouse?

Data Lakehouse is the architecture that includes Data Lake for raw and unprocessed data in staging and foundation layers and modern Data Warehouse architecture built on top of Data Lake.

Is Big Data killing Data Warehouse?

There was a lot of discussions about Big Data killing Data Warehouse in the last few years. Of course, that was not possible because Big Data is a technology, and Data Warehouse is a schema-on write analytical architecture.

In the meantime, many new technologies and methodologies have emerged in the market and became mainstream. Many companies have invested in Data Lakes and collected enormous volumes of raw and unprocessed data, both structured and non-structured. All this data was supposed to be prepared, combined, and analyzed by super-smart data scientists using the schema-on-read approach. You can find many resources on the Internet that compare Data Warehouse and Data Lake architectures, and here we will use one of the most common images.

 

Data Warehouse vs. Data Lake

 

Eventually, investment in Data Lakes, in most cases didn’t pay off. Sometimes Data Lakes lacked data quality and governance and turned into data swamps. In other cases, there were no business cases that would create additional business value from data collected in Data Lake. But in a majority of cases, there were no available data science talent to dig the golden data nuggets from those huge data repositories.

Meanwhile, Data Warehouses in most of the large organizations were somewhat put aside, with only necessary maintenance and minimal development investment. In some cases, their technology and architecture became obsolete because technologies used ten years ago are now too expensive to handle increasing data volumes, and performance is far lower than modern multi parallel processing (MPP) databases.

 

Save investments in Data Lake – build a Data Lakehouse!

Fortunately, there is a way to save huge investments in Data Lake, and the answer is to build a Data Lakehouse. The definition is very simple – Data Lakehouse is the architecture that includes Data Lake for raw and unprocessed data in staging and foundation layers and modern Data Warehouse architecture built on top of Data Lake. With such architecture that combines good characteristics of both Data Lake and Data Warehouse, the company has all raw detailed data available for schema-on-read Data Science queries and consistent schema-on write for standard business usage reporting and analytics.

Data Lakehouse architecture and processes are shown in the following picture:

 

The extraction and ingestion layer brings structured and unstructured data into Data Lake using streaming or batch processing. The staging layer is used for temporary data storage, and processing and the foundation layer keeps historical changes of loaded data. The manual entry area is used for reference data and additional analytical attributes and hierarchies that don’t exist in source data. Data Warehouse is based on a modern industry-standard data warehouse data model and consists of the detailed base layer and aggregated performance and analytical layer. A common Process Control Framework manages all end-to-end data pipelines.

Business users can access data in Data Warehouse layers using standard BI and analytical platforms, while Data Scientists can use raw data from Data Lake and processed data from Data Warehouse and combine inputs for more complex and powerful models, either using Data Preparation tools or Data Virtualization layer.

Data Lakehouse architecture can be implemented on-premise, in the cloud, or using a hybrid approach – technologies for each of the approaches are available and mature, and they don’t affect the architecture itself; it is only the matter of the implementation.

I strongly believe that this kind of architecture is the foundation of future analytical systems.

First published on LinkedIn- Dražen Oreščanin 25/10/2019
Collateral coverage

Already calculating collateral coverage? Here is how you can segment your portfolio and optimize your profitability

We have all dealt with the concept of collateral coverage, one way or another. For instance, I have to leave my ID card at a gas station if I accidentally forget my wallet at home. Believing that things will turn out OK is terrific, but it would be wise to ensure everybody meets their obligations and, in the end, it is essential that we all get what we wanted in the first place.

> Read More
Data Democratization starts with Data Governance

Data Democratization starts with Data Governance

There is no consensus of where the data democratization process begins and where it ends. It is a complex strategy with many proposed steps for data democratization to drive businesses. Each and every organization is unique and should define its own democratization steps according to its own needs, challenges, risks.

> Read More
Data democratization

Data democratization | Why is it Important for Your Business in 2021

In a world lead by data, an enormous amount of data is getting generated every second. We know by now that a data warehouse adds a significant value to an enterprise, helping to improve decision-making processes, but what about access to the raw data? By implementing data democratization, you enable everybody in the organization to take advantage of data-informed decisions.

> Read More
event driven economy

Data Analytics in Event-Driven Economy 

World and business are changing rapidly, and analytics is changing too. I will not talk about technical terms that are very popular now – Data Lakes, unstructured data, real-time analytics, Data Science, artificial intelligence, and other nice things everyone is talking about in the last few years. Instead, I would like to emphasize the change from a process-driven economy to an event-driven economy and what it brings to the analytical landscape.

> Read More
Collateral coverage

Already calculating collateral coverage? Here is how you can segment your portfolio and optimize your profitability

We have all dealt with the concept of collateral coverage, one way or another. For instance, I have to leave my ID card at a gas station if I accidentally forget my wallet at home. Believing that things will turn out OK is terrific, but it would be wise to ensure everybody meets their obligations and, in the end, it is essential that we all get what we wanted in the first place.

> Read More
Data Democratization starts with Data Governance

Data Democratization starts with Data Governance

There is no consensus of where the data democratization process begins and where it ends. It is a complex strategy with many proposed steps for data democratization to drive businesses. Each and every organization is unique and should define its own democratization steps according to its own needs, challenges, risks.

> Read More
Data democratization

Data democratization | Why is it Important for Your Business in 2021

In a world lead by data, an enormous amount of data is getting generated every second. We know by now that a data warehouse adds a significant value to an enterprise, helping to improve decision-making processes, but what about access to the raw data? By implementing data democratization, you enable everybody in the organization to take advantage of data-informed decisions.

> Read More
event driven economy

Data Analytics in Event-Driven Economy 

World and business are changing rapidly, and analytics is changing too. I will not talk about technical terms that are very popular now – Data Lakes, unstructured data, real-time analytics, Data Science, artificial intelligence, and other nice things everyone is talking about in the last few years. Instead, I would like to emphasize the change from a process-driven economy to an event-driven economy and what it brings to the analytical landscape.

> Read More

Optimize your business

Contact us today and learn how our Data Warehouse Models helps your business!

Data Warehouse Models © 2016 – 2021. All Rights Reserved.

Industry standard data warehouse solutions for telecommunication, banking, insurance and retail industries.

Data Warehouse Models © 2016 – 2021. All Rights Reserved.

Scroll to Top