Is it a time for Data Lakehouse?

Data Lakehouse is the architecture that includes Data Lake for raw and unprocessed data in staging and foundation layers and modern Data Warehouse architecture built on top of Data Lake.

Is Big Data killing Data Warehouse?

There was a lot of discussions about Big Data killing Data Warehouse in the last few years. Of course, that was not possible because Big Data is a technology, and Data Warehouse is a schema-on write analytical architecture.

In the meantime, many new technologies and methodologies have emerged in the market and became mainstream. Many companies have invested in Data Lakes and collected enormous volumes of raw and unprocessed data, both structured and non-structured. All this data was supposed to be prepared, combined, and analyzed by super-smart data scientists using the schema-on-read approach. You can find many resources on the Internet that compare Data Warehouse and Data Lake architectures, and here we will use one of the most common images.

 

Data Warehouse vs. Data Lake

 

Eventually, investment in Data Lakes, in most cases didn’t pay off. Sometimes Data Lakes lacked data quality and governance and turned into data swamps. In other cases, there were no business cases that would create additional business value from data collected in Data Lake. But in a majority of cases, there were no available data science talent to dig the golden data nuggets from those huge data repositories.

Meanwhile, Data Warehouses in most of the large organizations were somewhat put aside, with only necessary maintenance and minimal development investment. In some cases, their technology and architecture became obsolete because technologies used ten years ago are now too expensive to handle increasing data volumes, and performance is far lower than modern multi parallel processing (MPP) databases.

 

Save investments in Data Lake – build a Data Lakehouse!

Fortunately, there is a way to save huge investments in Data Lake, and the answer is to build a Data Lakehouse. The definition is very simple – Data Lakehouse is the architecture that includes Data Lake for raw and unprocessed data in staging and foundation layers and modern Data Warehouse architecture built on top of Data Lake. With such architecture that combines good characteristics of both Data Lake and Data Warehouse, the company has all raw detailed data available for schema-on-read Data Science queries and consistent schema-on write for standard business usage reporting and analytics.

Data Lakehouse architecture and processes are shown in the following picture:

 

The extraction and ingestion layer brings structured and unstructured data into Data Lake using streaming or batch processing. The staging layer is used for temporary data storage, and processing and the foundation layer keeps historical changes of loaded data. The manual entry area is used for reference data and additional analytical attributes and hierarchies that don’t exist in source data. Data Warehouse is based on a modern industry-standard data warehouse data model and consists of the detailed base layer and aggregated performance and analytical layer. A common Process Control Framework manages all end-to-end data pipelines.

Business users can access data in Data Warehouse layers using standard BI and analytical platforms, while Data Scientists can use raw data from Data Lake and processed data from Data Warehouse and combine inputs for more complex and powerful models, either using Data Preparation tools or Data Virtualization layer.

Data Lakehouse architecture can be implemented on-premise, in the cloud, or using a hybrid approach – technologies for each of the approaches are available and mature, and they don’t affect the architecture itself; it is only the matter of the implementation.

I strongly believe that this kind of architecture is the foundation of future analytical systems.

First published on LinkedIn- Dražen Oreščanin 25/10/2019

Why do we need Data Warehouse models?

If you are in telco, retail, healthcare or any other business, this business is the field where you are strong. You are not a Data Warehouse Architect, and if you want to implement a Data Warehouse system, you need somebody experienced in that field and a blueprint that you will use as standard and guidance.

> Read More

Two ways how to get things done

Delegating tasks is a complex process that requires knowledge and understanding from both sides. This rule applies to both business and leisure life; for example, you can use it in your child’s upbringing and the tasks and challenges you give him or her in a particular period of life.

> Read More

Why do we need Data Warehouse models?

If you are in telco, retail, healthcare or any other business, this business is the field where you are strong. You are not a Data Warehouse Architect, and if you want to implement a Data Warehouse system, you need somebody experienced in that field and a blueprint that you will use as standard and guidance.

> Read More

Two ways how to get things done

Delegating tasks is a complex process that requires knowledge and understanding from both sides. This rule applies to both business and leisure life; for example, you can use it in your child’s upbringing and the tasks and challenges you give him or her in a particular period of life.

> Read More

Optimize your business

Contact us today and learn how our Data Warehouse Models helps your business!

Scroll to Top