Building A MODERN Data Warehouse SOLUTION

Building A MODERN Data Warehouse SOLUTION

1. INTRODUCTION
1.1 Journal Overview

This Journal is mainly focused about building a modern data warehouse solution. This journal is a reflective write-up on what the webinar was about, what was learnt from it and how the knowledge gained from it can be applied to my future career.

 

1.2 List of Acronyms and Abbreviations

OLTP: Online Transactional Processing

                                   

2. BUILDING A MODERN DATA WAREHOUSE SOLUTION
2.1 Summary

A centralized store of business data for reporting and analysis that typically. Contains large volume of historical data is optimized for querying, as opposed to inserting or updating data, is incrementally loaded with new business data at regular intervals, provides the basis for enterprise BI solutions.

 

2.2 Learning Outcome

Data warehouses are centralized data storage systems that allow your business to integrate data from multiple applications and sources into one location. This provides an environment that is designed for decision support, analytics reporting, and data mining. When you isolate and optimize your data, you can manage it without impacting primary business processes.

In general, the benefits of data warehousing are all based on one central premise: warehousing solves the ongoing problem of analyzing separate data and converting it into actionable information you can use. Warehousing also allows you to process large amounts of complex data in an efficient way. When you successfully implement a data warehouse system, it’s possible to access the benefits associated with the practice— the very benefits that are making data warehousing a common practice for many businesses today.

 

2.2.1 What is a Data Warehouse?

A centralized store of business data for reporting and analysis that typically. Contains large volume of historical data is optimized for querying, as opposed to inserting or updating data, is incrementally loaded with new business data at regular intervals, provides the basis for enterprise BI solutions.

Data Warehouse is subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process(inmon,1993).

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.

The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the DW for reporting.

 

2.2.2 Data Warehouse vs. Database

It is easy to confuse both terms as a data warehouse and a database share some similarities. A database is a key component of a data warehouse and can be defined as a storage system where data can be quickly recorded and retrieved. A database collects data for transactional purposes, application support but also to enable reporting. Common databases that are commonly used in the enterprise include ERP, SQL databases, Customer Relationship Management (CRM) systems, business process management systems but also Excel spreadsheets.

In comparison, a data warehouse is designed to centralize and store large amounts of data from multiple databases and make them easier to analyse. A data warehouse uses an automated process called ETL and which stands for extracting, transforming, and loading data into a data warehouse and brings a substantial advantage when it comes analysing data without the technical expertise.

 
2.2.3 Data Warehouse vs. Data Lake

Data lakes and data warehouses are both used to store, manage, and analyze data. They complement each other and support different use cases even though they have some overlaps.

A data warehouse is a repository that stores structured, cleaned and organized data in order to serve a specific business purpose. In comparison, a data lake stores large volumes of structured, semi-structured, and unstructured data in its native format, and processes it later on-demand.

The need for a data warehouse becomes crucial when an organization has a high level of data diversity and analytical requirements and want to make better decisions in less time. In this scenario, the data warehouse will do the analytic work based on the best data available to ensure decisions can be made faster.

 

2.2.4 How we have been processing data

Data processing occurs when data is collected and translated into usable information. Usually performed by a data scientist or team of data scientists, it is important for data processing to be done correctly as not to negatively affect the end product, or data output.

Data processing starts with data in its raw form and converts it into a more readable format (graphs, documents, etc.), giving it the form and context necessary to be interpreted by computers and utilized by employees throughout an organization.

 
2.2.5 Concept – William Inman

The corporate information factory inmon publishes “Building the Data Warehouse” – 1990

Inmon updates book and defines architecture for collection of disparate sources into detailed, time variant data store.

The top-down approach focuses on the enterprise, Implementation is organizational.

  
2.2.6 Concept – Ralph Kimball

The data warehouse bus Kimball publishes “The Data Warehouse Toolkit” – 1996

Kimball updates book and defines multiple databases called data marts that are organized by business processes but use enterprise standard data bus.

The bottom-up approach focuses on data marts. Implementation is either process or departmental focus. Implementation is quicker.

  

2.2.7 Big Data Architectures

Big data architecture is the layout that underpins big data systems. It can refer to either its theoretical and/or physical makeup. Big data architecture is intended to be structured in such a way as to allow for the optimum ingestion, processing, and analysis of data.

 E.g.: Kappa, Lambda

 

2.2.8 Inmon’s DW Architecture

 



2.2.9 Bus—Matrix

The Enterprise Bus Matrix is a Data Warehouse planning tool and model created by Ralph Kimball and is part of the Data Warehouse Bus Architecture. The Matrix is the logical definition of one of the core concepts of Kimball’s approach to Dimensional Modelling – Conformed dimensions.

1.Record the business requirement

2.Identify dimensions with their attributes

3.Identify facts with their measures

4.Identify the Schema required: Star or Snowflake

5.Identify dimension tables required

6.Identify relationships between dimension tables

7.Identify facts tables required

8.Identify relationships between dimension and fact tables

 

2.2.10 Star and Snowflake Schema

Every business desire to grow, and data is a key component for business growth. Due to this, organizations are placing so much value on data. There are different sources of data, including data generated from the organizational departments as well as data obtained from external sources. In most cases, organizations end up gathering huge volumes of data, running up to petabytes in size. So, where do organizations store such data? They use Data Warehouses. A Data Warehouse can store huge volumes of structured and unstructured data from multiple data sources.

 

Just like with a regular database, the Schema is very important to a Data Warehouse since it gives the logical description of the data objects. It’s the Schema that describes how different objects in the Data Warehouse are related. Star and Snowflake Schema are the two types of Data Warehouse Schemas.

 
3. Conclusion

Modern Data Warehouse provides an integrated machine-learning (ML) solution that enables customer insights and business intelligence to make faster business decisions. It is easy to get started with self-service capabilities, and Oracle's data warehouse automation eliminates management complexity to simplify analysis.

 

Comments

Popular posts from this blog

Machine learning and artificial intelligence

Test Tools

SALESFORCE