Building A MODERN Data Warehouse SOLUTION
Building A MODERN Data Warehouse SOLUTION
1.
INTRODUCTION
1.1 Journal Overview
This Journal is mainly focused about building
a modern data warehouse solution. This journal is a reflective write-up on what
the webinar was about, what was learnt from it and how the knowledge gained
from it can be applied to my future career.
1.2 List of Acronyms and Abbreviations
OLTP: Online Transactional Processing
2. BUILDING A
MODERN DATA WAREHOUSE SOLUTION
2.1 Summary
A centralized store of business data for
reporting and analysis that typically. Contains large volume of historical data
is optimized for querying, as opposed to inserting or updating data, is
incrementally loaded with new business data at regular intervals, provides the
basis for enterprise BI solutions.
2.2 Learning Outcome
Data warehouses are centralized data storage
systems that allow your business to integrate data from multiple applications
and sources into one location. This provides an environment that is designed
for decision support, analytics reporting, and data mining. When you isolate
and optimize your data, you can manage it without impacting primary business
processes.
In general, the benefits of data warehousing are all based on one central premise: warehousing solves the ongoing problem of analyzing separate data and converting it into actionable information you can use. Warehousing also allows you to process large amounts of complex data in an efficient way. When you successfully implement a data warehouse system, it’s possible to access the benefits associated with the practice— the very benefits that are making data warehousing a common practice for many businesses today.
2.2.1
What is a Data Warehouse?
A centralized store of business data for
reporting and analysis that typically. Contains large volume of historical data
is optimized for querying, as opposed to inserting or updating data, is
incrementally loaded with new business data at regular intervals, provides the
basis for enterprise BI solutions.
Data Warehouse is subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process(inmon,1993).
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.
The data stored in the warehouse is uploaded
from the operational systems (such as marketing or sales). The data may pass
through an operational data store and may require data cleansing for additional
operations to ensure data quality before it is used in the DW for reporting.
2.2.2 Data Warehouse vs. Database
It is easy to confuse both terms as a data
warehouse and a database share some similarities. A database is a key
component of a data warehouse and can be defined as a storage system where data
can be quickly recorded and retrieved. A database collects data for
transactional purposes, application support but also to enable reporting.
Common databases that are commonly used in the enterprise include ERP, SQL
databases, Customer Relationship Management (CRM) systems, business process
management systems but also Excel spreadsheets.
In comparison, a data warehouse is designed to
centralize and store large amounts of data from multiple databases and make
them easier to analyse. A data warehouse uses an automated process called ETL and
which stands for extracting, transforming, and loading data into a data
warehouse and brings a substantial advantage when it comes analysing data
without the technical expertise.
2.2.3 Data Warehouse vs. Data Lake
Data lakes and data warehouses are both used
to store, manage, and analyze data. They complement each other and support
different use cases even though they have some overlaps.
A data warehouse is a repository that stores
structured, cleaned and organized data in order to serve a specific business
purpose. In comparison, a data lake stores large volumes of structured,
semi-structured, and unstructured data in its native format, and processes it
later on-demand.
The need for a data warehouse becomes crucial when an organization has a high level of data diversity and analytical requirements and want to make better decisions in less time. In this scenario, the data warehouse will do the analytic work based on the best data available to ensure decisions can be made faster.
2.2.4 How we have been processing data
Data processing occurs when data is collected
and translated into usable information. Usually performed by a data scientist
or team of data scientists, it is important for data processing to be done correctly
as not to negatively affect the end product, or data output.
Data processing starts with data in its raw
form and converts it into a more readable format (graphs, documents, etc.),
giving it the form and context necessary to be interpreted by computers and
utilized by employees throughout an organization.
2.2.5 Concept – William Inman
The corporate information factory inmon
publishes “Building the Data Warehouse” – 1990
Inmon updates book and defines architecture
for collection of disparate sources into detailed, time variant data store.
The top-down approach focuses on the
enterprise, Implementation is organizational.
2.2.6 Concept – Ralph Kimball
The data warehouse bus Kimball publishes “The
Data Warehouse Toolkit” – 1996
Kimball updates book and defines multiple
databases called data marts that are organized by business processes but use
enterprise standard data bus.
The bottom-up approach focuses on data marts.
Implementation is either process or departmental focus. Implementation is quicker.
2.2.7 Big Data Architectures
Big data architecture is the layout that
underpins big data systems. It can refer to either its theoretical and/or
physical makeup. Big data architecture is intended to be structured in such a
way as to allow for the optimum ingestion, processing, and analysis of data.
2.2.8 Inmon’s DW Architecture
2.2.9 Bus—Matrix
The Enterprise Bus Matrix is a Data Warehouse planning tool and model created by Ralph Kimball and is part of the Data Warehouse Bus Architecture. The Matrix is the logical definition of one of the core concepts of Kimball’s approach to Dimensional Modelling – Conformed dimensions.
1.Record the business requirement
2.Identify dimensions with their attributes
3.Identify facts with their measures
4.Identify the Schema required: Star or
Snowflake
5.Identify dimension tables required
6.Identify relationships between dimension
tables
7.Identify facts tables required
8.Identify relationships between dimension and
fact tables
2.2.10 Star and Snowflake Schema
Just like with a regular database, the Schema
is very important to a Data Warehouse since it gives the logical description of
the data objects. It’s the Schema that describes how different objects in the
Data Warehouse are related. Star and Snowflake Schema are the two types of Data
Warehouse Schemas.
3. Conclusion
Modern Data Warehouse provides an
integrated machine-learning (ML) solution that enables customer insights and
business intelligence to make faster business decisions. It is easy to get
started with self-service capabilities, and Oracle's data
warehouse automation eliminates management complexity to simplify
analysis.
Comments
Post a Comment