Home

Data Warehousing: The What and Why

Introduction

In my opinion, and in the opinion of a lot of industry professionals, data literacy is important for every employee. With the rate at which data is being generated by companies and organizations, people who are able to leverage that data will be increasingly valuable to those companies and organizations. In order to be able to leverage the terabytes of data being generated daily, one must understand how that data is stored and accessed. Data generated by companies is in almost countless forms. The data may contain inconsistencies, biases, false positives, etc., and all of these need to be addressed prior to analysis. Garbage data leads to garbage insights which leads to garbage business decisions.

ETL (Extract, Transform, Load)

Data in data warehouses is drawn from a multitude of sources. As aforementioned, this data is not always in the format you need in the database. Therefore, in the step between databases and data warehouses, an ETL process has to be undergone. This can be done via a variety of open-source and for-profit software. The data is drawn from the database, manipulated or transformed by the software, and then loaded into the data warehouse. This process is vital to the foundation of the data warehouse.

Data Warehouses vs Relational Databases

Relational Databases:

Relational Databases are databases laid out in a relational format. They are broken into very, very normalized tables. These tables are designed to contain the most basic form of the data that an organization obtains. Any given company could have hundreds of these tables, each storing data such as inventory, personnel, and credit card information. All those tables would house data only related to those specific categories. This is what is known as "database normalization."

Data Warehouses:

Data Warehouses, conversely, seek to take data stored in normalized, relational database tables, and de-normalize them back into one central location. This now-structured data is kept for an extended period of time to give the business intelligence analysts the ability to perform historical analysis on the data stored here. Instead of having to query into a grundle of different tables, the BI user can just simply query the data warehouse, which is the central store of all the needed data, transactional or not.

Conclusion:

Data warehouses are not important to understand on a technical level unless you are considering a career in the creation and maintenance of data warehouses in organizations. However, knowing what one is and how it is useful can be very beneficial to the average employee who wants to a bit of analysis on their organization's data.