Data Architecture Guide: Introducing the Unified Star Schema

A few months after my prior blog post on the OneBigTableapproach I learned of a new way to design Data Warehouses that is somewhat similar. It is a new way to design star schemas and is known as the Unified Star Schema (USS). It is recommended by the grandfather of Data Warehousing Bill Inmon in a book that just came out in the past year. Once I heard about it I immediately went out and bought The Unified Star Schema: An Agile and Resilient Approach to Data Warehouse and Analytics Design, by Bill Inmon and Francesco Puppini and read & underlined all 288 pages. I agree this new approach is better than the Kimball multi-star approach. USS is the best approach for building a ROLAP data warehouse using row-oriented tables. My OneBigTable approach is the best way to build a data warehouse for a columnar share-nothing hyper-converged DBMS. (The book portrays the OBT in passing in Figure 14.15.)

The core of the approach is the construction of a single “Puppini Bridge” table that works like a single super-fact table of only keys and transforms all other tables to fact/dimensions that users can blend as they see fit. Actually, the distinction between fact and dimension is gone. The Puppini Bridge table has no attributes and no metrics, only primary & foreign keys. It joins to all other tables like a star schema with no snowflaking. It is built quickly by performing a UNION ALL of the primary key (PK) & foreign keys (FK) of every table. This transforms an OLTP normalized database into a performant OLAP star data warehouse by the addition of just one table and no other ETL.

USS works well with BI tools that use “associations” in relationships to solve the “Fan Trap”, such as Microsoft’s Power BI, QlikView, and Tableau (starting with version 2020.2), but not so well with BI tools that use SQL JOIN, like Tibco Spotfire and Business Objects. (A Fan trap is when metrics in a parent table gets duplicated against the child tables causing incorrect aggregations.)

The classic star schema approach causes a problem known as the “chasm trap” when two or more fact tables are used in a BI tool. The display of data from multiple fact tables results in a cartesian join duplication. The Puppini Bridge & OneBigTable overcomes this situation, letting the details show in a logical list of one fact followed by the other fact.

The book left out one point that I think is essential to building the Puppini Bridge: One should include not only each record’s PK & FKs but also the FKs to all ancestor tables (parent of parent, etc.).

I also learned an important point when building the OneBigTable: attributes should be duplicated down but metrics should only show on the record that provided the PK.

Most exciting, I believe USS allows the rapid generation of code that creates the solution if the referential constraints are all defined in the database. This also makes the solution more maintainable and robust. I plan to share this technique in a future post.

BTW, a company named Incorta pitches a data warehouse that they say is the “Death of the Star Schema.” The USS book mentions it in Figure 14.6 and it appears to have nothing in common with USS or OBT.

Here are some other reviews:

CONCLUSION: USS & OBT is better than classic star design in usability, maintenance and development time. Muti-star solutions should be replaced with USS in row-oriented databases and OBT should be considered for MPP columnar databases like AWS Redshift & Azure Synapse.

At this point no one has a valid reason to stick with classic star schemas.

Data Architecture Guide

Wednesday, May 11, 2022

Introducing the Unified Star Schema

No comments:

Post a Comment