Microsoft’s Power BI is undergoing a pivotal transformation. Christian Wade, Group Product Manager, announced in November of 2023, that Power BI datasets will now be known as semantic models. While the change has This change reflects how Microsoft perceives the future of their business intelligence platform.
Power BI has been one of the premier business intelligence platforms since it was first released in 2015 as a standalone product, after evolving from the Power Pivot feature set of Microsoft Excel. However, the data landscape has changed dramatically during this short period of time with the increased popularity of the data science discipline and proliferation of data lake architectures.
We’ll break down what a Semantic Model is, and where we think Microsoft is going with their recent changes.
Let’s jump in!
Table of Contents
What is a Data Model?
Prior to the name change, Power BI used Data Models. In the world of databases, particularly those structured using SQL (Structured Query Language), a data model is the blueprint that defines how data is stored, organized, and manipulated in a database.
A structured SQL database is like a well-organized filing cabinet. Each drawer in the cabinet is a table. In each table, data is stored in rows and columns, similar to a spreadsheet. Each row or record represents a single, unique piece of data, like the details of a customer or a product. The columns represent the different attributes of that data, such as a customer’s name, address, or a product’s price.
One of the strengths of a structured SQL database is the ability to maintain relationships between tables. In the example above, Customer Addresses are stored on a different table than Sales Information. One benefit is that if a customer address updates, you don’t have to go back and update every individual row of sales data on the sales table.
In a SQL data model, data is often stored in the same location that it’s shaped, summarized, or structured in a single location.
What is a Semantic Model?
The idea of a semantic layer is a more modern interpretation of a data model, with one key difference. Data Storage can be separated from the actual structuring of the data. We see this most commonly in data lakes, where people are able to dump large amounts of structured, or unstructured data into folders of cheap cloud storage. By itself it’s kind of like saving all of your files on the desktop of your computer.
Then a second step comes in that interprets the files that are saved, bucketizes them, relates them, and allows you to interact with the data with machine learning models, or with SQL queries, even though there is no structured SQL database behind the scenes.
Where most people have probably come across the new term recently is in the Power BI Workspace where data sources have been renamed from data model to Semantic Model.
Even though the change can be somewhere confusing, there are some advantages to Semantic Models.
- Integration of Diverse Data Sources – Data can be combined from multiple sources, including both structured and unstructured data, providing a more comprehensive view for analysis.
- Scalability and Flexibility – Semantic layers are extremely scalable and flexible. They allow teams to quickly adjust to changing business needs without requiring a complete overhaul of the underlying data infrastructure.
- Reduced Complexity in Data Management – By abstracting the complexities of the data structure, it simplifies the actual data management, and storage.
- Reduced Costs – Structured and Unstructured data is often saved on inexpensive blob storage. As the amount of data being captured continues to grow the costs of storing it become increasingly important to manage.
As technologies continue to improve, become more accessible, including the introduction of Microsoft Fabric, we expect that these benefits and technologies will become the normal.
Data Model vs Semantic Layer
For most Power BI developers, the impact of the name change of Data Model to Semantic Layer is entirely cosmetic. Power BI will connect to Semantic Layers the same way it would connect to a Data Model and there’s no functional difference. The difference is mainly in the messaging of where the data comes from behind the scenes.
The renaming coincides with the forthcoming general availability of Microsoft Fabric that expands on the Power BI platform to an all-in-one place where developers and engineers can integrate data, build machine learning models, manage structured and unstructured data sources and much more.
Fabric is becoming a one stop location that consolidates a number of services that were previously only accessible by logging into different systems, or an Azure portal.
Microsoft Fabric Explained
Microsoft Fabric is an all-in-one platform for companies to manage the entire data analysis cycle. It combines tools for data integration and engineering with machine learning tools and the Power BI platforms. It’s a one stop dashboard interface that combines technologies that were previously only accessible from the Azure Portal and brings them all together into a single workspace
While some of the capabilities are new, many of the services already existed but now they’re all available in one place.
To highlight the change, the following screenshot is from the Power BI Workspace with a Fabric trial enabled. When you click + New on the Power BI Service, the following options show up which range from Data Engineering and Data Science to traditional Power BI actions.
We ultimately think it’s a good thing to put all things data into a single location for ease of use. We imagine that the role between data engineer, data scientists, and business intelligence analyst will continue to be blurred over the coming years as tools become easier to use and less specialized skillsets are required.
Our Opinion on Semantic Layers vs Data Models
Microsoft sees the future of data storage being based on data lake technology. In a data lake architecture, data storage is separated into cheap blob storage while the data is structured on demand using serverless SQL pools. This is in contrast with traditional SQL databases where data is stored in a structured database or Datawarehouse.
From a data science perspective, having unstructured or semi-structured data easily accessible in a single platform makes building AI and Machine Learning models much easier. The needs for data ingestion into a Machine Learning model can differ quite a bit from what an analyst might need for a Power BI report but the connection to the data in a data lake could be the same.
By combining multiple services into Microsoft Fabric and expanding the scope of what business users can do on a single platform, the concept of a Semantic Layer over a data model makes a lot more sense.
Choosing the Right Infrastructure for Your Business
While technologies like Machine Learning and Data Bricks geta lot of buzz in recent years, the reality is that of these technologies are not well suited for deployment at small and mid-sized organizations. A number of platforms exist that make Machine Learning more approachable, including features within Power BI and Microsoft Fabric, but they’re still niche and require an understanding of statistics and models to understand what the results are telling you.
Data storage and warehousing technologies such as Data Lakes have a number of advantages, such as less expensive data storage compared to more traditional technologies. However, it can be very expensive or difficult to find specialists that understand how to deploy these technologies. On the other hand, data base administrators familiar with traditional SQL data models are relatively inexpensive and easy to find. These solutions also easily scale and should be adequate for most small to mid-sized business needs.
So as Semantic Layers become more common language, they will not be terribly relevant for many businesses until the related technologies become more widely used and less expensive to deploy.
The change from data model to semantic layer terminology is mostly symbolic. In the near term there is no functional difference for a Power BI user that is interacting with data sources. Power BI still supports the same methods for connecting directly to, or importing data into the Power BI Service.
The name change does signal how Microsoft sees the future of data become based increasingly on different technologies with a broader range of use cases. Where data may have traditionally come primarily from a structured SQL database, it now more often comes from several different data sources and has applications in AI and Machine Learning that can differ greatly from more traditional reporting.