Chris Webb's BI Blog: Semantic Layers

I’ve been working with Microsoft BI tools for 28 years now and for all that time Microsoft has been consistent in its belief that semantic models are a good thing. Fashions have changed and at different times the wider BI industry has agreed and disagreed with this belief; right now, semantic models are cool again because everyone has realised how important they are for AI. As a result, some of Microsoft’s partners and competitors (and sometimes it’s not clear which is which) have invested in building their own semantic models and/or metrics stores, some of which don’t work at all with Power BI, some of which only work with significant limitations, and a very small number which are fully supported and work with only minor limitations. This naturally raises the question of whether Power BI will ever work properly with any or all of them. The answer is no, and in this blog post I’ll explain why.

The first thing to make clear is that the reasons why some semantic models work well with Power BI and others don’t are purely technical. It is not because Microsoft has some grand plan to stifle competing BI tools. If you look at Fabric as a whole, you’ll see that Microsoft works closely with Databricks, Snowflake, DBT and many other companies to ensure that it integrates closely with them and gives customers the option to work with whichever other tools they want to use. In Power BI there are connectors to a wide range of data sources, not just Microsoft ones. Over the last year the Power BI team has spoken to all major vendors of third-party semantic models about integration with Power BI and it has been clear about what is and isn’t technically feasible. The door remains open for future collaboration and Microsoft respects the motives of these other vendors, in particular those who are developing open standards.

To understand the technical issues, let’s look at the architecture of a simple Power BI solution that uses an Import mode semantic model – as the vast majority of Power BI solutions do:

In this case the data from the data sources is copied into the Power BI semantic model, which also contains information on how the different tables of data should be joined to each other, measures (defined in the DAX language) describing how data should be aggregated and how more complex business calculations should be performed, which columns are visible and which ones are hidden, and a lot more. When the Power BI report is rendered it sends queries, again in the DAX language, to the semantic model to get the data it needs for each visual.

How could a third-party semantic model be used instead here? Power BI reports connect to Power BI semantic models using the XMLA protocol, and that means that Power BI reports can also connect to older Azure Analysis Services and SQL Server Analysis Services semantic models too. Some vendors have come up with a solution whereby they implement support for XMLA and tell their customers to connect to their semantic models using the SQL Server Analysis Services connector. This works up to a point but as you can imagine, using the SQL Server Analysis Services connector to connect to something that isn’t SQL Server Analysis Services is not supported and not wholly reliable.
It’s worth noting that using a third-party semantic model as a data source for an Import mode Power BI semantic model is not an option either because if Power BI imports metrics like percentage shares or time intelligence calculations it will not be able to aggregate data and get the correct result. Most metrics need to be calculated after the base data has been aggregated to work properly.

There are two other storage modes available for Power BI semantic models: Direct Lake and DirectQuery. Direct Lake only works with data stored in, or which can be reached via a shortcut from, Fabric OneLake so we don’t need to discuss it here. In DirectQuery mode the Power BI semantic model doesn’t store any data and instead, when it is queried, it generates SQL queries to get the data it needs from a data source on demand.

Other vendors of third-party semantic models have taken the approach of suggesting the use of Power BI in DirectQuery mode and having it run SQL against their semantic model. Apart from the fact that DirectQuery mode is usually slower and less cost-effective than Import mode or Direct Lake mode, your first reaction to this would probably be that putting one semantic model on top of another semantic model doesn’t make any architectural sense and you’d be right. There are several serious problems that emerge when you try to use Power BI in this way.

For example, a Power BI semantic model assumes that you have your data modelled as a star schema and that it will be able to generate SQL that joins dimension tables to fact tables. Not all third-party semantic models support something as basic as this yet. What’s more a Power BI semantic model assumes that it will be where all metrics will be calculated, which means that despite some interesting workarounds by third-party vendors (such as making the SQL SUM() function not actually sum up values) you can never be sure that you’ll get the correct values for a metric defined in a third-party semantic model, for example for subtotals or grand totals. There are a lot of other, similar problems that the Power BI team have made these third-party semantic model vendors aware of. These problems are not specific to Power BI semantic models either: no other semantic model would work well with another semantic model as its source.

If you can’t use Power BI semantic models on top of third-party semantic models, is it an option to synchronise calculations defined in a third-party semantic model to a Power BI semantic model? Yes, that is certainly possible and supported, and some of our partners (such as our friends at Tabular Editor) have already started down this path. DAX is a very rich language for defining metrics and Microsoft has invested a lot recently in making changes to Power BI semantic models programmatically as easy as possible. Without a doubt any metrics defined in a third-party semantic model can be reproduced in DAX, although since DAX is a much better fit for defining metrics than SQL you’ll probably find that some of the metrics you need can only be defined in DAX. In which case, rather than defining some of your metrics in a third-party semantic model and some in a Power BI semantic model, why not define all of them in your Power BI semantic model?

The final point to make is that Power BI semantic models can be used with a wide range of BI tools, not just Power BI reports. Apart from Microsoft tools like Excel and Fabric Paginated Reports, Tableau and several other non-Microsoft tools that you might think of as competitors to Power BI can also be used as a front-end for Power BI semantic models and this is supported. There is nothing stopping other BI tools from implementing connectivity to Power BI semantic models in the future. In Fabric you can even query a Power BI semantic model in SQL and extract data into a Pandas Dataframe in Python using the Semantic Link library. Anyone arguing that Power BI semantic models are somehow not “open” is wrong.

I’ll be honest, I think a lot of the reason why organisations that already use Power BI extensively consider third-party semantic models is because some people – not the Power BI users themselves, often people from a data engineering or database background – think of Power BI as just a visualisation tool and don’t realise that it also has the most mature, capable, widely used semantic model available in the market today. It is designed for both self-service and enterprise BI scenarios. Microsoft has no plans to make Power BI’s front end work properly with anything other than its own semantic models because that would be a huge amount of work with few benefits to customers: these third-party semantic models all behave differently and are at different levels of maturity, so any changes made in Power BI to accommodate them would risk breaking existing functionality or limit the use of advanced features. 35 million users view Power BI reports every month and those users query 20 million Power BI semantic models. Microsoft’s strategy is to continue to invest and strengthen Power BI semantic models for those customers. So if Power BI is how you want your end users to consume data, then Power BI semantic models, not any other third-party semantic model or metrics store, are the right place to store your metrics definitions and your business logic.

Last week it was announced that Power BI datasets have been renamed: they are now semantic models. You can read the announcement blog post here and see the change in the Fabric/Power BI UI already.

The name change proved to be surprisingly uncontroversial. Of course it’s very disruptive – trust me, I know, I have around 500 blog posts that I need to do a search-and-replace on at some point – so I have a lot of sympathy for people with books or training courses that need updating or who are getting calls from confused end users who are wondering where their datasets have gone. But there was a general consensus that the change was the right thing to do:

When Marco approves of a change the whole Fabric team breathes a sigh of relief. The term “dataset” is too generic and too confusing for new developers; “semantic model” is a lot more specific and descriptive. Kurt Buhler has just written a very detailed post on what semantic models are. What else is there to say?

A name is often not just a name, it’s a statement of intent. While I don’t want you to read too much into the name change (Christian Wade does a good job of explaining how and why the name “semantic model” was chosen at the start of this Ignite session) and it’s always a mistake to think that we at Microsoft have some elaborate secret master plan for our products’ future development, people are nevertheless asking what the name “semantic model” signifies:

…and when someone as senior as Amir Netz asks me to do something, it’s probably a good idea for me to oblige 😉:

Power BI as a semantic layer is certainly one of my favourite topics: I wrote a very popular post on it last year. Even if it isn’t immediately apparent, Power BI is a semantic layer, a semantic layer made up of one or more semantic models. A lot of things (not just names) have changed in the world of Microsoft BI since I wrote that post which, in my opinion, only strengthen my arguments.

However you define the term “semantic layer”, reusability of data and business logic is a key feature. We all know that Bad Things happen to companies like the one discussed here on Reddit which create one semantic model per report: source systems are overloaded by the number of refreshes, the burden of maintenance becomes overwhelming and there are multiple versions of the truth. Creating the minimum number of semantic models necessary and using them as the source for your reports has always been a best practice in Power BI and the new name will, I hope, prompt developers to think about doing this more.

Would Power BI be better if it forced all developers to build their semantic layer upfront? No, I don’t think so. I believe a good BI tool gives you the flexibility to use it however you like so long as it can be used in the right way if you want – where “right” will mean different things for different organisations. If Power BI was more prescriptive and made you to do the “right” thing up front then I doubt the company discussed on Reddit in the link above would be more successful; instead it would add so many barriers to getting started they probably wouldn’t be using Power BI in the first place, they would be using Excel or some other tool in an equally inefficient way. What’s more if Power BI chose one “right” way of doing things it might exclude other “right” ways doing things, which would alienate the adherents of those other ways and be commercially damaging.

Fabric provides several new opportunities for reuse, with shortcuts and Direct Lake mode as the most obvious examples. Think about the number of Import mode semantic models you have in your organisation: each one will have a Date dimension table for sure, and there will certainly be a lot of dimension tables and probably a few fact tables duplicated across them. How much time and CPU is spent refreshing each of these tables? How many different versions of these tables are there, each one refreshed at different times? In Fabric you can maintain a single physical copy of your shared dimension tables and fact tables in Delta format in a Lakehouse, load data into them once, then reuse them in as many semantic models as you want via shortcuts. With Direct Lake mode no further refresh is needed, so each semantic model reuses the same copy of each dimension table and fact table and shows exactly the same data, saving time and compute and making them all consistent with each other. You can even now sync the tables in your Import mode semantic models to OneLake, making this pattern easier to adopt for existing Power BI users.

Another cause of data duplication in the past has been the different toolsets used by BI professionals and data scientists. Data is modelled and loaded for Power BI reports and business logic coded in DAX by the BI professionals, while in parallel data scientists have taken their own copies of the raw data, modelled it differently and implemented business logic in their own way in languages like Python. As Sandeep Pawar points out here, Semantic Link in Fabric now allows data scientists to query semantic models in SQL or in code, again promoting reuse and consistency.

Finally, looking ahead, I think the new Power BI Desktop Developer mode, Git integration and Tabular Model Definition Language (TMDL) will provide new ways of sharing and reusing business logic such as measure definitions between multiple semantic models. Not all the features necessary to do this are in Power BI/Fabric yet but when they do appear I’m sure we’ll see the community coming up with new patterns (perhaps successors to Michael Kovalsky’s Master Model technique) and external tools to support them.

In conclusion, as Power BI evolves into a part of something bigger with Fabric, then the new features I’ve mentioned here make it an even more mature semantic layer. Changing the name of datasets to semantic models is a way of highlighting this.

Category: Semantic Layers

Power BI And Support For Third Party Semantic Models

Like this:

Thoughts On Power BI Datasets Being Renamed To Semantic Models

Like this:

Share this:

Like this:

Share this:

Like this: