Last week it was announced that Power BI datasets have been renamed: they are now semantic models. You can read the announcement blog post here and see the change in the Fabric/Power BI UI already.
The name change proved to be surprisingly uncontroversial. Of course it’s very disruptive – trust me, I know, I have around 500 blog posts that I need to do a search-and-replace on at some point – so I have a lot of sympathy for people with books or training courses that need updating or who are getting calls from confused end users who are wondering where their datasets have gone. But there was a general consensus that the change was the right thing to do:
When Marco approves of a change the whole Fabric team breathes a sigh of relief. The term “dataset” is too generic and too confusing for new developers; “semantic model” is a lot more specific and descriptive. Kurt Buhler has just written a very detailed post on what semantic models are. What else is there to say?
A name is often not just a name, it’s a statement of intent. While I don’t want you to read too much into the name change (Christian Wade does a good job of explaining how and why the name “semantic model” was chosen at the start of this Ignite session) and it’s always a mistake to think that we at Microsoft have some elaborate secret master plan for our products’ future development, people are nevertheless asking what the name “semantic model” signifies:
…and when someone as senior as Amir Netz asks me to do something, it’s probably a good idea for me to oblige 😉:
Power BI as a semantic layer is certainly one of my favourite topics: I wrote a very popular post on it last year. Even if it isn’t immediately apparent, Power BI is a semantic layer, a semantic layer made up of one or more semantic models. A lot of things (not just names) have changed in the world of Microsoft BI since I wrote that post which, in my opinion, only strengthen my arguments.
However you define the term “semantic layer”, reusability of data and business logic is a key feature. We all know that Bad Things happen to companies like the one discussed here on Reddit which create one semantic model per report: source systems are overloaded by the number of refreshes, the burden of maintenance becomes overwhelming and there are multiple versions of the truth. Creating the minimum number of semantic models necessary and using them as the source for your reports has always been a best practice in Power BI and the new name will, I hope, prompt developers to think about doing this more.
Would Power BI be better if it forced all developers to build their semantic layer upfront? No, I don’t think so. I believe a good BI tool gives you the flexibility to use it however you like so long as it can be used in the right way if you want – where “right” will mean different things for different organisations. If Power BI was more prescriptive and made you to do the “right” thing up front then I doubt the company discussed on Reddit in the link above would be more successful; instead it would add so many barriers to getting started they probably wouldn’t be using Power BI in the first place, they would be using Excel or some other tool in an equally inefficient way. What’s more if Power BI chose one “right” way of doing things it might exclude other “right” ways doing things, which would alienate the adherents of those other ways and be commercially damaging.
Fabric provides several new opportunities for reuse, with shortcuts and Direct Lake mode as the most obvious examples. Think about the number of Import mode semantic models you have in your organisation: each one will have a Date dimension table for sure, and there will certainly be a lot of dimension tables and probably a few fact tables duplicated across them. How much time and CPU is spent refreshing each of these tables? How many different versions of these tables are there, each one refreshed at different times? In Fabric you can maintain a single physical copy of your shared dimension tables and fact tables in Delta format in a Lakehouse, load data into them once, then reuse them in as many semantic models as you want via shortcuts. With Direct Lake mode no further refresh is needed, so each semantic model reuses the same copy of each dimension table and fact table and shows exactly the same data, saving time and compute and making them all consistent with each other. You can even now sync the tables in your Import mode semantic models to OneLake, making this pattern easier to adopt for existing Power BI users.
Another cause of data duplication in the past has been the different toolsets used by BI professionals and data scientists. Data is modelled and loaded for Power BI reports and business logic coded in DAX by the BI professionals, while in parallel data scientists have taken their own copies of the raw data, modelled it differently and implemented business logic in their own way in languages like Python. As Sandeep Pawar points out here, Semantic Link in Fabric now allows data scientists to query semantic models in SQL or in code, again promoting reuse and consistency.
Finally, looking ahead, I think the new Power BI Desktop Developer mode, Git integration and Tabular Model Definition Language (TMDL) will provide new ways of sharing and reusing business logic such as measure definitions between multiple semantic models. Not all the features necessary to do this are in Power BI/Fabric yet but when they do appear I’m sure we’ll see the community coming up with new patterns (perhaps successors to Michael Kovalsky’s Master Model technique) and external tools to support them.
In conclusion, as Power BI evolves into a part of something bigger with Fabric, then the new features I’ve mentioned here make it an even more mature semantic layer. Changing the name of datasets to semantic models is a way of highlighting this.