Different Ways To Replace Blanks With Zeros In DAX

My post from a few months ago about the dangers of DAX measures that never return blank attracted quite a lot of attention – this is a hot topic on the forums and adding zeros to measures is a common cause of memory errors in Power BI. In that post, though, I didn’t talk about what the best way to replace blanks with zeros is if you absolutely have no choice but to do so. One of the comments on that post mentioned that visual calculations is an option and this is something I hadn’t thought about before; now, after conversing with the Gods of DAX (no, not the Italians, I mean Akshai, Marius and Jeffrey!) and doing some testing I can reveal that Visual Calculations can be a good choice sometimes, while more traditional DAX approaches are fine at other times.

Let’s see some examples. I created the following model using the AdventureWorksDW 2017 sample data:

There’s a Product dimension, a Customer dimension and a Date dimension plus a fact table containing sales data. The most important thing to note is that individual customers only buy a few products on a few dates. I also created two measures with the following definitions:

Sales Amount = SUM('FactInternetSales'[SalesAmount])

Mountain-100 Black 38 Sales =
CALCULATE (
    [Sales Amount],
    'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
)

The [Sales Amount] measure just sums up the values in the SalesAmount column on the fact table; the [Mountain-100 Black 38 Sales] returns the value of [Sales Amount] for just one product.

Now consider a table visual showing LastName from the DimCustomer table, FullDateAlternateKey from DimDate and the [Sales Amount] and [Mountain-100 Black 38 Sales] measures:

There are a lot of rows here because every combination of LastName and FullDateAlternateKey where there is a value for [Sales Amount] is shown. Connecting Profiler to Power BI Desktop and capturing the Execution Metrics trace event (DAX Studio also shows this now) shows that this query has a peak memory consumption of 2063KB

{
	"timeStart": "2024-11-03T19:07:26.831Z",
	"timeEnd": "2024-11-03T19:07:26.844Z",

	"durationMs": 13,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 0,
	"totalCpuTimeMs": 0,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2063,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502
}

As you can see the [Mountain-100 Black 38 Sales] is mostly empty, and let’s say you need to replace the blanks in this column with zeros.

Changing the measure definition to add zero to the result of the Calculate(), as follows:

Mountain-100 Black 38 Sales =
CALCULATE (
    [Sales Amount],
    'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
) + 0

Doesn’t do what you want because now you get a row in the table for every combination of LastName and FullDateAlternateKey, which means the rows which have non-zero values are hard to find:

Instead, only adding zero when there is a value for [Sales Amount], something like this:

Mountain-100 Black 38 Sales =
IF (
    NOT ( ISBLANK ( [Sales Amount] ) ),
    CALCULATE (
        [Sales Amount],
        'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
    ) + 0
)

…does the trick. What does memory usage look like? Here are the Execution Metrics:

{
	"timeStart": "2024-11-03T19:17:22.470Z",
	"timeEnd": "2024-11-03T19:17:22.500Z",

	"durationMs": 30,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 31,
	"totalCpuTimeMs": 31,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 3585,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502
}

Memory usage has increased only slightly, to 3585KB.

What about using Visual Calculations instead? Reverting to the original definition of the [Mountain-100 Black 38 Sales] measure and then creating a Visual Calculation like so:

No Blanks = [Mountain-100 Black 38 Sales]+0

…shows that this doesn’t solve the problem because again you get unwanted rows with no sales. Using:

No Blanks = 
IF (
    NOT ( ISBLANK ( [Sales Amount] ) ),
    [Mountain-100 Black 38 Sales] + 0
)

…does solve the problem and you can of course hide the original [Mountain-100 Black 38 Sales] measure column so it doesn’t appear in your table:

But Execution Metrics shows that memory usage is in fact a lot higher, at 11295KB, which is because the resultset now has one extra column in it and Visual Calculations make a copy of the original resultset in memory when they are calculated:

{
	"timeStart": "2024-11-03T19:31:22.858Z",
	"timeEnd": "2024-11-03T19:31:22.980Z",

	"durationMs": 122,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 109,
	"totalCpuTimeMs": 109,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 11295,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502

Does this mean that Visual Calculations should never be used? No, not at all. Consider the following matrix visual which only contains the [Mountain-100 Black 38 Sales] measure and has LastName on rows and FullDateAlternateKey on columns:

Memory usage for this visual is 1091KB:

{
	"timeStart": "2024-11-03T19:51:02.966Z",
	"timeEnd": "2024-11-03T19:51:02.974Z",

	"durationMs": 8,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 16,
	"totalCpuTimeMs": 16,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 1091,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 139
}

The resultset returned for the DAX query used to populate this visual only contains one row for each combination of Last Name and Date that has a value for [Mountain-100 Black 38 Sales], 139 rows in all, but because a matrix visual is used to display the results this introduces all the blanks you can see in the screenshot. You could could try to replace these blanks with some very complex DAX but I’m not even going to try. Instead, Visual Calculations solve this problem very easily:

No Blanks Matrix = [Mountain-100 Black 38 Sales]+0

Here’s the matrix with [Mountain-100 Black 38 Sales] hidden and the Visual Calculation applied:

Execution Metrics reveal that peak memory consumption is just 2054KB and the number of rows returned is higher but still just 2070 rows:

{
	"timeStart": "2024-11-03T19:45:49.298Z",
	"timeEnd": "2024-11-03T19:45:49.337Z",

	"durationMs": 39,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 31,
	"totalCpuTimeMs": 31,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2054,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 2070
}

Overall, both traditional DAX solutions and Visual Calculations are effective in different scenarios, so I suggest that you test the query performance and memory usage of different solutions yourself.

New Solution For Monitoring Power BI/Fabric Gateways

If you’re the admin for a large Power BI/Fabric tenant you’ll know how important it is to monitor your on-premises data gateways: if they aren’t working then no-one will be able to refresh any semantic models or dataflows connected to on-prem sources. A while ago Rui Romano published a gateway monitoring solution that proved to be very popular, but recently my colleague at Microsoft Edgar Cotte built a new open-source solution that builds on Rui’s work and which is even more powerful, using Fabric Real-Time Intelligence to let you analyse gateway log information in real time.

You can read his announcement on LinkedIn here and download the solution from https://aka.ms/FabricGatewayMonitoring.

This is an amazing piece of work that is already being used by several very large customers. Check it out!

Finding The Columns Touched By A DAX Query

Some time ago I wrote a post about how you can use the DISCOVER_CALC_DEPENDENCY DMV to get a list of all the tables, columns and relationships referenced by a DAX query. That’s really useful and indeed the new INFO.CALCDEPENDENCY DAX function allows you to do the same thing without all the complexity of calling a DMV. However this doesn’t quite give you everything you need to solve one important problem: which columns are touched when a DAX query runs? The INFO.CALCDEPENDENCY function gives you a list of all the columns used directly in a DAX query and it also gives you a list of all the columns needed by all the measures used in a DAX query. However it only gives you a list of the relationships used and these relationships also rely on columns. In this post I’ll show you how to get a list of all the columns that are touched by a DAX query.

For illustration, consider the following model:

Note that the relationship between the Product table and the Sales table is inactive. The definitions of the three measures are as follows:

Units = SUM(Sales[SalesUnits])
Value = SUM(Sales[SalesValue])
Apples Value =
CALCULATE (
    [Value],
    KEEPFILTERS ( 'Product'[ProductName] = "Apples" ),
    USERELATIONSHIP ( 'Product'[ProductID], 'Sales'[ProductID] )
)

There is also a calculated column called CustomerID and Name defined as follows:

Customer ID and Name = 
"(" & 'Customer'[CustomerID] & ") " & Customer[CustomerName]

Now consider the following visual:

The DAX query for this is as follows:

// DAX Query
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], "IsGrandTotalRowTotal"),
			"Apples_Value", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]

Passing this query to INFO.CALCDEPENDENCY like so:

EVALUATE
VAR Source_Query =
"
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
			""Apples_Value"", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
	"QUERY",
	Source_Query
)
RETURN All_Dependencies

[Note that all double quotes in the query being passed in have been escaped]

…returns all the measures, tables, columns and relationships used:

It knows that the Apples Value measure needs the SalesValue column and the ProductName column and it knows that the relationships between the Country table and the Sales table, and the Product table and the Sales table, are also required but it doesn’t tell you which columns those relationships in turn need. Luckily the new INFO.VIEW.RELATIONSHIPS() function can be used to get that extra information easily.

Here’s a more complex DAX query that returns a list of all the measures touched by a DAX query, including those needed by all the measures and all the relationships needed by that query:

EVALUATE
VAR Source_Query =
"
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
			""Apples_Value"", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
	"QUERY",
	Source_Query
)
VAR Referenced_Cols = 
SELECTCOLUMNS(
	FILTER(
		All_Dependencies, 
		[REFERENCED_OBJECT_TYPE] IN {"COLUMN", "CALC_COLUMN"} 
		),
	"FullyQualifiedName",
	"'" & [REFERENCED_TABLE] & "'[" & [REFERENCED_OBJECT] & "]"
)
VAR Referenced_Relationships =
SELECTCOLUMNS(
FILTER(
	All_Dependencies, 
	[REFERENCED_OBJECT_TYPE] IN {"ACTIVE_RELATIONSHIP", "RELATIONSHIP"} ),
"Name",
[REFERENCED_OBJECT])
VAR Relationship_Details =
FILTER(
	INFO.VIEW.RELATIONSHIPS(),
	[Name] IN Referenced_Relationships
)
VAR Relationship_Columns = 
UNION(
	SELECTCOLUMNS(
		Relationship_Details,
		"FullyQualifiedName",
		"'" & [FromTable] & "'[" & [FromColumn] & "]"
	),
		SELECTCOLUMNS(
		Relationship_Details,
		"FullyQualifiedName",
		"'" & [ToTable] & "'[" & [ToColumn] & "]"
	)
)
VAR All_Referenced_Columns = 
UNION(
	Referenced_Cols,
	Relationship_Columns
)
RETURN 
All_Referenced_Columns

Here’s what it returns:

As you can see, this returns a list of all the columns used directly by the query, any measures referenced by the query, and all the relationships involved too.

You can use this query very easily in DAX Query View: just paste it in, then capture the DAX query generated by your own visuals using Performance Analyzer, escape any double quotes and paste it into the Source_Query variable in the query above.

Here’s one more example of how it works, a visual that touches a lot more columns:

I won’t post the full DAX query of the visual or my query again, but here’s the list of columns returned by my query:

One thing to notice about this is that while it shows the calculated column [Customer ID and Name] is used, it doesn’t break down this column’s dependencies and therefore doesn’t list the [CustomerName] column. This is a deliberate decision on my part.

Why is all this useful? The obvious answer is that it would allow you to see whether a query would break or return a different result if you deleted a column from your model. More interestingly – and this is something I need to research more – given that in Direct Lake mode columns are paged into memory when needed by a query, and given that it’s possible to use other DMVs/INFO functions to get the size of a column in memory (for example as DAX Studio’s Model Metrics feature does), it should be possible to use this approach to find the total memory used by all the columns needed by a DAX query on a Direct Lake model. This would be very helpful when diagnosing memory errors in Direct Lake mode.

Why Would You Use Power BI Copilot?

I’m a big fan of Kurt Buhler‘s work (indeed we like him so much at Microsoft that we get him to help out with our docs) and I have a lot of respect for his opinion. A few weeks ago, in a thread on Twitter/X, he asked an important question about Power BI Copilot: why would someone use it? When I saw this I realised it was something I hadn’t thought about properly but which deserved a detailed answer. Now, after ruminating for a while and talking to a few people (including Kurt), I will put my case for why someone might want to use Power BI Copilot.

[Just to be clear: in this post I am not going to discuss the related question of whether Copilot works well and delivers useful results – even though I think it is pretty good already – because that’s not what Kurt is asking in the thead]

Which Copilot are we talking about here?

Before I carry I need to define which Copilot is being referred to in the question. There are a lot of Copilot experiences in Power BI: the one in DAX Query View, the one that can create new report pages, the one that allows you to summarise and ask questions about data that is visible on report pages, the one that allows you to ask questions about data that is in your semantic model but not necessarily visible in a report, as well as others. Since Kurt and I agree on the usefulness of Copilot in DAX Query View for intermediate developers let’s focus on the last three – which can be summed up as the ability to answer data questions in natural language – which are what most people think about when they think about Copilot for Power BI anyway.

A solution looking for a problem?

To answer the question of why someone would use Power BI Copilot to answer data questions we need to look at who is asking for this functionality. But is anyone actually asking for it? I’ve heard a few people say that Copilot for Power BI is a solution looking for a problem.

Now I can see why you would think that because let’s face it, we can all imagine some VP in Microsoft ordering every program manager to add a Copilot to their product whether they think it’s useful or not because that’s what the market demands. However if you work in the Microsoft field or for a partner then I’m sure you’ll know how much genuine interest there is in Copilot from Power BI customers and users. In my opinion Power BI Copilot is actually the latest attempt to respond to a very real request that I have heard for the whole of my career in BI and which can be summed up as follows:

I want to ask questions in English (or whatever my native language is) about my data rather than have to look at a report someone else has built or build one myself

I’m old enough to remember SQL Server 2000’s English Query feature, which was an attempt to solve the same problem over twenty years ago, and there have been numerous other products and features from Microsoft (including Power BI’s own Q&A) and other BI vendors that have tried to do the same thing in the past. I am not saying that any of them succeeded in their aim because I don’t think any of them did. What I am saying is that they were attempts to do the same thing that Power BI Copilot is trying to do and that therefore Power BI Copilot is not a solution looking for a problem, it’s actually the latest attempt to solve a very real and very old problem.

So what is the problem?

I have never felt the need to ask questions about my data in natural language because I have always found it easy to:

  • Look at an existing report or dashboard and interpret what’s in it
  • Build my own report/dashboard/queries to get answer to questions that are not in existing reports or dashboards

That’s going to be true of anyone working as a data professional or a business analyst. So if the question is “why would I use it?” then the answer is I wouldn’t.

However I believe the people who are asking for natural language querying of data are people who:

  • Find it hard or impossible to interpret data provided in numeric or visual form in a report
  • Find it hard or impossible to build new reports or write queries to get answers that aren’t in the reports they do have access to
  • Have to ask other people to get the answers they need

It’s tempting to say “read what’s in the report” or “learn Power BI” or “learn SQL” when faced with these people and their requests. But, clearly, they keep asking for an alternative and we’re not going to change their minds.

Think back to maths class at school and it shouldn’t be a surprise that some people just don’t enjoy staring at pages of numbers and graphs or learning database query languages in the same way your or I do. What’s more dyscalculia or “maths dyslexia” means that interpreting data – however well designed your visualisations are – doesn’t come easily to a significant number of people. While the imperative to be “data driven” is widespread in every industry and profession today, not being good with data or not enjoying using it is not a barrier to success in most careers. Power BI Copilot provides a different, and perhaps more approachable, way of getting answers to data questions for many people.

It seems to me that a lot of the excitement for Copilot for Power BI comes from C-level people who love the challenge of management, who need data to make decisions, but who don’t like working with data directly or don’t have the time to do so. Today they need to ask their data questions to a business analyst and this introduces latency: asking someone else to do something is always more time consuming than doing it yourself. What I suspect these C-level folks want is to cut the human out of the loop, get answers to their questions more quickly, while keeping the experience of interacting with a human. Power BI Copilot aims to be that virtual business analyst.

Why is Copilot better or more effective than non-AI approaches?

If the “non-AI” approaches mentioned here involve “reading the report” or “learning Power BI” I believe I’ve already dealt with them above. What about non-AI approaches to natural language querying? Clearly LLMs are vastly better at “understanding” natural language requests than any technology ever seen before but there’s more to say here.

As I already mentioned, while the problem of asking data questions in natural language has been around for a long time, no-one has solved it yet. Why? Thinking back to my experiences with Power BI Q&A the reason is that to get good answers you have to ask good questions. Half the challenge of learning programming or any database query language is understanding what you want to do before you even learn how to express that request in code. To get good results from Q&A you have to construct the English-language equivalent of a DAX or SQL query and that’s why it, like all its predecessors, failed. Users are bad at constructing clear, unambiguous questions.

One way that Copilot is an improvement on Q&A is that it can interpret a wider range of questions. For example, I have a demo with a table of books at their year of publication; I was pleasantly surprised that when I asked “Show me all 19th century literature” it gave me all the books published between 1800 and 1899. Q&A would never have been able to do this.

More importantly, Copilot is getting better at asking for clarifications. Going back to the previous question, when a C-level person asks their business analyst for the answer to a data question it will be something vague like “what was growth like for widgets in the UK last year?”. Part of the job of the business analyst is to know exactly what exact product “widgets” refers to and which particular geographies make up the UK without having to ask. Those are things that can be defined in a good Power BI semantic model. However part of the job of the business analyst is to ask follow-up questions like “do you mean month-on-month growth or year-on-year growth?” to clarify the original request and that’s exactly what you’ll see Copilot doing more of in the future.

The ability of LLMs to conduct conversations in near human-quality natural language and use this ability to turn a vague request into a data question that can be answered is, I think, why AI approaches to the problem we’re talking about are better than non-AI approaches.

Conclusion

I hope I have made a good argument for there being people who want to use Power BI Copilot and the reason why is that they can’t, don’t like or don’t want to learn to use existing methods of interacting with data. Will it succeed where so many others have failed? Let’s see where we are in a year or so’s time but I’m confident that the remarkable advances in generative AI mean that it will.

Different Ways To Create Synonyms In Power BI Copilot

If you have been using Power BI Copilot to ask questions about the data in your model you will probably know that you should tune your Linguistic Schema to get the best results (something I’ve blogged about, for example here and here). Synonyms are one of the most important things you can add to your Linguistic Schema and in this post I will explain all the different ways synonyms can be created.

To illustrate this, consider a semantic model that contains a Customer dimension table with the following structure:

Manually Created Synonyms

The most effective synonyms are the ones you add to your Linguistic Schema yourself. You can do this by going to the Modelling tab on the ribbon in Power BI Desktop and clicking on the Q&A Setup button. This opens the Q&A Setup dialog, and if you then go to the Synonyms pane you will see a list of all the tables and columns in your model:

Next to each table or column name is a list of “Approved synonyms” which are the synonyms which actually get used by Copilot and a list of “Suggestions” which are synonyms you might consider adding.

Clicking the “Add” button in the “Approved synonyms” list allows you manually add a synonym. For example you might want to add the synonym “Client” to the DimCustomer table:

Manually added synonyms are almost always going to be the ones that improve Copilot’s results the most, and as a result Copilot will always try to match user input with them when possible. What’s more, the more manual synonyms that you add the better the suggested synonyms created using the methods below will be because they are taken into account when the suggested synonyms are generated.

You can also delete synonyms in the Synonyms pane. If a synonym is deleted in the Synonyms pane it still remains in the Linguistic Schema but is marked as deleted and the synonym will never be suggested again.

This type of manually-created synonym is known as “Authored” in the Linguistic Schema.

Heuristic Rules

Looking at the screenshots of the Synonyms pane above you can see that some other synonyms are automatically generated and added to the Linguistic Schema in the “Approved” section. These synonyms are created according to a series of heuristic rules, so for example:

  • Camel casing is replaced by spaces, so “DimCustomer” becomes “Dim Customer”
  • Obvious prefixes like “Dim” are removed and “DimCustomer” becomes just “Customer”
  • Underscores are replaced by spaces, so “Customer_Name” becomes “Customer Name”

…and so on. The benefit of this is that these synonyms are almost as high quality as authored synonyms and do not require any effort on the part of the model developer to create.

This type of synonym is known as “Generated” in the Linguistic Schema. Generated synonyms are not persistent and are deleted and recreated when the model changes or when the heuristic rules used change.

Thesaurus

The Suggestions column at first mostly contains suggested synonyms created from a thesaurus search. They are not added to the Approved synonyms by default because they are often relatively low quality but if you see one that is useful you can click the “+” button next to it to add it to the list of “Approved” synonyms that are actually used by Copilot.

This type of synonym is known as “Suggested” in the Linguistic Schema.

Organisational Sharing

Synonyms can be shared with other semantic model developers in your organisation and you can reuse the synonyms that other developers have created. This is only possible if the Synonym Sharing feature is turned on by your admin in the tenant settings:

If you promote a suggested synonym to the approved list then you’ll be prompted to decide if you want to share your synonyms with other developers in your organisation:

Synonyms created by other developers will appear in the “Suggested” list so you can decide to use them if you want. It’s unlikely many synonyms will be found this way unless you actively promote sharing to your developers, but the synonyms found are likely to be good quality.

Visual Renaming

If you edit the report in the same .pbix file as your semantic model, create a visual and then rename a column in that visual, that rename also results in an automatically-created synonym. So if you create a table visual in my report like this:

…and then rename the Customer_Name column in the visual (but not in the semantic model itself) to Person:

…this rename gets added to the list of Suggested synonyms:

Copilot Generated Synonyms

Lastly (and this feature is in preview at the time of writing) you can use Copilot to generate Suggested synonyms. This is very similar to using the thesaurus but the synonyms generated are higher quality because the LLM used has more domain knowledge. There’s a banner at the top of the Synonyms pane advertising this feature and a button to press; full docs are here, you will need Copilot enabled and it will add a (small) load on your capacity.

Copilot synonyms are not automatically added to the Suggested synonyms list because they are considered AI-generated content, which requires explicit user consent.

Suggestion Settings

You can control which of these methods are used to generate Suggested synonyms from, as well as your sharing settings, from the Suggestion Settings dropdown at the top of the Synonyms pane.

[Thanks to Tu Phan for the information in this post]

Role-Playing Dimensions In Fabric Direct Lake Semantic Models

Did you know you can add the same physical table in OneLake multiple times to the same Direct Lake semantic model?

Let’s say you have two tables in a Fabric Lakehouse. One is a fact table called Sales:

…and the other is a dimension table called Date:

Note that the Sales fact table has two date columns, OrderDate and ShipDate.

If you create a DirectLake semantic model using the Web Editor and add these two tables you could rename the Date table to Order Date and build a relationship between it and the OrderDate column on the Sales table:

What about analysing by Ship Date though? You could create a physical copy of the Date table in your Lakehouse and add that to the model, but there’s another option.

If you connect to the model using a tool like Tabular Editor, duplicate the Order Date table and rename the new table Ship Date:

You then have two tables in your semantic model connected to the same physical table in your Lakehouse, and you can create a relationship between this new table and the Sales table:

…and then use the new dimension in your reports:

This is handy for handling role-playing dimensions, which are most often Date dimensions but may be other types of dimension too. Indeed, I’ve sometimes found the need to add the same fact table to a model more than once. The benefit of only having one physical copy is reduced refresh time, lower storage costs (although for most role playing dimensions the savings will be negligible) and simpler ETL.

What Can You Do With Copilot In DAX Query View?

On my LinkedIn feed this morning I saw an amazing demo from Brian Julius showing how you can use AI to generate code for Python visuals in Power BI, immediately followed by this video from Marco Russo warning us not to believe all the hype about AI. I guess I, like most people, am waiting to see what turns out to be useful in the real world as a result of all of the current investment in AI. Certainly every Power BI customer I talk to these days is very interested in Copilot and the focus always seems to be using Copilot to understand the data shown on a report or to ask questions about the data in a semantic model. That’s understandable because these are the features that will benefit the largest number of users. However, I think the often-overlooked Copilot in DAX Query View is where you’ll get the most value today.

I have to admit that I’m a relatively late convert to DAX Query View, probably because I’ve been a hardcore DAX Studio user for so long. The ease with which you can write, test and update the definitions of measures in your model with DAX Query View is what lured me over and once I was there I started testing its Copilot features too. The documentation gives you the basics of what is possible with Copilot in DAX Query View but it turns out it’s capable of a lot more. So what are the practical benefits of Copilot in DAX Query View?

Writing measures

DAX is incredibly powerful but it’s also hard to learn. Up until recently if you were struggling to write a DAX measure you’d probably go to your favourite search engine, search for blog posts or questions on forums that seemed to describe what you wanted to achieve, copy some code from them and hope you can adapt it to your own needs. Nowadays more and more people are using LLMs to write DAX and while this doesn’t replace the need to learn the language (don’t throw away that copy of “The Definitive Guide to DAX” yet) it is a faster way to get working code than using search engines alone. Copilot in DAX Query View gives you this experience more conveniently right inside Power BI Desktop and with none of the security and privacy concerns about using a free public AI service. It also knows about the structure of your semantic model without you needing to explicitly pass that information in through your prompts, leading to better quality results.

Let’s see an example. Consider the following model:

In DAX Query View, the following Copilot prompt

Write a measure that returns Order Amount for the employee Gabi and show that measure in a query broken down by product

…returns the following measure definition and DAX query:

DEFINE
  MEASURE 'Orders'[Gabi's Order Amount] =
    CALCULATE(
      [Order Amount],
      'Employee'[Employee Name] = "Gabi"
    )

EVALUATE
  SUMMARIZECOLUMNS(
    'Product'[Product Name],
    "Gabi's Order Amount", [Gabi's Order Amount]
  )

As you can see, it’s successfully generated the measure definition I wanted and also written a DAX query that shows the results. I’ll admit it’s not the most complex measure in the world but all the same I know plenty of users find Calculate() hard to understand and would find this very helpful. I will also admit that the quality of the results is heavily dependent on whether you have followed best practices for Power BI semantic model design.

Explain DAX topics and apply them to your code

Looking at the measure Copilot just wrote you might think “should I be using the KeepFilters() function here?”. Entering the prompt:

Explain the DAX KeepFilters function

…gives you a nice explanation of what the KeepFilters() function does:

That’s great, but I could get that from a search engine. What is impressive is that the explanation includes a version of the measure above rewritten to use KeepFilters():

DEFINE
  MEASURE 'Orders'[Gabi's Order Amount] =
    CALCULATE(
      [Order Amount],
      KEEPFILTERS('Employee'[Employee Name] = "Gabi")
    )

You can then prompt:

Rewrite the "Gabi's Order Amount" measure so that it uses the DAX KeepFilters function

…and see the rewritten version side-by-side with the original:

…so you can see what exactly has changed and decide whether to accept the change or not.

Explain DAX code

Taking over the ownership of a semantic model that someone else built and understanding their code is tough. Copilot in DAX Query View can help here too. Let’s say you have the following measure in your model:

Avg Order Amount = 
 //Chris said to use the Divide() function
 //to avoid possible division by zero
 DIVIDE(
    [Order Amount],
    [Order Count]
 )

The prompt:

Explain the avg order amount measure

…gives you an explanation of what the measure, and all of the measures it references, does:

It takes code comments as well as the code itself into account and although at the time of writing it doesn’t look at the description properties of your tables, columns or measures it will soon.

Suggest improvements to your measures

Let’s say you have a measure with the following definition:

Order Amount Apples = 
CALCULATE(
    [Order Amount],
    FILTER('Product', 'Product'[Product Name]="Apples")
)

An experienced Power BI developer will read this and immediately hear an (Italian accented) voice in their head saying “Filter columns not tables!” even if in this case the query optimiser will ensure this measure performs well. Most Power BI developers will not see a problem though, and while I’m fairly sure that Tabular Editor’s best practices analyser would also pick up the problem not everyone knows about or is allowed to use Tabular Editor.

In DAX Query View you can enter the prompt:

Optimise the Order Amount Apples measure and explain the suggested changes

…and get a pretty good summary of how the measure could be improved to avoid the use of the Filter() function:

Conclusion

Copilot in DAX Query View is not magic. It’s not perfect and it will hallucinate or make bad suggestions occasionally, although I didn’t cherry-pick any of its output while writing this post and it is getting better and better at an impressive rate. It will not replace all the other tools and techniques you use today to write DAX and it will not put Marco and Alberto out of a job. But there’s a saying inside Microsoft that “Copilot is not a pilot”, which sums up nicely how to think about it: Copilot won’t do your job for you but will help you do your job faster and better. As these examples demonstrate, Copilot in DAX Query View can provide a significant productivity boost, especially for less experienced Power BI developers. If you’re the Power BI admin for a large organisation with a lot of self-service usage then I think that productivity boost more than justifies the cost of the Fabric capacity required to run it.

[And yes, before you comment, I know that the fact Power BI Copilot is only available for F64/P1 and above capacities and has geographic restrictions isn’t ideal, but trust me we’re working on ways to make it more affordable and more widely available. We want as many people as possible to use it but we can’t give it away for free. If you’re comparing this with a free service then remember the old saying: if a product is free then you are the product. You may be ok with that but your employer’s lawyers may not be.]

Another Way To Get The SQL Generated By Power BI DirectQuery Mode

If you’re performance tuning a DirectQuery mode semantic model in Power BI, one of the first things you’ll want to do is look at the SQL that Power BI is generating. That’s easy if you have permissions to monitor your source database but if you don’t, it can be quite difficult to do so from Power BI. I explained the options for getting the SQL generated in DirectQuery mode and why it’s so complicated in a presentation here, but I’ve recently found a new way of doing this in Power BI Desktop (but not the Service) that works for some M-based connectors, for example Snowflake.

The trick is to use Profiler and the M Data Provider Events/Execute Source Query event. To demonstrate this I created a DirectQuery semantic model in Power BI Desktop using Snowflake as a source. I then connected SQL Server Profiler to Power BI Desktop (the easiest way to do this is to install it as an External Tool as described here, although you can connect by finding the diagnostics port as described here) and selected the following events:

I then added a card visual to my report that contained a simple SUM measure.

Here are the events generated in Profiler:

As you would expect there’s a single DirectQuery Begin/End event pair generated because a single SQL query is sent to Snowflake by the card visual. However, the TextData column shows the following SQL – which is definitely not Snowflake SQL:

SELECT SUM([t0].[fare_amount])
 AS [a0]
FROM [NYCTAXIDATA] AS [t0] 

This is because for some data sources, Power BI emits TSQL and the connector used then converts this to the actual dialect of SQL used by the source database.

This is where the Execute Source Query event comes in. The TextData column for this event shows the actual SQL sent to Snowflake for this query, which is:

select sum("fare_amount") as "C1"
from "NYCTAXIDB"."PUBLIC"."NYCTAXIDATA"

Interestingly this event is generated after the Query End event for the whole query, but you can associate this event with its parent DAX query by matching the value in the RequestId column with the RequestId for the other events generated by the query.

Exploring Advanced Relationship Properties In Power BI Copilot

I’ve been playing around with Power BI Copilot a lot recently, and following on from my recent posts on tuning Copilot by editing the Linguistic Schema and setting the Row Label and Key Column properties in this post I want to delve into some of the advanced properties of relationships. I won’t pretend to understand everything that’s going on here – I’m still learning all this – but I thought it would be useful to share what I’ve found out so far.

These properties are described in this YouTube video from the Q&A Modeling Bootcamp series that I referred to in my Linguistic Schema post. They available to set in Power BI Desktop on the Relationships pane of the Q&A Setup dialog by clicking on the “Add to relationship” dropdown box” when creating a new relationship:

For my testing I created a semantic model containing a single table called Journeys. Each row in this table represents a leg of a journey; the journeys are identified by the values in the Journey column, each leg is identified by the values in the Stop column. When a person starts a journey there is a row with Stop = 1 and Stop Type = Start; they may then call in on one or more towns, indicated by Stop Type = Call; and when the journey ends there is a row with Stop Type = End. The Mode column contains the mode of transport, the date represents the date of the leg of the journey, and there is a Booked column which tells you when the journey was booked.

For example, in this data Chris started a car journey in London on January 1st, called in on Oxford, then finished his journey in Birmingham on January 2nd.

There’s also a measure called Journey Count which does a distinct count on the journey column:

Journey Count = DISTINCTCOUNT(Journey[Journey])

Now Copilot does a pretty good job with this table without any tuning. For example, the prompt

Whose journeys started in London?

is correctly interpreted most of the time as “Show the person for journeys where Town is London and the Stop is 1”:

This is a great example of where the LLM adds extra smarts that aren’t present in the old Q&A feature. But some similar questions don’t get answered so well. For example:

Whose journeys ended in Glasgow?

…gives this result:

Not only is the number shown irrelevant the answer is wrong, because Helen also ended a journey in Glasgow. So some tuning is needed.

Here’s where I’m going to show what I did and how it fixed the problem but I want to make it clear that there may be better ways of doing this. I’m sure this time next year I’ll look back at this post and cringe and my ignorance but hey, we all have to start somewhere!

I added several relationships to the model, for example this Verb relationship to indicate that People end Journeys in Towns on Dates where Stop Type is “End”.

In this case the People column indicates the subject, the verb is “end” and the objects is the Journeys table. There is a prepositional phrase “in” to indicate that journeys end in towns. Journeys end on the date given in the Date column. The fact that a journey has ended is indicated by a condition that the Stop Type column contains the value “End”. The prepositional phrase, Time and Condition were added with the “Add to relationship” dropdown.

After this I got the following, correct response to the prompt

Whose journeys ended in Glasgow?

Here’s another example. Consider the prompt:

Who passed through Birmingham?

The original model gives the following, reasonable response:

But let’s say that when we write “passed through” we only mean Stop Type = Call, so journeys like the one Chris made that start or end in Birmingham don’t count.

With the following relationship:

…you not only get the correct result:

…you can ask questions like:

Who passed through Birmingham and on which dates?

As you can see, the “Add to relationship” dropdown box adds even more ways to configure relationships and even more ways to tune Copilot. Knowing your English grammar and parts of speech is going to help a lot here – and this is something that may not come easily to someone working in tech. I like this kind of challenge though. Who knows, maybe the ability to set up a Linguistic Schema will be as important as the ability to write DAX for Power BI developers soon?

Finding Power BI Semantic Model Refresh Operations In Gateway Logs

The logs for your on-premises data gateway contain a lot of information that is useful for troubleshooting and performance tuning. The contents of the logs are documented here and there a several great solutions out there for reporting on their contents: there’s a Microsoft-provided template and Rui Romano’s more sophisticated monitoring solution here (there’s also another, even better community solution being worked on that will be publicly available soon). To do any troubleshooting or performance tuning, though, you need to link the data generated by Power BI in Log Analytics or Profiler traces to the data in the gateway logs. How can you do this?

Let’s take the refresh of an Import mode semantic model that uses an on-premises SQL Server database as a source as an example. If you have Log Analytics connected to your workspace you can run a KQL query something like this to find the events generated by your semantic model refresh by Power BI:

PowerBIDatasetsWorkspace
| where TimeGenerated > ago(1hr)
| where OperationName in ("CommmandBegin", "CommandEnd", "ProgressReportBegin", "ProgressReportEnd")
//| where OperationName in ("ExecutionMetrics")
| project OperationName, OperationDetailName, EventText, TimeGenerated, XmlaRequestId 
| order by TimeGenerated asc

All the events associated with the refresh command will have the same value in the XmlaRequestId column. If you’re using a Profiler trace to capture the same information the RequestID column will contain the same value.

If you then export the gateway logs and look in the Query Execution Report log file for example (I did it with Power Query – it’s just a CSV file) you can match the XmlaRequestId value from Log Analytics to the value in the RequestId column in the log file:

In this example my semantic model contains two tables sourced from an on-premises SQL Server database, so there are two entries in the Query Execution Report log file, one for each table. Once you’ve found the matching rows in the Query Execution Report log file then you can find all kinds of useful information such as how much time (if any) it took to read data from the source or to spool data to disk. You can do the same thing with the other gateway log files: for example the Open Connection Report will tell you how long it took to open any connections to the data source.

If you have a DirectQuery model connected to an on-premises data source then you do the same thing: all the events associated with a DAX query (such as the Query Begin/End events) will all have the same XmlaRequestId, and again all you need to do is find that value in the RequestId column in the gateway logs.