Getting Report Visual IDs With Power BI Desktop Developer Mode

Back in 2021 I wrote a post showing how you can link a DAX query generated by a Power BI report in Log Analytics to a visual in a Power BI report. In that post I showed how you can get the unique IDs for each visual in a report using Performance Analyzer, an approach which works but has drawbacks. Then, in 2022 I wrote another post showing how you can get the same IDs using the Power BI Embedded Analytics Playground, which again works but isn’t perfect. After that, this August, my colleague Rui Romano pinged me on Teams to point out that the new Power BI Desktop Developer Mode and the .pbip file format provides the best and easiest way to get these IDs.

If you do save a Power BI report in .pbip format, look in the report folder and open the report.json file (where the report definition is stored) with a tool like Visual Studio Code then you can find the IDs for visuals quite easily. The sections array lists all the pages in the report; the visualContainers array within that lists all the visuals on a page; and the visual’s ID can be found in the config property as its name:

As a result, if you’re looking in Log Analytics and you need to understand which visual a query was generated by, you can just copy the ID from the ApplicationContext column of a QueryEnd event and search for that ID in the report.json file. Even when you’ve found that ID, though, it can be hard to know which visual it relates to. This is where a tool I blogged about a few weeks ago, PBI Inspector, comes in handy: as part of its output it draws wireframes of each report page showing each visual, its type and its ID. Here’s the report page from the example above, with two visuals on it, and the associated PBI Inspector wireframe:

Yet another way in which Power BI Developer Mode makes our lives better!

VisOps For Power BI With PBI Inspector

This week, one of my colleagues at Microsoft, Nat Van Gulck, showed me a cool new open-source tool he’s been working on to make VisOps for Power BI much easier: PBI Inspector. What is VisOps? I’ll admit I didn’t really know either, so being lazy I asked Nat to write a few paragraphs describing the project and why it will be useful:

Great progress has been made over the years with software development CI\CD tooling and processes (aka DevOps), just not so much with BI report visualisations and charts where we’ve come to accept only manual checks before publishing to production.  PBI Inspector is a rules-based visual layer testing tool for Power BI which aims to fill this tooling gap. It runs on either the report author’s desktop or as part of a CI\CD pipeline. The latter follows naturally from the recent Power BI Desktop Developer mode announcement which marks a step change in providing Pro developers with much improved source control capabilities. PBI Inspector reads report files in the PBIP format (currently in Preview) although it also accepts PBIX files. Test results can be presented in several formats including HTML, JSON and Azure DevOps logging commands

PBI Inspector’s rules combine Greg Dennis’s JsonLogic .NET implementation, which allows for the definition of expressive rule logic, with the querying abilities of JsonPath and JsonPointer libraries to select nodes from the Power BI report’s JSON layout definition for testing purposes.  As an illustrative example, here’s a rule that tests if charts are wider than tall in each report page and returns an array with the names of visuals that fail the test: 

{ 

        "name": "Charts wider than tall", 

        "description": "Want to check that your charts are wider than tall?", 

        "disabled": false, 

        "logType": "warning", 

        "forEachPath": "$.sections[*]", 

        "forEachPathName": "$.name", 

        "forEachPathDisplayName": "$.displayName", 

        "path": "$.visualContainers[*].config", 

        "pathErrorWhenNoMatch": false, 

        "test": [ 

        	        { 

                     "map": [ 

                     	{ 

                                    "filter": [ 

                                        { 

                                            "var": "visualsConfigArray" 

                                        }, 

                                        { 

                                            "<=": [ 

                                                { 

                                                    "var": "layouts.0.position.width" 

                                                }, 

                                                { 

                                                    "var": "layouts.0.position.height" 

                                                } 

                                            ] 

                                        } 

                                    ] 

                                }, 

                                { 

                                    "var": "name" 

                                } 

                            ] 

                        }, 

                        { 

                            "visualsConfigArray": "." 

                        }, 

                        [] 

                    ] 

} 

Here’s an example result wireframe depiction of a report page (provided as part of the HTML output) highlighting two visuals that failed the test because they are taller than wide: 

For additional rule examples, see PBI-Inspector/DocsExamples/Example rules.json at main · NatVanG/PBI-Inspector (github.com). For further details see NatVanG/PBI-Inspector: A rules-based Power BI Desktop file inspection or testing tool. (github.com)

I think this is a great example of the kind of community innovation that Power BI Desktop Developer Mode allows (see also the recent announcement of PBI Explorer). A lot of organisations that use Power BI don’t, and will never, care about this kind of thing – but those who do have been very vocal about Power BI’s previous limitations in the area of DevOps and DataOps. Thanks to the work of people like Mathias Thierbach (of pbi-tools fame), John Kerski and Nat we can see how quickly Power BI is catching up.

Can You Develop For Power BI Using Only Your Browser?

One frequently asked question I see asked on Power BI forums is whether it’s possible to run Power BI Desktop on a Mac or indeed anything other than a Windows PC. There are already a lot of detailed blog posts and videos out there on this subject, such as this one from Guy In A Cube: the answer is no, you can’t run Power BI Desktop natively on a Mac or any other OS apart from Windows and there are no plans to port it over, so you need to either install Windows somehow (for example with Boot Camp) or use tools like Parallels or Turbo.Net to run Power BI Desktop. You can also spin up a Windows VM, for example in Azure, and run Power BI Desktop on that; Power BI Desktop is also now fully supported on Azure Virtual Desktop too although not on other virtual environments like Citrix.

Turning the question around, however, leads you to some aspects of the question that haven’t been fully explored. Instead of asking “Can I run Power BI Desktop on my Mac?”, you can instead ask “Can I do all of my Power BI development using only a browser?”. At Microsoft our long-term goal is to make all Power BI development web-based, but how close are we to that goal?

The first point to make is that it has always been possible to build Power BI reports (as opposed to datasets) in the browser without needing Power BI Desktop. You can even now build basic paginated reports in the browser too now. Historically I’ve never been a fan of encouraging users to do this because developing in Power BI Desktop gives you the chance to roll back to a previous version of the report if you need to – assuming you have saved those previous versions of your .pbix file. What’s more, if two or more people try to edit the same report at the same time then the last person to save wins and overwrites the other person’s changes, which can be dangerous. Fabric’s Git integration, which does work for Power BI reports, has changed my attitude somewhat though. As Rui Romano discusses here you can now safely make changes to reports in the Power BI Service, save them to source control and then roll back if you need to; this assumes your users are comfortable using Git, however, and it doesn’t solve the simultaneous development problem.

What about dataset development? Web editing for datasets has been in preview for a few months now and is getting better and better, although there are still several limitations and the focus up to now has been on modelling; connecting to data sources is on the public roadmap though. As a result Power BI Desktop is still needed for dataset development, at least for now.

Do datamarts change anything? Or Direct Lake mode in Fabric? Datamarts do solve the problem of being able to connect to and load data using just your browser and are available (if not GA yet) today. If you’re only using datamarts to avoid the need for a Windows PC to develop on, though, you’re paying a price: for a start you’ll either be loading the data twice if you want to use Import mode for your dataset (once to load data into the datamart, once to load the same data into the dataset) or taking the query performance hit of using DirectQuery mode. There are also some other limitations to watch out for. Fabric Direct Lake mode datasets, for me, offer all the benefits of Datamarts without so many of the limitations – Direct Lake mode means you only load the data once and still get near-Import mode performance, for example – and will be the obvious choice when Fabric GAs and once features like OneSecurity are available. With Fabric it will be possible to for most Power BI developers do all their work using only a browser, although for more complex projects (and to be clear this is only a small minority of projects) it will still be necessary to use other tools such as Tabular Editor, DAX Studio, SQL Server Management Studio and SQL Server Profiler which can only run on a Windows PC. I can imagine some of this more advanced developer functionality coming to the browser too in time, though.

In summary while Power BI Desktop and therefore Windows is still needed for Power BI development today, the day when you can do most and maybe all of your development in the browser is in sight. All you Mac owners need to be patient just a little while longer!

The “Visual Has Exceeded The Available Resources” Error In Power BI Desktop

One of my most popular blog posts of the last few years is this one on the “Visual has exceeded available resources” error in the Power BI Service:

https://blog.crossjoin.co.uk/2020/01/20/visual-has-exceeded-the-available-resources-error-power-bi/

This error only used to appear in the Power BI Service, but the good news is – and trust me, this is good news – it may now appear in Power BI Desktop too following the May 2023 release.

First, a quick recap of what causes this error. The Power BI Service has finite resources so we at Microsoft don’t allow you to run queries there that take hours to complete or consume vast amounts of memory; we impose limits on query duration and memory usage, and you’ll see this error in the Power BI Service when you hit these limits. The problem has been that, up until recently, these limits were not imposed in Power BI Desktop so it was easy to develop inefficient reports and datasets that worked ok (if a little slowly) on the developer’s PC but then caused errors after they were published. What has changed is that these limits are now imposed in Power BI and they are also configurable there.

The announcement of this change is here:

https://powerbi.microsoft.com/en-us/blog/power-bi-may-2023-feature-summary/#post-23312-_Toc134425717

How do you know if you are running into these limits? You’ll see an error in your visual in Power BI Desktop that looks like this:

The error message is:

Visual has exceeded the available resources

If you click on the “See details” link you’ll see the following dialog:

Resources Exceeded
This visual has exceeded the available resources. Try filtering to decrease the amount of data displayed.

What should you do if you encounter this error? The most important thing is to understand why it’s happening. There are two possible causes:

  1. Your query is taking too long to run. For the Power BI Service, the default query timeout is 225 seconds although it is possible for an admin to reduce this; unless you’ve set a custom limit or you’re not using a Power BI dataset as a source, it’s likely that 225 seconds is the longest that a query will run for in Power BI Desktop.
  2. Your query is using too much memory. This is probably because you are doing an inefficient DAX calculation on a very large table (filtering on entire table rather than a single column is a classic anti-pattern, for example).

As a result you need to do some tuning. “But Chris!”, you say, “my query has to run for longer than 225 seconds! It’s too difficult to tune!” To which I say – tough, you don’t have a choice. Slow, inefficient queries result in unhappy users so if you don’t fix these issues you have an even bigger problem. This video of Marco’s is a good place to start if you want to learn about performance tuning.

In order to do that tuning though (or if you just want to get stuff done before you do any tuning, or if you think the wrong limits are being imposed) you’ll need to turn off the limits so you can capture the DAX query for the offending visual using Performance Analyzer. To do this, go to File/Options and settings/Options to open the Options dialog, go to the Report settings pane and scroll to the bottom to see the Query limit simulations section:

You can either use the “Custom limits” option, as shown in the screenshot above, to set your own limits (enter 0 in one of these boxes for no limits to be imposed) or use the “No query limits” for no limits. You should only use these options temporarily, remember, otherwise you run the risk of getting this error in the Power BI Service later on!

It’s worth mentioning that the limits imposed in Power BI Desktop are only imposed on queries generated by Power BI Desktop itself, which means that they won’t affect external tools like DAX Studio that can also be used to query a dataset in Power BI Desktop. You can see how the limits are imposed by running a Profiler trace on Power BI Desktop, finding the Query Begin event for each query and looking for the Timeout and DbpropMsmdRequestMemoryLimit properties in the Property List shown under the query text:

Also, these settings are saved on a per-file basis, so if you create a new .pbix file it will have the default settings and not the settings you made in any other .pbix file.

Text Search Performance In Power BI

In the blog post for the October 2022 release of Power BI Desktop there’s a brief description of a new optimisation in Power BI for filters that perform searches in text columns. In this blog post I’m going to share more details about this optimisation and how you can make sure your reports can benefit from it.

First of all, a brief description of the problem. Let’s say you have a Power BI dataset containing just the following table:

Let’s also say you want to build a report where a user can search for certain terms in the Description field and display a filtered table in a report:

Note how the Description field has been dragged into the Filter pane, the filter type is set to “Advanced filtering”, the “Show items when the value” dropdown has “contains” selected and the search term is “citrus”, so the table on the left only shows the fruit where the description includes the text “citrus”. Some custom visuals such as the Text Filter visual have very similar functionality.

Here’s the DAX query generated for the table visual in this screenshot:

DEFINE
	VAR __DS0FilterTable = 
		FILTER(
			KEEPFILTERS(VALUES('Fruit'[Description])),
			SEARCH("citrus", 'Fruit'[Description], 1, 0) >= 1
		)
	VAR __DS0Core = 
		CALCULATETABLE(
			SUMMARIZE('Fruit', 'Fruit'[Fruit Name], 'Fruit'[Description]),
			KEEPFILTERS(__DS0FilterTable)
		)
	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Fruit'[Fruit Name], 1, 'Fruit'[Description], 1)
EVALUATE
	__DS0PrimaryWindowed
ORDER BY
	'Fruit'[Fruit Name], 'Fruit'[Description]

As you can see, the filter in this query is accomplished using the DAX Search() function. This is a great example of the type of query that the optimisation I’m talking about can speed up.

Here are some more details about how this optimisation works:

  • This optimisation is now enabled and works in the same way in both Power BI Desktop and the Power BI Service.
  • The first time that any query or measure that uses either the Search() or ContainsString() DAX functions on a text column is evaluated, Power BI starts to build a special text index just for that column.
  • This index build will only be successful if two conditions are true:
    • The text column must only contain characters from the classic 128 character ASCII set.
    • The index build must take less than 25 seconds. If 25 seconds elapse then the build will timeout and Power BI will continue to run the query without the index being present.
  • If the index build succeeds for that column then the index can be used by all subsequent queries by all users, but it will be dropped when:
    • Power BI Desktop is restarted, if you’re in Power BI Desktop.
    • The dataset is refreshed, either in Power BI Desktop or the Power BI Service.
    • The dataset is evicted from memory in the Power BI Service or when the dataset is under memory pressure.
  • DAX queries that use the index will be a lot faster than queries that do not, although the difference will only be noticeable when you are searching on a table with thousands of rows and in a column with reasonably long text values.
  • There is no way for you to know whether an index has been built or whether the build has failed, or if a DAX query uses an index, unfortunately. However if you look at the duration of the DAX queries that do this kind of search (for example in Log Analytics or by running a Profiler trace) and you see the first query after a refresh is relatively slow and subsequent queries are almost instant then it’s likely that the index has been built successfully; on the other hand if your queries are consistently slow then it’s likely the index has not been built successfully. Before you all leave comments complaining, I know this is not ideal and I hope we’ll be able to make further improvements in the future.

How can you ensure that the index is built successfully? The only way to ensure that you stay under the 25 second timeout limit is to reduce the amount of text that needs to be indexed, either by reducing the number of rows in the table or by reducing the amount of text in the column. Reducing the amount of text in the column is probably the only feasible option: for example you may be able to remove words like “and” and “the” from your text since users are less likely to need to search for them. Making sure your text column only contains ASCII characters is much harder because you can’t just remove all non-ASCII characters (such as characters with diacritics) without making your text unsearchable; I have a good-enough solution in the form of an M custom function which removes diacritics from characters in text and ensures that only ASCII characters are left here.

[Thanks to Jeffrey Wang for the information in this post]

Speed Up Power BI Dataset Refresh Performance In Premium Or PPU By Changing The “Parallel Loading Of Tables” Setting

Do you have a a large dataset in Power BI Premium or Premium Per User? Do you have more than six tables that take a significant amount of time to refresh? If so, you may be able to speed up the performance of your dataset’s refresh by increasing the number of tables that are refreshed in parallel, using a feature that was released in August 2022 but which you may have missed.

Some of you may be thinking: haven’t you blogged about refresh parallelism before? Yes: last year I wrote a post about setting the maxParallelism property when refreshing through TMSL scripts, a few months ago I also showed how you could change the same property when refreshing using the Enhanced Refresh API. That was all fairly complex though and what I’m going to show you here is, in contrast, very easy to implement.

To illustrate this I created an Import dataset containing nine tables, each of which loaded data from different CSV files. Each table contained a couple of million rows and took 20-30 seconds to refresh. I then published to a PPU workspace in the Power BI Service and used the technique described in Phil Seamark’s “Visualise your Power BI refresh” blog post to capture what happened during refresh. The refresh took 44 seconds as as you can see from this screenshot of Phil’s report, only six of the tables were refreshed in parallel at any one time:

[Ignore the y axis on this graph – I have no idea what’s going on with it]

This is as you would expect: the default number of objects that can be refreshed in parallel in Premium or Premium Per User is 6. This is also the maximum number of objects that can be refreshed in parallel in Shared capacity.

I then went to my original .pbix file, opened the Options dialog, went to the Current File/Data Load pane and changed the new “Parallel loading of tables” setting on my dataset from Default to Custom and entered the value 9:

This setting sets the maxParallelism property in the refresh commands generated by Power BI in the Power BI Service; it is fully documented here. Rerunning the refresh again in the Power BI Service I found that the overall time taken was down to 29 seconds and that all the tables were refreshed in parallel:

Here’s the Refresh command from the Profiler trace showing maxParallelism set to 9:

In summary: refresh performance got1/3 faster by making this simple change, so if you’re using Power BI Premium and not already doing advanced things with TMSL scripts or the Enhanced Refresh API, this is definitely something you should experiment with. As I said last year you can try setting this setting up to a value of 30 but there’s no guarantee you’ll get the amount of parallelism you ask for (it depends on the Premium capacity SKU and a few other factors). What’s more, setting this property too high could result in slower refresh because it might result in Power BI overloading your data source with queries.

[Thanks to Akshai Mirchandani for answering my questions about this subject]

What Is The Difference Between A Live Connection And DirectQuery Mode In Power BI?

It’s very easy to get confused between a Live connections and DirectQuery mode in Power BI: the customers I work with frequently get them mixed up, partly because they are similar concepts, partly because other BI tools use the term “live connection” to refer to what Power BI calls DirectQuery mode. In this post I will explain the differences between these two terms.

Power BI Datasets, Azure Analysis Services and SQL Server Analysis Services

The first thing to understand is that when a Power BI report runs it can only get data from one of three places: a Power BI dataset; an Azure Analysis Services (AAS) database; or an on-premises instance of SQL Server Analysis Services(SSAS). There are some exceptions to this rule but we can ignore them for the purposes of this post. Each of these three sources run variations of the same engine, ie the Analysis Services engine, and can be queried in DAX. These sources may in turn get data from other sources when they are queried but as far as the Power BI report is concerned the only sources it can talk to directly are Power BI datasets, AAS databases and SSAS databases.

Live connections

Normally, when you open up Power BI Desktop and start developing, your .pbix file contains the definition of a Power BI dataset and a report that connects to it. When you publish this .pbix file they are split up and you see separate dataset and report objects in your Power BI workspace.

However, in Power BI Desktop you may instead choose to connect to a Power BI dataset that has already been published to the Power BI Service, or an instance of AAS or SSAS, and use that as the source for the report. The main advantage of doing this is that different developers can work on the dataset/AAS database/SSAS database definition and the report at the same time. When you connect to an already-published dataset or an AAS or SSAS database from Power BI Desktop you are creating a Live connection.

To connect to a published Power BI dataset and create a Live connection choose the “Power BI datasets” option under the Get Data menu:

If you want to create a Live connection to AAS or SSAS choose the Analysis Services source and then make sure the “Connect live” option is selected:

After you’ve created the Live connection you’ll see a message telling you this in the bottom right-hand corner of the screen in Power BI Desktop:

The key thing to understand is that when you create a Live connection your .pbix file no longer contains a dataset definition – it only contains a report, and that report connects to a dataset, AAS database or SSAS database that already exists elsewhere.

The concept of a Live connection only really makes sense in Power BI Desktop: as I said, in the Power BI Service reports and datasets are separate objects anyway. A report can only have one Live connection because a report can only ever connect to a single dataset, AAS database or SSAS database.

If you publish a report with a Live connection to an already-published Power BI dataset, then the only new thing that will appear in the Power BI Service is the report . If you publish a report with a Live connection to AAS or SSAS then you will see a new dataset appear in the Power BI Service but this isn’t really a dataset, it’s really just a pointer to the AAS or SSAS database where the data is really being held.

DirectQuery mode

There are two fundamental ways a Power BI dataset, AAS database or SSAS Tabular database can work with data: Import mode and DirectQuery mode. In Import mode the data needed for your dataset is copied into the dataset and stored inside it. In DirectQuery mode, when a Power BI report requests data from the dataset by running a DAX query against it, the dataset in turn generates queries that are run on your data source to get the data it needs on demand. If your data source is a SQL Server database then Power BI will generate SQL queries to get the data it needs; for other sources it may generate queries in other query languages. A Power BI dataset is always a semantic layer, an easily consumable interface for your BI data: in Import mode it is a semantic layer and it also stores the data it needs; in DirectQuery mode it is just a semantic layer.

The only data sources that Power BI can use in DirectQuery mode are those that have some kind of query language that Power BI can generate. You can’t connect to a CSV file in DirectQuery mode, for example, because there’s no way of running queries against it unless you load the data in it into some other form of database. Other data sources may have rich query languages but Power BI will only be able to connect to them in DirectQuery mode if a suitable connector exists for that source. Most commonly-used relational databases can be used in DirectQuery mode. If your data source supports DirectQuery mode you’ll see that as an option on the dialog when you first connect:

The key thing to understand here is that the term “DirectQuery mode” describes where the data for a dataset is kept, ie outside the dataset as opposed to inside it in Import mode.

Composite models

Different tables in a single Power BI dataset can have different storage modes: some can be in Import mode and some can be in DirectQuery mode. Furthermore, different DirectQuery tables in the same Power BI dataset can connect to different data sources. When you have different tables using different storage modes or different tables using different DirectQuery datasets then you have what is called a “composite model“.

DirectQuery on Power BI datasets and Analysis Services

One final twist is that you can create Power BI datasets in DirectQuery mode and use other Power BI datasets, AAS databases or SSAS databases, as a data source. This means you can have DirectQuery datasets that connect to other datasets that can potentially connect to other datasets. It’s useful because, for example, you can have a single, centrally-developed dataset containing all of your company’s sales data and then other Power BI developers can create their own datasets that connect to this central dataset and extend it by adding extra tables of data and new measures without ever duplicating the data in the central dataset.

Although the way things behave behind the scenes is somewhat different, a Power BI dataset in DirectQuery mode that connects to another Power BI dataset as its source is conceptually very similar to a Power BI dataset in DirectQuery mode that connects to a relational database like SQL Server: it just generates DAX queries to get the data it needs when it is queried rather than SQL queries.

Conclusion

Live connections and DirectQuery mode are two distinct concepts. Live connections occur when you create reports in Power BI Desktop that connect to already-published Power BI datasets, AAS databases or SSAS databases. DirectQuery mode is one of the two fundamental ways that a Power BI dataset can work with data (the other being Import mode) and is the mode where a dataset does not contain any data but instead queries other data sources to get the data it needs when it is itself queried.

[Thanks are due to my friend Peter Myers – this blog post expands on some explanations we developed last year in a session we co-presented at various Power BI user groups called “Demystifying Power BI Datasets”. Several recordings of this session exist on YouTube, for example here.]

Monitoring Power BI Desktop Activity Using Extended Events In SQL Server Management Studio

Something I do all the time when performance tuning Power BI is use SQL Server Profiler to monitor query and refresh activity. There’s a handy external tool that lets you open up Profiler with a connection to Power BI Desktop; if you’re using Power BI Premium you can also connect Profiler up to a published dataset using the XMLA Endpoint. Profiler is a bit old-school though and likely to lead to snarky comments from SQL Server DBAs (it’s not deprecated for Analysis Services though!) who’ll say you should be using Extended Events (aka xEvents) instead. And guess what – did you know you can use Analysis Services xEvents to monitor activity in Power BI Desktop?

Doing this is quite simple. The only challenge to overcome is how to connect SQL Server Management Studio to an instance of Power BI Desktop, and the easiest way to do this is to install DAX Studio as an external tool, open it connected to Power BI Desktop, and then look in the bottom right-hand corner of the screen for the address of the instance of the Analysis Services engine inside Power BI Desktop that DAX Studio is connected to:

You can then use this address in the connection dialog of SQL Server Management Studio:

Once you’ve connected, under the Management folder in the Object Explorer pane you’ll find there’s already an xEvent session running you can look at:

You can also create a new session by following the instructions here.

All very interesting, but what’s the point of doing this? I admit that I find Profiler much quicker and easier to use but xEvents have two slight advantages over Profiler:

  1. Frustratingly, all the timestamps in Profiler for Analysis Services events are rounded to the nearest second, which makes answering questions like “which event started or finished first?” much harder to answer. In contrast, all the timestamps for xEvents are very granular:
  • 2. There are some events that are only available in xEvents and not in Profiler traces. The ExecutionStatistics event shown in the screenshot above is a great example: the text column for it a lot of detailed information that I don’t know how to interpret (yet) and which isn’t documented but apparently can be used to determine the amount of time spent in the Formula Engine among other things. This is useful because the widely used method of doing this – add up the amount of time spent in the Storage Engine and subtract from the overall query duration – is unreliable and will become even more unreliable in the future.

I love this kind of detailed information and I’m sure there are a lot of practical problems it can be used to solve. The bad news is that you can’t use this technique against Power BI Premium, at least at the time of writing. Look out for more blog posts on this subject in the future!

Why I’m Excited About The New DAX EvaluateAndLog() Function

If you aren’t active on Twitter or LinkedIn (or you don’t spend a lot of time on them) you may not have heard about the new EvaluateAndLog() function. It’s not documented yet and not mentioned on any of the official blogs, but you can read about it in the last two posts on Jeffrey Wang’s blog here:

[If you’re wondering who Jeffrey is, he’s one of the resident geniuses that Microsoft employs on the Power BI development team – he’s one of the people Marco and Alberto go to when they need help 😀]

I think this is the most exciting thing to happen in DAX since the introduction of variables. Why? Well we all know that writing DAX can be hard, and part of why it’s hard is that it’s hard to debug. Things like Tabular Editor 3’s DAX Debugger can be really useful but the EvaluateAndLog() function is the first time the AS/Vertipaq engine has given us the ability to see the intermediate values used by a DAX expression to calculate its output without lots of tedious commenting and uncommenting or the use of ConcatenateX(). Have you ever wanted a watch window to see what’s going on inside a measure? Have you ever wanted to debug your DAX by printing out values to a console or a log file? Well that’s what EvaluateAndLog() allows you to do.

It’s still early days: Jeffrey hasn’t blogged about all of its secrets yet and the functionality needs a bit of polishing. That said, if you’re passionate about DAX I strongly suggest you check it out right now because it will make your life a lot easier.

Measuring Power Query CPU Usage During Power BI Dataset Refresh

Some time ago I wrote a post about how optimising for CPU Time is almost as important as optimising for Duration in Power BI, especially if you’re working with Power BI Premium Gen2. This is fairly straightforward if you’re optimising DAX queries or optimising Analysis Services engine-related activity for refreshes. But what about Power Query-related activity? You may have a small dataset but if you’re doing a lot of complex transformations in Power Query that could end up using a lot of CPU, even once the CPU smoothing for background activity that happens with Premium Gen2 has happened. How can you measure how expensive your Power Query queries are in terms of CPU? In this post I’ll show you how.

Let’s consider two Power Query queries that return a similar result and which are connected to two different tables in the same Power BI dataset. The first query returns a table with one column and one row, where the only value is a random number returned by the Number.Random M function:

#table(type table [A=number],{{Number.Random()}})

The second query also returns a table with a single value in it:

let
  InitialList = {1 .. 1000000},
  RandomNumbers = List.Transform(
    InitialList,
    each Number.Random()
  ),
  FindMin = List.Min(RandomNumbers),
  Output = #table(
    type table [A = number],
    {{FindMin}}
  )
in
  Output

This second query, however, generates one million random numbers, finds the minimum and returns that value – which of course is a lot slower and more expensive in terms of CPU.

If you run a SQL Server Profiler trace connected to Power BI Desktop and refresh each of the two tables in the dataset separately, the Command End event for the refresh will tell you the duration of the refresh and also the amount of CPU Time used by the Analysis Services engine for the refresh (there will be several Command End events visible in Profiler but only one with any significant activity, so it will be easy to spot the right one). In Desktop, however, the Command End event does not include any CPU used by the Power Query Engine. Here’s what the Command End event for the first Power Query query above looks like in Desktop:

As you would expect the values in both the Duration and CPU Time columns are low. Here is what the Command End event looks like for the second query above:

This time the refresh is much slower (the Duration value is much larger than before) but the CPU Time value is still low, because the Analysis Services engine is still only receiving a table with a single value in it. All the time taken by the refresh is taken in the Power Query engine.

If you publish a dataset containing these queries to a Premium workspace in the Power BI Service, connect Profiler to the XMLA Endpoint for the workspace, and then refresh the two tables again then for the first, fast query you won’t notice much difference:

[Note that in this screenshot I’ve chosen a comparable Command End event to the one I used in Desktop, although for some reason it doesn’t show the duration. The overall refresh duration, which includes some extra work to do a backup, is around 2 seconds]

However, for the second, slower query you can see that the CPU Time for the Command End event is much higher. This is because in the Power BI Service the event’s CPU Time includes all the Power Query-related activity as well as all Analysis Services engine activity:

This is a simple example where there is very little work being done in the Analysis Services engine, which means that pretty much all the CPU Time can be attributed to the Power Query engine. In the real world, when you’re working with large amount of data, it will be harder to understand how much work is being done in the Analysis Services engine and how much is being done in the Power Query engine. This is where Power BI Desktop comes in, I think. In Desktop you know you are only seeing the CPU used by the Analysis Services engine, so I’ll bet that if there is a big difference in the ratio of CPU Time to Duration for your refresh in Power BI Desktop compared to the Power BI Service, it’s highly likely that that difference is due to Power Query engine activity and that’s where you should concentrate your optimisation efforts.

Of course the next question is how can you optimise Power Query queries so they use less CPU? I don’t know, I haven’t done it yet – but when I have something useful to share I’ll blog about it…