Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh

[Update September 2023: now that Power BI is part of Microsoft Fabric, the new features of Fabric make it much easier to solve this problem as described here]

Power BI incremental refresh is a very powerful feature and now it’s available in Shared capacity (not just Premium) everyone can use it. It’s designed for scenarios where you have a data warehouse running on a relational database but with a little thought you can make it do all kinds of other interesting things; Miguel Escobar’s recent blog post on how to use incremental refresh for files in a folder is a great example of this. In this post I’m going to show you how to use incremental refresh to solve another very common problem – namely how to get Power BI to keep the data that’s already in your dataset and add new data to it.

I know what you’re thinking at this point: isn’t this what incremental refresh is meant to do anyway? Well, yes it is, but as I said it’s designed to work in scenarios where a relational data warehouse stores a copy of all the data that’s in your dataset and in some cases you don’t have that luxury. For example, you may want to:

  • Connect to a data source that only gives you a set of new sales transactions each day, and add these sales transactions to the ones you have already stored in a Power BI dataset
  • Take a snapshot of a data source, like an Excel workbook, that is changing all the time and store each of these daily snapshots of the contents of the workbook in a Power BI dataset

The current Power BI incremental refresh functionality doesn’t make it easy to do either of these things, and that’s why I’ve written this post.

Let me be clear though: in all these cases I recommend that you don’t use the technique I’m going to show you. If possible, you should stage a copy of each day’s data in a relational database (ie build that data warehouse) or even as text files in a folder (Power Automate may be useful to do this) and use that staged copy as the data source for Power BI. This will allow you to do a full refresh of the data in your dataset at any point in the future if you need to, or create a completely new dataset, even though it means you have to do a lot of extra work. If it isn’t possible to do this, or you’re too lazy or you’re just curious to see how my technique works, read on.

For the example I’m going to show in this post I’m going to use a web-based data source, an RSS feed from the BBC News website that returns a list of the current top stores on the site. You can find it here:

http://feeds.bbci.co.uk/news/rss.xml

RSS is based on XML and there’s no authentication needed to access this feed, so loading a table of these top stories into Power BI is very easy – so easy, I’m not going to bother explaining how to do it. Instead I’m going to show you how to use incremental refresh to store a copy of this list of top stories every day in the same table in a dataset.

First of all, for incremental refresh to work two Power Query parameters of data type date time need to be created called RangeStart and RangeEnd. More details about how to create them can be found in the docs here. Power BI expects them to be used to filter the rows loaded into a table; it also uses them to partition these tables in the dataset.

Next you need a Power Query query to load the news stories. Here’s the M code for the query:


let
//Connect to the BBC News Top Stories RSS feed
//and create a nicely formatted table
Source = Xml.Tables(Web.Contents("http://feeds.bbci.co.uk/news/rss.xml")),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Attribute:version", Int64.Type}}),
channel = #"Changed Type"{0}[channel],
#"Changed Type1" = Table.TransformColumnTypes(channel,{{"title", type text}, {"description", type text}, {"link", type text}, {"generator", type text}, {"lastBuildDate", type datetime}, {"copyright", type text}, {"language", type text}, {"ttl", Int64.Type}}),
item = #"Changed Type1"{0}[item],
#"Changed Type2" = Table.TransformColumnTypes(item,{{"title", type text}, {"description", type text}, {"link", type text}, {"pubDate", type datetime}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"guid"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"title", "Title"}, {"description", "Description"}, {"link", "Link"}, {"pubDate", "Publication Date"}}),
//Find the current date and time when this query runs
CurrentDateTime = DateTimeZone.FixedUtcNow(),
//Find yesterday's date
PreviousDay = Date.AddDays(DateTime.Date(CurrentDateTime),-1),
//Put the current date and time in a new column in the table
#"Added Custom" = Table.AddColumn(#"Renamed Columns", "UTC Data Load Date", each CurrentDateTime),
#"Changed Type3" = Table.TransformColumnTypes(#"Added Custom",{{"UTC Data Load Date", type datetimezone}}),
//Add the filter required for incremental refresh
//Only return rows in this table if:
//a) The RangeStart parameter equals yesterday's date, and
//b) RangeEnd is not null (which should never be true)
#"Filtered Rows" = Table.SelectRows(#"Changed Type3", each DateTime.Date(RangeStart)=PreviousDay and RangeEnd<>null)
in
#"Filtered Rows"

As I said, the first couple of steps aren’t interesting – they just connect to the RSS feed to get the list of top stories as a table. After that:

  • The CurrentDateTime step gets the current date and time at the point when the dataset refreshes. It’s important to note that I’ve used the DateTimeZone.FixedUtcNow function – this not only gives you the current UTC date and time, but it’s guaranteed to give you the same date and time every time you call it within a query. If you use DateTimeZone.UtcNow you may get a different date time returned every time the function is called within the query, which can make things very confusing.
  • The PreviousDay step gets yesterday’s date by extracting the date from the value found in the CurrentDateTime step and subtracting one day.
  • The #”Added Custom” step adds a new column to the table called “UTC Data Load Date” containing the value returned by the CurrentDateTime step. As the name suggests, this column shows when the data in any particular row was loaded.
  • The #”Filtered Rows” step is extremely important. It filters the rows in the table: it will return all the rows in the table if the RangeStart parameter returns yesterday’s date, otherwise it will return no rows at all. It also has an AND condition that checks whether the RangeEnd parameter is not null, which should always be true. Without this step that uses RangeStart and RangeEnd in some way you won’t be able to turn on incremental refresh; the significance of only returning data for yesterday’s date will become clear later.

At this point, unless you happen to have set the value of the RangeStart parameter to yesterday’s date, the table returned by the query will be empty.

Once you have loaded this table to your Power BI dataset (I called it “News”) you need to configure incremental refresh on it. You can do this by right-clicking on the table in the Fields pane in the main Power BI window and selecting Incremental refresh from the menu. Here’s how you will need to configure the settings:

IncrementalConfig

You can ignore the warning about query folding and the M query at the top. What you need to do here is:

  • Set the “Incremental refresh” slider to On
  • Under “Store rows in the last” choose Days from the dropdown and enter the number of days you want to store data for. You probably don’t need to store data indefinitely; if you did, your dataset might get very large. Data that is older than the number of days specified here will be deleted from the dataset.
  • Under “Refresh rows in the last” again select Days from the dropdown and enter 1.
  • Do not check the boxes for “Detect data changes” and “Only refresh complete day”.

With that done all you need to do is publish to the Power BI Service and set up scheduled refresh for once a day (and no more). When the report refreshes each day, all the stories listed in the RSS feed will be added to the existing stories in the dataset. To monitor this I created a measure called Story Count to count the number of rows in the News table, and then created a simple report that showed the total value of this measure and also showed it broken down by the UTC Data Load Date column. Here’s what it looked like yesterday (the 12th of April 2020), after it had already been refreshed for a few days:

ReportYesterday

and here’s what it looked like today (the 13th of April 2020), with 51 more stories having been loaded in this morning:

ReportToday

Job done!

The last thing to point out is that connecting to the XMLA endpoint (or should I say connecting via the “Analysis Services protocol”…?) for the workspace using SQL Server Management Studio, if it’s in Premium, makes it a lot easier to understand what’s happening behind the scenes here. If you right-click on your table in SQL Server Management Studio’s Object Explorer pane and select Partitions:

PartitionsSSMS

…You can see the names of the 51 partitions created to hold the data (remember, above we opted to store 50 days of historical data), the number of rows of data in each partition and the dates that these partitions were last refreshed (or “processed” in Analysis Services terminology):

PartitionRefresh

Notice that the partition for today’s date, April 13th, is empty and that both the partition for today and yesterday (April 12th) have been refreshed today; similarly, the partition for April 11th was last refreshed yesterday on April 12th. The two most recent partitions are always refreshed and this is is why the #”Filtered Rows” step in the M code above only returns rows when the RangeStart parameter holds yesterday’s date, to make sure that the stories for the current day are only stored once.

You can also script out the table to TMSL like so:

PartitionsScript

This allows you to see the definitions of each partition and, crucially, the start and end values that are passed to the RangeStart and RangeEnd parameters when they are refreshed:

TOM

You can download my example pbix file for this post here.

Understanding The “The key didn’t match any rows in the table” Error In Power Query In Power BI Or Excel

One of the most common errors you’ll see when working with Power Query in Power BI or Excel is this:

Expression.Error: The key didn’t match any rows in the table

KeyErrorScreenshot

It can occur with almost any data source and in a wide variety of different circumstances, and for new users of Power Query it can be very confusing. In this post I’ll explain what the error message means and when you’re likely to encounter it using a simple example.

TL;DR You’re probably getting this error because your Power Query query is trying to connect to a table or worksheet or something in your data source that has been deleted or been renamed.

Say you have an Excel workbook with three worksheets in called Sheet1, Sheet2 and Sheet3:

WorksheetsBefore

On Sheet1 there’s some data you want to load into Power BI or Excel using Power Query:

Data

It’s very easy to do this: after you click on the Excel data source and select your Excel workbook, you see the Navigator window with the contents of the workbook and select Sheet1:

Nav

…click OK and your query is ready:

PQQuery

What actually happens in this query though? If you click on each of the four steps in the Applied Steps pane on the right-hand side of the screen, you’ll see what each step does. In particular, notice that the first step in the query (called Source) returns a table with one row for each worksheet, table and named range in the workbook:

SourceStep

This is something that happens with many types of data source: after you have connected you have to choose what data inside the data source you want to connect to in the first step, and the list of things you can connect to is returned in a table. This table lists the same things displayed in the Navigator dialog shown above. In the same way, if you connect to a SQL Server database this first step returns a table with all the tables and views in the database; if you connect to an OData feed the first step shows all the resources available through the feed.

The other thing to notice in the table in the screenshot is the Data column, which contains nested table values. If you were to click on one of these nested tables you would navigate to the data in the worksheet named on that row and that is in fact what the second step in the query does.

Here’s the M code for the whole query:

let
Source =
Excel.Workbook(
File.Contents(
"C:\MyExcelFile.xlsx"),
null,
true),
Sheet1_Sheet =
Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" =
Table.PromoteHeaders(
Sheet1_Sheet,
[PromoteAllScalars=true]
),
#"Changed Type" =
Table.TransformColumnTypes(
#"Promoted Headers",
{{"Month", type text},
{"Sales", Int64.Type}}
)
in
#"Changed Type"

If you want to understand how the code in the second step of this query gets the contents of the cell containing the nested table, I have a blog post here that goes into a lot more detail. Basically the second step (called Sheet1_Sheet in the code, but shown as Navigation in the Applied Steps pane) returns the nested table in the Data column from the row in the table where the Item column contains the value “Sheet1” and the Kind column contains the value “Sheet”. It’s the contents of the Item and Kind columns that are used to identify the row in the table that contains the data you want to see – in database terminology these columns are the key columns on the table.

What happens if you change the name of the worksheet that contains the data you want, from “Sheet1” to “Hello”?

WorksheetsAfter

The result is that the first step of the query now returns a table that looks like this:

AfterNav

Since there isn’t a row where the Item column contains “Sheet1” any more, the second step can no longer find the row it’s looking for – the key value it’s looking for no longer exists in the table – which is why you see the error message you do:

KeyErrorScreenshot

There are two ways to fix the error:

  1. Go to the Excel workbook and change the name of the sheet called “Hello” back to “Sheet1” or
  2. In either the Advanced Editor or the Formula Bar edit the M code and replace the reference to “Sheet1” with “Hello” in the second step of the query:
    FormulaBar

In summary, when you’re working with data sources in Power Query you need to be aware that if something in your data source is renamed or deleted then your query will break. You can of course write some clever M code to deal with situations like this but my recommendation is to try to fix the problem in the data source and not in your query.

Analysing Dataset Refresh In Power BI Premium Using SQL Server Profiler

A few weeks ago I showed how XMLA Endpoints allow you to connect SQL Server Profiler to Power BI Premium. As well as looking at query execution times this also means you can see in more detail what happens when a dataset is refreshed, for example so you can find out exactly how long a refresh took, understand which tables inside the dataset contribute most to refresh times or which calculated columns or calculated tables take the longest to create. Remember that refresh performance in Power BI Desktop may be different to refresh performance in the Power BI Service where you’re running on different hardware, may be going through an on-premises gateway to reach your data sources and where other operations may also be consuming resources while refresh is taking place.

To do this you need to create a trace in the way I describe in that previous post and include (at least) the following two pairs of trace events:

  • Command Begin/End
  • Progress Report Begin/End

Once the trace is running and you start a refresh, you’ll see a lot of events generated:

Trace

What does this all mean though? Generally speaking any books or blog posts that talk about processing in Analysis Services Tabular will also apply to refresh in Power BI since the two engines are very similar, although there are many options available in Analysis Services that are not (yet) available for Power BI. For example, my colleage Richard Tkachuk has just written a blog post on Analysis Services processing here that is a good introduction to the subject; chapters 11, 12 and 15 of Marco and Alberto’s book “Tabular modelling in Microsoft SQL Server Analysis Services” provide a lot more detail. If you’re interested in the performance of your Power Query/M queries this post of mine from last year is worth reading too.

It looks a lot more complex than it actually is though. The columns in the trace you should focus on are the ones shown in the screenshot above: EventClass, EventSubclass, TextData and Duration. In EventClass you’ll see pairs of Command Begin/End and Progress Report Begin/End events; EventSubclass and TextData give you more information about each event; and Duration will give you the amount of time taken for each event (it only appears for the “end” events in the pairs) in milliseconds. A refresh is represented by a single Command, so when you’re looking for the events in your trace that relate to a refresh you’ll need to look for a pair of Command Begin/End events with the word “Refresh” visible in amongst a lot of XML in the TextData column and with a lot of Progress Report Begin/End events in between:

Refresh

The Command End event of this pair will give you the time taken for the refresh in the Duration column:

CommandEnd

Finding the amount of time needed to refresh individual tables is not so straightforward because (as Richard Tkachuk describes in his blog post mentioned above) refreshing a table consists of a number of different jobs. The easiest thing to do is to look for a Progress Report End event with EventSubclass 59 where the TextData column starts with “Finished processing partition…” followed by the name of your table:

Table

The Duration column for this event will give you the amount of time taken for the majority of work needed to refresh a table.

There is, of course, a lot more that you can see in these traces and a lot more detail you can get by adding other events, but I’ll stop here for now. If you find anything interesting while you’re looking at traces of your own please let me know by leaving a comment!

Power BI, HTTP/2, Internet Explorer 11 And Older Versions Of Windows

A very quick performance tip: do you have users consuming Power BI reports with Internet Explorer 11 and an older Windows OS like Windows 8.1 or Windows 7? If so, their reports are likely to be slower because over the last few months some important performance optimisations were made in Power BI that rely on the HTTP/2 protocol, and while all modern browsers like Edge and Chrome have supported HTTP/2 for a long time now, Internet Explorer 11 only supports HTTP/2 on Windows 10. If you are using Internet Explorer 11 you can check whether HTTP/2 is enabled (and it should be by default) by going to Internet Options and the Advanced tab and seeing if the Use HTTP2 option is checked:

Internet Options

Making One Power BI Measure Appear In Multiple Folders

Back in 2018, when I wrote a detailed post on how to create nested display folders for measures in Power BI, I mentioned that unlike in Analysis Services it was not possible to make a Power BI measure appear in two or more folders simultaneously. The other day on Twitter Deepak Agrawal pointed out that at some point since I wrote that post the situation has changed, so here’s a quick post showing how it now works.

Say you have a simple Power BI dataset with a table, a column and a measure called My Measure:

Initial State

If you go to the Model pane in Power BI Desktop and click on the measure so that the Properties pane appears, and then enter a list of folder names separated by semi-colons such as:

First Folder;Second Folder

After State

…you’ll see that all the folders in your list are visible in the Fields pane (if they weren’t there already) and your measure appears inside each of the folders. There’s still only one measure – it has not been duplicated – it’s just that it appears in more than one folder. This can be very useful if you have lots of measures that need to be organised into folders in several different ways.

Bonus observation: I hadn’t noticed this before, but you can also drag and drop measures into different folders in the Fields pane. I suspect that’s been possible for ages but I never tried it before.

Visualising Power BI Premium And Azure Analysis Services Query Parallelism

In my last post I showed how to connect SQL Server Profiler up to a Power BI Premium dataset but I didn’t give you any examples of why this might be useful. In this post I’ll show you how you can use a Profiler trace to visualise all the queries run by a Power BI report, see when they start to run, see which ones run in parallel with each other and see what the overall time taken to run all the queries is.

Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.

As I showed last week, SQL Server Profiler can be used to create a trace that logs all the queries run against a Power BI Premium dataset in the same way as it can be used with Azure Analysis Services. Assuming that you have a trace running that uses only the Query End event, this will give you a list of all the queries that are being run along with their start time, end time, duration and a lot of other interesting information. A table with all this data in can still be difficult to interpret though, so I built a Power BI template for a report that visualises all these queries and helps you understand the amount of parallelism that is taking place. You can download the template file here.

To use it, first you need a trace file. Make sure that no-one else is running reports on the Premium capacity you want to test (creating a Power BI Embedded capacity for testing purposes is a good idea) and then, when the trace is running, refresh your report using the technique I described in the “Use the network tab” section of this blog post. This will also allow you to correlate what you see in the trace with the information you see in the DevTools tab in the browser.

Then save the trace file you can created to XML by going to File/Save As/Trace XML File:

SaveToXML

Next, open the Power BI template file and when prompted, enter the full path of the trace XML file you just created:

TemplateOpening

A new Power BI report will then be created. If you want to point the report to a different trace XML file all you need to do is change the value of the TraceXMLFile Power Query parameter.

On the first page you’ll see the name of the trace XML file you connected to plus a bar chart showing each Query End event (with each query identified by a number) on the y axis and the duration of each query on the x axis:

Waterfall

It’s not quite a simple bar chart though. What I’ve done is:

  • Found the start time of the first query run
  • Calculated the start time of every other query in the file relative to this first start time (although, unfortunately, Profiler only gives you start times rounded to the nearest second which means you can’t know exactly when a query starts)
  • Created a stacked bar chart where the first value in the stack is this relative start time and the second value is the duration of the query in seconds
  • Made the colour of the relative start time transparent, so you only see the blue sections of the bar for the query durations. This gives you a waterfall-like effect and allows you to see which queries are run in parallel. This also makes it easy to see the total amount of time taken to run your queries, from the start of the first query to the end of the last query, which is just as useful to know as the duration of any single query.
  • There’s also a drillthrough page so you can right-click on a bar and see a table with the DAX query for the query you clicked on, as well as its start time and duration.

It’s a very basic report, I know, and I would be interested to know if you have any ideas about other ways of visualising this data. What’s more, a visual like this raises more questions than I know how to answer… yet. For example, one thing I want to investigate is the effect that query interleaving has on this graph and both perceived and actual report performance. So stay tuned for more blog posts on this subject!

 

 

 

Connecting SQL Server Profiler To Power BI Premium

Back in December when I wrote a series of posts on testing the performance of Power BI reports in the browser, I mentioned that it was important to test in the browser because some aspects of the performance of a report may be different there compared to in Power BI Desktop. Following on from this, if you’re testing performance of a report in the browser you are also going to want to take a closer look at the DAX queries generated by your report, even if it is just to check that what you see there is the same as what you see in Performance Analyzer in Power BI Desktop. If your report uses a Live Connection to Analysis Services this is easy to do using either SQL Server Profiler, Azure Analysis Services’s diagnostic logging feature or XEvents. If you’re using a dataset stored in Power BI we have a range of options for monitoring what’s going on including Usage Metrics and the Premium Capacity Metrics Apps and of course there’s also DAX Studio, but for an old-school guy like me, connecting to a Power BI Premium workspace using SQL Server Profiler is a great way to go to get detailed information about what’s going on when queries run. In this blog post I’ll show you how to connect Profiler to Power BI Premium.

First you’ll need to install the latest version of SQL Server Management Studio, which includes SQL Server Profiler – you can get it here. If you have an older version installed you’ll probably need to upgrade. And before anyone leaves a comment about Profiler being deprecated, let me point you to the note on this page:

Note

The feature that allows you to connect Profiler to a Power BI Premium workspace is XMLA Endpoints: it’s in preview right now but basically this allows you to connect any tool that works with Azure Analysis Services up to Power BI Premium. SQL Server Profiler wants to connect to an instance of Analysis Services; XMLA Endpoints mean that you can connect it to a dataset in a Power BI Premium workspace but for this to happen you need to know the url for Profiler to connect to. You can find this by going to your workspace, clicking on Settings:

Workspace Settings

and then going to the Premium tab and copying the Workspace Connection string:

Workspace Connection

You can then open up Profiler, go to the File menu and select New Trace and a connection dialog will appear:

ProfilerConnection1

In this dialog:

  • Set the Server type to Analysis Services
  • In Server name paste the Workspace Connection that you copied from Power BI earlier
  • In Authentication select Azure Active Directory – Universal with MFA and enter your username

Next click the Options button and go to the Connection Properties tab and on the Connect to database dropdown select <Browse server…>:

ProfilerConnection2

Click Yes on the dialog that appears and then choose the name of the dataset in your workspace that you want to connect to in the Connect to database dropdown. If you don’t do this you’ll get errors later on.

Next you’ll see the Properties dialog:

ProfilerProperties1

The default template is Blank, which means no events are selected to monitor; to select events go to the Events Selection tab:

ProfilerProperties2

…select the events you want, and click Run to start tracing.

Trace

Which events should you choose? That’s a big topic and not one that I have time to go into here, but the Query End event is perhaps the one I look at most – it’s fired every time a query finishes executing and gives you a lot of important information such as the start time, end time and duration of the query in milliseconds. Books such as “The Definitive Guide To DAX” have a lot of information on using Profiler with Analysis Services and a lot of that information is relevant to Power BI Premium too. In future blog posts I dare say I’ll show you some interesting things you can do using Profiler and Power BI too…

Adding Your Own Messages To Power Query Query Diagnostics

A quick point: while the Power Query Query Diagnostics functionality is relatively new, it’s based on Power BI/Power Query trace logging that has been around for a while. I’ve just realised that this means you can use the Diagnostics.Trace M function that I blogged about back in 2016 to add your own messages to the output of Query Diagnostics. Using the example query from that blog post, here’s what you’ll see in the detailed Query Diagnostics output query when that query is run (I’ve removed all but the important columns to make it easier to read):

TraceOutput

The amount of detail you get with Query Diagnostics can be overwhelming; inserting your own messages should make it a lot easier to work out what is happening, when and how often – especially in scenarios where the query name and step name aren’t shown.

Make Your Power BI Report Run Faster By Showing The Same Data In Fewer Visuals

Do you have a Power BI report that has a row of card visuals on it, something like this:

Cards

?

It’s a very common thing to do, but if your report is too slow to render you may find that a design like this is part of the problem. Now look at the following report which contains a single matrix that has been configured to look as un-matrix-like as possible but shows the same data:

Matrix

This is likely to perform faster, but why?

Before we go any further, I don’t want you to go and change your reports if you’re not going to get any benefit from doing so. Use Performance Analyzer (as shown here) to determine which visuals on your report are the cause of slow performance – there’s no point redesigning visuals that are fast anyway.

As a general rule the more visuals you put on a report page the slower it’s going to get. It’s logical if you think about it: the more visuals there are, the more queries have to be run against your dataset and the more work Power BI has to do to render the report. I know there is a tendency to try to pack as much information onto a page as possible and this often happens when someone else has designed the report you’re trying to build, but you should always try to resist this. Splitting a single large page into multiple smaller pages, using slicers or filters to reduce the amount of data shown at any one time and avoiding gigantic Excel-like tables are a good idea.

Once you’ve done that you need to see whether you can reduce the number of visuals needed to display the same amount of information: in the example above, the four separate cards show the same data as the single matrix. In part this is because the cards generate four separate DAX queries against the dataset to get the data they need whereas the matrix only generates one DAX query. There is a certain overhead to running a DAX query, so reducing the number of DAX queries needed to get the same amount of data is a good thing. More importantly, in this example Power BI can get the four values required for the single DAX query generated by the matrix much more efficiently than it can in the four separate DAX queries needed by the cards. This is because of something called “DAX Fusion”, which my colleague Phil Seamark blogged about in great detail here, and this can make a signficant difference to performance (Marco Russo’s post on why Analyze In Excel reports may be slower than Power BI reports here also contains some useful information on DAX Fusion).

This is only a simple example; if you want to see a really sophisticated demonstration of replacing several visuals with a matrix I suggest you watch this video by another colleague of mine, Miguel Myers. And of course it doesn’t just apply to cards: any time you can replace multiple visuals with a single visual (remember to be careful using custom visuals though) you’re likely to gain some performance.

An M Function To Help You Explore Power Query Diagnostics Data

Last week’s post showed an M function that took Power Query diagnostics data and formatted in a way that made it suitable for visualisation in a Power BI Decomposition Tree visual. This is great for understanding what’s going on at a high level, but by doing this you also lose a lot of detailed information from the diagnostics logs that could be useful for performance tuning. This week I have an M function for you that takes a different approach to solving the same problem that might be more appealing for the hardcore Power Query fans out there.

Here’s the code for the function:


(Input as table) as table =>
let
Source = Input,
#"Expanded Additional Info" = Table.ExpandRecordColumn(Source, "Additional Info", {"Message"}, {"Message"}),
#"Calculated Total Seconds" = Table.TransformColumns(#"Expanded Additional Info",{{"Exclusive Duration", Duration.TotalSeconds, type number}}),
#"Sorted Rows" = Table.Sort(#"Calculated Total Seconds",{{"Id", Order.Ascending},{"Start Time", Order.Ascending}}),
#"Removed Other Columns" = Table.SelectColumns(#"Sorted Rows",{"Id", "Query", "Category", "Operation", "Start Time", "End Time", "Exclusive Duration (%)", "Exclusive Duration", "Data Source Query", "Message", "Row Count", "Content Length", "Path", "Group Id"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Message", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"Missing",Replacer.ReplaceValue,{"Path"}),
BufferedTable = Table.Buffer(#"Replaced Value"),
GetAllChildRows = (CurrentId, CurrentPath) =>
Table.SelectRows(BufferedTable, each [Path]<>"Missing" and [Id]=CurrentId and Text.StartsWith([Path], CurrentPath)),
AddTotalED = Table.AddColumn(#"Replaced Value", "Exclusive Duration (Including Child Operations)", each List.Sum(GetAllChildRows([Id],[Path])[Exclusive Duration]), type number),
AddTotalEDPct = Table.AddColumn(AddTotalED, "Exclusive Duration (%) (Including Child Operations)", each List.Sum(GetAllChildRows([Id],[Path])[#"Exclusive Duration (%)"]), Percentage.Type),
#"Inserted Text Before Delimiter" = Table.AddColumn(AddTotalEDPct, "Parent Path", each Text.BeforeDelimiter([Path], "/", {0, RelativePosition.FromEnd}), type text),
#"Added Custom" = Table.AddColumn(#"Inserted Text Before Delimiter", "Child Operations", each
let
CurrentPath = [Path],
CurrentId = [Id],
ChildRows =
Table.SelectRows(
@#"Added Custom",
each
[Path]<>"Missing" and [Parent Path]=CurrentPath and [Id] = CurrentId
),
Output = if Table.RowCount(ChildRows)=0 then null else ChildRows
in
Output),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Path] = "0" or [Path]="Missing")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Parent Path"})
in
#"Removed Columns"

As before you need to pass a query created by the Power Query diagnostics functionality to the function:

InvokeFunction

[Note: I recommend calling the function on the “detailed” diagnostics query and not the summarised view, because I’ve found what might be a bug with the summarised view]

The function will then return a table with one row for each query that is evaluated while the diagnostics were running:

QueryList

[Note: You may find that some query evaluations have nulls instead of the name of the query; that’s an issue with the diagnostics data that will be addressed soon]

You could get the same thing simply by filtering the raw diagnostics data down to the rows where the Path column contains 0, but I’ve added two extra things that make the data much easier to understand.

First of all, the original data contains columns called “Exclusive Duration” and “Exclusive Duration (%)”, which tell you the amount of time each operation took in seconds and what percentage this was of the entire query evaluation. However, the data itself is in a parent/child structure so while a parent operation may have a very small duration it may have many slow descendant operations. Therefore my function adds two new columns called “Exclusive Duration (Including Child Operations)” and “Exclusive Duration (%)(Including Child Operations)”. The first of these new columns returns the total duration of each operation and all its child operations, while the second returns the percentage that the duration of each operation and all its child operations make up of the whole query duration. At the root level, this means the first of these new columns shows the total duration of each query in seconds and the second of these new columns should always show 100%.

Durations

Secondly, the function adds a new column called Child Rows that contains a nested table containing the child operations associated with each operation:

ChildRows

You can either click on the Table link in this column to navigate to a new table containing all the child operations, or use the Expand/Aggregate button in the top-right hand corner to keep the current operations and join the nested rows onto them (I strongly recommend you do the former). Experienced Power Query developers will be familiar with the concept of nested tables and I think this provides an easy way of navigating the parent/child operation structure that is similar to what you see with a Decomposition Tree but keeps all the detail columns that the diagnostics data provides you with and means you don’t have to leave the Power Query Editor.

I would love to hear your feedback about this and know if there are any bugs or other information that could be added. You can download a sample pbix file for this function here.