Troubleshooting Data Refresh Performance Issues With OData Data Sources In Power BI And Excel Using Fiddler

A lot of people have problems with the performance of OData data sources when loading data into Power BI and Excel. One possible cause of these problems is query folding not taking place – if this is the case then the Power Query engine will be requesting more data that is needed and applying any filters itself locally, rather than requesting filtered data from the data source. How do you know whether query folding is taking place or not though? The Power Query Editor UI doesn’t give you this information, unfortunately. Last week, at the Data and BI Summit in Dublin, Matt Masson demonstrated how to use Fiddler to check if query folding is taking place when loading data from an OData data source and he’s very kindly allowed me to write about what he showed here.

The following examples use the UK Parliament’s public OData API which is documented here. The M query below, generated using the Power Query Editor in Power BI Desktop, returns all the rows from the Government Organisation table from the API:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
		]}[Data]
in
    GovernmentOrganisation_table

image

To monitor communication between Power BI Desktop and the OData API you will need to install Fiddler, a free tool from Telerik. You can download it here:

https://www.telerik.com/fiddler

You’ll probably also need to configure Fiddler to intercept https traffic, which you can find out how to do here. It’s a very powerful tool and I am by no means an expert in using it, but even a basic understanding of its features can be very useful for a Power BI developer.

With Fiddler running you can add a filter so that it only shows traffic to certain hosts; in this case I’m filtering to only show traffic to api.parliament.uk:

image

Refreshing the query shown above in the Query Editor results in the following activity being shown in Fiddler:

image

There are several calls to the service root URL, but the important call is to get the top 1000 rows from the GovernmentOrganisation table:

https://api.parliament.uk/odata/GovernmentOrganisation?$top=1000

Notice the use of the $top query option to restrict the number of rows returned – the first example of the Power Query engine pushing a filter back to the data source. This only happens when you refresh the query in the Power Query Editor so you can see a sample of the data; when you click the Close and Apply button and load the query into your dataset you’ll see that this $top filter is not applied and all the rows from the table are requested.

Altering the query to filter the rows down to where the GroupName column equals “Cabinet Office” like so:

image

Results in the following M query:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
	]}[Data],
    #"Filtered Rows" =
	Table.SelectRows(
		GovernmentOrganisation_table,
		each ([GroupName] = "Cabinet Office")
	)
in
    #"Filtered Rows"

Fiddler shows the following call that includes the filter:

image

https://api.parliament.uk/odata/GovernmentOrganisation?$filter=GroupName%20eq%20’Cabinet%20Office’&$top=1000

[The Power Query Editor does quite a lot of caching so you may need to click the Refresh button on the ribbon to make sure it actually does call the API]

It can be quite difficult to work out what’s going on in a call like this, so with the highlighted row in the screenshot above selected you can go to the Inspectors tab and then the WebForms sub-tab, and it will show the different query options used:

image

In this case you can see the $filter query option has been used to do the filter, so query folding has taken place in this case and the API is only returning the one row that the query returns:

image

However it is all too easy to do something in your query that prevents query folding from happening. For example if you add an index column to the table before filtering by GroupName, as shown in this query:

let
    Source = OData.Feed("https://api.parliament.uk/odata"),
    GovernmentOrganisation_table =
	Source{[
		Name="GovernmentOrganisation",
		Signature="table"
	]}[Data],
    #"Added Index" =
	Table.AddIndexColumn(
		GovernmentOrganisation_table,
		"Index",
		0,
		1),
    #"Filtered Rows" =
		Table.SelectRows(
			#"Added Index",
			each ([GroupName] = "Cabinet Office"))
in
    #"Filtered Rows"

 

image

…then Fiddler shows that only the $top filter is being applied in the call to the API, not the $filter on GroupName, so query folding is no longer taking place for the filter step:

image

image

Not everything you do in your M queries can or will be folded back to an OData API, but in general you should aim to get query folding to take place on the steps in your query that reduce the amount of data returned by the API the most. Row filters like the one shown above are a prime example of the type of transformation that will need to be folded in order to get the best possible data refresh performance. You may need to experiment with reordering steps and applying transformations in different ways to get the Power Query engine to call the API in the way you want.

Reordering Multiple Columns With ‘Remove Other Columns’ In The Power Query Editor

In the comments to my last blog post on how the order that you select columns can affect the output of certain calculations, both Bradley Sawler and John B. Thomas pointed out something very useful that I didn’t know about: that the order you select columns can also be used with the ‘Remove Other Columns’ functionality to reorder columns in bulk.

For example, imagine you have a table with columns called A, B, C, D, E and F. If you select the columns in the order F, E, D, A, B and C and the select ‘Remove Other Columns’, the columns are reordered in the order that you clicked them:

PQReorderCols

As you can see from the demo above, ‘Remove Other Columns’ uses the Table.SelectColumns M function behind the scenes and the order the columns are listed in that function is the order that you have clicked them in. A great trick for reordering a large number of columns quickly!

The Order You Select Columns In The Power Query Editor Can Affect The Output Of Some Transformations

Maybe this is obvious to more experienced Power Query users, but something I always point out when I’m training people up on Power Query is that the order that you select columns in the Power Query Editor window (both in Power BI Desktop and Excel) can affect the output of certain transformations. For example, say you have a table with two columns A and B that both contain numbers; if you select A first and then B, and then go to Add Column/Standard/Divide, you’ll get a new column that contains the value of the calculation A/B. However, if you select B first and then A and do the division you’ll get B/A:

PQDivide

The order that you select columns is also significant for some other types of calculation such as Percent Of and Power, and also when you do a Merge Columns.

Make Excel Reports Created With Analyze In Excel Work After Publishing To Power BI!

I think Power BI’s a great tool, but like most Power BI users I have a list of my own pet features that I would like to see implemented to make Power BI even greater. What I would most like to see addressed is the fact that, right now, Analyze In Excel only works with Excel on the desktop but not after you have published a workbook to Power BI. It seems crazy to me (and it’s very hard to explain to customers too) that Analyze In Excel lets you create reports in Excel using PivotTables and cube functions connected to a Power BI dataset, but when you publish your report to a Power BI workspace – so that the Excel workbook and the source data are both in Power BI – the reports stop working because Excel Online cannot connect back to Power BI.

There has been a post on the Power BI Ideas forum about this for some time, but recently I was talking to some guys from the Excel dev team and they told me that it’s actually something they, not the Power BI team, need to address. Therefore I created a post on the Excel UserVoice forum too:

https://excel.uservoice.com/forums/274580-excel-online/suggestions/33793252-pivottables-created-with-power-bi-using-analyze-in

Please vote for it! The more votes it gets, the more likely it is to be implemented quickly. Ken Puls managed to get a lot of votes for his idea to improve Power Query performance, and since that’s now in the process of being implemented it just goes to show that voting does influence what the Excel dev team works on.

Why is this important? In my opinion, Power BI is not a good ad-hoc data exploration tool and isn’t intended to be – its strengths lie elsewhere. However people do want to explore data stored in Power BI and an Excel PivotTable is the ideal way to do this (the Power BI matrix visual is very limited in comparison), and after you have found something interesting it’s only natural that you should want to share it. PivotTables aren’t the only thing Excel has to offer though: I’m a big fan of cube functions too, especially for creating financial reports, and Power BI has nothing remotely like them. Finally, don’t forget all those people who want to build reports in Excel because it’s Excel and that’s what they know. All in all getting this feature implemented would be a major boost for Power BI and broaden its range of capabilities.

Running M Queries In Visual Studio With The Power Query SDK

Writing M in the Advanced Editor in Excel or Power BI can be a frustrating experience unless you’re the kind of masochist who loves writing code in Notepad. There are some options for writing M code outside Excel and Power BI, for example Lars Schreiber’s M extension for Notepad++ (see here for details) or the M extension for Visual Studio Code (available from the Visual Studio Marketplace here; more details on Brett Powell’s blog here), but the trouble with them is that you have to copy the code back into Excel or Power BI to run it. What many people don’t realise, however, is that it is possible to write M code and have IntelliSense, formatting, keyword highlighting and also the ability to execute your own M queries, using the Power Query SDK in Visual Studio.

The Power Query SDK (which you can download here) supports Visual Studio 2015 and 2017 and is intended for people who are writing custom Data Connectors for Power BI. To let you test your Data Connector you can create a .pq file containing M code, and this in fact allows you to run any M query you want whether you’re building a Data Connector or not.

Here’s how. First, install the Power Query SDK and then open Visual Studio and create a new project. Find the Power Query template, select the PQ file option and give your file a name:

image

Then, in the .pq file that is created, you can enter an M query and then either press the Start button on the toolbar or hit F5 to run the query. The output of the query is shown on the Output tab in the M Query Output window:

image

Right-clicking on your project in the Solution Explorer pane and selecting Properties brings up a Properties dialog with various properties that control how your queries behave:

image

image

Many of the properties are self-explanatory, at least for anyone used to writing M in Power BI or Excel. FastCombine turns off data privacy checks. Allow Native Query lets you use M queries that contain ‘native’ queries (for example your own SQL queries if you’re using a SQL Server data source), as Cédric Charlier shows here. A few of them, such as Legacy Redirects, I have no idea about yet (I should really ask someone…). Turning on Show Engine Traces displays engine trace information in the Log tab of the M Query Output pane; turning on Show User Traces displays trace information generated by the use of the Diagnostics.Trace() function in your code in the Log tab. You can save the contents of the Log tab to a text file.

image

Error messages are displayed on the Errors tab of the M Query Output pane:

image

When you have a query that connects to an external data source, the first time you try to run your query you will be prompted to set the credentials used to connect to that data source (as you would in Power BI Desktop), and the data privacy level for the data source, on the Errors pane:

image

The query won’t actually run this first time though; you’ll need to hit Start/F5 again to see the results. If you close the project and then reopen it you will need to enter credentials again; alternatively, on the Credentials tab you can save the credentials used for a data source to a .crd file which can then be reloaded when you reopen your project. You can also edit and delete credentials on the Credentials tab.

image

If I’m honest it’s all very basic but it does the job. The main thing that I miss from writing M code in Power BI is the Query Editor UI – when I write M code there I only write about 50% of it manually, the rest I generate by clicking buttons in the UI because it’s faster. Give me the Query Editor (or the ribbonless version of it that comes with SSDT, because Visual Studio doesn’t support ribbons apparently) and I’ll be happy. Even better, give me the improved code editing functionality in the Advanced Editor in Power BI Desktop and Excel that we’ve been promised!

Creating Excel “Data Dump” Reports From Power BI

We’ve all met them: the users whose idea of a ‘report’ is a large table of data dumped into Excel. However many beautiful data visualisations they see, however many times you try to convince them of the benefits of using Power BI to build reports, they just want to know where the “Export to Excel” button is so they can carry on analysing data in the same old way. Sometimes there’s a valid reason for doing this, sometimes not, but all too often internal politics means that you have to accommodate them.

If you’re using Power BI the obvious way to do this is to use the “Export Data” button in Power BI. However, as the documentation notes, there are some limits on the amount of data that can be exported:

  • The maximum number of rows that can be exported from Power BI Desktop and Power BI service to .csv is 30,000
  • The maximum number of rows that can be exported to .xlsx is 150,000

What’s more, clicking a button to export data and then copying it into an Excel report is a time-consuming, error-prone, manual process that no-one likes doing.

There is another way to get data from Power BI into Excel though: using the Analyze in Excel feature. Rather than using Analyze in Excel to create a PivotTable, which may suffer from the subtotals issue described here if you aren’t using the click-to-run version of Excel 2016 and as a result may be very slow when dealing with large amounts of data, and which will be awkward to get data out of because you have to use Excel functions like GetPivotData(), in this post I’ll show you how to get data from Power BI into an Excel table instead.

First of all you need to create a PivotTable in Excel on your desktop that is linked to a dataset published to Power BI. You can do this either by clicking the Analyze in Excel option from inside the Power BI service in the browser (see here for how to do this), or by using the Connect to Data option in the Power BI Publisher for Excel add-in.

Next, drag any measure into your PivotTable (you will need to have at least one measure defined in your dataset to do this) like so:

image

Then double-click inside a cell containing values, such as the selected cell in the screenshot above, or right-click on the cell and select Show Details:

image

When you do this a new worksheet will appear with a table in it containing rows of data from a table in the underlying dataset, but the exact data is irrelevant here. The real point is that you now have an Excel table (not a PivotTable) with a connection back to your Power BI dataset and you can edit the query that it uses to return data from Power BI – a variation on an old trick known to SSAS and Power Pivot users. You can now delete the PivotTable you created because you will no longer need it.

On the new worksheet with the table on it, right-click in a cell and select Table then Edit Query:

image

A dialog will pop up, and you’ll see an MDX Drillthrough statement in the Command Text box as shown below:

image

You can replace this statement with any MDX or DAX query you want – I recommend using DAX queries here because they are likely to be easier to write if you already know how to write DAX calculations, and unlike MDX queries they return a tabular resultset. I have a fairly old series of posts on DAX queries here that will help get you going. It will be a good idea to install DAX Studio and write your queries there, then copy them into Excel when you are happy with them.

Once you have a working DAX query in the Command Text box:

image

You can click OK, the query will run and you’ll see the output of the query in the Excel table:

image

One minor annoyance is that the column names will be in DAX ‘Table Name’[Column Name] format, and although you can use the SELECTCOLUMNS() DAX function to alias your column names and thereby remove the table names, the square brackets around the column names will always be there.

There are several good things about this approach to getting data into Excel:

  • It allows you to exceed the 150,000 row limit of the native Power BI Export option mentioned above
  • The queries are very quick to run if you aren’t doing anything complex in them
  • It avoids manual exporting – you just use Excel’s native Refresh functionality to run the query whenever you want to download new data into Excel

There are some obvious downsides though:

  • There’s no easy way to pass parameters to the queries you use, and so allow the user to choose what data is retrieved from Power BI. I guess it would be possible with some VBA, though – I haven’t tried – and it may also be possible to connect Power Query/Get&Transform to the Power BI Service and then use some of the tricks I show in this video for building reporting solutions in Excel. You can of course use Excel’s own native table filtering functionality to filter the data that is downloaded.
  • If your DAX query is slow to run your users may get frustrated. To counter this you could create calculated tables in your Power BI dataset using the same DAX as your query, and because calculated tables are created when the Power BI dataset is refreshed rather than when your user refreshes their Excel worksheet this should speed things up. However it will increase the size of your Power BI dataset and make your dataset refresh take longer. In Excel your DAX query would simply be something like:
    EVALUATE ‘My Calculated Table Name’
  • Writing DAX can be complex, whether it’s a DAX query or a calculated table, so another option would be to use the Power BI Query Editor to create the table you need for your report, load that into your dataset and then load the table into Excel.
  • You can’t publish the Excel workbook up to Power BI and have it refresh automatically, because of course Power BI does not support connections from published Excel workbooks back to Power BI datasets. Hopefully this will change in the future – it marked as planned on the Ideas forum.

Of course doing this goes against the best practices that I and most other people recommend for Power BI, so you should probably only do this if you have no other choice. Sometimes you have to bend the rules a little to make sure your project succeeds…

Obscure MDX Month: Recreating The Star Ratings Measure In MDX Using Excel Functions

I still love MDX, but I’m aware that I blog about it less and less – which is a shame, I know. Therefore I’ve decided that for the next four weeks I’m going to write about some obscure MDX topics that hopefully will make all you SSAS MD diehards out there feel less neglected… even if they don’t have much practical use.

Let’s start off with recreating my ever-popular DAX star-ratings measure in MDX. Well, not exactly pure MDX, but did you know that in MDX you can call some Excel functions (in the same way you can call some VBA functions)? It’s a really, really bad thing to do from a query performance point of view, but it does allow you to do some useful calculations that might otherwise be impossible. Here’s a query on the Adventure Works cube that uses the Excel Rept() and Unichar() functions (functions that do not exist in MDX proper) to recreate my start-ratings measure:

WITH
MEMBER MEASURES.STARS AS
REPT(
UNICHAR(9733),
CINT([Measures].[Internet Sales Amount]/10000))
+
REPT(
UNICHAR(9734),
10-CINT([Measures].[Internet Sales Amount]/10000))

SELECT {[Measures].[Internet Sales Amount],MEASURES.STARS} ON 0,
ORDER(
[Date].[Date].[Date].MEMBERS,
[Measures].[Internet Sales Amount],
BDESC)
ON 1
FROM
[Adventure Works]

 

image

Here’s the same measure used in a PivotTable:

image

%d bloggers like this: