At the end of last year two new, fairly minor, improvements were made to Excel VBA’s support for Power Query: you can now refresh and delete individual Power Query queries. These methods are available now for everyone using the Insiders build of Excel. Not particularly interesting on its own (I wrote a post a few years ago about Excel VBA and Power Query if you’re interested) but it got me thinking about this subject again. Also, at the end of last year, I upgraded to Windows 11 which has Power Automate for desktop built in. Power Automate desktop makes it super easy to automate everyday tasks and it has great support for Excel – including the ability to run Excel VBA macros. So I wondered: can you use Power Automate for desktop to automatically refresh your Excel Power Query queries? Yes, you can!
Here’s a simple example. I created a Power Query query called GetTheDateAndTime that returns the current date and time in a table with one row and column. Here’s the M code for the query:
Next, I created a VBA macro called RunPQQuery to refresh this Power Query query using the new refresh method I mentioned earlier:
I then saved the Excel workbook as a .xlsm file.
Next I opened Power Automate for desktop and created a new desktop flow following the instructions here to open Excel, run the macro and close Excel again. I realised that if I closed Excel immediately after running the macro it would close Excel before the Power Query query had finished, so I added a delay of ten seconds after running the macro to give it time to finish. There are probably more sophisticated ways to solve this problem: for example you could read the value of a cell in the table returned by the query that you knew would change, then after running the query loop until the value you’ve read has changed. Here’s my desktop flow:
Finally I created a cloud flow to run this desktop flow:
And that’s it! A very simple example but very easy to implement.
My favourite – and it seems many other people’s favourite – new feature in the February 2022 Power BI Desktop release is support for more datasources (including SQL Server, Azure SQL DB and Synapse) with dynamic M parameters. In my opinion dynamic M parameters are extremely important for anyone planning to use DirectQuery: they give you a lot more control over the SQL that is generated by Power BI and therefore give you a lot more control over query performance.
Teo Lachev has already stolen my thunder and blogged about how the new functionality allows you to use a TSQL stored procedure as the source of a table in DirectQuery mode. In this post I’m going to show you something very similar – but instead of using a stored procedure, I’m going to show a simple example of how to use a TSQL table-valued function, which I think has a slight advantage in terms of ease-of-use.
To start off I created a table-valued function in the Adventure Works 2017 sample database on my local SQL Server which returns some filtered data from the DimDate table:
CREATE FUNCTION [dbo].[udfDates] (
FullDateAlternateKey, EnglishDayNameOfWeek, EnglishMonthName, CalendarYear
Here’s how it can be called in a SQL SELECT statement:
Next, I created a Power Query query in my dataset that called this function and returned a table in DirectQuery mode. The great thing about table-valued functions is that they appear in the Navigator pane when you connect to a SQL Server database from the Power Query Editor:
You can only hard-code the values you pass to the function’s parameters at this point but even if you do nothing here you can just return an empty table. After having done this I selected DirectQuery mode:
After this I created two Power Query text parameters, called DayName and MonthName, to hold the month and day names to be passed to the function:
I also created two Import mode queries called DayNames and MonthNames to hold all the valid values for the DayName and MonthName parameters:
The last thing to do in the Power Query Editor was to edit the query calling the function to pass the M parameters to it. Here’s the M code for the query after the modification made to the “Invoked Functiondbo_udfDates1” step:
At this point I closed the Power Query Editor and loaded the three tables to my dataset:
Next I bound the EnglishDayNameOfWeek column on the DayNames table to the DayName M parameter and the EnglishMonthName column on the MonthNames table to the MonthName M parameter:
Finally, I built a report with two slicers bound to the two dynamic M parameter columns and a table showing the output of the table-valued function:
Here’s an example of the TSQL generated by Power BI to populate the table in this report:
from [dbo].[udfDates]('Thursday', 'February') as [$Table]
[t0].[CalendarYear] = 2010
ORDER BY [t0].[FullDateAlternateKey]
I am by no means an expert in writing efficient TSQL so I can’t comment on the pros and cons of table-valued functions, stored procedures or using native SQL queries in Power BI (although the last of these has obvious maintainability issues). Hopefully, though, you can see the possibilities – and if you do get round to using this approach on a project, please let me know how you get on!
If your Power BI dataset needs to connect to an on-premises data source it will need to connect via an On-Premises Data Gateway; what’s more, if you have a Power Query query that combines data from cloud and on-premises data sources, then Power BI needs to connect to all data sources used (even if they are cloud sources) via an On-Premises Data Gateway. And when Power BI connects to a data source via a gateway all the transformation work done by the Power Query engine takes place on the machine where the gateway is installed.
As a result of all this the specification of the machine where the gateway is installed has an impact on the performance of any dataset refreshes that use it. So how powerful does the machine with the gateway installed on it need to be? That’s a tough question because, as you can probably guess, it depends on a lot of different factors: how many datasets get refreshed in parallel, how often, how complex the transformations used are, if you’re using Import mode or DirectQuery, and so on. There’s a great docs article describing how to go about sizing your gateway machine here. Unsurprisingly, the more memory and CPU cores you have available the better refresh performance is likely to be and the more refreshes can take place in parallel.
There is one important thing to point out that is not obvious though: increasing the amount of memory on your gateway machine can improve refresh performance even if it doesn’t look like the machine is under memory or CPU pressure. This is because the total amount of memory made available for a single refresh is calculated relative to the overall amount of memory available on the gateway machine. I’ve written about how the Power Query engine uses memory a few times: this post describes how each refresh can use a fixed maximum amount of memory and how performance suffers if your refresh needs to use more; this post shows how increasing the amount of memory Power Query can use for a single refresh can increase refresh performance dramatically. In short, the more memory on your gateway machine the more memory is available for each individual refresh and – if the Power Query engine needs it, for example if you’re sorting, merging, pivoting/unpivoting, buffering or doing group-bys on large tables and query folding is not taking place – the faster each refresh will be.
You do have the option of changing some properties (listed here) on the gateway to try to influence this behaviour. However since the algorithms involved are not documented and may change at any time, not all the relevant properties are documented, and working out what the optimal settings are yourself is very difficult, I don’t recommend doing this. It’s a lot easier just to increase the amount of memory and CPU on the gateway machine and let the gateway work out how these resources should be used. I’m not saying that you should blindly increase your memory and CPU as much as you can, though – you should test to see what the impact on refresh performance is (the gateway logs will be useful here) and whether that impact is worth the extra cost.
Bonus tip: another easy way to improve gateway refresh performance is to enable the StreamBeforeRequestCompletes property on the gateway. It’s documented here and a few people (see here and here for example) have already blogged about how much this has helped them.
Azure Data Explorer has a data type called dynamic which can be used to hold scalar values as well as arrays and property bags; you can read about it in the docs here. For example (following on from my recent series on DirectQuery on Log Analytics, starting here) the ApplicationContext column in the PowerBIDatasetsWorkspace table that holds the IDs of the dataset, report and visual that generated a DAX query (see this post for more background) is of type dynamic:
This is what the contents of the column look like:
Now you can easily extract the individual property values from this column in KQL, and indeed I did so in the KQL queries in this post, but the interesting thing is you can also extract these values in Power Query M very easily and – crucially – maintain query folding using the Record.FieldOrDefault M function in a custom column without needing to write any KQL yourself, in both Import mode and DirectQuery mode.
In order to do this, first of all you have to enter a table name or KQL query in the third parameter of AzureDataExplorer.Contents function. When you do this you can treat a dynamic column as a record even if the Power Query UI doesn’t display it as such. Here’s an example M query that shows all of this in action on the PowerBIDatasetsWorkspace table that contains Power BI data in Log Analytics:
I admit that this is a super-obscure tip but I think it’s fascinating nonetheless, especially given how nested structures are becoming more and more common in the world of big data. It would be great to have similar behaviour in other connectors…
Thanks to my colleague Itay Sagui (whose blog has several posts on Power BI/Azure Data Explorer integration) for this information.