Book Review: “Collect, Combine And Transform Data Using Power Query In Excel And Power BI” by Gil Raviv

I generally try to avoid writing book reviews here, but the fact that there are so few books available on Power Query and M means that I’m making an exception for “Collect, Combine and Transform Data using Power Query in Excel and Power BI” by Gil Raviv.

The first thing to say about this book is that it takes the approach of teaching through exercises and worked examples, rather than by explaining abstract concepts. If this suits your style of learning (and I know that it does for a lot of people) then you’re in luck; if you’re looking for a book that will explain what all the different join types for Merge operations do, for example, then you’ll be disappointed. This isn’t a criticism though – I don’t think it’s possible to write a book that will satisfy everyone – and Gil has done a good job of covering a lot of common data preparation scenarios. One important exception to this is the chapter on M which provides a very clear introduction to the language and the way it works. I suspect a lot of people will want to buy the book for this chapter alone.

The second thing to say about this book is that while it covers both Power Query in Excel and Power BI, in my opinion it’s aimed slightly more at Excel users. Again, this is not a criticism: although advanced Excel users and Power BI report designers have to solve many of the same problems, they also have some very different concerns too. What’s more, if you can assume your readers have good Excel skills and can explain Power Query concepts in Excel terms then you’ll serve that particular set of readers well, and probably do a better job for them than if you assume they are completely new to the area of data transformation and preparation have no existing skills in this area.

All in all it’s a good book that I can recommend to anyone who wants to learn Power Query and M, and also for intermediate users who want to deepen their knowledge. I still think there’s a need for a book completely devoted to M and covering topics such as custom connectors and dataflows; hopefully someone writes one soon!

Full disclosure: I received a free copy of the book from the author. I’m also the author of a Power Query book myself, but to be honest it’s several years old now and a bit out-of-date, so it’s hard to recommend it any more.

You can buy a copy of this book from Amazon UK here.

Thoughts On The New Power Query Source In SSIS

Yet more evidence that Power Query is taking over the world: there’s a new Power Query data source for SQL Server Integration Services in the latest version of SSDT. Here’s the blog post announcing it (the actual announcement is halfway through):

And here’s the documentation:

It’s a preview and as such, limited in scope. The main restriction, as the blog post says, is:

At present, to facilitate quick/frequent feedback-gathering and improvement cycles, it can only be used with SQL Server Data Tools (SSDT) and SSIS IR.

There’s also no user interface for generating queries: right now you can only paste in an M query or use an M query stored in an SSIS variable. I don’t think that’s much of a problem though – if you don’t want to write your own M code, you can always generate a query in Excel Power Query or Power BI Desktop and cut and paste from there. You don’t even have to open the Advanced Editor to do this, as I show here.

Beyond the normal excitement about cool new stuff, what are the benefits of having Power Query integrated into SSIS? I’m not an SSIS person so I’m probably not the right person to comment on the benefits for SSIS developers, but I would imagine that it will make it possible to connect to a wider range of data sources and also make it easier to work with certain others, such as Excel workbooks and web services. I’d be interested in hearing your thoughts on this, so please leave a comment.

From the point of view of the Excel, Power BI or SSAS developer, though, it’s clear this will make it much easier and quicker to take user-developed Power BI and Excel Power Query solutions and turn them into a managed, IT-department owned solutions – and this will be incredibly useful.

Finding Power Query Query Execution Times In Power BI And Excel Using SQL Server Profiler

Working out how long a Power Query M query takes to run – the first step to troubleshooting refresh performance problems in Power BI or Excel – is something I have blogged about several times (see here, here and here for example). However, there are problems with all of these methods such as the need to alter the M code of your query. Luckily there is another method that I have found, using a combination of DAX Studio and SQL Server Profiler.

How can you get hold of these tools? You can download DAX Studio for free here:

Any serious Power BI developer should have it installed already, but if you have an older version you may need to update it to get the features I talk about in this post.

SQL Server Profiler is part of SQL Server Management Studio, and you can download it for free from here:

You can generate SQL Server Profiler trace files for diagnostic purposes from Power BI Desktop, as Kasper shows here, but these traces don’t show any information about refreshes as far as I can see. You can also hook Profiler up to Power BI Desktop direct, as Sam Lester shows here, but that’s a bit fiddly to do. Luckily DAX Studio now has a preview feature which makes connecting Profiler to Power BI Desktop much easier.

First of all, open up your report in Power BI Desktop, then open up DAX Studio and connect it to the instance of Power BI Desktop you just opened; if you want to use this technique with Excel Power Query queries then you will need to launch DAX Studio from inside Excel. Then, in DAX Studio, open the Options dialog and enable the “Show External Tools” preview feature:


Once you have done this, on the Advanced tab on DAX Studio’s ribbon you’ll see an option to launch SQL Profiler and connect it to the Power BI report or Excel file that DAX Studio is currently connected to:


Clicking the button opens up Profiler with a trace running. You’ll see that a lot of trace events are selected – probably too many to make sense of – so I suggest that you stop the trace, click the Properties button on the toolbar and then, on the Events Selection tab of the Trace Properties dialog select just the following four trace events:


  • Command Begin
  • Command End
  • Progress Report Begin
  • Progress Report End

Restart the trace and then refresh your dataset in Power BI Desktop. You’ll still see a lot of events generated in Profiler, but look for the following:

  • A Command Begin event followed by a lot of Progress Report Begin/End events, which is the command that marks the start of the refresh.
  • After that there will be a series of Progress Report Begin events that have an  EventSubclass of “25 – ExecuteSQL” – one for each of the tables that are being loaded into your dataset and therefore one for each of the Power Query M queries associated with those tables. Don’t be confused by the name of the EventSubclass, these events will appear whatever data source you are using: it looks like the Vertipaq engine requests data from the Power Query engine using a basic form of SQL. For example, here’s a screenshot showing the three events I get when refreshing a dataset with three tables in it called “Property Type”, “Date” and “Price Paid 2017”:
  • Finally, below that there will be a series of Progress Report End events (one for each Progress Report Begin event) with the EventSubclass of “25 – ExecuteSQL”, and the Duration column here will tell you how long it took to get data from the Power Query M query associated with the table. For example, this screenshot shows that it took 5460ms (about 5.5 seconds) to get the data from the “Price Paid 2017” Power Query query:


And there you have it, exact timings for each of the Power Query M queries associated with each of the tables in your dataset. Remember that the time taken by each Power Query M query will include the time taken by any other queries that it references, and it does not seem to be possible to find out the amount of time taken by any individual referenced query in Profiler.

There is a lot more interesting information that can be found in this way: for example, dataset refresh performance is not just related to the performance of the Power Query M queries that are used to load data; time is also needed to build all of the structures inside the dataset by the Vertipaq engine once the data has been returned, and Profiler gives you a lot of information on these operations too. Discussion of that will have to wait until a future blog post though…

UPDATE April 2019:

Having used this technique quite a lot in the last few months, I’ve seen that the Progress Report End/25 – Execute SQL event doesn’t always give an accurate value for the amount of time taken to execute an M query – or rather it does, but things are more complicated than I initially understood. You may also need to look at the trace events that come immediately afterwards: Progress Report Begin/17 – Read Data and Progress Report Begin/17 – End Data. Akshai Mirchandani of the SSAS team gave me a great explanation of the significance of these events:

The SQL query traces indicate the begin/end of query execution against the data source – theoretically, once this is done, rows are ready to be pulled from the source.

That’s when you will see the “ReadData” begin event. Rows start getting pulled at that point, until all rows are read (and encoded/compressed) – then you see the end event.

It is not always predictable how each provider/source behaves. Some of them might defer the expensive query execution until the first “Read” operation, but typically the SQL execution step tries to ensure that the command is valid, prepare the command and (at the very least) start the execution of the command.

Sometimes, the command execution might compute all the results and buffer them too.

In effect, its up to the provider/source exactly how this actually behaves…

It looks like the Power Query engine can behave in both of the ways Akshai describes depending on what you’re doing in the query.

Excel.Workbook() And The delayTypes Option In Power Query/Power BI

A while ago I found myself wondering – as I often do about this kind of thing – about what the undocumented third parameter of the Excel.Workbook() M function (called delayTypes) actually does. I found a forums post from Guy Hunkin of the Excel Power Query team in 2017 here, which gives the following summary:

Originally, Excel.Workbook used to read the entire workbook data to accurately assign types to the columns. This was EXTREMELY slow on large workbooks.

Having this in mind, we added this flag to delay the behavior. When set to “true”, we don’t infer any column types. Instead, the UI uses its normal inference algorithm to generate a Table.TransformColumnTypes step with the inferred types and if it inferred wrongly, the user can update the transformation explicitly.

I also posted a question on the Power Query MSDN forum and got some more details from Colin Banfield, one of the many M experts who hang out there, as well as doing some investigations of my own. This post summarises what I found out.

First of all, a simple demonstration of what it does. Consider the following Excel table:


Connecting to this table in the December 2018 release of Power BI Desktop generates the following M query:

    Source = 
    SourceData_Table = 
    #"Changed Type" = 
            {"TextColumn", type text}, 
            {"IntegerColumn", Int64.Type}, 
            {"DecimalColumn", type number}, 
            {"DateColumn", type date}
    #"Changed Type"

Notice that the delayTypes option is now automatically set to true and that there is a “Changed Type” step. The output of this query, as seen in the Power Query Editor, is what you would expect:


Compare this with the output of the following query which has delayTypes set to false and has no “Changed Type” step:

    Source = 
    SourceData_Table = 


There are two things to notice:

  • With delayTypes set to false no “Changed Type” step is necessary, the data types on the columns are set by Excel.Workbook(). With delayTypes set to true all the columns returned by Excel.Workbook() have their data type set to Any and a subsequent “Changed Type” step is necessary to set data types to something useful.
  • With delayTypes set to true the IntegerColumn and the DecimalColumn columns have different data types assigned to them by the “ChangedType” step: IntegerColumn becomes Int64.Type whereas DecimalColumn becomes type number. With delayTypes set to false both columns are set to number. So it looks like Excel.Workbook with delayTypes set to false will only convert columns to primitive data types.

The second question is: if it is, primarily, a performance optimisation how much of an impact does it have? I built four queries that read all the data from an 80MB xlsx file to test the performance of all of the variations of delayTypes true/false and having a “Changed Type” step present or not (just in case there was an overhead to having a “Changed Type” step). Here are the results – the main timings are in minutes and seconds, and in brackets is the length of the “Evaluating” phase of query execution:

  delayTypes=false delayTypes=true
no “Changed Type” step 2:08
“Changed Type” present 2:19

I wouldn’t read too much into these exact timings because in my experience timings for the same query can fluctuate quite a lot, but one thing is clear: setting delayTypes to true results in much better performance. Indeed with delayTypes set to false nearly half the time was spent in the “Evaluating” phase whereas with delayTypes set to true there was no “Evaluating” phase at all. As a result I can’t think of a reason not to set the delayTypes option to true.

It’s worth noting that delayTypes is set to false by default if you don’t specify this option, so be aware of this if you are writing your own M code.

Power BI Roadmap Announcements In The April 2019 Release Notes

The latest version of the “release notes” document that details the roadmap for Dynamics 365 and the whole Power Platform – which includes Power BI – has just been released. You can view it online here:

There are lots of very exciting announcements for Power BI. Here are a few of the features that are promised for later this year that caught my eye, that haven’t already been announced elsewhere (although some of them were discussed in public sessions at the last PASS Summit):

  • Expression-based formatting in Power BI reports – the ability to use a DAX expression to control properties such as colour and line style within a visual. One of those features that advanced Power BI users will love, and which will allow all kinds of interesting customisations.
  • Performance analyzer for reports – a tool in Power BI Desktop to tell you how long each visual on a page takes to refresh, making it easier to understand where your performance bottlenecks are. I wish we could have something similar for Power Query queries…
  • Customised navigation through apps
  • Shared and certified datasets – this is going to be incredibly useful! As the docs say “With shared datasets in Power BI, we are allowing a single dataset to be used by multiple reports, across workspaces”. I can’t wait. Lots of other cool, related features too here.
  • Deeper integration of paginated reports in Power BI – including the ability to create paginated reports on top of Power BI datasets.
  • Large-scale compute engine for dataflows – this is somewhat cryptic. The docs say “We are enhancing the dataflow calculation engine in Power BI Premium to enable high-performing large-scale data transformations (TBs of data).”. Are these enhancements to the Power Query engine, or is it being replaced with something else entirely (eg to make it more like Azure Data Factory?). I don’t know.
  • New dataflow viewer to visualize dataflow dependencies – I guess something similar to what we have in the “Query Dependencies” view in the Power Query Editor?
  • More new Q&A features: auto-suggestions and did you mean. I’m glad Microsoft is continuing to invest in Q&A, I think it’s going to be one of those features that suddenly goes from being a demo feature that few people actually use to something that will be very powerful indeed. In fact, I would argue that it’s already a lot more powerful than most people realise.
  • Python support in the Power BI service
  • Power BI Template Apps – the ability to create multiple apps in different workspaces from a single template. This was something that used to be possible with content packs (remember them?) so I guess anyone still using them should be able to finally migrate away. Useful for end-user customisation of apps.
  • Better integration between Flow and Power BI, for example integrated alert creation, and the ability to query a dataset and trigger refresh of a dataset from Flow. This last feature will be really important – there are lots of scenarios where you only want to refresh a dataset when you know the data is ready.

Power BI Data Privacy Levels And Cloud /Web-Based Data Sources Or Dataflows

UPDATE: it is now possible to set privacy levels for cloud data sources in the Power BI portal. See

I have already blogged in great detail many times about Power BI/Power Query data privacy settings (see this series for example) but there’s always something new to learn. Recently I was asked a question by Ian Eckert about how Power BI handles data privacy for cloud or web-based data sources after a dataset has been published, and it prompted yet more revelations…

Consider the following M query:

    Source = Xml.Tables(
    channel = Source{0}[channel],
    language = channel{0}[language],
    out = Json.Document(
    result = out[result],
    results = result[results],
    #"Converted to Table" = Table.FromList(
    Column1 = #"Converted to Table"{0}[Column1],
    #"Converted to Table1" = Record.ToTable(Column1)
    #"Converted to Table1"

It doesn’t do anything particularly interesting, but it does take data from one web-based data source (the RSS feed for this blog) and sends it to another (the UK government’s open data metadata search web service). As a result, in Power BI Desktop, if you set the data privacy settings for both data sources to Public then the query runs, but if you set the data privacy settings for both data sources to Private:



…As expected, you get the following error:


Formula.Firewall: Query ‘Test’ (step ‘Converted to Table1’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

Now the strange thing is that, when you publish the dataset that contains this M query, refresh always works. Why? What’s more, other datasets that do something similar will always fail when refreshed.

It turns out that when you publish a dataset that uses cloud or web-based data sources like the two used here, the Power BI service does not use the data privacy settings you have set in Power BI Desktop but instead it automatically assigns data privacy levels as follows:

  • Data sources, like the ones used here, that use Anonymous authentication are automatically given the privacy level Public
  • All other data sources are given the privacy level Private.

Interestingly, Power BI dataflows also count as cloud-based data sources and because they do not use Anonymous authentication they default to Private too, so if you are combining data from a dataflow with another data source in your dataset then you need to be careful of this.

What’s more there is at the time of writing no way to change these data privacy levels in the Power BI web-based portal. Hopefully this will change soon.

There are some workarounds though!

First of all, you can force refresh to take place through a gateway. This might sound strange because in theory, if you’re only using cloud or web-based data sources, a gateway should not be necessary. However there are already similar scenarios where a gateway is needed, for example if you are scraping data from a web page you need to use a gateway, and if you are combining data from a cloud-based data source with an on-premises data source you also need to use a gateway. If you add your cloud/web-based data sources as data sources in your gateway (unfortunately it does not seem to be possible to add a dataflow as a data source in a gateway, though) you can set their data privacy levels in the Advanced Settings section in the Manage Gateways screen:


You will also need to set the “Use a data gateway” option to On in the Settings dialog for your dataset after it has been published:


The other workaround is to copy the M code for your query and paste it into a new blank M query in an entity in a dataflow, as Matthew Roche shows here. While it does not seem to be possible to set data privacy levels for individual data sources when creating an entity, it is possible to turn off data privacy checks for an entity completely. If you create a query that sends data from one data source to another (regardless, as far as I can see, of the authentication mechanism used), you will see the following message in the Power Query Online query editor:


The evaluation was cancelled because combining data from multiple sources may reveal data from one source to another. Click Continue if the possibility of revealing data is okay.

If you click Continue, data privacy checks are turned off and the query runs; you can also click the Options button on the ribbon and check the “Allow combining data from multiple sources” option:


If one of your data sources is already itself a dataflow you may need to do some editing of the M query to make things work, but as Matthew Roche shows here it is possible to have an entity in a dataflow refer to another entity without using a computed entity (which is a Premium-only feature).

[Thanks to Arthi Ramasubramanian Iyer from Microsoft for providing background information for this post]

Power BI/Power Query/M Pre-cons At SQLKonferenz, SQLBits and SQL Saturday Iceland

If you’re learning Power BI and you want to deepen your understanding of its data transformation and loading functionality, the Power Query engine and the M language, I’m doing one-day pre-conference seminars on this subject at three conferences over the next few months:

You’ll learn about all the functionality in the Power Query Editor window plus concepts such as merging, appending, parameters, functions, data privacy and M. I hope to see you at one of them!

%d bloggers like this: