Refreshing Power Pivot Excel Data Models That Use The Current Workbook As A Data Source In Excel Online

Something that got lost in all the excitement around Excel reports connected to Power BI datasets working in Excel Online is the fact that Excel reports connected to the Excel Data Model – aka Power Pivot – now work too (although I did mention it here). Right now it’s not possible to refresh the data stored in the Excel Data Model if you are connected to external data sources. However, today I noticed something nice: if you are using a, Excel table in the same workbook as a source for a table in the Excel Data Model, if you make a change to the table in the worksheet then it is possible to refresh the data in the Excel Data Model.

Here’s an example. In Excel on the Desktop I created a new workbook and added a simple table to a worksheet:

I then moved to the Power Pivot tab on the ribbon and clicked the “Add to Data Model” button to add this table to the Excel Data Model:

I then created a PivotTable connected to the Excel Data Model:

After saving the workbook to OneDrive for Business, I closed it in Excel Desktop and reopened it in Excel Online in the browser:

Finally I was able to change a value in the source table, click the Refresh Selected Connection button on the Data tab in the ribbon, and see the change reflected in the PivotTable connected to the Excel Data Model:

Of course it would be better if you could refresh external data sources too but I still think this could be useful, for example if you had multiple users updating forecasts or budgets in an Excel table in the browser and were using the Excel Data Model for reporting on this data.

Handling “Select All” For Slicers Bound To Dynamic M Parameters In Power BI

Last year I blogged about handling multi-select scenarios for dynamic M parameters. Since then support for “Select All” has been added (see here) but since a regular slicer with no items selected behaves the same as one with all items selected, some extra logic is needed to make a slicer bound to a dynamic M parameter behave like a regular slicer.

To illustrate this, I created a simple dataset with two tables in it:

  1. A DirectQuery table linked to the AdventureWorksDW DimDate table (a simple date dimension table), stored in Snowflake
  2. An Import mode table called Day Name that contains one column and seven rows containing the names of the days of the week

I then created an M parameter called DayParam and bound it to the DayName column of the Day Name table and enabled the Multi-select and Select all options:

Here’s the M code that shows how to use the DayParam parameter to filter the DimDate table on the EnglishDayNameOfWeek column, and handle the scenario where “Select All” has been selected as well as the scenario where no item has been selected:

let
  Source = Snowflake.Databases(
    "xyz.snowflakecomputing.com",
    "DEMO_WH"
  ),
  AWORKS_Database = Source
    {
      [
        Name = "AWORKS",
        Kind = "Database"
      ]
    }
    [Data],
  PUBLIC_Schema = AWORKS_Database
    {[Name = "PUBLIC", Kind = "Schema"]}
    [Data],
  DIMDATE_Table = PUBLIC_Schema
    {[Name = "DIMDATE", Kind = "Table"]}
    [Data],
  #"Filtered Rows" =
    if DayParam = null then
      DIMDATE_Table
    else if Type.Is(
      Value.Type(DayParam),
      List.Type
    )
    then
      if DayParam = {"__SelectAll__"} then
        DIMDATE_Table
      else
        Table.SelectRows(
          DIMDATE_Table,
          each List.Contains(
            DayParam,
            [ENGLISHDAYNAMEOFWEEK]
          )
        )
    else
      Table.SelectRows(
        DIMDATE_Table,
        each [ENGLISHDAYNAMEOFWEEK]
          = DayParam
      )
in
  #"Filtered Rows"

The #”Filtered Rows” step does the following:

  • If nothing is selected in the slicer then the DayParam parameter will contain a null value, and in this case no filter will be applied to the DimDate table
  • If the DayParam parameter is of type list (when there is a multi-select in the slicer) then
    • If it’s a list containing a single text value of “__SelectAll__”, the default value returned when the slicer has “Select all” selected, then no filter will be applied to the DimDate table
    • Otherwise List.Contains will be used to filter the DimDate table by all the selected values
  • Otherwise the DayParam parameter will contain a single scalar value (because a single value has been selected in the slicer) and the DimDate table will be filtered by that value

Here’s how it works:

Multi-Value Parameters In Power Query Online

There’s a nice new feature in Power Query Online (the version of Power Query used in Dataflows): parameters of type List. You can see this as a new option in the Type dropdown in the “Manage parameters” dialog:

Why is this interesting? In the past, Power Query parameters were always single values like a date or a string; now a parameter can contain mutliple values.

There’s one other new feature in Power Query Online that goes along with this: In and Not In filters, which can use these new List parameters.

I’m sure there are other cool things you can do with this but I’ll leave them to future blog posts.

Power BI Report Accessibility: Emulate Vision Deficiencies Using Edge DevTools

I’m not an expert on Power BI report accessibility like Meagan Longoria but I do know how important an issue accessibility is. I also know how difficult it can be to remember to check for accessibility issues when building reports which is why, when I was watching this video on new features in Edge DevTools, I was pleased to see that Edge now makes it easy to see how your report looks like when viewed by someone with vision deficiencies.

Full documentation is here but since this is a feature aimed at web developers rather than the needs of a Power BI developer, here’s a quick summary. All you need to do is open your report in Edge, hit Ctrl-Shift-I to open DevTools, go to the Rendering pane (you can find it by clicking on the chevron icon in the top menu):

The Edge DevTools menu and how to find the Rendering pane

…and then going to the “Emulate vision deficiencies” menu:

The "Emulate vision deficiencies" menu

Here’s a demo of what it does:

Animated gif showing how the "Emulate vision deficiencies" menu affects a Power BI report
Of course this isn’t the only thing you should be checking for regarding accessibility (see here for a more comprehensive list, for example) but having this feature built into Edge does remove a lot of the friction around making these checks.

Three New Power BI/Power Query Books

I decided to stop writing book reviews here on my blog a long time ago: it’s a lot of work to read a book and write a proper, detailed review and what’s more I don’t like the idea of writing a bad review and upsetting someone who has gone to all the effort of writing a book. That said, from time to time I get given free copies of books (which I’m always happy to receive – I like to see how other people go about explaining Power BI concepts and functionality) and in return I give the authors some free publicity here. Recently I received two copies of new books from people that I know:

Expert data modeling with Power BI, by Soheil Bakhshi (Buy it here on Amazon UK)

Soheil is an MVP whose blog I have read and admired for some time so I’m pleased to see he has written a book. It’s an important subject too: good data modelling is key to success with Power BI, and the problems of many customers I work with stem from not taking the time to learn how data should be modelled for Power BI. This book introduces you to concepts like dimensional modelling and star schemas and shows you how to build datasets that follow best practices. It also covers topics such as calculation groups and object-level security that won’t be in older books.

Power Query cookbook, by Andrea Janicijevic (Buy it here on Amazon UK)

Andrea is a colleague of mine at Microsoft and of course Power Query is a technology close to my heart. This book follows the cookbook format which teaches through a series of worked examples and easy-to-follow steps; anyone learning Power Query will find it useful to follow these recipes to get practice creating queries. I liked the inclusion of Power BI Dataflows as well as Power Query in Power BI Desktop, and again this book has the advantage of being new – it covers recently-added features such as Schema View and Diagram View in Dataflows and Query Diagnostics in Power BI Desktop that won’t be covered in other books.

There’s another book I was curious about and was lucky enough to be able to read via Microsoft’s online library for employees:

Pro Power BI theme creation, by Adam Aspin (Buy it here on Amazon UK)

When I hear someone had written a book about Power BI theme files I couldn’t believe it, but Adam is an experienced writer and has pulled it off. As you might expect it’s everything you ever wanted to learn about Power BI themes and as such, if themes are something you’re interested in you should read this book. It explains how theme files are structured, how to edit them and how the various attributes are applied to different visuals.

Why You Should Optimise Your Power BI Premium Reports And Refreshes For CPU Time As Well As Duration

When you tune a Power BI report or dataset refresh your first job is to make it run faster – users hate slow reports and late data. However, if you are using Power BI Premium you also need to think about reducing the amount of CPU used by a query or refresh as well, and in this post I’ll explain why.

First of all, what are the metrics I’m talking about? For query performance tuning, if you connect SQL Server Profiler to Power BI Desktop or the Power BI Service (as described here for example) and look at the Query End event you can see all the DAX queries executed when a report is rendered. The TextData column gives the query text, the Duration column shows the amount of time the query took to run in milliseconds and the CPUTime column is a representation of the amount of CPU resources used by the query:

The Command End event gives you similar information for refreshes:

The same data is available in Log Analytics, albeit with slightly different column names: the EventText holds the query text, DurationMs has the duration and CpuTimeMs has the CPU time:

As I said, it’s the duration that, traditionally, you would have wanted to reduce. In fact, in the past, CPU Time didn’t even return accurate values so it wasn’t useful – although that changed recently.

So why is CPU Time so important? In Power BI Premium gen1 you have a fixed set of resources that all your queries and refreshes compete for; if you try to use more CPU than you have available then performance suffers. Power BI Premium gen2 works differently (as explained in detail here and here): you no longer have a fixed set of resources available, but if you use more CPU than your capacity allows you will either pay extra (if you enable autoscale) or get throttled. The Premium Capacity Metrics app gives you a lot of information about CPU usage across your Premium capacities. Power BI Premium Per User is based on the Power BI Premium gen2 platform and some of the same considerations exist for it as for Premium gen2, although they aren’t fully documented or implemented yet and I can’t go into details here.

As a result it’s important for the overall cost and performance of a Premium capacity that all the queries and refreshes that run on it use as little CPU as possible. One last point: while Duration and CPU Time are linked the relationship is not necessarily linear. For example you could tune a Storage Engine-heavy DAX query and achieve a big reduction in CPU Time but a smaller reduction in Duration because the Storage Engine can do a lot of work in parallel across multiple cores; on the other hand tuning a Formula Engine-heavy DAX query might result in an equal reduction in both Duration and CPU Time. Similarly because the overall amount of time taken to refresh a dataset is often determined by the amount of time taken to refresh the biggest table, you might be able to significantly reduce the CPU Time for a refresh without changing the duration much. The CPU used by background operations like refresh is smoothed over a 24 hour period in Premium gen2 so there is no need to try to slow down a refresh to avoid a spike in CPU usage that might lead to throttling or autoscaling, but it is still true that reducing the amount of CPU used by a refresh at any given point in time will mean more resources are available for any queries that are running on the same node at the same time.

[Thanks to Akshai Mirchandani and Giri Nair for the information in this post]

Preserving Data Types With SQL Queries In Power Query And Power BI

My post earlier this year on enabling query folding when using SQL queries as a data source in Power Query provoked a lot of interest. This post adds one more useful detail: how to preserve the original data types of the columns in your query when using this technique with SQL Server-related sources.

Consider the DimDate table in the AdventureWorksDW2017 sample database for SQL Server:

Notice that the FullDateAlternateKey column has the data type Date.

If you connect to this table in the normal way, by selecting it in the Navigation pane when you connect to your SQL Server instance, the M code for your Power Query query will look something like this:

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  dbo_DimDate = AdventureWorksDW2017
    {[Schema = "dbo", Item = "DimDate"]}
    [Data]
in
  dbo_DimDate

Unsurprisingly, the FullDateAlternateKey column in the Power Query query also has a data type of Date, as indicated by the calendar icon on the left side of the column header in the Power Query Editor:

However, if you use Value.NativeQuery to run a SQL query to get the same data and set EnableFolding=true, like so:

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  Q = Value.NativeQuery(
    AdventureWorksDW2017, 
    "Select * From DimDate", 
    null, 
    [EnableFolding = true]
  )
in
  Q

…you’ll see that the FullDateAlternateKey column comes through as a DateTime type instead:

The same thing would happen with a column of type Time too, ie it would come through as a DateTime.

If you want the types in the output of Value.NativeQuery to match the types in the output of the first Power Query query above there’s an extra option you need to add: PreserveTypes=true.

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  Q = Value.NativeQuery(
    AdventureWorksDW2017, 
    "Select * From DimDate", 
    null, 
    [
      PreserveTypes = true, 
      EnableFolding = true
    ]
  )
in
  Q

In the output of this query, FullDateAlternateKey has the data type Date again:

This option is only available for the SQL Server connector (and connectors related to it) at the time of writing.

[Thanks to Curt Hagenlocher for this information]

Matching DAX Queries To Individual Visuals In A Published Power BI Report

The integration between Azure Log Analytics and Power BI opens up a lot of new possibilities for Power BI administrators. It’s still in preview (there’s a lot more cool stuff still to come) but the Analysis Services events that are available at the time of writing are still very useful: they give you pretty much everything you had in Profiler plus some new stuff too. In this post I’ll show you how you can use this data to work out which visual in a published report generated a particular DAX query.

In Log Analytics you can get information about all the queries run against a dataset using a simple KQL query like this one (I haven’t included an explicit time filter in this query, although you would want to do this in real life):

PowerBIDatasetsWorkspace | 
where ArtifactName == "InsertWorkspaceNameHere" 
and OperationName == 'QueryEnd' 

The QueryEnd event is raised every time a query finishes running – it’s an event you may already be familiar with from Profiler, xEvents or the Log Analytics integration with AAS – and it tells you useful things like the time the query finished, the duration, the text of the DAX query, the user running the query and so on. However what is new for Power BI is the contents ApplicationContext column. My colleague Kasper blogged about how you can use this column to get the DatasetId and ReportId when using AAS here; it now also contains a GUID that identifies the exact visual on a report that generated the DAX query.

Taking a closer look at this column shows that it contains a JSON document:

Here’s a more sophisticated KQL query that parses that JSON document, extracts the contents into separate columns and returns other columns you might need for troubleshooting like the DAX query text and query duration:

PowerBIDatasetsWorkspace | 
where ArtifactName == "InsertWorkspaceNameHere" 
and OperationName == 'QueryEnd' |
extend a = todynamic(ApplicationContext)|
extend VisualId = a.Sources[0].VisualId, ReportId = a.Sources[0].ReportId, 
DatasetName = ArtifactName, DAXQuery = EventText |
project TimeGenerated, WorkspaceName, DatasetName, ReportId, VisualId, DurationMs, DAXQuery |
order by TimeGenerated desc 

The ID of the dataset isn’t that interesting because you can get it, and the name of the dataset, in other columns. The ReportId is useful because it tells you which report generated the query and there are a number of ways you can find out which report this relates to. For example, you can just open the report in the browser and look at the url: the Report ID is the GUID between “/reports/” and “/ReportSection” as detailed here.

How can you work out which visual the VisualId relates to though? It’s not obvious, but it is possible. First you have to open your report in Power BI Desktop, open Performance Analyzer, refresh the report page and export the data from Performance Analyzer as a JSON file as detailed here. The data you need is hidden in this JSON file in the Visual Container Lifecycle event; here’s the M code for a Power Query query to get it:

let
  Source = Json.Document(
    File.Contents(
      "C:\InsertFileNameHere.json"
    )
  ), 
  events = Source[events], 
  #"Converted to Table" = Table.FromList(
    events, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Expanded Column1"
    = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {
      "name", 
      "component", 
      "start", 
      "id", 
      "metrics", 
      "end", 
      "parentId"
    }, 
    {
      "name", 
      "component", 
      "start", 
      "id", 
      "metrics", 
      "end", 
      "parentId"
    }
  ), 
  #"Filtered Rows" = Table.SelectRows(
    #"Expanded Column1", 
    each (
      [name]
        = "Visual Container Lifecycle"
    )
  ), 
  #"Expanded metrics"
    = Table.ExpandRecordColumn(
    #"Filtered Rows", 
    "metrics", 
    {
      "status", 
      "visualTitle", 
      "visualId", 
      "visualType", 
      "initialLoad"
    }, 
    {
      "status", 
      "visualTitle", 
      "visualId", 
      "visualType", 
      "initialLoad"
    }
  ), 
  #"Removed Other Columns"
    = Table.SelectColumns(
    #"Expanded metrics", 
    {
      "visualTitle", 
      "visualId", 
      "visualType"
    }
  )
in
  #"Removed Other Columns"

Here’s an example of the output of this Power Query query:

This shows that, on the report in question, there were three visuals: a card visual called “My Card”, a column chart called “My Column Chart” and a table called “My Table”. It also shows the IDs of these visuals which you can match up with the VisualId values in the ApplicationContext column in Log Analytics.

The last problem is matching these names up with the actual visuals in the report and this will only be easy if the report designer has followed the best practice of giving each visual a meaningful name; you can now do this by double-clicking on the visual in the Selection pane or by editing the Title Text property as Reid Havens shows here.

Connecting To REST APIs With OAuth2 Authentication In Power Query/Power BI

There are a lot of articles and blog posts out there on how to handle OAuth2 authentication when connecting to REST APIs from Power Query in Power BI. However there is also a lot of confusion and contradictory information too so in this post I want to give you the definitive, Microsoft-endorsed answer to this question, which is:

If want to connect from Power BI to a REST API that uses OAuth2 authentication then you need to build a custom connector. You can find documentation on how to implement an OAuth2 flow in a custom connector here.

The only exception is that you can connect to some APIs that use AAD authentication using the built-in web or OData connectors, as documented here.

A quick web search will turn up several examples of how to implement an OAuth2 credential flow in regular Power Query queries without needing a custom connector. This is not recommended: it’s not secure and it’s not reliable. In particular, hard-coding usernames/passwords or client ids/client secrets in your M code is a really bad idea. What’s more requesting a new token every time a query runs isn’t great either.

Unfortunately Excel Power Query doesn’t support custom connectors at the time of writing. Also, if you use a custom connector in the Power BI Service then you’ll need to use an on-premises gateway. Finally, there’s an article here explaining why it isn’t easy to connect Power BI to the Microsoft Graph API.

[Thanks to Curt Hagenlocher and Matt Masson for the information in this post]

How Query Folding And The New Power BI Dataflows Connector Can Help Dataset Refresh Performance

You may have noticed that a new dataflows connector was announced in the August 2021 release of Power BI Desktop, and that it now supports query folding between a dataset and a dataflow – which you may be surprised to learn was not possible before. In this post I thought I’d take a look at how much of an improvement in performance this can make to dataset refresh performance.

For my tests I created a new PPU workspace and a dataflow, and made sure the Enhanced Compute Engine was turned on for the dataflow on the Settings page:

Query folding will only happen if the Enhanced Compute Engine is set to “On”, and won’t happen with the “Optimized” setting. The Enhanced Compute Engine is only available with PPU and Premium.

For my data source I used a CSV file with a million rows in and seven integer columns. I then created two tables in my dataflow like so:

The Source table simply connects to the CSV file, uses the first row as the headers, then sets the data type on each column. The second table called Output – which contains no tranformations at all – is needed for the data to be stored in the Enhanced Compute Engine, and the lightning icon in the top-left corner of the table in the diagram shows this is the case.

Next, in Power BI Desktop, I created a Power Query query that used the old Power BI dataflows connector:

If you have any existing datasets that connect to dataflows, this is the connector you will have used – it is based on the PowerBI.Dataflows function. My query connected to the Output table and filtered the rows to where column A is less than 100. Here’s the M code, slightly edited to remove all the ugly GUIDs:

let
    Source = PowerBI.Dataflows(null),
    ws = Source{[workspaceId="xxxx"]}[Data],
    df = ws{[dataflowId="yyyy"]}[Data],
    Output1 = df{[entity="Output"]}[Data],
    #"Filtered Rows" = Table.SelectRows(Output1, each [A] < 100)
in
    #"Filtered Rows"

Remember, this connector does not support query folding. Using this technique to measure how long the query ran when the results from the query were loaded into the dataset, I could see it took almost 12.5 seconds to get the data for this query:

In fact the performance in Desktop is worse: when refresh was taking place, I could see Power BI downloading 108MB of data even though the original source file is only 54MB.

Why is the data downloaded twice? I strongly suspect it’s because of this issue – because, of course, no query folding is happening. So the performance in Desktop is really even worse.

I then created the same query with the new dataflows connector:

This connector uses the PowerPlatform.Dataflows function; it’s not new, but what is new is that you can now access Power BI dataflows using it.

Here’s the M code, again cleaned up to remove GUIDS:

let
    Source = PowerPlatform.Dataflows(null),
    Workspaces = Source{[Id="Workspaces"]}[Data],
    ws = Workspaces{[workspaceId="xxxx"]}[Data],
    df = ws{[dataflowId="yyyy"]}[Data],
    Output_ = df{[entity="Output",version=""]}[Data],
    #"Filtered Rows" = Table.SelectRows(Output_, each [A] < 100)
in
    #"Filtered Rows"

When this query was loaded into the dataset, it only took 4 seconds:

This is a lot faster, and Power BI Desktop was a lot more responsive during development too.

It’s reasonable to assume that query folding is happening in this query and the filter on [A]<100 is now taking place inside the Enhanced Compute Engine rather than in Power BI Desktop. But how can you be sure query folding is happening? The “View Native Query” option is greyed out, but of course this does not mean that query folding is not happening. However, if you use Query Diagnostics, hidden away in the Data Source Query column of the detailed diagnostics query, you can see a SQL query with the WHERE clause you would expect:

In conclusion, you can see that the new dataflows connector can give you some big improvements for dataset refresh performance and a much better development experience in Power BI Desktop. Query folding support also means that you can now use dataset incremental refresh when using a dataflow as a source. However, you will need to use Premium or PPU, you may also need to make some changes to your dataflow to make sure it can take advantage of the Enhanced Compute Engine, and you will also need to update any existing Power Query queries to use the new connector. I think the potential performance gains are worth making these changes though. If you do make these changes in your dataflows and find that it helps, please leave a comment!

%d bloggers like this: