Exploring Power BI Run-Length Encoding With DMVs

Recently I was involved in an interesting discussion on Twitter X about how partitioning a table in a Power BI dataset semantic model can affect compression and therefore its size and query performance. You can read the thread here. It got me thinking: is there a way to get more detail on how well compression, and in particular run-length encoding (RLE from hereon), is working for a column when you’re using Import mode or Direct Lake? After a bit of research I found out there is, so let’s see some examples that illustrate what I learned.

First of all, consider the following M query that returns a table with one numeric column called MyNumbers:

let
    Source = {1..DistinctValues},
    Repeat = List.Transform(Source, each List.Repeat({_}, TotalRows/DistinctValues)),
    Combine = List.Combine(Repeat),
    #"Converted to Table" = Table.FromList(Combine, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "MyNumbers"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"MyNumbers", Int64.Type}})
in
    #"Changed Type"

It references two M parameters: TotalRows, which determines the total number of rows in the table, and DistinctValues, which determines the number of distinct numeric values in the MyNumbers column. With TotalRows=9 and DistinctValues=3, it returns the following output:

Note that in this case it returns 3 rows with the value 1 repeated, 3 rows with the value 2 repeated and 3 three rows with the value 3 repeated; three sets of repeated values in all. It’s fair to assume that repeated sets of values like this are a good candidate for RLE.

I created a semantic model in Import mode containing only this table and published it to the Power BI Service. Initially TotalRows was set to 1,000,000 and DistinctValues was set to 100 – so the table consisted of just 100 sets of 10,000 repeated values. I chose 1,000,000 rows because that’s the size of a segment in the Power BI Service with the “small semantic model format” setting and any compression that takes place always takes place within a segment.

When the Analysis Services engine inside Power BI compresses data it looks for sequences of repeated values to see if RLE can be used. If it finds them, these sequences result in “pure” RLE runs; if it doesn’t find these sequences they are called “impure” RLE runs and the values are stored using bitpack compression. Pure runs are generally a good thing, impure runs generally a bad thing. You can see how many pure and impure runs there are using the TMSCHEMA_COLUMN_STORAGES DMV, for example with the following DMV query:

select 
[Name], Statistics_DistinctStates, Statistics_RowCount, 
Statistics_RLERuns, Statistics_OthersRLERuns 
from $SYSTEM.TMSCHEMA_COLUMN_STORAGES

Running this query in DAX Studio on my published semantic model returned the following table:

[You can ignore all the rows except the one for the MyNumbers column in this table]

The Statistics_RLERuns column shows the number of pure RLE runs; the Statistics_OthersRLERuns column shows the number of impure RLE runs. In this case you can see, for the MyNumbers column, there were 100 pure RLE runs and no impure runs, so as expected RLE is working well.

Here’s what Vertipaq Analyzer showed for this table:

Unsurprisingly the size of the MyNumbers column is very small.

Then I changed DistinctValues to 100,000 (keeping TotalRows at 1,000,000), giving me 100,000 sets of 10 values, and refreshed the dataset. Here’s what the DMV query on TMSCHEMA_COLUMN_STORAGES returned:

And here’s what Vertipaq Analyzer showed:

As you can see, the column was a lot larger than before; there were no pure RLE runs and one impure RLE run. In this case the large number of distinct values in the column prevented RLE from taking place and this had a negative impact on the size of the column.

These are two extreme cases. What about a scenario that’s somewhere in between? I modified my M query as follows:

let  
    RepeatedNumbers = 
    let
    Source = {1..DistinctValues},
    Repeat = List.Transform(Source, each List.Repeat({_}, ((TotalRows/2)/DistinctValues))),
    Combine = List.Combine(Repeat),
    #"Converted to Table" = Table.FromList(Combine, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "MyNumbers"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"MyNumbers", Int64.Type}})
in
    #"Changed Type",

    RandomNumbers = 
    let
    Source = {1..TotalRows/2},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Added Custom" = Table.AddColumn(#"Converted to Table", "MyNumbers", each Number.Round(Number.RandomBetween(TotalRows+1, TotalRows*2))),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"MyNumbers"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"MyNumbers", Int64.Type}})
in
    #"Changed Type",

    Output = Table.Combine({RepeatedNumbers, RandomNumbers})
in
    Output

What this version of the code does is return a table where the first 50% of the rows are repeated numbers and the second 50% are random numbers. With TotalRows set to 12 and DistinctValues set to 2 it produces the following output:

With this version published to the Power BI Service I set TotalRows to 1,000,000 again and set DistinctValues to 2000, resulting in a table with 2000 sets of 250 repeating values followed by 500,000 random values. Here’s what the DMV query against TMSCHEMA_COLUMN_STORAGES returned:

As you can see there are now 2000 pure runs (I assume for the first 50% of rows with repeated values) and 1 impure run (I assume for the second 50% of rows with random values).

Here’s the output of Vertipaq Analyzer:

The column is now almost as large as in the second scenario above.

You can get a bit more detail about what’s happening in the impure runs with the DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS DMV. Running the following query against the latest version of the table:

select 
column_ID, partition_name, segment_number,
records_count, bits_count, used_size
from $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS

…returns the following values:

To get a rough idea of the number of rows in the impure runs you can use the following formula:

(used_size * 8)/bits_count

In this case for the MyNumbers column (1349720 * 8)/21 = 514,179 which makes sense since my code returns 500,000 rows of random numbers. The records_count column in this query returns the total number of rows in the segment, so the higher the numberof rows in impure runs relative to the total, the worse compression you’re getting.

What practical use is this information? Probably not much as you might think, interesting as it is. It can tell you how well RLE is working for a column but it doesn’t tell you much about how to optimise it, or if it is possible to optimise it, or if optimising it is a good idea – that’s a subject for another blog post.

[Thanks to Marius Dumitru and Akshai Mirchandani for the information in this post]

New HostProperties Data In The ApplicationContext Column In Power BI Log Analytics

If you’re a fan of using Log Analytics for monitoring Power BI activity, then you may have noticed there’s some new data in the ApplicationContext column of the PowerBIDatasetsWorkspace table. Up until recently the ApplicationContext column only contained IDs that identify the report and the visual that generated a DAX query (something I blogged about here); it now contains additional information on the type of Power BI report that generated the query and an ID for the user session.

Here’s an example of the payload with the new data in bold:

[
  {
    "ReportId": "2beeb311-56c8-471a-83a3-6d7523d40dc7",
    "VisualId": "477b5dc44249fe897411",
    "HostProperties": {
      "ConsumptionMethod": "Power BI Web App",
      "UserSession": "a3d941bd-c374-4e0e-b911-5086310cb345"
    }
  }
]

Here’s an example KQL query that returns the new ConsumptionMethod and UserSession data from HostProperties:

PowerBIDatasetsWorkspace
| where TimeGenerated > ago(1d)
| where OperationName == 'QueryEnd' 
| where ApplicationContext!=""
| extend hp = parse_json(ApplicationContext)
| extend ConsumptionMethod = hp.Sources[0].HostProperties.ConsumptionMethod, 
UserSession = hp.Sources[0].HostProperties.UserSession
|project TimeGenerated, EventText, ApplicationName, ConsumptionMethod, UserSession
| order by TimeGenerated desc

ConsumptionMethod is less useful than it might first appear: at the time of writing it only returns data for Power BI reports (and not other types of report such as Paginated reports), although it will allow you to differentiate between different methods of viewing a Power BI report such as viewing via a browser or viewing via Teams. It should be used in combination with the ApplicationName column to get a fuller picture of the way reports are being consumed.

UserSession is something that I need to explore in more detail. Grouping user activity into sessions is something I blogged about here, but this is the user session ID used internally and therefore a lot more reliable. I don’t know the rules which govern how activity is grouped into sessions though, so I will only blog about this when I find out more.

[Thanks to my colleague Andreea Sandu for this information]

Getting Report Visual IDs With Power BI Desktop Developer Mode

Back in 2021 I wrote a post showing how you can link a DAX query generated by a Power BI report in Log Analytics to a visual in a Power BI report. In that post I showed how you can get the unique IDs for each visual in a report using Performance Analyzer, an approach which works but has drawbacks. Then, in 2022 I wrote another post showing how you can get the same IDs using the Power BI Embedded Analytics Playground, which again works but isn’t perfect. After that, this August, my colleague Rui Romano pinged me on Teams to point out that the new Power BI Desktop Developer Mode and the .pbip file format provides the best and easiest way to get these IDs.

If you do save a Power BI report in .pbip format, look in the report folder and open the report.json file (where the report definition is stored) with a tool like Visual Studio Code then you can find the IDs for visuals quite easily. The sections array lists all the pages in the report; the visualContainers array within that lists all the visuals on a page; and the visual’s ID can be found in the config property as its name:

As a result, if you’re looking in Log Analytics and you need to understand which visual a query was generated by, you can just copy the ID from the ApplicationContext column of a QueryEnd event and search for that ID in the report.json file. Even when you’ve found that ID, though, it can be hard to know which visual it relates to. This is where a tool I blogged about a few weeks ago, PBI Inspector, comes in handy: as part of its output it draws wireframes of each report page showing each visual, its type and its ID. Here’s the report page from the example above, with two visuals on it, and the associated PBI Inspector wireframe:

Yet another way in which Power BI Developer Mode makes our lives better!

Analyse Power BI Data In Excel With Python

In the Power BI/Fabric community everyone is excited about the recent release of Semantic Link: the ability to analyse Power BI data easily using Python in Fabric notebooks. Sandeep Pawar has an excellent blog post here explaining what this is and why it’s so cool. Meanwhile in the Excel community, everyone is excited about the new integration of Python into Excel. But can you analyse Power BI data in Excel using Python? Yes you can – so as my teenage daughter would say, it’s time for a crossover episode.

Let’s see a simple example. The main problem to solve is how to get data from Power BI into Excel in a format that Python in Excel can consume easily, ie a table rather than a PivotTable. The easiest way to do this is using the new Excel Connected Table feature, described here, which creates a live connection back to Power BI so when the data in the dataset changes the data in Excel is updated too. I have a Power BI dataset published to the Service that contains data from the UK government’s Land Registry Price Paid data which has details of all the property transactions in England and Wales; I found this in Excel and clicked Insert Table:

I then defined a query that found the number of property transactions and average price paid broken down by county:

This gave me a table, connected to the dataset in the Power BI Service using a DAX query, in my worksheet:

I renamed this table to “Sales”:

Unfortunately you can’t change the ugly column names without rewriting the DAX query behind the table, which makes life more difficult later on.

Then, on a new worksheet, I created a Python code cell using the PY function and entered the following Python code:

df=xl("Sales[#All]", headers=True)
s = plt.scatter(df.iloc[:,1], df.iloc[:,2], marker=11)
s.axes.xaxis.set_label_text("Count Of Sales")
s.axes.yaxis.set_label_text("Average Price Paid")

[I’ve only just started learning Python so please excuse any newbie bad practices/mistakes here! The main point is to visualise the data returned from Power BI]

This gave me a scatterplot with each county as a marker, the count of sales measure on the x axis and the average price paid measure on the y axis:

A few comments:

  • The xl function allows you to reference Excel cells, ranges and tables in your code; the reference to Sales[#All] gets the whole of the Sales table, including headers; adding headers=True means the table headers are recognised as such
  • Dealing with those ugly column names in Python is such a pain that I copped out and referenced the columns by position
  • After entering the code and committing it, you also need to tell Excel to treat the output as an Excel Value rather than a Python object to see the scatterplot; you also need to resize the cell

A second way of getting data into Excel from Power BI is to export the data from a published Power BI report. If you’re going to do that, you should export as a connected table so again the data stays connected to the source Power BI dataset.

There’s also a third , slightly different way of getting data from Power BI into Excel that is possible if you have Premium and which is a bit more complex but also more flexible: you can use Power Query, but maybe not in the way you would expect. The xl function can reference the output of a Power Query query even if that query is not loaded to a worksheet or the Excel Data Model – which I think is a nice touch and important if you’re working with larger data volumes.

To get data from Power BI into Excel using Power Query you need to use Power Query’s Analysis Services connector to connect to your workspace’s XMLA Endpoint. Go to the Data tab in Excel, click the Get Data button then From Database/From SQL Server Analysis Services Database (Import):

On the connection dialog the XMLA Endpoint goes into the Server box, the name of the dataset goes into the Database box and you can paste a DAX query into the MDX or DAX query box:

There are several benefits to using this approach:

  • You can use your own DAX query rather than have one generated for you
  • You can easily edit the DAX query after you have created the Power Query query
  • You can rename the query as well as all those ugly column names, making them easier to work with in Python – I named my query SalesByCounty and renamed my columns to County, CountOfSales and AveragePricePaid

I then closed the Power Query Editor without loading the output of the query anywhere.

You can read more about how to use Power Query queries in Python in Excel here.

Finally, here’s the modified version of the Python code to create the scatterplot shown above:

df=xl("SalesByCounty")
s = plt.scatter(df.CountOfSales, df.AveragePricePaid, marker=11)
s.axes.xaxis.set_label_text("Count Of Sales")
s.axes.yaxis.set_label_text("Average Price Paid")

Note how, in the first line, I can reference the Power Query query by name in the xl function and how, in the second line, renaming the columns in Power Query makes writing the Python code much easier.

Is this actually going to be useful to anyone? Well if Python in Excel is going to be used, it will be used by data analysts who love both Excel and Python – and who are also likely to use Power BI too. As Sandeep argues in the blog post about Semantic Link referenced above there are several reasons why these data analysts should use a Power BI dataset as a data source for their work rather than going back to the raw data: for example they can be sure they are using exactly the same data that is being used in their reports and they can use measures defined in the dataset rather than have to recreate the same calculations in their own code. While Semantic Link in Fabric is much more powerful than anything you can do in Excel with Power BI data, it’s only available in Fabric notebooks and this needs a Fabric or Premium capacity; this technique is available to anyone who has Python in Excel and works with Power BI Pro as well as Premium. So yes, in conclusion, I think there are some practical uses for this.

VisOps For Power BI With PBI Inspector

This week, one of my colleagues at Microsoft, Nat Van Gulck, showed me a cool new open-source tool he’s been working on to make VisOps for Power BI much easier: PBI Inspector. What is VisOps? I’ll admit I didn’t really know either, so being lazy I asked Nat to write a few paragraphs describing the project and why it will be useful:

Great progress has been made over the years with software development CI\CD tooling and processes (aka DevOps), just not so much with BI report visualisations and charts where we’ve come to accept only manual checks before publishing to production.  PBI Inspector is a rules-based visual layer testing tool for Power BI which aims to fill this tooling gap. It runs on either the report author’s desktop or as part of a CI\CD pipeline. The latter follows naturally from the recent Power BI Desktop Developer mode announcement which marks a step change in providing Pro developers with much improved source control capabilities. PBI Inspector reads report files in the PBIP format (currently in Preview) although it also accepts PBIX files. Test results can be presented in several formats including HTML, JSON and Azure DevOps logging commands

PBI Inspector’s rules combine Greg Dennis’s JsonLogic .NET implementation, which allows for the definition of expressive rule logic, with the querying abilities of JsonPath and JsonPointer libraries to select nodes from the Power BI report’s JSON layout definition for testing purposes.  As an illustrative example, here’s a rule that tests if charts are wider than tall in each report page and returns an array with the names of visuals that fail the test: 

{ 

        "name": "Charts wider than tall", 

        "description": "Want to check that your charts are wider than tall?", 

        "disabled": false, 

        "logType": "warning", 

        "forEachPath": "$.sections[*]", 

        "forEachPathName": "$.name", 

        "forEachPathDisplayName": "$.displayName", 

        "path": "$.visualContainers[*].config", 

        "pathErrorWhenNoMatch": false, 

        "test": [ 

        	        { 

                     "map": [ 

                     	{ 

                                    "filter": [ 

                                        { 

                                            "var": "visualsConfigArray" 

                                        }, 

                                        { 

                                            "<=": [ 

                                                { 

                                                    "var": "layouts.0.position.width" 

                                                }, 

                                                { 

                                                    "var": "layouts.0.position.height" 

                                                } 

                                            ] 

                                        } 

                                    ] 

                                }, 

                                { 

                                    "var": "name" 

                                } 

                            ] 

                        }, 

                        { 

                            "visualsConfigArray": "." 

                        }, 

                        [] 

                    ] 

} 

Here’s an example result wireframe depiction of a report page (provided as part of the HTML output) highlighting two visuals that failed the test because they are taller than wide: 

For additional rule examples, see PBI-Inspector/DocsExamples/Example rules.json at main · NatVanG/PBI-Inspector (github.com). For further details see NatVanG/PBI-Inspector: A rules-based Power BI Desktop file inspection or testing tool. (github.com)

I think this is a great example of the kind of community innovation that Power BI Desktop Developer Mode allows (see also the recent announcement of PBI Explorer). A lot of organisations that use Power BI don’t, and will never, care about this kind of thing – but those who do have been very vocal about Power BI’s previous limitations in the area of DevOps and DataOps. Thanks to the work of people like Mathias Thierbach (of pbi-tools fame), John Kerski and Nat we can see how quickly Power BI is catching up.

Keep Your Existing Power BI Data And Add New Data To It Using Fabric

One of the most popular posts on my blog in the last few years has been this one:

To be honest I’m slightly ashamed of this fact because, as I say in the post, the solution I describe is a bit of a hack – but at the same time, the post is popular because a lot of people have the problem of needing to add new data to the data that’s already there in their Power BI dataset and there’s no obvious way of doing that. As I also say in that post, the best solution is to stage the data in a relational database or some other store outside Power BI so you have a copy of all the data if you ever need to do a full refresh of your Power BI dataset.

Why revisit this subject? Well, with Fabric it’s now much easier for you as a Power BI developer to build that place to store a full copy of your data outside your Power BI dataset and solve this problem properly. For a start, you now have a choice of where to store your data: either in a Lakehouse or a Warehouse, depending on whether you feel comfortable with using Spark and notebooks or relational databases and SQL to manage your data. What’s more, with Dataflows gen2, when you load data to a destination you now have the option to append new data to existing data as well as to replace it:

If you need more complex logic to make sure you only load new records and not ones that you’ve loaded before, there’s a published pattern for that.

“But I’m a Power BI developer, not a Fabric developer!” I hear you cry. Perhaps the most important point to make about Fabric is that Power BI is Fabric. If you have Power BI today, you will have Fabric soon if you don’t have the preview already – they are the same thing. One way of thinking about Fabric is that it’s just Power BI with a lot more stuff in it: databases, notebooks, Spark and pipelines as well as reports, datasets and dataflows. There are new skills to learn but solving this problem with the full range of Fabric workloads is a lot less complex than the pure Power BI approach I originally described.

“But won’t this be expensive? Won’t it need a capacity?” you say. It’s true that to do all this you will need to buy a Fabric capacity. But Fabric capacities start at a much cheaper price than Power BI Premium capacities: an F2 capacity costs $0.36USD per hour or $262.80USD per month and OneLake storage costs $0.023 per GB per month (for more details see this blog post and the docs), so Fabric capacities are a lot more affordable than Power BI Premium capacities.

So, with Fabric, there’s no need for complex and hacky workarounds to solve this problem. Just spin up a Fabric capacity, create a Warehouse or Lakehouse to store your data, use Dataflows Gen2 to append new data to any existing data, then build your Power BI dataset on that.

Multiple Connections To The Same Data Source In The Power BI Service With Shareable Cloud Connections

A few weeks ago an important new feature for managing connections to data sources in the Power BI Service was released: Shareable Cloud Connections. You can read the blog post announcing them here. I won’t describe their functionality because the post already does that perfectly well; I want to focus on one thing in particular that is important for anyone using Power BI with Snowflake (and, I believe BigQuery and probably several other non-Microsoft sources): Shareable Cloud Connections allow you to have multiple connections to the same data source in the Power BI Service, each using different credentials.

Some of you are going to read that last sentence and get very excited. Many of you will probably be surprised that Power BI didn’t already support this. To understand what’s going on here you first have to understand what Power BI considers a “data source”. The answer can be found on this page of the Power Query SDK docs:

The M engine identifies a data source using a combination of its Kind and Path […]

The Kind value comes from the Data Source Kind definition.

The Path value is derived from the required parameters of your data source function. Optional parameters aren’t factored into the data source path identifier.

In the case of the Snowflake connector, the “Kind” of the connector is Snowflake and the “Path” is the determined by the two required parameters in the Snowflake connector, namely the Server and the Warehouse:

Before Shareable Cloud Connections, unless you used a gateway, you could only use one connection with one set of credentials for each data source used in the Power BI Service. This meant, for Snowflake, you could only use one set of credentials for all datasets that connected to the same Server and Warehouse, which led to a variety of problems like this one where different credentials were needed for different Snowflake databases or like this one where one user would publish a dataset and enter credentials that worked for them and then a second user would publish another dataset, enter different credentials for the same Server/Warehouse combination and break refresh for the first dataset. With most other popular connectors these issues were rarer because their Paths are more specific and aligned to how you’d want to use different credentials.

As I said, Shareable Clould Connections solve all this by allowing the creation of multiple named connections to the same source, each of which can use different credentials. As a result I strongly recommend everyone using Snowflake with Power BI to create new Shareable Clould Connections and use them in the Power BI Service.

Some Thoughts On Third-Party Tools For Power BI

While I was at the Data Scotland conference in Edinburgh on Friday (great event by the way) I stopped by the Tabular Editor stand and got the nice people there to give me a demo of their new tool, DAX Optimizer. It’s currently in private beta but if you’re curious to learn more, Nikola Ilic has already blogged about it in detail here.

Rather than blog about the tool itself – there’s no point repeating Nikola’s post – I thought it would be good to answer a question someone asked me later that day about Tabular Editor and which I’m definitely going to be asked about DAX Optimizer, namely:

This looks great, but it’s expensive and it’s hard for me to get sign-off to use third-party tools like this. Why doesn’t Microsoft give me something like this for free?

Before I carry on, let me make a few things clear:

  • I work for Microsoft but these are my personal opinions.
  • I have known many of the people involved in Tabular Editor and DAX Optimizer, including Marco and Alberto, for many years and have had business relationships with them in the past before working for Microsoft.
  • I don’t endorse any non-Microsoft Power BI-related commercial tools here on my blog but I do use many of them and mention them regularly, leaving readers to draw their own conclusions. This post is not an endorsement of Tabular Editor or DAX Optimizer.

With that out of the way let me address some of the different aspects of this question.

There’s a post on the Power BI blog from 2021 here co-written by Marco Russo and Amir Netz which covers Microsoft’s official position on community and third party Power BI development tools and which is still relevant. There’s also a companion article by Marco here that’s worth reading. In summary Microsoft’s long-term goal is to provide great tools for all Power BI developers, including enterprise developers, but in the meantime our priority is to build a solid platform that other people can build these tools on. I know many of you won’t believe me but here at Microsoft we have finite development resources and we need to make difficult decisions about what we invest in all the time. We can’t build every feature that everyone wants immediately and everyone wants different features.

As a result there will always be space for free and commercial third-party tools to innovate in the Power BI ecosystem. In the same way Tabular Editor serves the enterprise tools market, the vendors in the Power BI custom visuals marketplace extend Power BI with custom visuals. There are literally hundreds of other examples I could give in different areas such as planning and budgeting and admin and governance. Why doesn’t Microsoft buy some or all of these tools? We do buy tools vendors sometimes, but I feel these tools and companies tend to fare better outside Microsoft where they can compete with each other and move quickly, and when there’s a vibrant partner ecosystem around a product then the customer is better off too.

DAX Optimizer is slightly different to Tabular Editor and these other tools though. While the tool is very sophisticated the tool itself is not the whole point; it’s like a much, much more sophisticated version of Tabular Editor’s Best Practices Analyzer feature, a feature which is available in both the free and paid versions of Tabular Editor. The real value lies in the IP inside DAX Optimizer: these aren’t just any rules, these are Marco and Alberto’s rules for optimising DAX. Anyone could build the tool, but only Marco and Alberto could write these particular rules. I guess that’s why the Tabular Editor team had these stickers on their stand on Friday:

Doesn’t Microsoft have people who are this good at DAX who could write the same rules? We do have people who know more about DAX than Marco and Alberto (namely the people who create it, for example Jeffrey Wang) and we do have people who are extremely good at performance tuning DAX (for example my colleagues Michael Kovalsky or Phil Seamark). Indeed, back in 2021 Michael Kovalsky published a free set of rules here which you can use with Best Practices Analyzer in Tabular Editor and which represent the Power BI CAT team’s best practices recommendations on DAX and modelling, so you can argue that Microsoft already does offer a free solution to the problem that DAX Optimizer is trying to solve.

Marco and Alberto are Marco and Alberto though. They have a very strong brand. Consultancy is a famously hard business to scale and this is a very clever way for them to scale the business of DAX performance tuning. If you want their help in whatever form then you’ll need to pay for it. Couldn’t Microsoft just hire Marco and Alberto? I doubt they’d say yes if we asked, and in any case the situation is the same as with buying the tools I mentioned above: I think they add more value to the Power BI ecosystem outside Microsoft than they ever could inside it.

I’ve been lucky enough to get an invitation code to test DAX Optimizer and will be doing so this week, but I deliberately wrote this post before giving it a try. It’s important for me to stay up-to-date with everything happening in the world of Power BI because the customers I work with ask for my opinion. I wish the team behind it well in the same way I wish anyone who tries to build a business on top of Power BI well; the more successful they are, the more successful Power BI and Fabric are.

Understanding The “External table is not in the expected format” Error In Power Query In Power BI And Excel

Sometimes when you’re importing data from files using Power Query in either Power BI or Excel you may encounter the following error:

DataFormat.Error: External table is not in the expected format

What causes it? TL;DR it’s because you’re trying to load data from one type of file, probably Excel (I don’t think you can get this error with any other source but I’m not sure), and actually connecting to a different type of file.

Let’s see a simple example. Say you have a folder with two files: one is an Excel file called Date.xlsx and one is a CSV file called Date.csv.

Here’s the M code for a Power Query query that connects to the Excel file and reads the data from a table in it:

let
  Source     = Excel.Workbook(File.Contents("C:\MyFolder\Date.xlsx"), null, true), 
  Date_Table = Source{[Item = "Date", Kind = "Table"]}[Data]
in
  Date_Table

Now, if you change the file path in this query – and only the file path – to point at the CSV file instead like so:

let
  Source     = Excel.Workbook(File.Contents("C:\MyFolder\Date.csv"), null, true), 
  Date_Table = Source{[Item = "Date", Kind = "Table"]}[Data]
in
  Date_Table

…you will get the “external table is not in the expected format” error shown above. This is because your code is using the Excel.Workbook M function, which is used to import data from Excel workbooks, to connect to a file that is a CSV file and not an Excel workbook. The way to fix it is to use the appropriate function, in this case Csv.Document, to access the file like so:

let
  Source = Csv.Document(
    File.Contents("C:\MyFolder\Date.csv"), 
    [Delimiter = ",", Columns = 4, Encoding = 65001, QuoteStyle = QuoteStyle.None]
  ), 
  #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true])
in
  #"Promoted Headers"

To be honest, if making this change is beyond your Power Query skills and you’re sure you’re trying to connect to the right file, you’re better off creating a completely new query rather than editing the query you already have.

Another common scenario where you might encounter this error is when you’re importing data from all the files in a folder and one of the files isn’t in the correct format. For example, let’s say you have a folder with three Excel files in and you use the Folder data source to import all the data from all three files:

Since all three files are Excel files the Folder option will work:

However, if you take a CSV file and drop it into the folder like so:

Then you’ll get the same error in Power Query:

Apart from deleting the CSV file you have another option to solve this problem in this case: filtering the folder so you only try to get data from the .xlsx files and no other file type. To do this, click on the step that is called “Source”. When you do this you’ll see that the step returns a table containing all the files in the folder you’re pointing at:

You’ll see that the table in this step contains a column called Extension which contains the file extension for each file. If you filter this table – which will insert a new step at this point in the query, which is ok – by clicking on the down arrow in the Extension column, delselecting the (Select All) option and selecting “.xlsx” so the table only contains .xlsx files then you can avoid this problem:

If, as in this example, the rogue file happens to be the first file in the folder and you’ve selected that first file to be your “sample” file when setting up the import, then you’ll also need to go to the query called Sample File in the Queries pane and make exactly the same change there (ie click on the Source step and filter to remove any non .xlsx files).

Log Out Of Power BI Automatically After A Period Of Inactivity

A common requirement from Power BI customers in highly-regulated industries is the need to log users out of Power BI if they have been inactive for a certain amount of time. If your Power BI reports contain extremely sensitive data you don’t want someone to open a report, leave their desk for lunch, forget to lock their PC and let everyone in the office see what’s on their screen, for obvious reasons. This has actually been possible for some time now with Power BI and is now supported for Fabic, so I thought I’d write a blog post to raise awareness.

The feature that makes this possible is Microsoft 365’s Idle Session Timeout, which you can read about here:

https://learn.microsoft.com/en-us/microsoft-365/admin/manage/idle-session-timeout-web-apps?view=o365-worldwide

To turn it on, a Microsoft 365 admin has to go to the M365 admin centre and Org Settings/Security & Privacy and select Idle Session Timeout. There you can set the amount of time to wait before users are logged out:

Once that is set, anyone who has Power BI open in their browser but doesn’t interact with it will see the following message after the specified period of time:

Your session is about to expire

Your organization’s policy enforces automatic sign out after a period of inactivity on Microsoft 365 web applications.

Do you want to stay signed in?

There are a few things to point out about how this works (read this for the full details):

  • You can’t turn it on for just Power BI, you have to turn it on for all supported Microsoft 365 web apps. This includes Outlook and the other Office web apps
  • You can’t turn it on for specific users – it has to be for the whole organisation
  • Users won’t get signed out if they get single sign-on into the web app from the device-joined account, or select “Stay signed in” when they log in (an option that can be hidden), or if they’re on a managed device and using a supported browser like Edge or Chrome

You’ll need to be on friendly terms with your M365 admin if you want to use this, clearly, but if you need this functionality it makes sense to enforce activity-based timeout rules for more apps than just Power BI.