Is Power BI A Semantic Layer?

The short answer: yes. And it’s a metrics layer and a metrics store and it’s headless BI too for what it’s worth.

I’ve been meaning to blog about this question for a long time now, for two reasons:

  1. Customers migrating to Power BI often ask me if Microsoft also has a product or service that can act as a semantic layer, so I need to explain that Power BI is already a semantic layer.
  2. I read the deliberations of the cool kids of analytics (Benn Stancil for example, who I enjoy a lot) and get the feeling that Microsoft and Power BI inhabit a separate universe that is invisible to them. This exchange on Twitter is a prime example.

The reason I haven’t blogged about this question yet is that earlier this year Aurimas Račas wrote a truly outstanding blog post on this subject, which I strongly recommend you read:

https://aurimas.eu/blog/2022/08/metrics-layers-and-power-bi/

What more could I say? Well Aurimas’s blog post has the virtue of being impartial and I thought it would be useful adding a few thoughts from the perspective of a Microsoft insider. These opinions are my own, not official Microsoft opinions, but I don’t think any of my colleagues would disagree with them.

So is Power BI a semantic layer?

Back to the main question. Whether or not we do a good job of promoting Power BI as a semantic layer to customers we certainly think of it as one internally; I see it referred to as one internally all the time. Indeed we always thought of Power BI’s ancestors Azure Analysis Services and SQL Server Analysis Services, all through their 20+ year history, as semantic layers too – who remembers the terms BI Semantic Model or UDM?. One of the points Aurimas makes is that this is an awareness problem more than anything else: because Power BI can be used as a self-service BI tool as well as an enterprise BI tool and because more people use it as a self-service tool, the perception of it as such prevents some people from seeing it as an enterprise BI tool. On the Power BI CAT team we certainly work with a lot of large customers that use Power BI as an enterprise BI tool and semantic layer successfully: Walmart’s finance team is a greate example and their recent case study here (this older video is good too) explicitly mentions that they use Power BI as a “semantic model library” on billions of rows of data.

Preference for thin(ness)

Another great point that Aurimas makes is that the current preference in BI tools is for them to be thin layers that “delegate the computational workloads to the (cloud) databases/warehouses where the data is stored”. Back when I first started in BI the debate was between MOLAP and ROLAP and while the pendulum has swung in different directions over the years we’re still arguing over the same points with Import mode versus DirectQuery. My personal opinion is the currently unfashionable one: Import mode and the Vertipaq engine will always outperform an approach that involves generating SQL against an external database, however fast and scalable that database claims to be, for anything more than basic BI requirements (I’m watching Google Malloy with great interest though, along with whatever SQL additions Julian Hyde is working on). The official Microsoft guidance is that Import mode should be your default choice and at present, as this video by Alberto Ferrari shows, the performance differences between Import mode and DirectQuery mode are significant. As the Walmart case study referenced above mentions, you can always mix Import mode and DirectQuery mode in composite models and build aggregations if you’re working with data volumes tha are too large for Import mode alone. We are continuing to invest in improvements to DirectQuery such as Horizontal Fusion and I think that will close the gap between Import and DirectQuery a lot.

DAX or SQL?

In the same way the MOLAP vs ROLAP debate has dragged on for twenty-plus years, people have always argued whether SQL is the right language for expressing BI queries and calculations or if another language – MDX when I started, DAX today – is necessary. To be honest I think if SQL was the right choice the argument would be settled by now and we’d already have a whole ecosystem of BI products allowing you to define complex measures in SQL in a way that developers found easy to understand. Even if you accept that another language is necessary (and the people working on Google Malloy agree on that point) then there’s the question of whether DAX is a good solution to the problem or whether a different approach would be better. DAX is certainly hard to learn but that’s more because of the concepts involved rather than the syntax itself; Marco Russo’s post here is a great explanation of why DAX is simple but not easy. Since the concepts are the issue I strongly suspect that any other language that was as powerful as DAX would be just as difficult to learn. What’s more we’re working on making DAX debugging easier, for example with the EvaluateAndLog function, and making writing calculations easier with the upcoming visual calcs feature, and there are a lot of other similar improvements we could and should implement in the future.

Conclusion

Will these points change anyone’s mind? Probably not, especially since these questions are religious more than anything else: you’re either a Microsoft fan or someone who would never, ever consider using a Microsoft product; you’re either a SQL die-hard or you aren’t. Does this matter? I’ve seen Power BI’s all-conquering usage figures and I’m not sure it does. I love theoretical questions about semantic layers as much as anyone else but what really matters is whether a tool is used and providing value to businesses.

Sharing Excel Reports Connected To Power BI Datasets Or The Excel Data Model/Power Pivot Using OneDrive For Business

I love Excel as an alternative to Power BI reports, especially now that Excel Online allows you to interact with browser-based Excel reports that use Power BI datasets or the Excel Data Model (aka Power Pivot) as a source. After all, why try to recreate a PivotTable in a Power BI report when you can give your users an actual PivotTable? You can publish Excel reports to Power BI and view them there; however if you want to share these reports using OneDrive for Business, outside Power BI, you’ll need to think carefully about how you do so before you allow other people to view the report.

To illustrate the problem I used Power Pivot to load some data into the Excel Data Model, built a simple report with a PivotTable and PivotChart, saved it to OneDrive for Business and then opened it in the browser with Excel Online. As you can see it’s possible to interact with the report in the browser, changing slicers and moving fields around in the PivotTable and PivotChart field lists:

So far so good. However, an Excel file saved to OneDrive for Business is a shared document – and any changes that one user makes are immediately visible to all other users who have access with full Edit permissions. This means that if you change a slicer and someone else is viewing my report at the same time, they will see the slicer change too in real-time. This may be useful in scenarios where you are collaborating with someone on the same report and want them to see your changes, but in most cases this is not desirable behaviour for a report. Here’s an example with the same Excel report open side by side in Chrome and Edge:

How can you stop this happening? One option is to use View mode. When an Excel workbook is in View mode a user can interact with slicers and move fields around in the PivotTable without these changes affecting other users and without these changes being saved; they cannot make most other changes such as entering values in cells. If users do have full Edit permissions they can always switch to View mode using the dropdown menu in the top right corner of the screen:

You can also use the Protect Workbook option to make sure users open the workbook in View mode by default. However there is always the danger that if a user has Edit permissions they will switch to Edit mode and make unwanted changes, so as a result it’s better to share the workbook without Edit permissions using the “Can View” option described here so they can only open the workbook in View mode:

There is another option: using an Embed link. To get this, click on the File menu in Excel Online and then on the backstage view select Share and then Embed:

As the UI suggests, this option is intended for scenarios where you want to embed your workbook in another application; it’s similar to View mode but a bit more restrictive. In the Embed dialog that opens you can choose which parts of the workbook to share, whether grid lines are shown and other useful options:

Here you need to select the “Let people sort and filter” option otherwise your users won’t be able to use the slicers. Next, copy the code from the “Embed code” box and paste into your favourite text editor so you can extract the URL in the src tag – this is the URL you should give to your users. When a user opens an Excel report using a link generated in this way they won’t see the Excel ribbon at all; they can interact with slicers but can’t get access to the PivotTable fields pane and so won’t be able to move fields around, for example changing what is on rows or columns. I think this gives the best experience for end users who just want to view an Excel report in the browser.

Note that this does not replace normal OneDrive for Business sharing and permissions though: you’ll still need to share the report with your users, ideally with “Can View” permissions as described above. If your Excel report connects back to a Power BI dataset you also need to ensure that your users have the appropriate permissions on that dataset, which means they’ll either need to be at least a Member in the workspace the dataset is stored in or have Build permissions.

What’s New In Analysis Services 2022?

There was a time when a new release of SQL Server – and therefore a new release of SQL Server Analysis Services – was the most exciting thing in the world for me. New functionality! New things to blog about! Not so now that my focus, and Microsoft’s, is on Power BI and we get cool new functionality there every month. All the same there are still a lot of people running SSAS on-premises and SQL Server 2022 has just been released, so what’s new and is it worth upgrading?

There’s nothing about Analysis Services in the SQL Server 2022 GA announcement blog post, but you can find a list of what’s new here:

https://learn.microsoft.com/en-us/analysis-services/what-s-new-in-sql-server-analysis-services?view=asallproducts-allversions

Most of the items listed here are performance optimisations, most of which have been available in Power BI and Azure Analysis Services for some time now (although we haven’t got parallel execution plans for DirectQuery in Power BI just yet 😉). Probably the most important in my opinion is MDX Fusion, the main effect of which is to improve the performance of Excel PivotTables and cube-function-based reports connected to SSAS Tabular – I saw some cases where MDX queries ran a lot faster when this rolled out for Power BI. All the features are applicable to SSAS Tabular although some are applicable to SSAS Multidimensional too; there are also a few other minor optimisations that aren’t listed. The new cloud-billing model announced here is only applicable to the SQL core engine and not to SSAS, SSRS or SSIS.

There are no deprecated features but Multidimensional’s data mining features and PowerPivot for SharePoint are now officially discontinued (which means that they are now no longer supported – see the definition of “discontinued” here).

As a Microsoft employee, obviously I’m going to say you should upgrade to SQL Server 2022. As a member of the Power BI product group I would add that you should also consider migrating all your on-prem SSAS Tabular models to Power BI Premium if you can: Power BI Premium is the strategic direction for enterprise BI as well as self-service BI and that’s where all the investment is going from now on. Don’t think about migrating to Azure Analysis Services instead – we’re already encouraging people to migrate from AAS to Premium! My colleague Dan English just posted a great walkthrough of the new AAS to Premium migration experience here, which is worth checking out.

Migration from SSAS Multidimensional to Power BI is a much more difficult task. You’ll need to rebuild your existing cubes and calculations from scratch manually in Power BI (there are no tools to automate migration because it isn’t possible to build them). Simple cubes should be easy to rebuild; more complex cubes, for example those with parent/child hierarchies, custom rollups or SCOPE statements for example, will be much more difficult to migrate and you may need to accept that you can’t reproduce some functionality exactly. You can always run SSAS Multidimensional in a virtual machine in Azure if you need to move to the cloud and there are VM images to make that easy.

Web.Contents, Text Concatenation And Dataset Refresh Errors In Power BI/Power Query

One of the most popular topics I’ve ever written about here is the dataset refresh errors you get with dynamic data sources and the Web.Contents function in Power BI (see here and here for example). Recently I learned about an interesting new detail regarding this problem that I thought was worth blogging about, and which may help someone trying to troubleshoot it.

Consider the following M query, used as the source of a table in a Power Query dataset:

let
  Term = "apples",
  Source = Json.Document(
    Web.Contents(
      Text.Combine(
        {
          "https://data.gov.uk/api/3/action/package_search?q=",
          Term
        }
      )
    )
  ),
  CallSucceeded = #table(
    type table [CallSucceeded = logical],
    {{Source[success]}}
  )
in
  CallSucceeded

[The API used here is public and doesn’t require authentication so you can run this query yourself]

This query refreshes successfully in Power BI Desktop but when you publish to the Power BI Service and go to the dataset Settings page you see the following error:

This dataset includes a dynamic data source. Since dynamic data sources aren’t refreshed in the Power BI service, this dataset won’t be refreshed. Learn more: https://aka.ms/dynamic-data-sources. Data source for Query1

Why is this a case of a dynamic data source? Unlike the examples in my previous post there is only one, known, value being passed to the API, but nevertheless the use of the Text.Combine M function is enough to prevent the Power Query engine from doing the static analysis it needs to do to allow refresh to take place. As documented here, the Power Query engine needs to be able to work out the URL it’s going to call before the query is run and in this case it can’t.

One way of solving this problem, the way I knew would work, is to use the Query option with Web.Contents like so:

let
  Term = "apples",
  Source = Json.Document(
    Web.Contents(
      "https://data.gov.uk/api/3/action/package_search",
      [Query = [q = Term]]
    )
  ),
  CallSucceeded = #table(
    type table [CallSucceeded = logical],
    {{Source[success]}}
  )
in
  CallSucceeded

What I didn’t know – or rather I did, but didn’t really understand – is that if all you want to do is concatenate text to build your URL you can use the & operator rather than Text.Combine and it will work because Power Query is smart enough to understand that during static analysis. So for example:

let
  Term = "apples",
  Source = Json.Document(
    Web.Contents(
      "https://data.gov.uk/api/3/action/package_search?q="
        & Term
    )
  ),
  CallSucceeded = #table(
    type table [CallSucceeded = logical],
    {{Source[success]}}
  )
in
  CallSucceeded

…will also successfully refresh in the Power BI Service because it uses the & operator. This is the way I would naturally write the code and I’m sure I’ve done this hundreds of times in the past without realising that concatenating text any other way will lead to errors.

[Thanks to Curt Hagenlocher for this information]

How Much Data Can You Load Into Power BI?

Power BI can handle large data volumes, but just how much data can you load into Power BI? Can Power BI handle big data? How big is “big” anyway? These are questions you may have when you’re starting out on a new Power BI project and the answers can be hard to find. Indeed the answer in most cases is “it depends”, which isn’t very helpful. In this post I will try to explain the various limits on Power BI dataset size and how you can know if you’re likely to hit them.

Power BI storage modes

First of all, what does it even mean to “load data into” Power BI? There are two fundamental ways that Power BI can work with data: Import mode and DirectQuery mode. Different tables in your Power BI dataset can have different storage modes; there’s also an option called Dual mode where a table can switch between Import mode and DirectQuery mode depending on the circumstances.

In DirectQuery mode no data is stored in your Power BI dataset and when your report renders and queries are sent to the dataset, the dataset sends queries back to the data source to get data on demand. This means the only limits on the amount of data you can work with are the limits set by your datasource.

In Import mode Power BI stores a copy of your data inside the dataset in its own internal database engine, known as either the Vertipaq engine or the Analysis Services engine. Import mode is the default storage mode and for good reason – it will almost always give you the best performance for your reports and it allows you to use all Power BI functionality. The “how much data?” question is only relevant to Import mode because when you use it Power BI imposes various limits on the amount of data that can be stored inside a dataset.

How big is your dataset?

In order to answer the question of “how much data can you load into Power BI?” you need to know how to measure the size of your dataset. There are various different ways of doing this but the best way is to install a free, community-developed tool called DAX Studio: its Model Metrics feature shows you the total size of your dataset on the Summary tab:

This figure is good enough for most purposes. It’s not quite that simple though, and this video by Marco Russo and Patrick Leblanc has a great explanation of all the different ways of measuring dataset size and what the figures really represent.

Can you predict how large your dataset is if you know how much data you have in your data source?

You have to import all your data into your dataset to find out how large your dataset is. But can you work out how large your dataset will be before you do this? The answer is no. You can be pretty sure that the size of your dataset will be smaller than the size your data is in your data source but it could only be a bit smaller or it could be a lot smaller, maybe even just 10-20% of the original size. This is because Power BI compresses your data very efficiently when you import it. What’s more, how you model your data can have a big impact on how well the compression works and making a few changes can result in a much smaller dataset as, for example, this post by Nikola Ilic shows.

How much data can you load into Power BI Desktop?

The only practical limit on the amount of data you can load into Power BI Desktop is the amount of memory you have on your PC; you’ll need at least 16GB of RAM, ideally 32GB, to get the best experience. However Power BI Desktop is just a development tool – you’ll need to publish to the Power BI Service for other people to view your report and that’s where the limits take effect. What’s more there’s a 10GB limit on the size of a dataset you can publish to the Power BI Service from Power BI Desktop, although you can have datasets much larger than that in Power BI Premium.

To be honest you should never be working with anything like 10GB of data in Power BI Desktop anyway: the file size will be huge, saving will be slow and you’ll spend a long time waiting for data to import while you’re development. What you should do is work with a small subset of your data in Desktop and only load the full volume after you’ve published. You can do this in a few different ways, for example using incremental refresh or by using M parameters to limit the amount of data you load in Desktop and then changing the parameter after publishing, as I showed here.

How much data can you load in the Power BI Service if you’re using Shared capacity (aka “Power BI Pro”)?

If you are not using Power BI Premium or Premium Per User, when you publish to the Power BI Service you are using Shared capacity (often called “Power BI Pro” by Power BI users because you only need a Power BI Pro licence to use it). The maximum size of a dataset in Shared capacity is 1GB; if you hit that limit you’ll get the “Unable to save the changes” error I blogged about here. There is one exception though: as mentioned here, if you use the rather obscure option in the Power BI Service to upload an Excel workbook with data in the Excel Data Model/Power Pivot, then the resulting dataset is limited to 250MB.

How much data can you load in the Power BI Service if you’re using Power BI Premium or Premium Per User (PPU)?

The default maximum dataset size in a Power BI Premium capacity or PPU is 10GB, but if you turn on the Large Dataset storage format you can have datasets larger than that and the maximum size depends on the amount of memory available in your Premium capacity. The “Max memory per dataset” column in the table here shows the maximum amount of memory available in each Premium or Power BI Embedded capacity SKU to an individual dataset; the maximum amount of memory available per dataset in PPU is 100GB. However the maximum amount of memory available to a dataset is not the maximum size of a dataset: apart from the memory used to store the data for a dataset, more memory will be needed when the dataset is queried or refreshed. If you do a full refresh of a dataset it may require almost double the amount of memory needed to store the dataset itself; incremental refresh may require less memory. You can use the Premium Capacity Metrics App to see how your dataset’s memory usage changes over time.

Other Import mode limits

There are a few other limitations of Import mode that should be mentioned. As documented here you can only have 1,999,999,997 distinct values in a single column and there is a limit of 16,000 on the combined total number of tables and columns in those tables – but if you have that many tables and columns you have definitely made a mistake with how you model your data. Also, some DAX functions such as Median() only work on tables with less than 2 billion rows in them, as I blogged here.

Power BI can load all my data, but will my reports be fast enough to use?

Loading all your data into Power BI is one thing but what really matters is whether the reports you build are fast enough to use. In my experience performance problems relating to data volume are not as common as performance problems caused by inefficient DAX in measures or poor modelling decisions, although they do occur – for example distinct counts on columns with a very large number of unique values can be slow. If you’re following best practices you’re unlikely to encounter them unless you’re working with the kind of data volumes that require Premium. DAX Studio’s Server Timings feature can help you understand why your queries are slow and whether data volumes are an issue.

This is a lot to take in – what does it all mean?

Let me finish up by making a few broad generalisations about how much data you can load into Power BI Import mode. Assuming you have followed all the best practices around modelling data you should be able to work with tables with up to a few million rows, probably tens of millions of rows, in Shared capacity and tables with up to a few billion rows in Premium. If your data source is Excel then Power BI can definitely handle the amount of data you have; if you are working with a relational database like SQL Server then it’s still very likely Import mode will work; even if you’re working with hundreds of gigabytes of data or more in a source like Azure Synapse, Snowflake or BigQuery then Import mode may still be ok. You will need to know Power BI very well to get good performance in Import mode with the largest data volumes but it is certainly possible and I know of several customers that are doing it.

Calling The Power BI REST API From TSQL In Azure SQL DB

Something cool has just been announced for Azure SQL DB: the ability to call a limited number of REST APIs direct from TSQL. The announcement is here:

https://devblogs.microsoft.com/azure-sql/azure-sql-database-external-rest-endpoints-integration-public-preview/

There are code samples here:

https://github.com/Azure-Samples/azure-sql-db-invoke-external-rest-endpoints

And guess what, the Power BI REST API is one of the APIs you can call! This means, for example, you can run DAX queries via the Execute Queries endpoint (bearing in mind all the limitations) from TSQL:

https://github.com/Azure-Samples/azure-sql-db-invoke-external-rest-endpoints/blob/main/power-bi.ipynb

I think this opens up a lot of possibilities. What do you think? Leave a comment!

Why You Should Defragment Your Fact Tables If You’re Using Incremental Refresh In Power BI Premium

A lot of people are building large datasets in Power BI Premium nowadays, and if you’re building a large dataset you’re very likely to be using Power BI’s incremental refresh functionality with your fact tables or manually creating and refreshing partitions in them. If so, you should occasionally do a special kind of refresh to shrink the dictionaries on your fact tables.

To illustrate why you need to do this I created a Power BI dataset with one table containing one column, where each value was a GUID (ie a unique text value). Here’s the M I used to generate it:

let
    Source = {1..10000000},
    Output = #table(type table [MyGUID=text], 
        List.Transform(
            Source, 
            each {Text.NewGuid()})
    )
in
    Output

After publishing the dataset I copied the original partition so there were two partitions in the table, each of which contained ten million rows. When you use incremental refresh in a table Power BI automatically generates and manages partitions for you in the background – and there isn’t much difference between doing this and creating partitions manually like I’m doing here.

Here’s what DAX Studio’s Model Metrics functionality showed when I first refreshed the dataset:

The Dictionary column is the column to focus on in this table. Dictionaries are the data structures created by Power BI’s storage engine, the Vertipaq engine, to hold all the distinct values in a column – the more distinct values there are, the larger the dictionary. As Marco points out here the DMVs that DAX Studio uses to measure dictionary size don’t always return consistent or accurate results but the values it returns are good enough for the purposes of this test.

I then did several Full refreshes on just one of the two partitions. Here’s what DAX Studio showed at this point:

As you can see, the size of the dictionary for the MyGUID column (and therefore the overall size of the dataset) has grown a lot – about 1GB – even though the cardinality of the column has stayed the same. This is because dictionaries are held at the table level, not at the partition level, so by refreshing just one partition I was adding 10 million new text values to the dictionary without deleting any of the old values that has been added to the dictionary by previous refreshes and which were no longer there in the table.

Finally, I ran the following Refresh command against the workspace’s XMLA Endpoint from SQL Server Management Studio:

{
  "refresh": {
    "type": "defragment",
    "objects": [
      {
        "database": "TableWithManyDistinctValues",
        "table": "MyTable"
      }
    ]
  }
}

[Note that XMLA Endpoints are only available if you’re using Premium or PPU]

This refresh is of type defragment and it is documented here. This is the explanation of what it does from the docs:

Defragment the data in the specified table. As data is added to or removed from a table, the dictionaries of each column can become polluted with values that no longer exist in the actual column values. The defragment option will clean up the values in the dictionaries that are no longer used.

After that you can see the dictionary for the table, and therefore the size of the dataset, was back to around its original size:

This is an extreme example of course, but I can imagine something nearly this bad happening in the real world if you use incremental refresh on a table with a text column containing addresses or user comments and if you don’t keep much history. This isn’t new behaviour or functionality: this happens in Analysis Services Tabular models too and is discussed in books like Marco and Alberto’s “Tabular Modelling in Microsoft SQL Server Analysis Services” but not many people are aware it still applies to Power BI datasets. This type of refresh can be quite slow but it is likely to be faster than the only other option (which is the only option if you’re not using Premium) which is to run a full refresh on the dataset.

So why should you run a refresh of type defragment on your dataset? There are two reasons:

  1. If you’re getting close to the size limits imposed by the Premium capacity SKU you’re using, and you want to make sure that you don’t hit them.
  2. Large datasets generally result in slower query performance, so reducing the size of your dataset may result in faster queries.

How often should you do this? In most cases not very often; it depends on how much your dataset shrinks when you do defragment. Once a year may be sufficient and I doubt anyone would need to do it more than once a month. One way to automate this would be using Power Automate to call the Enhanced Refresh API, as I showed in this series of posts.

Performance Tuning Table Visuals With Filters Applied In Power BI

Do you have a Power BI report with a table or a matrix on it, where there is a filter on the rows of the table? It’s a very common scenario indeed. Is the table or matrix slow to render? If so, this post is for you!

Consider the following table visual:

There are four columns: Date, Town and two measures. One measure called [Fast Measure] is, as the name suggests, very quick to execute; the other measure, called [Slow Measure], is very complex and slow. The definitions are irrelevant here. Notice that there is a filter on this table visual so only the rows where [Fast Measure] is greater than 1 are shown.

If I measure the amount of time to render this table in Performance Analyzer, it takes around 17.5 seconds to run. However, if I remove the filter on [Fast Measure], the table only takes 8 seconds to run. Why? The filter is on the fast measure and surely more rows are returned without the filter, so wouldn’t the slow measure be evaluated more?

It turns out that the DAX generated for tables with filters applied isn’t as well-optimised as it could be. Yes, I work on the Power BI team and yes this is something that should be addressed, but before that happens I thought it would be good to make as many people aware of this as possible so they can tune their reports. I have worked with a lot of customers recently who are running into this problem without realising it.

Here’s the DAX query for the table without the filter:

DEFINE
  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Date'[Date],
      'Property Transactions'[Town],
      "Fast_Measure", 'Property Transactions'[Fast Measure],
      "Slow_Measure", 'Property Transactions'[Slow Measure]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Date'[Date], 1, 'Property Transactions'[Town], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Date'[Date], 'Property Transactions'[Town]

Here’s the DAX query for the table with the filter:

DEFINE
  VAR __ValueFilterDM0 = 
    FILTER(
      KEEPFILTERS(
        SUMMARIZECOLUMNS(
          'Date'[Date],
          'Property Transactions'[Town],
          "Fast_Measure", 'Property Transactions'[Fast Measure],
          "Slow_Measure", 'Property Transactions'[Slow Measure]
        )
      ),
      [Fast_Measure] > 1
    )

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Date'[Date],
      'Property Transactions'[Town],
      __ValueFilterDM0,
      "Fast_Measure", 'Property Transactions'[Fast Measure],
      "Slow_Measure", 'Property Transactions'[Slow Measure]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Date'[Date], 1, 'Property Transactions'[Town], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Date'[Date], 'Property Transactions'[Town]

Notice how, in this second query, the filter is applied in an additional variable called __ValueFilterDM0. The side effect of implementing the filter in this way is that all the measures in the table, including [Slow Measure], are evaluated twice. This explains why, in this example, the filtered table is twice as slow as the unfiltered table. The performance of your table may be different: it depends on the number of measures, their definition, the nature of the filter and many other factors. Measuring the performance of your table with and without the filter applied will tell you how much of an impact this issue is having in your report.

What can you do about this? One solution is to apply the filter inside the measures rather than on the visual. In this case, creating two new measures with the following definitions:

Fast Measure Filtered =
VAR f = [Fast Measure] 
RETURN IF ( f > 1, f )

Slow Measure Filtered =
IF ( [Fast Measure] > 1, [Slow Measure] )

..and then using these measures on columns instead of the original measures, removing the visual-level filter, like so:

In my example this results in the table rendering in 8 seconds again, but again your mileage may vary. What’s more I’ve turned off totals in the table and therefore avoided the problem of making these measures return the same totals as the totals in the original filtered table. There may be other solutions (I suspect calculation groups would be worth investigating) that work better in your dataset and report.

Are you affected by this problem? How much impact does it have? Do you have a different solution? Let me know in the comments…

Understanding The “Unable to save the changes since the new dataset of size” Error In Power BI

If you’re refreshing a Power BI dataset that contains a large amount of data you may encounter the following error:

Data source error Unable to save the changes since the new dataset of size 1120 MB exceeds the limit of 1024 MB. If the dataset is hosted on a PowerBI Premium capacity, you may also reach out to your capacity administrator to see if the limit can be adjusted.

You’ll get this error at the end of a refresh if the size of your dataset is larger than the size allowed by Power BI. The maximum size of a dataset varies depending on whether you are using Shared capacity (commonly known as “Pro”) or Premium capacity:

  • In Shared/Pro the maximum size of a dataset is 1GB (ie the 1024MB from the error message shown above), as documented here.
  • In Premium:
    • The default maximum size of a dataset is 10GB if the “Large dataset storage format” option (documented here and here) is turned off for your dataset.
    • If the “Large dataset storage format” option is turned on, the maximum size of your dataset depends on the amount of memory available in the SKU of the Power BI Premium capacity you’re using. There’s a table in the documentation that tells you how much memory is available for each SKU: look in the “RAM per dataset” column and also note what it says in the footnotes about the maximum size of a dataset being slightly less than the amount of memory listed.

What can you do if you run into this error? You have two choices:

  1. If you’re using a Power BI Pro licence, buy Premium Per User licences or buy Premium capacity and move your dataset into a workspace on PPU or Premium. If you’re already on Premium, upgrade your capacity and make sure that the “Large dataset storage format” option is turned on for your dataset. This will increase the dataset size limits.
  2. Reduce the size of your dataset so it fits within the limits.

Option #2 is likely to be the preferred option in most cases. There are a lot of detailed blog posts and articles out there on how to reduce the size of your dataset (including this one in the docs) but basically you should use DAX Studio’s Model Metrics feature to see what’s taking up the most space in your dataset then delete unnecessary tables and columns, try reduce the number of distinct values in each column, and reduce the number of rows you’re loading into each table (perhaps by reducing the amount of historic data). Quite often a few small changes can significantly shrink the size of your dataset and also help refresh times and query performance.

New Web.Headers M Function

[This post was originally published on the official Power Query blog, which has now been taken down. I’m republishing all my posts there to this blog to ensure the content remains available.]

You may have noticed a new M function was added to Power Query recently: Web.Headers. This function allows you to make HEAD requests to web services – the existing Web.Contents function, which it resembles closely, only allows you to make GET and POST requests. You can use the Web.Headers function to return the HTTP headers that would be returned if the URL passed to it was instead used to make a GET request.

For example, the following M query returns a record with the HTTP headers returned by a HEAD request to the URL”https://bing.com/search?q=Power+Query” :

let
    searchText = "Power Query",
    output = Web.Headers(
                "https://www.bing.com",
                [
                    RelativePath = "search",
                    Query = [q = searchText]
                ]
            )
in 
    output

The metadata of the record returned contains the HTTP response status code for the request. For example:

let
    searchText = "Power Query",
    output = Web.Headers(
                "https://www.bing.com",
                [
                    RelativePath = "search",
                    Query = [q = searchText]
                ]
            )
in 
    Value.Metadata(output)

…returns a record with a record containing the field Response.Status: