Limit The Amount Of Data You Work With In Power BI Desktop Using Parameters And Deployment Pipelines

If you’re working with large amounts of data in Power BI you may find that you have problems because:

  • Your pbix file is very large
  • You spend a long time waiting for refreshes to finish in Power BI Desktop – and if you’re developing, you may need to refresh your dataset frequently
  • It takes a long time to publish your dataset to the Power BI Service

Wouldn’t it be great if there was a way to work with a small subset of your data in Power BI Desktop and then, after you publish, load all the data when you refresh? The good news is that this is now possible with the new deployment pipelines feature in Power BI!

Assuming that you know the basics of how deployment pipelines work (the documentation is very detailed), here’s a simple example of how to do this. Let’s say that you want to use data from the FactInternetSales table in the Adventure Works DW 2017 SQL Server sample database in your dataset. When you import the data from this table and open the Advanced Editor to look at the M code for the query, here’s what you’ll see:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
   AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data]
in
  dbo_FactInternetSales

This query, of course, imports all the data from this table. To cut it down to a smaller size, the first thing to do is to create a new Power Query parameter (called FilterRows here) of data type Decimal Number:

Parameter

Notice that the Current Value property is set to 5. The purpose of this parameter is to control the number of rows from FactInternetSales that are loaded into the dataset. Here’s an updated version of the Power Query query above that uses this parameter:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
  AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data],
  FilterLogic = 
   if 
     FilterRows <= 0 
    then 
     dbo_FactInternetSales 
    else 
     Table.FirstN(
      dbo_FactInternetSales, 
      FilterRows
     )
in
  FilterLogic

A new step called FilterLogic has been added at the end of this query that implements the following logic:

  • If the FilterRows parameter is less than or equal to 0 then return all the rows in the FactInternetSales table, otherwise
  • If FilterRows is more than 0 then return that number of rows from the table

Given that the FilterRows parameter is set to 5, this means that the query now returns only the top 5 rows from FactInternetSales:

Top5

It’s important to point out that a filter like this will only make your refreshes faster if the Power Query engine is able to apply the filter without reading all the data in the table itself. In this case it can: with a SQL Server data source query folding ensures that the SQL query generated for the refresh only returns the top 5 rows from the FactInternetSales table:

SQL

Here’s a simple report with a card that shows the number of rows loaded into the table:

Top5Report

At this point you have your cut-down dataset for development in Power BI Desktop.

Next, publish this dataset and report to a workspace that is assigned to the Development slot in a deployment pipeline and then deploy them to the Test workspace:

Pipeline

Then click the button highlighted in the screenshot above to create a new dataset rule that changes the value of the FilterRows parameter to 0 when the dataset is deployed to the Test workspace:

ParameterRule

With this rule in place, when the dataset in the Test workspace is refreshed, the logic in the query above now ensures that all the data from the FactInternetSales table is loaded into the dataset. Instead of just 5 rows, the report now shows that the full 60000 rows of data have been loaded:

FullDataReport

 

19 thoughts on “Limit The Amount Of Data You Work With In Power BI Desktop Using Parameters And Deployment Pipelines

  1. I found also useful using shuffling technique so to have a smaller but meaningful smaller dataset. The first 5 o 10,000 rows might be not representative

  2. Really helpful, but when “Close & Apply” from Power Query Editor. It is still refreshing all the records but ofcourse when we see the the count then it is showing based on the parameter defined. So not able to save that refresh time/publishing time in PBI Desktop. Also, if I want to handle this parameter value from a SQL Table (Using Parameter values as a Query), I’m not successful as it always taking the Current Value. Need a manual change to the parameter value in the Service as well. Is that possible to handle this from a backend table, so that when I refresh the dataset it should filter the rows (using parameter) from the value configured in a SQL Table.

  3. Hi Chris

    I was just watching your London PUG session on privacy levels and found it typically excellent.

    I have a question that came to me after the session.

    The performance impacts you mentioned are for data refresh times rather than when using the report when in import mode.

    But if using direct query, do (or could) these performance impacts in privacy levels affect the report response times for users?

    Cheers and best wishes
    Oliver

  4. Hello Chris,

    This is a good trick. However, in reality it is not as simple as explained, because with Power BI we are always dealing with relational data and picking TOPN from a fact table might not always be sufficient. We might have to create separate TOPN parameters for the different fact tables in the dataset and doing this might be a challenge as in order to generate the report & test it thoroughly, we would need different combinations of the data which we might not always get by putting a TOPN across the different tables.

    The problem I have explained will mostly occur when we have Master-Child table e.g. Order & Order Details tables. If we are just picking TOP 5 Orders, picking TOPN Order Details may bring in some orphan records as well. I hope, you get my point here.

    This blog certainly gives a direction to think but there is more than this to get it practically working.

    1. I think for a Master table like Order, it’s best not to apply the filter. This method might require a bit of consideration on where to use it.

  5. Not sure if anyone else experienced this. I applied the TopN technique today and after refresh, some of the data tables have their all of their columns defaulted to Text data type. I couldn’t find any answer on this online so far.

  6. Hi Chris,
    what If you’ve made changes to the dataset (which won’t brake the query folding), such as renamed columns after importing query etc. How could I implement this filter logic in such case?

  7. Very Useful instructions, Thanks a lot it really helps me with oracle database importation.

  8. Thank you for this article. How can I apply this to a date range rather than TopN e.g. I want to apply on YTD entries + prior 2 years. Data is pulling from CRM Dynamics 365 multiple tables. I am looking to apply the filter to the ‘Year’ column in the ‘Donations’ table soo that only back to 2020 is included rather than all time data.

  9. Hi Chris.
    What about limiting the amount of data downloaded from Power BI dataflows? Lets say we use dataflows as a datasource. Dataflows could contain extremely large amount of data, so what is the way to limit number of records got from dataflows?

Leave a Reply to RobertoCancel reply