Limit The Amount Of Data You Work With In Power BI Desktop Using Parameters And Deployment Pipelines

Chris Webb Power BI, Power Query, Uncategorized May 24, 2020 2 Minutes

If you’re working with large amounts of data in Power BI you may find that you have problems because:

Your pbix file is very large
You spend a long time waiting for refreshes to finish in Power BI Desktop – and if you’re developing, you may need to refresh your dataset frequently
It takes a long time to publish your dataset to the Power BI Service

Wouldn’t it be great if there was a way to work with a small subset of your data in Power BI Desktop and then, after you publish, load all the data when you refresh? The good news is that this is now possible with the new deployment pipelines feature in Power BI!

Assuming that you know the basics of how deployment pipelines work (the documentation is very detailed), here’s a simple example of how to do this. Let’s say that you want to use data from the FactInternetSales table in the Adventure Works DW 2017 SQL Server sample database in your dataset. When you import the data from this table and open the Advanced Editor to look at the M code for the query, here’s what you’ll see:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
   AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data]
in
  dbo_FactInternetSales

This query, of course, imports all the data from this table. To cut it down to a smaller size, the first thing to do is to create a new Power Query parameter (called FilterRows here) of data type Decimal Number:

Parameter

Notice that the Current Value property is set to 5. The purpose of this parameter is to control the number of rows from FactInternetSales that are loaded into the dataset. Here’s an updated version of the Power Query query above that uses this parameter:

let
  Source = Sql.Databases("MyServerName"),
  AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data],
  dbo_FactInternetSales = 
  AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data],
  FilterLogic = 
   if 
     FilterRows <= 0 
    then 
     dbo_FactInternetSales 
    else 
     Table.FirstN(
      dbo_FactInternetSales, 
      FilterRows
     )
in
  FilterLogic

A new step called FilterLogic has been added at the end of this query that implements the following logic:

If the FilterRows parameter is less than or equal to 0 then return all the rows in the FactInternetSales table, otherwise
If FilterRows is more than 0 then return that number of rows from the table

Given that the FilterRows parameter is set to 5, this means that the query now returns only the top 5 rows from FactInternetSales:

Top5

It’s important to point out that a filter like this will only make your refreshes faster if the Power Query engine is able to apply the filter without reading all the data in the table itself. In this case it can: with a SQL Server data source query folding ensures that the SQL query generated for the refresh only returns the top 5 rows from the FactInternetSales table:

SQL

Here’s a simple report with a card that shows the number of rows loaded into the table:

Top5Report

At this point you have your cut-down dataset for development in Power BI Desktop.

Next, publish this dataset and report to a workspace that is assigned to the Development slot in a deployment pipeline and then deploy them to the Test workspace:

Pipeline

Then click the button highlighted in the screenshot above to create a new dataset rule that changes the value of the FilterRows parameter to 0 when the dataset is deployed to the Test workspace:

ParameterRule

With this rule in place, when the dataset in the Test workspace is refreshed, the logic in the query above now ensures that all the data from the FactInternetSales table is loaded into the dataset. Instead of just 5 rows, the report now shows that the full 60000 rows of data have been loaded:

FullDataReport

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published May 24, 2020

20 thoughts on “Limit The Amount Of Data You Work With In Power BI Desktop Using Parameters And Deployment Pipelines”

Roberto says:

May 25, 2020 at 7:05 am

I found also useful using shuffling technique so to have a smaller but meaningful smaller dataset. The first 5 o 10,000 rows might be not representative

Loading...

Reply
Pattem Manohar says:

May 26, 2020 at 12:59 pm

Really helpful, but when “Close & Apply” from Power Query Editor. It is still refreshing all the records but ofcourse when we see the the count then it is showing based on the parameter defined. So not able to save that refresh time/publishing time in PBI Desktop. Also, if I want to handle this parameter value from a SQL Table (Using Parameter values as a Query), I’m not successful as it always taking the Current Value. Need a manual change to the parameter value in the Service as well. Is that possible to handle this from a backend table, so that when I refresh the dataset it should filter the rows (using parameter) from the value configured in a SQL Table.

Loading...

Reply
Oliver says:

May 28, 2020 at 2:45 pm

Hi Chris

I was just watching your London PUG session on privacy levels and found it typically excellent.

I have a question that came to me after the session.

The performance impacts you mentioned are for data refresh times rather than when using the report when in import mode.

But if using direct query, do (or could) these performance impacts in privacy levels affect the report response times for users?

Cheers and best wishes
Oliver

Loading...

Reply
Pingback: Limit The Amount Of Data You Work With In Power BI Desktop Using Parameters And Deployment Pipelines | Pardaan.com
Pingback: Power BI Workspace Roadmap, DirectQuery dataflows and more… (Roundup | June 1, 2020) – Datalytical
Vinay Pugalia says:

July 10, 2020 at 6:14 am

Hello Chris,

This is a good trick. However, in reality it is not as simple as explained, because with Power BI we are always dealing with relational data and picking TOPN from a fact table might not always be sufficient. We might have to create separate TOPN parameters for the different fact tables in the dataset and doing this might be a challenge as in order to generate the report & test it thoroughly, we would need different combinations of the data which we might not always get by putting a TOPN across the different tables.

The problem I have explained will mostly occur when we have Master-Child table e.g. Order & Order Details tables. If we are just picking TOP 5 Orders, picking TOPN Order Details may bring in some orphan records as well. I hope, you get my point here.

This blog certainly gives a direction to think but there is more than this to get it practically working.

Loading...

Reply
1. David Adediran says:
  
  April 12, 2021 at 3:27 pm
  
  I think for a Master table like Order, it’s best not to apply the filter. This method might require a bit of consideration on where to use it.
  
  Loading...
  
  Reply
jerry deng says:

August 10, 2020 at 4:48 pm

Not sure if anyone else experienced this. I applied the TopN technique today and after refresh, some of the data tables have their all of their columns defaulted to Text data type. I couldn’t find any answer on this online so far.

Loading...

Reply
Bharath says:

June 1, 2021 at 2:29 pm

Does it works with query folding?

Loading...

Reply
1. Chris Webb says:
  
  June 1, 2021 at 2:30 pm
  
  Yes
  
  Loading...
  
  Reply
Caron Turner says:

August 18, 2021 at 3:11 am

Hi Chris – should this work when dataflows are the source for the dataset?

Loading...

Reply
JT says:

February 2, 2022 at 8:30 am

Hi Chris,
what If you’ve made changes to the dataset (which won’t brake the query folding), such as renamed columns after importing query etc. How could I implement this filter logic in such case?

Loading...

Reply
1. Gerrit says:
  
  December 15, 2022 at 11:22 am
  
  Just add it as a step after the last step in which query folding still applies.
  
  Loading...
  
  Reply
neofly41 says:

March 7, 2022 at 2:08 am

Very Useful instructions, Thanks a lot it really helps me with oracle database importation.

Loading...

Reply
Laura Reynolds says:

June 27, 2022 at 8:25 am

Thank you for this article. How can I apply this to a date range rather than TopN e.g. I want to apply on YTD entries + prior 2 years. Data is pulling from CRM Dynamics 365 multiple tables. I am looking to apply the filter to the ‘Year’ column in the ‘Donations’ table soo that only back to 2020 is included rather than all time data.

Loading...

Reply
Mark says:

November 17, 2022 at 1:54 pm

Hi Chris.
What about limiting the amount of data downloaded from Power BI dataflows? Lets say we use dataflows as a datasource. Dataflows could contain extremely large amount of data, so what is the way to limit number of records got from dataflows?

Loading...

Reply
Pingback: Chris Webb's BI Blog: How Much Data Can You Load Into Power BI? Chris Webb's BI Blog
Andrew says:

November 22, 2022 at 8:08 pm

Will this work with custom SQL queries, even just a Group By?

Loading...

Reply
1. Anonymous says:
  
  March 18, 2024 at 6:47 am
  
  Yes working for customer query also.
  
  Loading...
  
  Reply
Pingback: Warning: The Dataset Is under Maintenance – Quant Insights Network