Refreshing Hidden Tables In Power BI Semantic Models With Fabric Data Pipelines

Following on from my recent post about refreshing semantic models with Fabric Data Pipelines and the semantic model refresh activity, a few people asked me how to refresh hidden tables because they are not displayed in the Pipeline configuration UI. I got the answer from my colleague Alex Powers (aka reddit celebrity u/itsnotaboutthecell) who kindly allowed me to blog about it.

To demonstrate how to do this, I created a semantic model with two tables: one visible, called VisibleTable, and one hidden, calledHiddenTable.

I then published the semantic model, created a Data Pipeline and added a semantic model refresh activity; selected the connection, workspace and semantic model; waited for the Table(s) dropdown to populate (yes I know it’s slow, we’re working on it):

…and then, when it loaded, noted that only the visible table was shown in the dropdown:

I didn’t select anything and instead clicked “Add dynamic content” to use an expression to select the table instead:

Then in the Pipeline expression builder I entered the following:

@json('
    [
        {
            "table":"HiddenTable"
        }
    ]
')

Having done this I ran the Pipeline and just the hidden table was refreshed. Easy!

The expression needs to be a JSON array of table and partition names. Here’s an example showing how to refresh the table called HiddenTable and the sole partition of the table called VisibleTable (which also happens to be called VisibleTable) in the same refresh:

@json('
    [
        {
            "table":"HiddenTable"
        },
        {
            "table": "VisibleTable",
            "partition": "VisibleTable"
        }
    ]
')

It’s useful to know how to construct the expression even if you don’t need to refresh hidden tables – for example, you might want to dynamically generate the list of tables or partitions to refresh with an expression.

Monthly Power BI Semantic Model Refreshes With Fabric Data Pipelines

I’m sure you already know how to configure scheduled refresh for your semantic models in Power BI. While the options you have for controlling when refresh takes place are generally good enough – you can configure daily or weekly refreshes and set up to eight times a day for refreshes to take place – there are some scenarios it doesn’t work for, such as monthly refreshes. Up to now the workaround has been to use Power Automate to trigger refreshes (see here for an example) or to call the refresh API from another application. Now, with Fabric, you have a much better option for scheduling refreshes: Data Pipelines.

The semantic model refresh activity in Fabric Data Pipelines was released last year and at the time all the focus was on the extra control it gives you over what happens in a refresh: it allows you to refresh individual tables or partitions and control the amount of parallelism, for example; it also allows you to schedule your refresh after other ETL operations, which in Fabric will probably also be scheduled using Pipelines, have completed. What I want to draw your attention to is the fact that Fabric Data Pipelines use the new Fabric scheduler which offers more flexibility for controlling when they run.

There’s some documentation here on how to schedule a Data Pipeline run but it’s very straightforward to use. I created a Fabric Pipeline consisting of a single semantic model refresh activity like so:

..then hit the Schedule button on the toolbar, selected “Monthly” on the Repeat dropdown and configured it to run on the first Sunday of the month:

Apart from the option to run on the Nth instance of a given day of the week you can also run the Data Pipeline on a given day number of the month; you can also run every N months, add up to 10 times to run per day, and set a start and end date.

There are two other options for scheduling that aren’t available in the scheduler for semantic models: the ability to run the Data Pipeline every N hours or minutes.

Be warned though: refreshing your semantic model every few minutes is dangerous because it can result in excessive CU consumption on your capacity and maybe even throttling if you’re not careful.

The same options to run a Data Pipeline daily and weekly that exist in the scheduler for semantic models with one notable limitation: the semantic model scheduler allows you to specify up to 48 times to refresh every day for models stored on a Premium/Fabric capacity, whereas the Fabric scheduler used by Pipelines only allows you to specify 10 times per day.

Of course you need a capacity to be able to use Fabric Data Pipelines but orchestration activities only cost 0.0056CU hours per activity run, so using Pipelines to refresh a semantic model in this way will only use a tiny fraction of even the smallest capacity’s resources.

Even if you don’t think you’re interested in anything that Fabric offers beyond Power BI, it’s features like this that, in my opinion, still make it worthwhile to flip the switch to enable Fabric and make your life as a Power BI developer easier.

Reading Delta Metadata In Power Query

There’s a new M function rolling out now that allows you to read metadata from Delta tables (at the time of writing it’s available in Dataflows Gen2 and will be available soon in Desktop). It builds on the DeltaLake.Table M function that allows you to read data from Delta tables and is similar to the Parquet.Metadata function that was released last year. Here’s an example of how to use it to get metadata from a Delta table in OneLake:

let
  Source = AzureStorage.DataLake(
    "https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/inserttablenamehere", 
    [HierarchicalNavigation = true]
  ), 
  ToDelta = DeltaLake.Metadata(
    DeltaLake.Table(Source)
  )
in
  ToDelta

The function returns a table of records containing the metadata from the Delta table such as the schema, how the table is partitioned, and whether the table is V-Ordered or not:

Changing The Source Lakehouse Of Power BI Direct Lake Models In Deployment Pipelines

If you’re using deployment pipelines with Direct Lake semantic models in Power BI you’ll have found that when you deploy your model from one stage to another by default the model still points to the Lakehouse it was originally bound to. So, for example, if you deploy your model from your Development stage to your test stage, the model in the Test stage still points to the Lakehouse in the Development stage. The good news is that you can use the deployment rules feature of deployment pipelines to make sure the model in the Test stage points to a Lakehouse in the Test stage and in this post I’ll show you how.

To illustrate how to do this I created a workspace to represent a dev environment with a Lakehouse containing one table, called MyTable, containing the following data:

I then created a second workspace to represent a test environment with a Lakehouse containing a table with the same name and the same schema but with different data:

Then, back in the dev workspace I created a custom semantic model pointing to the mydata table and built a report on it to show the data:

Here’s what the dev workspace looked like:

I then created a deployment pipeline, assigned the two workspaces to the Development and Test stages of it, and configured it so that the semantic model and report would be deployed from Development to Test:

I then did a deployment and, as mentioned, although the report and the semantic model were moved to the Test stage workspace, the model still pointed to the Lakehouse in the Development stage workspace so the report showed the same data as before:

To fix this I created a deployment rule using the “Data source rules” option. This is where things get tricky. When you create the rule you can see that the custom model on the From side has a Database and a Server property to know which Lakehouse it is pointing to:

This means that on the To side you need to choose the “Other” option in the dropdown box and enter values for the Database and Server:

What do these properties mean?

  • The Database property is the ID of the SQL Endpoint of the Lakehouse the model is pointing to and you can find it by opening the SQL Endpoint UI of the Lakehouse in the browser and copying it from the part of the URL that immediately follows “lakehouses”:
  • The Server property is the SQL connection string of the SQL Endpoint, which can be found by clicking the Gear icon on the menu in the SQL Endpoint UI to open the Settings pane:

I changed the settings in the data source rule to the values from the Lakehouse in the Test workspace and then redeployed. After doing that the report in my Test workspace showed the data from the Lakehouse in the Test workspace (note: I had to manually refresh the report first to do this to flush out stale data from the cache):

Send Data From Power Automate To Fabric Using Eventstreams

Fabric’s Real-Time Intelligence features are, for me, the most interesting things to learn about in the platform. I’m not going to pretend to be an expert in them – far from it – but they are quite easy to use and they open up some interesting possibilities for low-code/no-code people like me. The other day I was wondering if it was possible to send events and data from Power Automate to Fabric using Eventstreams and it turns out it is quite easy to do.

Here’s a really simple illustration. I created a new Eventstream in a Fabric workspace, then clicked the “Use custom endpoint” option and published the Eventstream so it was ready to send data to:

Then, in Power Automate, I built a super-simple flow with a “Manually trigger a flow” trigger and then the “Send event” action from the Event Hubs connector:

This works because you can send data from Power Automate to the Eventstream custom endpoint using the Event Hub protocol. To set up the connection to the Eventstream in Power Automate I entered a name for the connection, chose the “Access Key” authentication type and then copied the “Connection string-primary key” value from the Eventstream custom endpoint, shown in the first screenshot above (you need to click the eye icon next to it in order to copy it to the clipboard) and pasted it into the “Connection String” property in the Power Automate connector:

Then I configured the “Send event” action like so:

I had to manually paste the Event Hub Name, shown in the first screenshot above, as a custom value because I got an error when I tried to use Power Automate’s dropdown box to find the name. I put the Timestamp from the “Manually trigger a flow” trigger into the Content property.

After saving the flow I was then able to trigger it from my browser or the Power Automate mobile app, and every time the flow ran it sent the time of the run to my Eventstream:

Very basic but it proves the point. What could you use this for? Power Automate has a huge number of connectors for Microsoft and third party services: for example you can trigger a flow when a file stored in OneDrive for Business is modified or when new content is published to an RSS feed. Sending these events and their related data on to a Fabric Eventstream not only means you can capture them for analysis in OneLake or an EventHouse but also send the events on to Fabric Activator, which can apply sophisticated rules so you can be alerted when something happens and specific conditions are met in Teams or via email, run Fabric items such as notebooks or even trigger Power Automate flows. I’ll explore some of these scenarios in more detail in future blog posts.

Power BI Copilot: Where Are The CUs Consumed?

If you’ve looked into using Copilot with Power BI, you’ll know that you need a capacity to use it: a P1/F64 or greater. Something I learned recently, though, is that the current Power BI Copilot capabilities are only accessible via a side pane to a report and it’s the report that needs to be stored in a workspace on a P1/F64+ capacity and that’s where the CUs for Copilot are consumed. If the report has a live connection to a semantic model stored in a workspace not on a capacity (which we refer to at Microsoft as “Shared”, but is widely known as “Pro”) then Copilot still works!

For example, I published a semantic model to a workspace stored on Shared/Pro:

I then built a report with a Live Connection to that semantic model and saved it to a workspace on a P1 capacity:

I then opened this report, used Power BI Copilot, and saw that a) it worked and b) the CUs were attributed to the workspace where the report was stored when I looked in the Capacity Metrics App:

This is particularly interesting if you have a large number of semantic models that are stored in Shared/Pro workspaces today but still want to use Power BI Copilot. It means you don’t have to move your semantic models into a workspace on a capacity, which in turn means that refreshing those models or querying them won’t consume CUs on a capacity, which in turn means that you’ll be able to support more users for Power BI Copilot on your capacities.

In the future you’ll have even more flexibility. In the sessions “Microsoft Fabric: What’s New And What’s Next” and “Boost productivity with Microsoft Fabric​, Copilot, and AI” at Ignite last week we announced a number of new Power BI Copilot capabilities coming soon, such as integration with AI Skills and a new “immersive” Copilot experience. We also announced Fabric AI Capacities, coming early 2025, which will allow you to direct all Copilot activity to certain capacities regardless of whether your reports or models are stored on Shared/Pro or Premium, or which capacity they are stored on (see more details from the 16:00 minute mark in the recording of the “Boost productivity with Microsoft Fabric, Copilot and AI” session).

Finding The Size Of All Of The Columns Touched By A DAX Query

I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.

To use Semantic Link Labs you just need to create a new Fabric notebook and install the library:

%pip install semantic-link-labs

After that you can use sempy_labs.get_dax_query_dependencies to get the columns touched by any DAX query, for example:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
    )

This returns a dataframe with one row for each column touched by the query, plus various statistics about the size of each column in memory.

If you’re working with a Direct Lake semantic model, though, in order to get the correct sizes of each column in memory the query itself will need to have been run beforehand; you can ensure that this happens by setting the optional parameter put_in_memory to True:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
        put_in_memory = True
    )

Last of all, if you don’t want a dataframe but just want a single number representing the total memory needed by all columns touched by a query, you can use sempy_labs.get_dax_query_memory_size, for example like this:

import sempy_labs as labs
labs.get_dax_query_memory_size(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery"
    )

Yet more evidence that, for any Power BI user, Semantic Link and Semantic Link Labs are the best reasons for you to flip the switch to enable Fabric. To find out more about what they are capable check out this user group presentation.

New Solution For Monitoring Power BI/Fabric Gateways

If you’re the admin for a large Power BI/Fabric tenant you’ll know how important it is to monitor your on-premises data gateways: if they aren’t working then no-one will be able to refresh any semantic models or dataflows connected to on-prem sources. A while ago Rui Romano published a gateway monitoring solution that proved to be very popular, but recently my colleague at Microsoft Edgar Cotte built a new open-source solution that builds on Rui’s work and which is even more powerful, using Fabric Real-Time Intelligence to let you analyse gateway log information in real time.

You can read his announcement on LinkedIn here and download the solution from https://aka.ms/FabricGatewayMonitoring.

This is an amazing piece of work that is already being used by several very large customers. Check it out!

Role-Playing Dimensions In Fabric Direct Lake Semantic Models

Did you know you can add the same physical table in OneLake multiple times to the same Direct Lake semantic model?

Let’s say you have two tables in a Fabric Lakehouse. One is a fact table called Sales:

…and the other is a dimension table called Date:

Note that the Sales fact table has two date columns, OrderDate and ShipDate.

If you create a DirectLake semantic model using the Web Editor and add these two tables you could rename the Date table to Order Date and build a relationship between it and the OrderDate column on the Sales table:

What about analysing by Ship Date though? You could create a physical copy of the Date table in your Lakehouse and add that to the model, but there’s another option.

If you connect to the model using a tool like Tabular Editor, duplicate the Order Date table and rename the new table Ship Date:

You then have two tables in your semantic model connected to the same physical table in your Lakehouse, and you can create a relationship between this new table and the Sales table:

…and then use the new dimension in your reports:

This is handy for handling role-playing dimensions, which are most often Date dimensions but may be other types of dimension too. Indeed, I’ve sometimes found the need to add the same fact table to a model more than once. The benefit of only having one physical copy is reduced refresh time, lower storage costs (although for most role playing dimensions the savings will be negligible) and simpler ETL.

UPDATE April 2026: if you’re trying this technique and getting errors, see the workaround in this blog post https://blog.crossjoin.co.uk/2026/04/05/role-playing-dimensions-in-fabric-direct-lake-semantic-models-revisited/

Which Columns Are Loaded Into Memory When In Power BI Direct Lake Mode?

As you probably know, in Power BI Direct Lake mode column data is only loaded into memory when it is needed by a query. I gave a few examples of this – and how to monitor it using DMVs – in this blog post from last year. But which columns are loaded into memory in which circumstances? I was thinking about this recently and realised I didn’t know for sure, so I decided to do some tests. Some of the results were obvious, some were a surprise.

Test semantic model and methodology

For my tests I loaded the following tables of data into a Fabric Lakehouse:

…and created a Direct Lake custom semantic model that looked like this:

Before every test I refreshed the model to make sure there was no data in memory (be aware that in the future it may be that refreshing a model does not purge all column data from memory). Then, for each test, I ran a single DAX query from DAX Studio and afterwards ran the following query to see whether the dictionaries for each column were resident in memory (again, see my previous post for background):

EVALUATE 
SELECTCOLUMNS(
INFO.STORAGETABLECOLUMNS(),
"TABLE_ID", [TABLE_ID],
"COLUMN_ID", [COLUMN_ID],
"DICTIONARY_ISRESIDENT", [DICTIONARY_ISRESIDENT])

Note that I’m using one of the new DAX Info functions, INFO.STORAGETABLECOLUMNS(), instead of the older DMV syntax I was using last year; I think the DAX Info functions are a lot more convenient to use.

[I also looked at whether column segments were paged into memory using the INFO.STORAGETABLECOLUMNSEGMENTS() but found that its results were consistent with INFO.STORAGETABLECOLUMNS() so I didn’t include its results in this post and assumed that if a column’s dictionary was resident in memory, so were the associated column segments. It’s also worth mentioning that the latest versions of DAX Studio have some great new functionality in the Model Metrics for telling you which columns are resident in memory in a Direct Lake model]

Here’s what the query above returned immediately after a refresh, before any other queries had been run:

The DICTIONARY_ISRESIDENT column tells you whether a column’s dictionary is resident in memory. Each table in a Direct Lake model (as in an Import mode model) has a hidden column called RowNumber, and for a Direct Lake model this column is always resident in memory. As you can see, all other columns are not resident in memory at this point.

Counting rows in a table

For my first test I created a measure that counted the rows in the Sales table:

Order Count = COUNTROWS('Sales')

I then ran a query that returned just the result of this measure:

EVALUATE
CALCULATETABLE(
    ROW(
    "Order Count", [Order Count]
    )
)

I then ran my query to see what had been loaded into memory, and…

…nothing had changed! No new column data had been loaded into memory at all. I assume this is because Power BI can resolve this query using the RowNumber column from the Sales table.

I then added the Order_Status column from the Sales table to the query:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Order Count", [Order Count]
)
ORDER BY 
    Sales[Order_Status] ASC

After this query, the Order_Status column was resident in memory as you might expect, but no other column was:

Relationships

Instead of using the Order_Status column, I then looked at the impact of using a column from a different table. I created a query that showed Order Count by Customer Name:

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count", [Order Count]
)
ORDER BY 
    Customer[Customer_Name] ASC

After this, the Customer_Name column was in memory along with the two CustomerID columns used in the relationship between the Customer and Sales table:

Any time your query references columns in different tables, the columns used in the relationships between those tables must also be resident in memory.

Measures

I then defined another measure:

Sales Amount = SUM(Sales[Sales_Amount])

And ran the following query:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Sales Amount", [Sales Amount]
)
ORDER BY 
    Sales[Order_Status] ASC

After this, the Order_Status and Sales_Amount columns were resident in memory:

No surprises here: as you would expect, if a column is referenced by a measure then it needs to be resident in memory.

Measures that use relationships also work as expected. I created the following measure:

Order Count Citrus = 
CALCULATE([Order Count], 'Product'[Category]="Citrus")

…and the following query:

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count Citrus", [Order Count Citrus]
)
ORDER BY 
    Customer[Customer_Name] ASC

After this query, all the columns used in the measure and all the relationships needed by the query and the measure were resident in memory:

I was curious to know what the effect of efficient versus inefficent DAX in measures might be on which columns were loaded into memory though. I created the following two measures to see the impact of the antipattern of filtering on an entire table in the second parameter of Calculate():

Order Count Status X V1 = 
CALCULATE([Order Count], 'Sales'[Order_Status]="X")

Order Count Status X V2 = 
CALCULATE([Order Count], FILTER('Sales', 'Sales'[Order_Status]="X"))

My first query used the first of these measures, the efficient version:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Order Count Status X V1", [Order Count Status X V1]
)
ORDER BY 
    Sales[Order_Status] ASC

After this query the Customer_Name column, the two Customer_ID columns and the Order_Status column were resident in memory:

But what about the second, inefficient version of the measure?

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count Status X V2", [Order Count Status X V2]
)
ORDER BY 
    Customer[Customer_Name] ASC

The same columns were resident in memory:

I don’t think this means this version of the measure is any less efficient than it is in Import mode, just that the two versions of the measure need the same columns to be resident in memory to run.

Conclusion

Knowing which columns your query needs to have resident in memory is important for two reasons in Direct Lake mode: it helps understand query performance, because loading column data into memory takes time (Teo Lachev published some test results in a blog post a few days ago which show this clearly); it also helps you understand model memory usage and where you stand relative to the memory limits for the capacity SKU you’re using. As these results show it’s better to test to see which columns your Power BI reports need to have resident in memory rather than rely on guesswork.

It is also possible to use the DISCOVER_CALC_DEPENDENCIES DMV (or the INFO.CALCDEPENDENCY function) to see which tables, columns, measures and relationships are referenced by a DAX query as I described here; I’ve been playing around with this for Direct Lake too, and will report my findings in a future blog post.