Handling Different Granularities In Power BI Using Slicer Groups

Reading Prathy Kamasani’s excellent blog post on Sync Slicers the other week I had a revelation: in the post Prathy shows how it is possible to sync selections between two slicers that are based on different columns from different tables in your dataset, something I had not known was possible. As soon as I read this, I realised this technique could be used to solve a number of common problems including that of handling data at different granularities.

This problem has been written about extensively over the years by Marco Russo and Alberto Ferrari in the following articles, which are all worth reading for background:

https://www.daxpatterns.com/handling-different-granularities/

https://www.sqlbi.com/articles/physical-and-virtual-relationships-in-dax/

https://www.sqlbi.com/articles/propagate-filters-using-treatas-in-dax/

Let me be clear: the solution I’m showing in this post is not the “best”, fastest or most elegant solution to this problem. It does, however, have the advantage of not requiring any changes to relationships or the DAX of your measures – and this could be important in scenarios when making these changes would be time-consuming or have other knock-on effects.

Now, consider the following dataset:

image

There is a Date dimension table and a relationship has been built between it and the Daily Sales table, which contains sales information at the date granularity. There is also a Monthly Budget table that contains data at the Month granularity, and no relationship between it and the Date dimension table. The requirement is, however, to be able to select a month on the Date dimension table and for this selection to filter the Daily Sales table and the Monthly Budget table.

Now look at the following report. With two slicers built from the Month columns of the Date and Monthly Budget tables, selecting a month from the slicer built on the Month column of the Date table filters the sales value but not the Budget value, as you would expect:

SynchSlicersNoSynch

The key thing is that the two Month columns contain month names in exactly the same format, “MonthName Year”, which means that a selection on one can be transferred over to the other. Next, for both slicers on the report, go to the Sync Slicers pane, expand the Advanced options section and enter the same group name for both. Make sure the “Sync field changes to other slicers” option is selected too.

image

Once you’ve done this you’ll find that the selection on the slicer built on the Date table is transferred to the slicer built on the Monthly Budget table, even though there is no relationship between them and no clever DAX used, and this in turn means that both the Budget and the Sales values shown are affected:

SynchSlicersWithSynch

The last step is to hide the slicer for the Month column on the Monthly Budget table so the user only sees the slicer for the Month column on the Date table. You can do this either on the Selection pane or by deselecting the “eye” option on the Sync Slicers pane:

image

The end result is this:

SynchSlicersWithSynchHidden

One last thing to mention is that, if you look at the DAX queries generated behind the scenes, it’s all done using the TreatAs function. Here’s an example of a query that shows this:

DEFINE
  VAR __DS0FilterTable =
    TREATAS({"December 2019"}, 'Monthly Budget'[Month])

  VAR __DS0FilterTable2 =
    TREATAS({"December 2019"}, 'Date'[Month])

EVALUATE
  SUMMARIZECOLUMNS(
    __DS0FilterTable,
    __DS0FilterTable2,
    "SumBudget", IGNORE(CALCULATE(SUM('Monthly Budget'[Budget])))
  )

You can download the sample pbix file for this post here.

Thoughts On The New Power Query Source In SSIS

Yet more evidence that Power Query is taking over the world: there’s a new Power Query data source for SQL Server Integration Services in the latest version of SSDT. Here’s the blog post announcing it (the actual announcement is halfway through):

https://blogs.msdn.microsoft.com/ssis/2019/02/15/deeper-integration-and-new-connector-for-ssis-in-adf/

And here’s the documentation:

https://docs.microsoft.com/en-us/sql/integration-services/data-flow/power-query-source?view=sql-server-2017

It’s a preview and as such, limited in scope. The main restriction, as the blog post says, is:

At present, to facilitate quick/frequent feedback-gathering and improvement cycles, it can only be used with SQL Server Data Tools (SSDT) and SSIS IR.

There’s also no user interface for generating queries: right now you can only paste in an M query or use an M query stored in an SSIS variable. I don’t think that’s much of a problem though – if you don’t want to write your own M code, you can always generate a query in Excel Power Query or Power BI Desktop and cut and paste from there. You don’t even have to open the Advanced Editor to do this, as I show here.

Beyond the normal excitement about cool new stuff, what are the benefits of having Power Query integrated into SSIS? I’m not an SSIS person so I’m probably not the right person to comment on the benefits for SSIS developers, but I would imagine that it will make it possible to connect to a wider range of data sources and also make it easier to work with certain others, such as Excel workbooks and web services. I’d be interested in hearing your thoughts on this, so please leave a comment.

From the point of view of the Excel, Power BI or SSAS developer, though, it’s clear this will make it much easier and quicker to take user-developed Power BI and Excel Power Query solutions and turn them into a managed, IT-department owned solutions – and this will be incredibly useful.

Splitting Text By Character Transition In Power BI And Excel Power Query

The February 2019 release of Power BI Desktop includes some new functionality not mentioned in the summary blog post, functionality that is already in the latest Excel 365 builds (thanks Ed Hansberry for pointing this out) and will be extremely useful. It takes the form of four new options under the Split Column button in the Query Editor:

image

The four options allow you to split text by:

  • Lowercase to uppercase
  • Uppercase to lowercase
  • Digit to non-digit
  • Non-digit to digit

They are intended to solve problems like the one Gil Raviv blogged about here where he shows how to split camel case text in M.

Here’s an example of the lowercase to uppercase split. With the following input table:

image

The output is:

image

Here’s the M code behind:

let
    Source = 
    #table(
        {"MyData"},
        {
            {"CamelCase"},
            {"DimDate"}
            }
            ),
    #"Split Column by Character Transition" = 
    Table.SplitColumn(
        Source, 
        "MyData", 
        Splitter.SplitTextByCharacterTransition(
            {"a".."z"}, 
            {"A".."Z"}), 
            {"MyData.1", "MyData.2"}
            )
in
    #"Split Column by Character Transition"

It turns out that the new Splitter function that’s behind this, Splitter.SplitTextByCharacterTransition, has been in Power BI for at least a month already (if you want some background on Splitter functions, see my blog post here), but I don’t think anyone noticed. This new function is very flexible, and I can see it will be useful for a lot more scenarios than just these new four options in the Query Editor: it allows you to split text when there is a transition from any one character or characters to any other character or characters.

Finding Power Query Query Execution Times In Power BI And Excel Using SQL Server Profiler

Working out how long a Power Query M query takes to run – the first step to troubleshooting refresh performance problems in Power BI or Excel – is something I have blogged about several times (see here, here and here for example). However, there are problems with all of these methods such as the need to alter the M code of your query. Luckily there is another method that I have found, using a combination of DAX Studio and SQL Server Profiler.

How can you get hold of these tools? You can download DAX Studio for free here:

https://daxstudio.org/

Any serious Power BI developer should have it installed already, but if you have an older version you may need to update it to get the features I talk about in this post.

SQL Server Profiler is part of SQL Server Management Studio, and you can download it for free from here:

https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017

You can generate SQL Server Profiler trace files for diagnostic purposes from Power BI Desktop, as Kasper shows here, but these traces don’t show any information about refreshes as far as I can see. You can also hook Profiler up to Power BI Desktop direct, as Sam Lester shows here, but that’s a bit fiddly to do. Luckily DAX Studio now has a preview feature which makes connecting Profiler to Power BI Desktop much easier.

First of all, open up your report in Power BI Desktop, then open up DAX Studio and connect it to the instance of Power BI Desktop you just opened; if you want to use this technique with Excel Power Query queries then you will need to launch DAX Studio from inside Excel. Then, in DAX Studio, open the Options dialog and enable the “Show External Tools” preview feature:

image

Once you have done this, on the Advanced tab on DAX Studio’s ribbon you’ll see an option to launch SQL Profiler and connect it to the Power BI report or Excel file that DAX Studio is currently connected to:

image

Clicking the button opens up Profiler with a trace running. You’ll see that a lot of trace events are selected – probably too many to make sense of – so I suggest that you stop the trace, click the Properties button on the toolbar and then, on the Events Selection tab of the Trace Properties dialog select just the following four trace events:

image

  • Command Begin
  • Command End
  • Progress Report Begin
  • Progress Report End

Restart the trace and then refresh your dataset in Power BI Desktop. You’ll still see a lot of events generated in Profiler, but look for the following:

  • A Command Begin event followed by a lot of Progress Report Begin/End events, which is the command that marks the start of the refresh.
  • After that there will be a series of Progress Report Begin events that have an  EventSubclass of “25 – ExecuteSQL” – one for each of the tables that are being loaded into your dataset and therefore one for each of the Power Query M queries associated with those tables. Don’t be confused by the name of the EventSubclass, these events will appear whatever data source you are using: it looks like the Vertipaq engine requests data from the Power Query engine using a basic form of SQL. For example, here’s a screenshot showing the three events I get when refreshing a dataset with three tables in it called “Property Type”, “Date” and “Price Paid 2017”:
    image
  • Finally, below that there will be a series of Progress Report End events (one for each Progress Report Begin event) with the EventSubclass of “25 – ExecuteSQL”, and the Duration column here will tell you how long it took to get data from the Power Query M query associated with the table. For example, this screenshot shows that it took 5460ms (about 5.5 seconds) to get the data from the “Price Paid 2017” Power Query query:
    image

 

And there you have it, exact timings for each of the Power Query M queries associated with each of the tables in your dataset. Remember that the time taken by each Power Query M query will include the time taken by any other queries that it references, and it does not seem to be possible to find out the amount of time taken by any individual referenced query in Profiler.

There is a lot more interesting information that can be found in this way: for example, dataset refresh performance is not just related to the performance of the Power Query M queries that are used to load data; time is also needed to build all of the structures inside the dataset by the Vertipaq engine once the data has been returned, and Profiler gives you a lot of information on these operations too. Discussion of that will have to wait until a future blog post though…

Excel.Workbook() And The delayTypes Option In Power Query/Power BI

A while ago I found myself wondering – as I often do about this kind of thing – about what the undocumented third parameter of the Excel.Workbook() M function (called delayTypes) actually does. I found a forums post from Guy Hunkin of the Excel Power Query team in 2017 here, which gives the following summary:

Originally, Excel.Workbook used to read the entire workbook data to accurately assign types to the columns. This was EXTREMELY slow on large workbooks.

Having this in mind, we added this flag to delay the behavior. When set to “true”, we don’t infer any column types. Instead, the UI uses its normal inference algorithm to generate a Table.TransformColumnTypes step with the inferred types and if it inferred wrongly, the user can update the transformation explicitly.

I also posted a question on the Power Query MSDN forum and got some more details from Colin Banfield, one of the many M experts who hang out there, as well as doing some investigations of my own. This post summarises what I found out.

First of all, a simple demonstration of what it does. Consider the following Excel table:

image

Connecting to this table in the December 2018 release of Power BI Desktop generates the following M query:

let
    Source = 
    Excel.Workbook(
        File.Contents(
            "C:\DelayTypesTest.xlsx"
            ), 
            null, 
            true
            ),
    SourceData_Table = 
    Source{[Item="SourceData",Kind="Table"]}[Data],
    #"Changed Type" = 
    Table.TransformColumnTypes(
        SourceData_Table,
        {
            {"TextColumn", type text}, 
            {"IntegerColumn", Int64.Type}, 
            {"DecimalColumn", type number}, 
            {"DateColumn", type date}
            }
            )
in
    #"Changed Type"

Notice that the delayTypes option is now automatically set to true and that there is a “Changed Type” step. The output of this query, as seen in the Power Query Editor, is what you would expect:

image

Compare this with the output of the following query which has delayTypes set to false and has no “Changed Type” step:

let
    Source = 
    Excel.Workbook(
        File.Contents(
            "C:\DelayTypesTest.xlsx"
            ), 
            null, 
            false
            ),
    SourceData_Table = 
    Source{[Item="SourceData",Kind="Table"]}[Data]
in
    SourceData_Table

image

There are two things to notice:

  • With delayTypes set to false no “Changed Type” step is necessary, the data types on the columns are set by Excel.Workbook(). With delayTypes set to true all the columns returned by Excel.Workbook() have their data type set to Any and a subsequent “Changed Type” step is necessary to set data types to something useful.
  • With delayTypes set to true the IntegerColumn and the DecimalColumn columns have different data types assigned to them by the “ChangedType” step: IntegerColumn becomes Int64.Type whereas DecimalColumn becomes type number. With delayTypes set to false both columns are set to number. So it looks like Excel.Workbook with delayTypes set to false will only convert columns to primitive data types.

The second question is: if it is, primarily, a performance optimisation how much of an impact does it have? I built four queries that read all the data from an 80MB xlsx file to test the performance of all of the variations of delayTypes true/false and having a “Changed Type” step present or not (just in case there was an overhead to having a “Changed Type” step). Here are the results – the main timings are in minutes and seconds, and in brackets is the length of the “Evaluating” phase of query execution:

  delayTypes=false delayTypes=true
no “Changed Type” step 2:08
(1:21)
1:06
(0:00)
“Changed Type” present 2:19
(1:30)
1:08
(0:00)

I wouldn’t read too much into these exact timings because in my experience timings for the same query can fluctuate quite a lot, but one thing is clear: setting delayTypes to true results in much better performance. Indeed with delayTypes set to false nearly half the time was spent in the “Evaluating” phase whereas with delayTypes set to true there was no “Evaluating” phase at all. As a result I can’t think of a reason not to set the delayTypes option to true.

It’s worth noting that delayTypes is set to false by default if you don’t specify this option, so be aware of this if you are writing your own M code.

%d bloggers like this: