Creating Tables In Power BI/Power Query M Code Using #table()

After my post earlier this week on creating current day/week/month/year reports in Power BI a few people asked me for a more detailed explanation of the way I was creating tables without using a data source in my M code. This is something I find myself doing quite a lot when I’m loading data with Power BI and Power Query, and while there are several ways of doing this I find that using the #table() intrinsic function is the most elegant option.

Let’s look at some examples. The following query returns a table with two columns (called “First Column” and “Second Column”) and two rows containing the values from 1 to 4:

#table({"First Column", "Second Column"}, {{1,2},{3,4}})
image

No data source is needed – this is a way of defining a table value in pure M code. The first parameter of the function takes a list of column names as text values; the second parameter is a list of lists, where each list in the list contains the values on each row in the table.

In the last example the columns in the table were of the data type Any (the ABC123 icon in each column header tells you this), which means that they can contain values of any data type including numbers, text, dates or even other tables. Here’s an example of this:

#table(
{"First Column", "Second Column"},
{
{1,"Hello"},
{#date(2016,1,1),3}
}
)
image

While this is flexible it’s not exactly practical: in almost all cases the Any data type is a bad choice for loading data, and you need to explicitly set the data type for each column. You can set data types for columns quite easily as a separate step, but it is also possible to set column data types using #table():

#table(
type table
[
#"Number Column"=number,
#"Text Column"=text,
#"Date Column"=date
],
{
{1,"Hello",#date(2016,1,1)},
{2,"World",#date(2017,12,12)}
}
)
image

In this example the first parameter is no longer a list of column names but a declaration of a table type that not only has column names in but also column types. You can see from the icons in the column headers in the screenshot above that the column called “Number Column” has a data type of number, “Text Column” has a data type of text, and “Date Column” has a data type of date.

Of course if you need a fixed table value in Power BI you could use the “Enter Data” button or, if you’re using Excel and Power Query you could create an Excel table and then use the Excel.CurrentWorkbook() function to load the contents of it; if you or your end users need to edit the values in your table easily then you should use one of these two options. On the other hand if you don’t want users to be able to edit the values in the table or, more likely, you are generating the contents of your table using functions that return lists (as in my previous post) then #table() is the way to go.

Creating Current Day, Week, Month And Year Reports In Power BI Using Bidirectional Cross-Filtering And M

One very common requirement when creating a Power BI report is the ability to apply a filter for the current day, week, month, quarter or year. There are several ways of implementing this: you could add relative date columns to your Date table as I show here (I used DAX calculated columns but you could also do this in M quite easily too); you could also build the filter into the DAX for your measures, although that could mean you end up with a lot of quite complex measures.

Last week I received an email asking for help with an interesting variation on this problem: how can you create a report with a single slicer that allows you to switch between showing data for the current day, week, month or year? The requirement to have a single slicer is important here: if you create new columns on the date table, that would allow you to have a single slicer that allows for selecting the current day or any relative day, or the current week or any relative week, or the current month and any relative month, or the current year and any relative year,  but it wouldn’t allow you to select weeks, months and years together in the same slicer.

Here are some screenshots showing what we want to achieve: a report with two measures and a single slicer that allows the user to switch between displaying data for a variety of relative time periods:

image

image

image

The way to achieve this is not all that different from the calculated column approach, but it requires a separate table to model the many-to-many relationship between all the required relative period selections and the dates in them, as well as the use of bidirectional cross-filtering between tables (which I blogged about here). The data model I used for this report looks like this:

image

The Sales table just contains sales data; the Date table is a normal Power BI date table. The Period table is the interesting table here: it contains one row for each combination of relative time period (eg “Today”, “Current Week To Date”, “Rolling Month”) and date:

image

It’s the Period column on this table that is used to create the slicer in the screenshots above. The Sort column is used along with Power BI’s Sort By Column functionality to make the values in the Period column appear in a sensible order in the report.

Notice also that on the relationship between the Period table and the Date table the Cross filter direction property is set to Both:

image

This means that a selection on the Period table travels up the relationship to the Date table and then back down the relationship from Date to Sales. For example, selecting “Current Week” in the Period table will select the dates in the current week on the Date table, which in turn selects the rows for those dates on the Sales table.

The challenge, though, is to write the query to populate the Period dimension. I’ve done this in two parts. First, here’s a function called CreatePeriodTable that returns a table for a single time period selection. It takes the name of the time period and a start date and end date, and will return a table with one row for each date in the date range:

[sourcecode language=”text” padlinenumbers=”true”]
(
PeriodName as text,
StartDate as date,
EndDate as date,
SortOrder as number
) as table =>
let
DayCount = Duration.Days(EndDate-StartDate)+1,
DateList = List.Dates(StartDate,DayCount,#duration(1,0,0,0)),
AddPeriodName = List.Transform(DateList,
each {PeriodName,_,SortOrder}),
CreateTable = #table(
type table[Period=text, Date=date, Sort=number],
AddPeriodName)
in
CreateTable
[/sourcecode]

For example, calling this function like so:

[sourcecode language=”text”]
CreatePeriodTable("Demo", #date(2016,1,1), #date(2016,1,5),1)
[/sourcecode]

Returns the following table (with the dates shown in dd/mm/yyyy format):

image

Second, here’s a query that calls this function once for each of the time periods you want to be able to select by and creates a single table that contains all of the rows returned by each of the function calls:

[sourcecode language=”text”]
let
TodaysDate = Date.From(DateTimeZone.FixedUtcNow()),
Ranges = {
{"Today",
TodaysDate,
TodaysDate,
1},
{"Current Week To Date",
Date.From(Date.StartOfWeek(TodaysDate)),
TodaysDate,
2},
{"Current Month To Date",
Date.From(Date.StartOfMonth(TodaysDate)),
TodaysDate,
3},
{"Current Year To Date",
Date.From(Date.StartOfYear(TodaysDate)),
TodaysDate,
4},
{"Rolling Week",
Date.AddWeeks(TodaysDate,-1) + #duration(1,0,0,0),
TodaysDate,
5},
{"Rolling Month",
Date.AddMonths(TodaysDate,-1) + #duration(1,0,0,0),
TodaysDate,
6},
{"Rolling Year",
Date.AddYears(TodaysDate,-1) + #duration(1,0,0,0),
TodaysDate,
7}
},
GetTables = List.Transform(Ranges,
each CreatePeriodTable(_{0}, _{1}, _{2}, _{3})),
Output = Table.Combine(GetTables)

in
Output
[/sourcecode]

In this query the Ranges step contains a list of lists, where each list in the list represents a time period with its start and end dates in the same order that you’d pass these values as parameters to the CreatePeriodTable() function. I’ve deliberately structured the code to make it easy to add new time period to the list. Hopefully the example time periods in the query above give you a good idea of what’s possible in M and all the functions it gives you for calculating different dates. The GetTables step loops over this list and calls the CreatePeriodTable() function for each list in the list, and the Output step combines all the data into a single table.

All of the dates ranges here end with today’s date, as returned by the DateTimeZone.FixedUtcNow() function, but you may want to check out Ken’s post here on handling time zones in M depending on your exact requirements. Because this happens in M when the data is loaded the value of today’s date will be fixed at the point in time that data refresh took place.

You can download the example workbook for this post here.

Understanding Let Expressions In M For Power BI And Power Query

When you start writing M code for loading data in Power Query or Power BI, one of the first things you’ll do is open up the Advanced Editor for a query you’ve already built using the UI. When you do that you’ll see a very scary chunk of code (and at the time of writing there’s no intellisense or colour coding in the Advanced Editor, making it even more scary) and you’ll wonder how to make sense of it. The first step to doing so is to understand how let expressions work in M.

Each query that you create in Power BI Desktop or Power Query is a single expression that, when evaluated, returns a single value – and that single value is usually, but not always, a table that then gets loaded into the data model. To illustrate this, open up Power BI Desktop (the workflow is almost the same in Power Query), click the Edit Queries button to open the Query Editor window and then click New Source/Blank Query to create a new query.

image

Next, go to the View tab and click on the Advanced Editor button to open the Advanced Editor dialog:

image

You’ll notice that this doesn’t actually create a blank query at all, because there is some code visible in the Advanced Editor when you open it. Delete everything there and replace it with the following M expression:

[sourcecode language=”text” padlinenumbers=”true”]
"Hello " & "World"
[/sourcecode]

image

Hit the Done button and the expression will be evaluated, and you’ll see that the query returns the text value “Hello World”:

image

Notice how the ABC icon next to the name of the Query – Query1 – indicates that the query returns a text value. Congratulations, you have written the infamous “Hello World” program in M!

You might now be wondering how the scary chunk of code you see in the Advanced Editor window for your real-world query could possibly be a single expression – but in fact it is. This is where let expressions come in: they allow you to break a single expression down into multiple parts. Open up the Advanced Editor again and enter the following expression:

[sourcecode language=”text”]
let
step1 = 3,
step2 = 7,
step3 = step1 * step2
in
step3
[/sourcecode]

image

Without knowing anything about M it’s not hard to guess that this bit of code returns the numeric value 21 (notice again that the 123 icon next to the name of the query indicates the data type of the value the query returns):

image

In the M language a let expression consists of two sections. After the let comes a list of variables, each of which has a name and an expression associated with it. In the previous example there are three variables: step1, step2 and step3. Variables can refer to other variables; here, step3 refers to both step1 and step2. Variables can be used to store values of any type: numbers, text, dates, or even more complex types like records, lists or tables; here, all three variables return numbers. The Query Editor is usually clever enough to display these variables as steps in your query and so displays then in the Applied Steps pane on the right-hand side of the screen:

image

The value that the let expression returns is given in the in clause. In this example the in clause returns the value of the variable step3, which is 21.

It’s important to understand that the in clause can reference any or none of the variables in the variable list. It’s also important to understand that, while the variable list might look like procedural code it isn’t, it’s just a list of variables that can be in any order. The UI will always generate code where each variable/step builds on the value returned by the previous variable/step but when you’re writing your own code the variables can be in whatever order that suits you. For example, the following query also returns the value 21:

[sourcecode language=”text”]
let
step3 = step1 * step2,
step2 = 7,
step1 = 3
in
step3
[/sourcecode]

image

The in clause returns the value of the variable step3, which in order to be evaluated needs the variables step2 and step1 to be evaluated; the order of the variables in the list is irrelevant (although it does mean the Applied Steps no longer displays each variable name). What is important is the chain of dependencies that can be followed back from the in clause.

To give another example, the following query returns the numeric value 7:

[sourcecode language=”text”]
let
step3 = step1 * step2,
step2 = 7,
step1 = 3
in
step2
[/sourcecode]

image

In this case, step2 is the only variable that needs to be evaluated for the entire let expression to return its value. Similarly, the query

[sourcecode language=”text”]
let
step3 = step1 * step2,
step2 = 7,
step1 = 3
in
"Hello" & " World"
[/sourcecode]

image

…returns the text value “Hello World” and doesn’t need to evaluate any of the variables step1, step2 or step3 to do this.

The last thing to point out is that if the names of the variables contain spaces, then those names need to be enclosed in double quotes and have a hash # symbol in front. For example here’s a query that returns the value 21 where all the variables have names that contain spaces:

[sourcecode language=”text”]
let
#"this is step 1" = 3,
#"this is step 2" = 7,
#"this is step 3" = #"this is step 1" * #"this is step 2"
in
#"this is step 3"
[/sourcecode]

image

How does all this translate to queries generated by the UI? Here’s the M code for a query generated by the UI that connects to SQL Server and gets filtered data from the DimDate table in the Adventure Works DW database:

[sourcecode language=”text”]
let
Source = Sql.Database("localhost", "adventure works dw"),
dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
#"Filtered Rows" = Table.SelectRows(dbo_DimDate,
each ([DayNumberOfWeek] = 1))
in
#"Filtered Rows"
[/sourcecode]

Regardless of what the query actually does, you can now see that there are three variables declared here, #”Filtered Rows”, dbo_DimDate and Source, and the query returns the value of the #”Filtered Rows” variable. You can also see that in order to evaluate the #”Filtered Rows” variable the dbo_DimDate variable must be evaluated, and in order to evaluate the dbo_DimDate variable the Source variable must be evaluated. The Source variable connects to the Adventure Works DW database in SQL Server; dbo_DimDate gets the data from the DimDate table in that database, and #”Filtered Rows” takes the table returned by dbo_DimDate and filters it so that you only get the rows here the DayNumberOfWeek column contains the value 1.

image

That’s really all there is to know about let expressions. It explains why you can do the kind of conditional branching that Avi Singh describes here; and also why, when I first tried to come up with a way to time how long a query takes to execute, I had to bend over backwards to ensure that all the variables in my let expression were executed in the correct order (though it turns out there’s an easier way of doing this). I hope you find this useful when writing your own M code.

Creating M Functions From Parameterised Queries In Power BI

Query parameters are, in my opinion, one of the most important features added to Power BI recently. The official blog post describing how to use them is great (read it if you haven’t done so already) but it misses out one other cool new feature that I only discovered by accident the other day: it’s now super-easy to create M functions from parameterised queries.

Why is this important? In Power BI (and indeed in Power Query), M functions are the key to combining data from multiple data sources that have the same structure. For example, if you have a folder of Excel workbooks and you want to read the data from Sheet1 in each of them to create a single table for loading into Power BI, functions are the key. Here are some blog posts with examples:

Matt Masson on iterating over multiple web pages:
http://www.mattmasson.com/2014/11/iterating-over-multiple-pages-of-web-data-using-power-query/

Ken Puls on combining data from multiple Excel workbooks:
http://www.excelguru.ca/blog/2015/02/25/combine-multiple-excel-workbooks-in-power-query/

My M function showing a generic function for combining any data from Excel:
https://blog.crossjoin.co.uk/2014/11/20/combining-data-from-multiple-excel-workbooks-with-power-querythe-easy-way/

All of these examples involve writing M code manually. The big change in the latest version of Power BI Desktop is that you can do the same thing using just the UI.

Let’s take the classic example of combining data from multiple Excel workbooks and update it to show how things work now.

Say you have a folder containing three Excel workbooks containing sales data for January, February and March and you want to load data from all three into a single table into Power BI. The first thing to do is to create a new parameter in Power BI Desktop that returns the filename, including path, of one of the Excel files. Call it ExcelFilePath and configure it as shown here:

image

Next, you need to create a query that connects to the Excel file whose filename is used in the parameter and load the data you want from it. In this case let’s say you want to load the data from Sheet1:

image

This is all very straightforward; here’s the query that you’ll get:

image

Unfortunately, at the time of writing, the Excel source doesn’t support using parameters when creating a new query, so you have to create a query and then edit it to use the filename that the parameter returns. You can do this by clicking on the gear icon next to the Source step in the query:

image

In the dialog that appears, click the icon next to the File Path property and choose Parameter, then from the dropdown menu choose the name of the parameter you created earlier:

image

Now here comes to the good bit. In the Queries pane on the left-hand side of the screen, right-click on the name of the query you just created and select Create Function:

image

This option allows you to take any parameterised query and create a function from it. When you do this, you’ll see a dialog asking for the name of the new function to be created (I’ve called my function GetSheet1) and allowing you to change the name of the parameters:

image

Here’s the original M code for the query with the parameterised Source step highlighted:

[sourcecode language=”text” padlinenumbers=”true” highlight=”2″]
let
Source = Excel.Workbook(File.Contents(ExcelFilePath), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",
{{"Product", type text}, {"Month", type text},
{"Units", Int64.Type}, {"Value", Int64.Type}})
in
#"Changed Type"
[/sourcecode]

Here’s the M code for the new query created after Create Function has been selected:

[sourcecode language=”text” highlight=”2,3″]
let
Source = (ExcelFilePath as text) => let
Source = Excel.Workbook(File.Contents(ExcelFilePath), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",
{{"Product", type text}, {"Month", type text},
{"Units", Int64.Type}, {"Value", Int64.Type}})
in
#"Changed Type"
in
Source
[/sourcecode]

Where the original query points to the workbook whose path is returned by the ExcelFilePath query parameter, the new function takes a parameter (also called ExcelFilePath) to which any other query can pass any text value.

Now you have your function, the final step is to call the function on a table containing the names of all of the Excel files in your folder. Create a new query using the From Folder source:

image

…then point Power Query to the folder containing the names of all of the Excel files:

image

Remove all the columns in the table except Folder Path and Name, drag the Folder Path column before the Name column, then select both remaining columns, right-click and select Merge Columns to create a single column (called FullPath here) containing the full path of all the Excel files:

image

image

Next you need to click the Custom Column button and call the GetSheet1 function for the text in the FullPath column for each row in the table. Here’s the expression to use:

[sourcecode language=”text”]
GetSheet1([FullPath])
[/sourcecode]

image

Last of all, click on the Expand icon in the right-hand corner of the new column:

image

…and you have a table that contains all of the data from Sheet1 on all of the files in the folder:

image

Now for the bad news: queries that use functions like this can’t be refreshed after they have been published to PowerBI.com (see also this thread for more details). This could be why the functionality wasn’t publicised in the post on the Power BI blog. Hopefully this will change soon though…?

So, to sum up, it’s the early stages of an important and powerful new piece of functionality. In the past, a lot of the times when I found myself writing M code it was to create parameterised queries and functions; in the future I’m going to be writing a lot less M code, which is great news. I can’t wait to see how this develops over the next few months and I hope it turns up in Power Query too.

First Look at Pyramid’s On-Premises Power BI Integration

Attendees at SQLBits last week were given a sneak peak at the long-promised on-premises Power BI solution from Pyramid Analytics. Details are still scarce, but here’s a link to the five minute-long video that was shown:

https://pyramidanalytics.wistia.com/medias/jkyyn15yy5

Here’s a screenshot from the video showing a Power BI report, a Reporting Services report and native Pyramid content blended together into a single dashboard:

image

I have also managed to confirm one very important point: at least initially, Pyramid’s solution will only work for Power BI reports that use live connections to on-premises Analysis Services data sources (this is contrary to what I originally understood and what I said to a few people on Twitter last week – sorry). That said if you are using Power BI as a front-end to Analysis Services, and a lot of people are, this looks like it will be pretty cool.

The M Code Behind Power BI Parameters

For me the most exciting new feature in Power BI in a long while is the appearance of Query Parameters for data loading. We have been promised an official blog post explaining how they work (although they are very easy to use) and in fact Soheil Bakhshi has already two very good, detailed posts on them here and here. What I want to do in this post, however, is look at the M code that is generated for them and see how it works.

Consider the following parameter built in Power BI Desktop that has, as its possible values, the names of all of the days of the week:

image

The first thing to notice is that parameters are shown as a special type of query in the Queries Pane, but they are still a query:

image

This means that you can open up the Advanced Editor and look at the M code for the query. Here’s the code for the query shown above:

[sourcecode language=”text” padlinenumbers=”true”]
"Monday"
meta
[
IsParameterQuery=true,
List={"Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday"},
DefaultValue="Monday",
Type="Text",
IsParameterQueryRequired=true
]
[/sourcecode]

From this you can see that the value returned by the parameter query is just a single piece of text – it’s the value “Monday” that is set as the Current Value, that’s to say the value returned by the parameter itself. The interesting stuff is all in the metadata record associated with the value. I blogged about metadata here, so you may want to read that post before going any further; it’s pretty clear that the fields in the metadata record correspond to the values set in the UI. All of the fields in the metadata record can be edited in the Advanced Editor if you want.

When the parameter is used in another query it is referenced like any other query value. For example, if you load the DimDate table from the Adventure Works DW sample database and use the parameter above to filter the EnglishDayNameOfWeek column then the code generated in the UI looks like this:

[sourcecode language=”text” highlight=”10″]
let
Source =
Sql.Databases("localhost"),
#"Adventure Works DW" =
Source{[Name="Adventure Works DW"]}[Data],
dbo_DimDate =
#"Adventure Works DW"{[Schema="dbo",Item="DimDate"]}[Data],
#"Filtered Rows" =
Table.SelectRows(dbo_DimDate,
each [EnglishDayNameOfWeek] = Day)
in
#"Filtered Rows"
[/sourcecode]

The filtering takes place in the #”Filtered Rows” step and you can see where the name of the parameter – Day – is used in the Table.SelectRows() function to filter the EnglishDayNameOfWeek column. This is nothing new in terms of the language itself because you have always been able to return values of any data type from a query, not just tables, and you have always been able to reference queries in other queries like this – in fact you can see me write the same kind of code manually in this video. What is new is that there is now a UI to do this and there’s no need to write any code.

Personally, I think the Power BI team have done a great job here in terms of usability and clearly a lot of thought has gone into this feature. It doesn’t do everything I would want yet though: the ability to bind the list of available values to the output of another query and the ability to select multiple parameter values at the same time are obvious missing features (and ones that would be needed to match the parameter functionality in SSRS). However I would not be surprised to see them appear in a future version of Power BI.

After seeing the code, I wondered whether I could edit the code in the parameter query to make it do more interesting things. For example, even if the UI doesn’t support data-driven lists of available values for a parameter, it looks as though it should be possible to replace the hard-coded list with a list of values returned by another query. Unfortunately this does not work: any changes I tried to the parameter query code were either ignored or removed completely. A bit of a disappointment but again, hopefully this will be possible in a future version.

Thoughts On SandDance And Power BI

After SandDance was announced at the Microsoft Data Insights Summit a few weeks ago I had a quick play with it, thought to myself that it looked like it it would provide a few more cool data visualisation options, and then almost forgot about it. More recently I spent some time looking at SandDance in more detail and it got me thinking some more about what its uses today are and what what its future might be. There has been a lot of hype surrounding SandDance but not a lot of clarity about where it is positioned in the Power BI story; to be honest I’m still not quite sure where it fits myself and I wouldn’t be surprised if Microsoft doesn’t know either, or at least is keeping its options open.

One thing that is worth pointing out is that it comes from Microsoft Research and is released through Microsoft Garage which is, and I quote, an “outlet for experimental projects”. This suggests that it isn’t a polished product but more of a work-in-progress or an experimental platform. This certainly matches my impressions of the tool and those of Ruth Pozuelo and Alon Brody, who have blogged about it already: in many respects it’s very sophisticated but in others it is quite limited. Will it ever become an ‘official’ product? Other tools have followed this path: you may remember Power Query was originally an experimental project called Data Explorer and released though a site called Azure Labs, a predecessor to the Microsoft Garage site, so it is possible.

Another aspect of the SandDance story that deserves discussion is whether it’s just another custom visualisation or something more. This post on the Power BI blog talks about is as though it’s the former and I guess you could see it just as a way of accessing a lot of new chart types (such as small multiples) for your reports. The charts its creates are certainly eye-catching, as are the animated transitions, and the importance of that  – especially for sales demos – should not be underestimated.

image

However, it seems clear to me that SandDance is really an interactive visual data exploration tool, and indeed this is what the SandDance website suggests:
”SandDance is a web-based application that enables you to more easily explore, identify, and communicate insights about data.”
Microsoft doesn’t currently have any other products that compete in this sector: Power BI reports and dashboards are for publishing pre-defined, semi-static insights rather than true ad-hoc analysis, and while Excel PivotTables are great for starting with a blank sheet and exploring your data, they are certainly not visual; I don’t think Excel PivotCharts are a true visual exploration tool either, more of a visual representation of data in a PivotTable. Does Microsoft need a product in this area? I think it does if it wants to compete directly with Tableau, the gold standard in visual data exploration. Adding SandDance to Power BI makes Power BI a much more rounded product.

A third question is this: why is there a standalone version of SandDance and a Power BI custom visual? This blog post contains an interesting statement from Steven Drucker, principal researcher on the SandDance team:
“Using the Microsoft Garage as the release platform gives us the freedom to run experiments with the more accessible standalone version, and as we learn what you like and what works, we can add the right parts to the Power BI visual,”
This strongly suggests that the standalone version is really just a place for testing new functionality and that the Power BI custom visual is the main focus. Does this contradict the point I made above, and is it just the standalone version that is the ‘experimental’ tool? I’m not sure, because at the moment there don’t seem to be many differences in functionality between the two versions. We’ll have to see how things develop. This statement also suggests that if SandDance does grow up to be a real product, it will be as part of Power BI. This makes commercial sense – every new Microsoft BI product should be integrated with Power BI in my opinion. What’s more, many of SandDance’s current limitations (for example around loading and refreshing data) are solved by using the capabilities of the Power BI platform.

However I’m not sure integrating SandDance into Power BI as a custom visualisation, or rather only as a custom visualisation, is a good idea. At the moment the SandDance custom visualisation feels a bit awkward to use: it’s one tool embedded inside another with two inconsistent and often overlapping UIs. I would prefer to see it as a separate tool launched from the PowerBI.com portal, similar to how the original Power View is/was launched from SharePoint, a third way to interact with data stored in Power BI alongside regular Power BI reports and Excel reports. Users should be able to launch it in the same way as Analyze in Excel and use it to explore a data set directly without having to create a report first, and if they find something interesting they should be able to pin what they have created as a visual to a dashboard, or save it for use in a regular Power BI report. Doing this would require a lot more time and effort on the part of Microsoft than just building a custom visual, but at the moment there seems to be no shortage of resources available to the Power BI team. SandDance is undoubtedly a great first step but with some more investment from Microsoft it could be a much more important part of the Power BI story.

Dynamic Chart Titles In Power BI

UPDATE April 2019: It is now possible to use DAX expressions such as the ones described in this post directly in the Title property of a visual. See https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-april-2019-feature-summary/#dynamicTitles

As you probably know, charts (and lots of other visualisations) in Power BI have titles that can be set to any piece of static text. You can do this by selecting the chart, going to the Format tab in the Visualizations pane, then changing the properties in the Title section as shown below (full documentation here):

image

But what about if you want the chart title to change depending on what is selected? For example, you might be using slicers or filters to allow a user to choose which days of the week they want to see data for. In that situation you might want to add a title that shows which days of the week have actually been selected; this would be particularly important if the report uses filters, or if the report is going to be printed. Unfortunately the built in Title Text property can’t be used to display dynamic values but in this blog post I’ll show you how to solve this problem using DAX.

Here’s a simple example of a report that contains a dynamic chart title:

image

Using data from the Adventure Works DW database I’ve created a simple data model containing a Date dimension table called DimDate and a fact table called FactInternetSales; the DimDate table contains a field called EnglishDayNameOfWeek that contains the names of the days of the week, and the report contains a column chart that shows a Sales measure broken down by day of week. There’s also a slicer where the user can select one or more day and at the top there’s a title that lists the day names selected in the slicer and displayed in the chart.

There are two parts to the solution. The first part is to create a measure that will return the text needed for the chart title, and this relies on the DAX ConcatenateX() function that I blogged about here. Here’s the DAX for the measure:

Title =

"Sales Amount for "
& CONCATENATEX (
VALUES ( 'DimDate'[EnglishDayNameOfWeek] ),
'DimDate'[EnglishDayNameOfWeek],
", "
)

Here, the Values() function is used to return a table containing all of the selected days of the week, and this is then passed to ConcatenateX() to get a text value containing a comma delimited list of the day names.

The second part of the solution deals with how to display the value returned by the measure. In the report above I used a Card visualisation, dropped the measure above into the Field area and then turned off the Category Label on the Format tab so that only the value returned by the measure, and not the name of the measure itself, is displayed:

image
image

And this is all you need to do to recreate the report above.

We can make this better though! Instead of a simple comma delimited list of day names it would be better if we could change the last comma in the list to an “and”:

image

Also, if all the day names were displayed, it would be good not to display a long list of day names but show some default text instead:

image

Here’s the DAX for a measure that does all this:

Title2 =

VAR SelectedDays =
VALUES ( 'DimDate'[EnglishDayNameOfWeek] )
VAR NumberOfSelectedDays =
COUNTROWS ( SelectedDays )
VAR NumberOfPossibleDays =
COUNTROWS ( ALL ( 'DimDate'[EnglishDayNameOfWeek] ) )
VAR AllButLastSelectedDay =
TOPN ( NumberOfSelectedDays - 1, SelectedDays )
VAR LastSelectedDay =
EXCEPT ( SelectedDays, AllButLastSelectedDay )
RETURN
"Sales Amount "
& IF (
NumberOfSelectedDays = NumberOfPossibleDays,
"By Day Of Week",
"For "
& IF (
NumberOfSelectedDays = 1,
"",
CONCATENATEX (
AllButLastSelectedDay,
'DimDate'[EnglishDayNameOfWeek],
", " )
& " And "
)
& LastSelectedDay
)

Using a series of DAX variables to make the code more readable, here’s what this measure does:

  • If the number of days selected is the same as the total number of possible days, return the title text “By Day Of Week”, otherwise
    • If two or more days have been selected, then return a comma delimited list containing all but the last selected day (I used TopN() to get that table of all but the last selected day) plus a trailing “ And “. If only one day has been selected, return an empty string. Then
    • Concatenate the last selected day to the text returned by the previous step. I’ve used the Except() function to find the day that was excluded by the TOPN() function in the previous step.

You can download a .pbix file containing all the code from this post here and I’ve published the report here.

Profiler, Extended Events And Analysis Services

Last week one of the attendees on my SSAS cube design and performance tuning course in London told me that he had been prevented from running a Profiler trace on his SSAS Multidimensional instance by a DBA because “he should be using Extended Events”. He wasn’t too pleased, and I can understand why. This, plus the recent discussion about Profiler and Extended Events for the SQL Server relational engine provoked by Erin Stellato’s recent blog post on the subject, made me think it was worth writing a few words on this subject myself.

Microsoft is clear that in the long term, Extended Events (also commonly known as XEvents) are the replacement for Profiler. This page shows Profiler listed as functionality that will not be supported in a future version of SQL Server – although, importantly, it is not deprecated yet. Profiler is still officially supported for SSAS and the SQL Server relational engine in SQL Server 2016. What’s more, in my opinion it will be a long time before anyone doing serious performance tuning work with SSAS will be able to forget about Profiler and use Extended Events exclusively. Let me explain why.

First of all there is the fact that support for Extended Events in SSAS was only introduced with SSAS 2012. If you are using an earlier version you can’t use them. Even if you have some instances of 2012 or 2014 and some on earlier versions the desire to have a consistent set of tools for performance analysis and monitoring means you probably won’t want to use Extended Events yet.

Then there is the fact that in both SSAS 2012 and 2014 there is no user interface for working with Extended Events– instead, to create, start and stop a trace session you have to use XMLA commands. There are plenty of blog posts out there explaining how to do this but it’s still incredibly time-consuming and fiddly to do. Even in SSAS 2016, where there is a user interface for working with Extended Events and viewing their output, it’s pretty awful and nowhere near as good as Profiler (which is far from perfect itself, but at least useable). Perhaps at some point someone in the community will create a user-friendly tool for working with Extended Events, in the same way that the community created DAX Studio to make up for the shocking fact that even in SQL Server 2016 there is no proper support for running DAX queries in SQL Server Management Studio. I would prefer it if Microsoft did the job itself, though, and started taking tooling for BI seriously.

Thirdly, if you want to do anything useful with the .xel files beyond open them up in SQL Server Management Studio, you’re going to need to use some TSQL and functions like sys.fn_xe_file_target_read_file. What happens if you don’t have an instance of the SQL Server relational engine handy though? Most SSAS shops use SQL Server as their data source, but not all – some use Oracle, or Teradata, or other relational databases, and for them installing an instance of SQL Server somewhere just to work with .xel files many not be an option.

Ah, you say, but on the other hand Extended Events have many advantages over Profiler traces: for example, they are much more lightweight! As Adam Saxton says here:

[Extended Events] won’t have the same impact on performance that a traditional Profiler Trace has. For example, it is reported that 20,000 events/sec on a 2ghz CPU with 1GB of RAM takes less than 2% of the CPU.

If I was building a monitoring application (something like, say, SQL Sentry’s excellent Performance Advisor for Analysis Services) then this might be relevant. But 99% of the time I’m not running a Profiler trace on a Production server, I’m working on a dev or test server, and I always try to prevent other people doing stuff on a server while I’m doing tuning work too, so this is irrelevant. It’s a mistake to assume that Analysis Services users use Profiler for the same kinds of thing that SQL Server relational engine users do. For me, I use Profiler to get roughly the same kind of information that a SQL Server developer gets from a query plan: I use it to find out what’s going on in the engine when I run a single query, so the performance overhead is not something I care about.

That said, it certainly seems to be the case that Extended Events will provide more information than I can get from a Profiler trace and allow me to do more things with that data than I can with Profiler. In SSAS 2016 there are several events that are only available via Extended Events and not via Profiler, although I have no idea what they do; I’m sure, with a bit of research, I can find out. Will any of them be useful? I have no idea yet but I suspect a few will be.

Don’t get me wrong, I think Extended Events are a great technology and something all SSAS developers and administrators should learn. There’s still a lot of UI work to do by Microsoft before they are in a position to replace Profiler, but as I said earlier Microsoft hasn’t deprecated Profiler yet so it has given itself a lot of time to do this work. My only problem is with people like the aforementioned DBA who go around telling people they should be using Extended Events with SSAS right now because that’s what they’ve heard is the best practice and that’s what the recommendation is for the SQL Server relational engine. I’ll still need to use Profiler for a few more years yet.

Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 3

In Part 1 of this series I introduced the different types of non empty filtering that occur in Analysis Services Multidimensional and in Part 2 I showed how you can use monitor this activity using Profiler. In this, the final part of the series, I’m going to show some examples of how you can use this information while tuning MDX queries.

Let’s start by looking at the following query:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

 

It returns 19004 rows – all of the combinations of Customer and Subcategory that have a value in the year 2003:

image

Here’s what you can see in Profiler:

image

There are two Non Empty operations here: the ProgressTotal column shows that first is the NON EMPTY statement on the rows axis; the second we can ignore because it’s the evaluation of the WHERE clause. The Duration column shows that the first Non Empty operation takes just 54ms and the query as a whole takes 1021ms.

Now, let’s make things a bit more complicated by adding an extra filter so we only see the Customer/Subcategory combinations where Internet Sales Amount is less than $10:

[sourcecode language=”text” highlight=”5,10″]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

image

Here’s what Profiler shows:

image

The query now takes 2512ms. But why is it slower? The obvious assumption to make is that it’s the Filter() that has slowed things down, but it looks like the Filter() and the NON EMPTY are now being evaluated as a single operation because the first Non Empty operation in the trace is now taking 2408ms – the majority of the query duration.

Removing the NON EMPTY statement from the rows axis and putting the logic to filter out the customers with no sales into the Filter() function, like so:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10
AND
(NOT ISEMPTY([Measures].[Internet Sales Amount])))
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

 

…only makes things worse, increasing query duration to 4139ms. This confirms our suspicion that Filter() is the problem here and that NON EMPTY can remove the empty customers faster than Filter() can.

The problem with the last query but one is that the NON EMPTY statement is being applied after the Filter(). Wouldn’t it be faster to remove the empty customers first and then filter out the ones where Internet Sales Amount is less than $10, so the slower Filter() can be applied over a smaller set?

There are two ways we can do this. First of all, we can use the NonEmpty() function instead of the NON EMPTY statement to remove the empty customers. NonEmpty() is not faster than the NON EMPTY statement per se, but it does allow us to change the order that the different types of filtering are applied here, and that can make all the difference to performance. Here’s a new version of the query:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS,
[Measures].[Internet Sales Amount]),
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

image

Query duration is now down to 217ms and the first Non Empty operation is only 57ms.

There’s another way of doing this. For MDX geek-points you could use the ultra-obscure HAVING clause in your query to do the filtering after the NON EMPTY, like so:

[sourcecode language=”text” highlight=”8,9″]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
HAVING
[Measures].[Internet Sales Amount]<1000
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

From what I can see, the HAVING clause performs a few milliseconds faster than the previous query – measurable but not something a user would notice. I also tested a variation on Mosha’s classic calculated measure approach for Count/Filter optimisation but that performed worse than the two previous queries.