Thoughts On SandDance And Power BI

After SandDance was announced at the Microsoft Data Insights Summit a few weeks ago I had a quick play with it, thought to myself that it looked like it it would provide a few more cool data visualisation options, and then almost forgot about it. More recently I spent some time looking at SandDance in more detail and it got me thinking some more about what its uses today are and what what its future might be. There has been a lot of hype surrounding SandDance but not a lot of clarity about where it is positioned in the Power BI story; to be honest I’m still not quite sure where it fits myself and I wouldn’t be surprised if Microsoft doesn’t know either, or at least is keeping its options open.

One thing that is worth pointing out is that it comes from Microsoft Research and is released through Microsoft Garage which is, and I quote, an “outlet for experimental projects”. This suggests that it isn’t a polished product but more of a work-in-progress or an experimental platform. This certainly matches my impressions of the tool and those of Ruth Pozuelo and Alon Brody, who have blogged about it already: in many respects it’s very sophisticated but in others it is quite limited. Will it ever become an ‘official’ product? Other tools have followed this path: you may remember Power Query was originally an experimental project called Data Explorer and released though a site called Azure Labs, a predecessor to the Microsoft Garage site, so it is possible.

Another aspect of the SandDance story that deserves discussion is whether it’s just another custom visualisation or something more. This post on the Power BI blog talks about is as though it’s the former and I guess you could see it just as a way of accessing a lot of new chart types (such as small multiples) for your reports. The charts its creates are certainly eye-catching, as are the animated transitions, and the importance of that  – especially for sales demos – should not be underestimated.

image

However, it seems clear to me that SandDance is really an interactive visual data exploration tool, and indeed this is what the SandDance website suggests:
”SandDance is a web-based application that enables you to more easily explore, identify, and communicate insights about data.”
Microsoft doesn’t currently have any other products that compete in this sector: Power BI reports and dashboards are for publishing pre-defined, semi-static insights rather than true ad-hoc analysis, and while Excel PivotTables are great for starting with a blank sheet and exploring your data, they are certainly not visual; I don’t think Excel PivotCharts are a true visual exploration tool either, more of a visual representation of data in a PivotTable. Does Microsoft need a product in this area? I think it does if it wants to compete directly with Tableau, the gold standard in visual data exploration. Adding SandDance to Power BI makes Power BI a much more rounded product.

A third question is this: why is there a standalone version of SandDance and a Power BI custom visual? This blog post contains an interesting statement from Steven Drucker, principal researcher on the SandDance team:
“Using the Microsoft Garage as the release platform gives us the freedom to run experiments with the more accessible standalone version, and as we learn what you like and what works, we can add the right parts to the Power BI visual,”
This strongly suggests that the standalone version is really just a place for testing new functionality and that the Power BI custom visual is the main focus. Does this contradict the point I made above, and is it just the standalone version that is the ‘experimental’ tool? I’m not sure, because at the moment there don’t seem to be many differences in functionality between the two versions. We’ll have to see how things develop. This statement also suggests that if SandDance does grow up to be a real product, it will be as part of Power BI. This makes commercial sense – every new Microsoft BI product should be integrated with Power BI in my opinion. What’s more, many of SandDance’s current limitations (for example around loading and refreshing data) are solved by using the capabilities of the Power BI platform.

However I’m not sure integrating SandDance into Power BI as a custom visualisation, or rather only as a custom visualisation, is a good idea. At the moment the SandDance custom visualisation feels a bit awkward to use: it’s one tool embedded inside another with two inconsistent and often overlapping UIs. I would prefer to see it as a separate tool launched from the PowerBI.com portal, similar to how the original Power View is/was launched from SharePoint, a third way to interact with data stored in Power BI alongside regular Power BI reports and Excel reports. Users should be able to launch it in the same way as Analyze in Excel and use it to explore a data set directly without having to create a report first, and if they find something interesting they should be able to pin what they have created as a visual to a dashboard, or save it for use in a regular Power BI report. Doing this would require a lot more time and effort on the part of Microsoft than just building a custom visual, but at the moment there seems to be no shortage of resources available to the Power BI team. SandDance is undoubtedly a great first step but with some more investment from Microsoft it could be a much more important part of the Power BI story.

Dynamic Chart Titles In Power BI

UPDATE April 2019: It is now possible to use DAX expressions such as the ones described in this post directly in the Title property of a visual. See https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-april-2019-feature-summary/#dynamicTitles

As you probably know, charts (and lots of other visualisations) in Power BI have titles that can be set to any piece of static text. You can do this by selecting the chart, going to the Format tab in the Visualizations pane, then changing the properties in the Title section as shown below (full documentation here):

image

But what about if you want the chart title to change depending on what is selected? For example, you might be using slicers or filters to allow a user to choose which days of the week they want to see data for. In that situation you might want to add a title that shows which days of the week have actually been selected; this would be particularly important if the report uses filters, or if the report is going to be printed. Unfortunately the built in Title Text property can’t be used to display dynamic values but in this blog post I’ll show you how to solve this problem using DAX.

Here’s a simple example of a report that contains a dynamic chart title:

image

Using data from the Adventure Works DW database I’ve created a simple data model containing a Date dimension table called DimDate and a fact table called FactInternetSales; the DimDate table contains a field called EnglishDayNameOfWeek that contains the names of the days of the week, and the report contains a column chart that shows a Sales measure broken down by day of week. There’s also a slicer where the user can select one or more day and at the top there’s a title that lists the day names selected in the slicer and displayed in the chart.

There are two parts to the solution. The first part is to create a measure that will return the text needed for the chart title, and this relies on the DAX ConcatenateX() function that I blogged about here. Here’s the DAX for the measure:

Title =

"Sales Amount for "
& CONCATENATEX (
VALUES ( 'DimDate'[EnglishDayNameOfWeek] ),
'DimDate'[EnglishDayNameOfWeek],
", "
)

Here, the Values() function is used to return a table containing all of the selected days of the week, and this is then passed to ConcatenateX() to get a text value containing a comma delimited list of the day names.

The second part of the solution deals with how to display the value returned by the measure. In the report above I used a Card visualisation, dropped the measure above into the Field area and then turned off the Category Label on the Format tab so that only the value returned by the measure, and not the name of the measure itself, is displayed:

image
image

And this is all you need to do to recreate the report above.

We can make this better though! Instead of a simple comma delimited list of day names it would be better if we could change the last comma in the list to an “and”:

image

Also, if all the day names were displayed, it would be good not to display a long list of day names but show some default text instead:

image

Here’s the DAX for a measure that does all this:

Title2 =

VAR SelectedDays =
VALUES ( 'DimDate'[EnglishDayNameOfWeek] )
VAR NumberOfSelectedDays =
COUNTROWS ( SelectedDays )
VAR NumberOfPossibleDays =
COUNTROWS ( ALL ( 'DimDate'[EnglishDayNameOfWeek] ) )
VAR AllButLastSelectedDay =
TOPN ( NumberOfSelectedDays - 1, SelectedDays )
VAR LastSelectedDay =
EXCEPT ( SelectedDays, AllButLastSelectedDay )
RETURN
"Sales Amount "
& IF (
NumberOfSelectedDays = NumberOfPossibleDays,
"By Day Of Week",
"For "
& IF (
NumberOfSelectedDays = 1,
"",
CONCATENATEX (
AllButLastSelectedDay,
'DimDate'[EnglishDayNameOfWeek],
", " )
& " And "
)
& LastSelectedDay
)

Using a series of DAX variables to make the code more readable, here’s what this measure does:

  • If the number of days selected is the same as the total number of possible days, return the title text “By Day Of Week”, otherwise
    • If two or more days have been selected, then return a comma delimited list containing all but the last selected day (I used TopN() to get that table of all but the last selected day) plus a trailing “ And “. If only one day has been selected, return an empty string. Then
    • Concatenate the last selected day to the text returned by the previous step. I’ve used the Except() function to find the day that was excluded by the TOPN() function in the previous step.

You can download a .pbix file containing all the code from this post here and I’ve published the report here.

Profiler, Extended Events And Analysis Services

Last week one of the attendees on my SSAS cube design and performance tuning course in London told me that he had been prevented from running a Profiler trace on his SSAS Multidimensional instance by a DBA because “he should be using Extended Events”. He wasn’t too pleased, and I can understand why. This, plus the recent discussion about Profiler and Extended Events for the SQL Server relational engine provoked by Erin Stellato’s recent blog post on the subject, made me think it was worth writing a few words on this subject myself.

Microsoft is clear that in the long term, Extended Events (also commonly known as XEvents) are the replacement for Profiler. This page shows Profiler listed as functionality that will not be supported in a future version of SQL Server – although, importantly, it is not deprecated yet. Profiler is still officially supported for SSAS and the SQL Server relational engine in SQL Server 2016. What’s more, in my opinion it will be a long time before anyone doing serious performance tuning work with SSAS will be able to forget about Profiler and use Extended Events exclusively. Let me explain why.

First of all there is the fact that support for Extended Events in SSAS was only introduced with SSAS 2012. If you are using an earlier version you can’t use them. Even if you have some instances of 2012 or 2014 and some on earlier versions the desire to have a consistent set of tools for performance analysis and monitoring means you probably won’t want to use Extended Events yet.

Then there is the fact that in both SSAS 2012 and 2014 there is no user interface for working with Extended Events– instead, to create, start and stop a trace session you have to use XMLA commands. There are plenty of blog posts out there explaining how to do this but it’s still incredibly time-consuming and fiddly to do. Even in SSAS 2016, where there is a user interface for working with Extended Events and viewing their output, it’s pretty awful and nowhere near as good as Profiler (which is far from perfect itself, but at least useable). Perhaps at some point someone in the community will create a user-friendly tool for working with Extended Events, in the same way that the community created DAX Studio to make up for the shocking fact that even in SQL Server 2016 there is no proper support for running DAX queries in SQL Server Management Studio. I would prefer it if Microsoft did the job itself, though, and started taking tooling for BI seriously.

Thirdly, if you want to do anything useful with the .xel files beyond open them up in SQL Server Management Studio, you’re going to need to use some TSQL and functions like sys.fn_xe_file_target_read_file. What happens if you don’t have an instance of the SQL Server relational engine handy though? Most SSAS shops use SQL Server as their data source, but not all – some use Oracle, or Teradata, or other relational databases, and for them installing an instance of SQL Server somewhere just to work with .xel files many not be an option.

Ah, you say, but on the other hand Extended Events have many advantages over Profiler traces: for example, they are much more lightweight! As Adam Saxton says here:

[Extended Events] won’t have the same impact on performance that a traditional Profiler Trace has. For example, it is reported that 20,000 events/sec on a 2ghz CPU with 1GB of RAM takes less than 2% of the CPU.

If I was building a monitoring application (something like, say, SQL Sentry’s excellent Performance Advisor for Analysis Services) then this might be relevant. But 99% of the time I’m not running a Profiler trace on a Production server, I’m working on a dev or test server, and I always try to prevent other people doing stuff on a server while I’m doing tuning work too, so this is irrelevant. It’s a mistake to assume that Analysis Services users use Profiler for the same kinds of thing that SQL Server relational engine users do. For me, I use Profiler to get roughly the same kind of information that a SQL Server developer gets from a query plan: I use it to find out what’s going on in the engine when I run a single query, so the performance overhead is not something I care about.

That said, it certainly seems to be the case that Extended Events will provide more information than I can get from a Profiler trace and allow me to do more things with that data than I can with Profiler. In SSAS 2016 there are several events that are only available via Extended Events and not via Profiler, although I have no idea what they do; I’m sure, with a bit of research, I can find out. Will any of them be useful? I have no idea yet but I suspect a few will be.

Don’t get me wrong, I think Extended Events are a great technology and something all SSAS developers and administrators should learn. There’s still a lot of UI work to do by Microsoft before they are in a position to replace Profiler, but as I said earlier Microsoft hasn’t deprecated Profiler yet so it has given itself a lot of time to do this work. My only problem is with people like the aforementioned DBA who go around telling people they should be using Extended Events with SSAS right now because that’s what they’ve heard is the best practice and that’s what the recommendation is for the SQL Server relational engine. I’ll still need to use Profiler for a few more years yet.

Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 3

In Part 1 of this series I introduced the different types of non empty filtering that occur in Analysis Services Multidimensional and in Part 2 I showed how you can use monitor this activity using Profiler. In this, the final part of the series, I’m going to show some examples of how you can use this information while tuning MDX queries.

Let’s start by looking at the following query:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

 

It returns 19004 rows – all of the combinations of Customer and Subcategory that have a value in the year 2003:

image

Here’s what you can see in Profiler:

image

There are two Non Empty operations here: the ProgressTotal column shows that first is the NON EMPTY statement on the rows axis; the second we can ignore because it’s the evaluation of the WHERE clause. The Duration column shows that the first Non Empty operation takes just 54ms and the query as a whole takes 1021ms.

Now, let’s make things a bit more complicated by adding an extra filter so we only see the Customer/Subcategory combinations where Internet Sales Amount is less than $10:

[sourcecode language=”text” highlight=”5,10″]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

image

Here’s what Profiler shows:

image

The query now takes 2512ms. But why is it slower? The obvious assumption to make is that it’s the Filter() that has slowed things down, but it looks like the Filter() and the NON EMPTY are now being evaluated as a single operation because the first Non Empty operation in the trace is now taking 2408ms – the majority of the query duration.

Removing the NON EMPTY statement from the rows axis and putting the logic to filter out the customers with no sales into the Filter() function, like so:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10
AND
(NOT ISEMPTY([Measures].[Internet Sales Amount])))
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

 

…only makes things worse, increasing query duration to 4139ms. This confirms our suspicion that Filter() is the problem here and that NON EMPTY can remove the empty customers faster than Filter() can.

The problem with the last query but one is that the NON EMPTY statement is being applied after the Filter(). Wouldn’t it be faster to remove the empty customers first and then filter out the ones where Internet Sales Amount is less than $10, so the slower Filter() can be applied over a smaller set?

There are two ways we can do this. First of all, we can use the NonEmpty() function instead of the NON EMPTY statement to remove the empty customers. NonEmpty() is not faster than the NON EMPTY statement per se, but it does allow us to change the order that the different types of filtering are applied here, and that can make all the difference to performance. Here’s a new version of the query:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS,
[Measures].[Internet Sales Amount]),
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

image

Query duration is now down to 217ms and the first Non Empty operation is only 57ms.

There’s another way of doing this. For MDX geek-points you could use the ultra-obscure HAVING clause in your query to do the filtering after the NON EMPTY, like so:

[sourcecode language=”text” highlight=”8,9″]
SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
HAVING
[Measures].[Internet Sales Amount]<1000
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])
[/sourcecode]

From what I can see, the HAVING clause performs a few milliseconds faster than the previous query – measurable but not something a user would notice. I also tested a variation on Mosha’s classic calculated measure approach for Count/Filter optimisation but that performed worse than the two previous queries.

Timing M Query Execution In Power Query And Power BI (Revisited)

Some time ago I wrote the following post on how to work out how long your M query takes to execute in Power Query:

http://blog.crossjoin.co.uk/2014/11/17/timing-power-query-queries/

While it’s still relevant for Power Query and now of course Power BI (and it also illustrates some interesting facts about query execution), recently I had an interesting discussion with Colin Banfield in the comments section of that post that led to us finding an easier way of measuring query execution times.

In M, the DateTime.LocalNow() function returns the current system date and time at the time the function is called – which means that you could call it multiple times in the same query and get different results. There’s also the DateTime.FixedLocalNow() function which also returns the system date and time; the difference between the two is, as the documentation says:

This value is fixed and will not change with successive calls, unlike DateTime.LocalNow, which may return different values over the course of execution of an expression.

The question here is, though, what time does DateTimeFixedLocalNow() actually return? I asked on the Power Query forum here and Ehren from the Power Query dev team revealed that it returns the system date and time at the point when the query begins.

This means that it can be used to simply the original method I used to find query execution. Here’s a query that uses Function.InvokeAfter() to create a delay of 5 seconds during query execution and returns the difference between the values returned by DateTime.LocalNow() and DateTime.FixedLocalNow():

[sourcecode language=”text” padlinenumbers=”true”]
let
SomeFunction = ()=> DateTime.LocalNow(),
InvokeAfter5Secs =
Function.InvokeAfter(SomeFunction, #duration(0,0,0,5)),
Output =
Duration.ToText(InvokeAfter5Secs – DateTime.FixedLocalNow())
in
Output
[/sourcecode]

image

If you’re using this on a real query I strongly recommend you read my original post carefully and make sure that all of the steps in your query are executed, but does make things a little bit easier.

Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 2

In part 1 of this series I introduced all of the different types of non empty filtering that can occur in Analysis Services and MDX. In this post I’ll show you how you can monitor each of these types of non empty filtering using Profiler using the queries from part 1 as examples.

Profiler events

The three Profiler events we’re going to be most concerned with here are:

  • Calculate Non Empty Begin, which is raised when the engine starts a non empty filter
  • Calculate Non Empty Current, which will be raised one or more times when the non empty filter is in progress and which will give us more detail about what’s happening
  • Calculated Non Empty End, which is raised when the engine finishes evaluating a non empty filter

I’ve also included the Query Begin/End events in my traces just to show when query execution starts and finishes. All the queries in this post will be run on a warm cache; although the values used for non empty filtering can be cached, SSAS can’t cache the output of a non empty filter (unless you do something like this) so for our purposes whether the cache is warm or cold is irrelevant.

image

NON EMPTY and NONEMPTY()

In the simplest cases SSAS treats the NON EMPTY statement and the NONEMPTY() function the same, so let’s look again at the following query from my previous post:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

Here’s what our Profiler trace shows us when this query is run:

image

There are several things to notice here:

  • The Duration column tells us how long, in ms, both the query took to run (on the Query End line) and the non empty filter took to evaluate (on the Calculate Non Empty End line).
  • The EventSubclass column, for the Calculate Non Empty Current events, shows the different stages of evaluation. The possible values here are:
    • 1 – cell values are evaluated for the filtering
    • 2 – calculated members are evaluated (this is no longer used from SSAS 2008 on)
    • 3 – the tuples are rendered in the resultset
  • The IntegerData column shows what type of non empty filtering is taking place. The possible values here are:
    • 1 – All non empty filtering using NON EMPTY and NONEMPTY() using a fast algorithm (as shown in the screenshot above)
    • 2 – EXISTING operations using a fast algorithm.  This is a bit misleading though, because due to an internal bug this event will not fire for most uses of the EXISTING statement but will fire for some operations related to regular autoexists.
    • 3 – Autoexists operations using a fast algorithm
    • 11 – All non empty filtering using NON EMPTY and NONEMPTY() using a slower algorithm, for when complex calculations need to be evaluated
    • 12 – as (2) but with the slower algorithm (theoretically possible but should never occur in practice)
    • 13 – as (3) but with the slower algorithm (again, theoretically possible but should never occur in practice)
  • The ProgressTotal column for the Calculate Non Empty Current (as with the Serialize Results Current event) and EventSubclass 1 should return the number of tuples evaluated for the non empty filter. If the number of tuples is greater than 1000 you’ll see multiple events, one for each 1000 tuples, with the last event showing the overall total. Unfortunately there’s a bug here that means it returns one less than the actual number of tuples evaluated. What’s more, in most cases, the non empty filter operations in the SSAS engine are difficult to match to whatever’s going on in the query, so it’s only really useful as a relative measure of how expensive the operation is.

Putting this all together, for this query we can see that there was one non empty filter operation, it was using the fast NON EMPTY algorithm, the ProgressTotal column shows 3 tuples were returned (ie the three Categories on rows) and it took 1ms.

Autoexists

Now let’s look at the autoexists query from the previous post in this series:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Category].[Category].MEMBERS
*
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

Here’s what Profiler shows:

image

Even with what looks like a single autoexists operation there are three sets of Non Empty events here, two of which have IntegerData 3 (for autoexists) and one with IntegerData 2 (for EXISTING which, as noted above, is to be expected here with autoexists). The ProgressTotal column returns 26 for a query that returns 24 rows which I guess is near enough correct to be useful.

EXISTS() And Existing

As you might expect, the EXISTS() function produces a result very similar to a straightforward autoexists. Taking the EXISTS() query from my previous post:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
EXISTS(
[Product].[Category].[Category].MEMBERS
, {[Product].[Color].&[White]})
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

You can see there’s just one Non Empty Begin/End event pair generated, with IntegerData 3, and ProgressTotal returns 0 which means that one tuple is returned:

image

Whereas the use of EXISTING inside a calculated measure, like so:

[sourcecode language=”text”]
WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT(EXISTING [Product].[Color].[Color].MEMBERS)
SELECT
{MEASURES.COLOURCOUNT}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

Causes multiple Non Empty Begin/End pairs, very probably one for each member on the hierarchy based on the number of events and the values in the ProgressTotal column (the ProgressTotal values for the second, third, fourth and fifth non empty filters tally with the four values returned by the calculated measure; I don’t know what the first non empty filter is doing) :

image

The WHERE Clause And Subselects

Weirdly enough, the presence of a WHERE clause or a subselect in a query also triggers Non Empty Begin/End events [Subselects are something that a lot of SSAS developers get confused by, especially when it comes to the differences between a subselect and the WHERE clause – I recommend watching this short video if you’re unsure of what they do].

The following query with a WHERE clause generates a single Non Empty Begin/End event pair:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]
WHERE([Customer].[Total Children].&[5])
[/sourcecode]

image

image

Probably the only value in knowing that this happens is that you can ignore it when you see it.

A query with a subselect instead of a WHERE clause, while it returns exactly the same results in this case, produces different activity in Profiler:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
(SELECT {[Customer].[Total Children].&[5]} ON 0
FROM
[Adventure Works])
[/sourcecode]

image

image

Again, if you have a subselect in your query (they are very common in MDX generated by SSRS and Excel) you should ignore this activity. In both cases you’ll note that the IntegerData column shows 3 for autoexists and ProgressTotal shows 0.

Summary

What’s clear from these examples is that trying to relate what’s going on in the query to what you see in Profiler is quite tricky even for seemingly simple queries; for most real-world queries it would be almost impossible to do so with total confidence. That said, when I’m tuning queries I usually comment out large parts of the code to try to isolate problems, thus creating much simpler queries, and I hope the value of this post will lie in you being able to spot similar patterns in Profiler to the ones I show here when you do the same thing. In part 3 of this series I’ll show you some practical examples of how all this information can help you tune your own queries.

Thoughts On The Power BI Announcements At The MS Data Insights Summit

I’m not at the Microsoft Data Insights Summit, unfortunately, but I watched the live stream of the keynote today and was duly overwhelmed by the new features announced. I’m not going to summarise what was said because there’s already a post on the official Power BI blog that does that here:

https://powerbi.microsoft.com/en-us/blog//over-5-million-subscribers-are-embracing-power-bi-for-modern-business-intelligence/

…but I wanted to pick out a two topics for more in-depth discussion.

Query Power BI Datasets from Excel

For me, by far, the biggest announcement made today was that by the end of this month we’ll be able to connect an Excel PivotTable up to a Power BI dataset and query it as if it was an Analysis Services cube or a local Power Pivot Data Model. In my opinion this a true killer feature. For the last few years my customers have asked me when MS was going to release SSAS in the cloud and I’ve always replied that Power BI is SSAS in the cloud – it’s just tightly coupled with a front-end right now. That makes things simple in some ways but it also causes problems:

  • You have to build your data model and reports all in the same file, which is a pain because when you want to update one you need to deploy both
  • Other report authors who want to use your model for their reports need to get hold of your .pbix file and edit it
  • Everyone has to use Power BI to build reports on data stored in Power BI

This move solves these problems. It makes manageability much easier: you can build a Data Model, publish it, then you and other developers can build reports on it separately. What’s more it means that all those users who prefer to use Excel, either because it’s the tool they are most comfortable with or because it’s the best option for building the reports they want to build (Excel has a lot of great functionality that Power BI reports don’t have right now and will probably never have), to build reports can do so without having to copy the data into Excel.

Even better, as Amir hinted, this isn’t going to just work with just Excel. If this works the way I assume it works you’ll be able to use all the other MDX-capable front-end tools out there, like SSRS or Pyramid, to query your Power BI data, meaning that you have even more choices for consuming data held in Power BI.

SandDance: A symbol of how MS is finally exploiting all of its assets

SandDance is undoubtedly a cool data visualisation tool and will be great for creating ‘wow’ demos. It’s also not, as the hype is suggesting, something new – I saw it on the Microsoft Research site at least two years ago. What is really important about SandDance is that it shows off another area that Microsoft has got right with Power BI: it has created a platform that other developers both inside and outside Microsoft has build on top of. In this case Microsoft Research had some eye-catching data visualisation technology but not something that made sense as a standalone tool (yes, I know it is actually available as a standalone tool but let’s face it, it wouldn’t make sense as a commercial product). Integrate this data visualisation technology with Power BI, though, and suddenly you’re part of a much richer product that is commercially viable. SandDance gets the chance to be used for real-world purposes by a lot more users; Power BI gets another great bit of functionality; it’s a win-win.

The ‘build a platform’ strategy is classic Microsoft. It’s the story of Windows. It has some disadvantages in that the different pieces might not always fit together as smoothly as it should (compare and contrast with Tableau, which could be seen as the Apple of the data visualisation world) but it gets useful, specialised features out there very quickly. Microsoft has a wealth of great technology scattered across its different divisions that, historically, it has struggled to bring together coherently, but it looks like Power BI is managing to buck the trend. The integration with Azure Stream Analytics and Azure SQL DW that exist today, the integration Power Apps that was demoed in the keynote, the use of machine learning in Quick Insights and the integration with Azure ML that I also saw mentioned on Twitter today, are other examples of this happening.

Microsoft also has a strong partner community in the BI space that did good business building solutions on the SQL Server BI stack; Power BI v1 did not offer the same opportunities and was neglected by partners as a result but Power BI today offers partners a lot more opportunities and so the they are back out there building and selling on Microsoft’s behalf again. It’s happening in the Custom Visuals Gallery, it’s happening with the Power BI API and apps like Power BI Tiles,  and it’s going to happen in a lot of other ways in the future too.

Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 1

Filtering out empty values is something that SSAS does a lot of during query execution, and it is also a common cause of performance problems. In this series of posts (similar to my series earlier this year on results serialisation) I’ll look at the different types of non empty filtering that can occur in an MDX query, how they can be monitored using Profiler and what you can do to improve their performance.

Some of this information has come from an old white paper, but I’ve found that some of what that paper says is now out of date and I’m extremely grateful to Akshai Mirchandani of Microsoft for answering my questions on this subject. Even with the long-suffering Akshai’s help a lot of the information here is based on my own research and therefore potentially incomplete/inaccurate, potentially different for different versions of SSAS (I’m using SSAS 2014 for this series) and could potentially change again in the future, so take due care!

The first question to ask is: what counts as non empty filtering? There are actually several different operations that the engine treats as a non empty filter, a few of which I was surprised by; here are the ones I know about.

1) The NON EMPTY statement

Most MDX queries generated by client tools include a NON EMPTY statement on the rows and columns axis. For example, take a look at the results returned by this MDX query on the Adventure Works cube:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

As you can see, the Product Category Components has no data and returns a null. One way to remove the row for Components would be to add a NON EMPTY statement to the rows axis:

[sourcecode language=”text” highlight=”4″]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

2) The NONEMPTY() function

Often confused with the NON EMPTY statement, but not the same thing: the NON EMPTY statement can only be used on an axis in a SELECT statement, whereas the NONEMPTY() function can be used in any MDX expression. Continuing our example, here’s how to use it to remove the Component category:

[sourcecode language=”text” highlight=”4,6″ padlinenumbers=”true”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
NONEMPTY(
[Product].[Category].[Category].MEMBERS,
{[Measures].[Internet Order Quantity]})
ON 1
FROM
[Adventure Works]
[/sourcecode]

The NONEMPTY() function is much more flexible than the NON EMPTY statement but essentially does the same thing – it isn’t any faster in what it does, but it does allow you to make certain assumptions about your data that can improve query performance (more of that later). One thing to remember is to always set the second parameter, because if you don’t you may get unexpected results.

There is also a NONEMPTYCROSSJOIN() function but it is deprecated and you should not be using it – everything that it does can be done more reliably with other functions.

3) Autoexists

Autoexists is not a feature of the MDX language but rather something that SSAS does automatically to remove tuples from a set that it knows must always be null. It’s described in great detail here, but it’s quite easy to illustrate. We already know from the queries above which categories have data; similarly the following query shows there is data for all colours except Grey and Silver/Black:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

However if you crossjoin every category and every colour on the rows axis, you don’t see every combination of category and colour returned:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Category].[Category].MEMBERS
*
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

There is no row for the category Bikes and the colour White, for example. This is because the SSAS engine knows from the data in the dimension that no product exists that is both the colour White and in the category Bikes, so it doesn’t return that combination from the crossjoin – that particular tuple could never contain any data so there’s no point returning it. Notice that there are combinations, such as Components/Black, that exist in the dimension and are present on the rows axis but still return null because there is no value for Internet Order Quantity.

It’s important to remember that autoexists only takes place when you are working with sets of members from different hierarchies on the same dimension, never with sets of members from different dimensions.

4) The EXISTS() function and the EXISTING keyword

The EXISTS() function and the EXISTING keyword allow you to take advantage of autoexists for filtering inside your own expressions without having to actually do a crossjoin (there’s another variant of EXISTS() with a third parameter that behaves more like NONEMPTY() but it’s very rarely used so I’m going to ignore it).

For example, here’s how you can use the EXISTS() function to return all the categories that have a product that is White:

[sourcecode language=”text”]
SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
EXISTS(
[Product].[Category].[Category].MEMBERS
, {[Product].[Color].&[White]})
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

The EXISTING keyword is used within calculations to apply autoexists filtering to a set based on the other hierarchies from the same dimension. The following query contains a calculated measure that counts the number of members on the Color level of the Color hierarchy, and unsurprisingly returns the same value each time it’s called:

[sourcecode language=”text”]
WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT([Product].[Color].[Color].MEMBERS)
SELECT
{MEASURES.COLOURCOUNT}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

image

However, if you add the EXISTING keyword just before the set in the calculated measure definition, like so:

[sourcecode language=”text” highlight=”3″]
WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT(EXISTING [Product].[Color].[Color].MEMBERS)
SELECT
{MEASURES.COLOURCOUNT}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]
[/sourcecode]

Then you’ll see that the calculation now returns the number of members on the Color level of the Color hierarchy after autoexists filtering has been applied; so for example the first line shows there are five distinct colours associated with the Category Bikes:

image

Summary

OK, after that somewhat lengthy introduction, in part 2 I’ll show you how to use Profiler to monitor what’s going on inside SSAS when you do all of these different types of non empty filtering.

Generating Fixed-Width Text Files In Excel With Power Query

While it’s fairly common to need to load fixed-width files using Power Query or Power Query (and there’s a nice walkthrough of how to do this here), occasionally you might want to use Power Query and Excel to create a fixed-width output for another system, or maybe to create some test data. You might not want to do it often but I can imagine that when/if Power Query is integrated into SSIS this will be a slightly less obscure requirement; at the very least, this post should show you how to use a couple of M functions that are under-documented.

Before we begin, a quick tip: when working with any fixed width file always check the Monospaced box on the View tab in the Query Editor window, to make it easier to just the widths. You’ll notice I’ve done this in all the screenshots in this post.

image

For source data, let’s take the following expression which returns a table:

[sourcecode language=”text” padlinenumbers=”true”]
#table(
type table[Month=text,Product=text,Sales=text,Comments=text],
{
{"Jan","Apples","1","Not good"},
{"Feb","Pears","2","Better"},
{"March","Oranges","50000","Great!"}
})
[/sourcecode]

 

image

Of course this could be any table of data that you choose, it’s just that using an expression like this makes the examples easier to follow. The only thing to notice here is that all the columns have a data type of text, even the Sales column: if your table doesn’t, I suggest you convert all the data types to text before you start – it will save you having to do it later.

The next thing you’ll need to do is to create a custom column that returns a list containing all the values from each of the columns from the current row. The expression to use here is:

[sourcecode language=”text”]
Record.FieldValues(_)
[/sourcecode]

For more information about what the underscore means here, see my previous post; basically it refers to the values in the current row.

Finally, you need to concatenate all the values in this list (which should all be text already, of course) into a single piece of fixed-width text. To do this we’ll need to use a Combiner function – a function that returns a function that, in turn, combines multiple pieces of text together in a specific way. I’ve already blogged about the Combiner.CombineTextByDelimiter() function here but in this case we’ll be using Combiner.CombineTextByRanges() which is a little more complex. You can’t use Combiner.CombineTextByRanges() to combine text directly, you have to call it to get a function that combines text in the way you want and then use this function on your data. To do this you need an expression like:

[sourcecode language=”text”]
Combiner.CombineTextByRanges({{0,3},{9,10},{19,10},{29,255}})
[/sourcecode]

How does this work? For each line, remember, we now have a value of type list that contains all of the values from the four columns in our table. The expression above returns a function that takes a list and constructs a single line of text from it. The list passed to the first parameter consists of a series of lists of two integers, and each value pair gives the position to place each value on the line and the number of characters to allow. So, in this example, the first value in the input list is put at position 0 on the line and is given a space of 3 characters; the second value in the input list is put at position 9 and given a space of 10 characters; the third value in the input list is put at position 19 and given a space of 10 characters; and the fourth value in the input list is put at position 29 and given a space of 255 characters.

This function can now be called in a second custom column to get the  combined text for each line. Here’s the full M code for the query:

[sourcecode language=”text”]
let
Source = #table(
type table
[Month=text,Product=text,Sales=text,Comments=text],
{
{"Jan","Apples","1","Not good"},
{"Feb","Pears","2","Better"},
{"March","Oranges","50000","Great!"}
}),
ListOfValues = Table.AddColumn(
Source,
"List Of Values",
each Record.FieldValues(_)),
MyCombinerFunction = Combiner.CombineTextByRanges({
{0,3},
{9,10},
{19,10},
{29,255}
}),
OutputColumn = Table.AddColumn(
ListOfValues,
"Output Column",
each MyCombinerFunction([List Of Values]))
in
OutputColumn
[/sourcecode]

And here’s the output:

image

One thing to notice: on the third line, the month name March has been truncated to Mar because we only specified a space of three characters for the month name.

image

There’s an optional second parameter to Combiner.CombineTextByRanges() that also needs some explanation. By default, the function that Combiner.CombineTextByRanges() returns will place your text on an otherwise blank line. However you can get it to place your text on a line that contains other characters apart from blank space. For example, if you alter the MyCombinerFunction step in the query shown above to be:

[sourcecode language=”text” padlinenumbers=”true” highlight=”7″]
MyCombinerFunction = Combiner.CombineTextByRanges({
{0,3},
{9,10},
{19,10},
{29,255}
}
,Text.Repeat("#",285)
),
[/sourcecode]

You get:

image

The Text.Repeat() function, as used in the expression above, returns a piece of text with the # character repeated 285 times, and that piece of text is used as the ‘background’ on which the values are superimposed.

Of course, now you’ve got your desired output all you need to do to create a fixed width file is to remove all other columns apart from Output Column in your query, load your query to a table in the Excel worksheet, turn off the header row for the table by unchecking the Header Row option on the ribbon:

image

…and then save the workbook in a .txt file format.

You can download the sample workbook for this query here.

Dynamic Column Selection In Custom Columns In Power BI/Power Query/M

When you’re writing an M expression for a custom column in Power BI or Power Query it’s easy to reference the values in other columns. A slightly more difficult question, however, is how can you reference column values dynamically? For example, given the following input table:

image

How can you use the values in the “Column To Select” column to dynamically select values from either Column 1, Column 2 or Column 3? For example, on the first line of the table the “Column To Select” column contains the value 2, so the calculated column should contain the value from “Column 2”; on the second line of the table the “Column To Select” column contains the value 1, so the calculated column should contain the value from “Column 1” and so on:

image

There are a number of different ways to achieve this. You could, for instance, write a nested if or do some other kind of fancy conditional logic in M, but this could result in a lot of code. You could also unpivot all the columns except “Column To Select”, do some filtering, then pivot the data again but that would also be quite complicated. Probably the easiest way of doing this is with the Record.Field() function, which allows you to get the value of any given field in a record by passing the name of that field as a text value.

Here’s an example query that generates the table shown in the first screenshot above in its first step, sets some data types on the columns in the second step, then creates the custom column shown in the second screenshot in the final step:

[sourcecode language=”text” padlinenumbers=”true” highlight=”18,19,20″]
let
Source = #table(
{"Column 1", "Column 2",
"Column 3", "Column To Select"},
{{10,20,30,2},
{30,40,50,1},
{50,60,70,3}}),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Column 1", Int64.Type},
{"Column 2", Int64.Type},
{"Column 3", Int64.Type},
{"Column To Select", Int64.Type}}),
#"Added Custom" = Table.AddColumn(
#"Changed Type",
"Selected Column Value",
each
Record.Field(
_,
"Column " & Text.From([Column To Select])))
in
#"Added Custom"
[/sourcecode]

Here’s the expression for the custom column isolated, in the form that you would use in the Add Custom Column dialog:

[sourcecode language=”text”]
Record.Field(_, "Column " & Text.From([Column To Select]))
[/sourcecode]

image

Understanding how to use the Record.Field() function here leads us to an interesting side discussion about custom columns, functions, records and each expressions. The full M expression for the third step in the query is:

[sourcecode language=”text”]
Table.AddColumn(
//the table returned by the previous step
#"Changed Type",
//the name of the new custom column
"Selected Column Value",
//a function to call for each row of the table
//returning the value to display in the custom column
each Record.Field(_, "Column " & Text.From([Column To Select])))
[/sourcecode]

The first two parameters of the Table.AddColumn() function are straightforward but if you read the documentation you’ll see that the third parameter requires a value of type function. This function is called for every row in the table and returns the values in the custom column. It doesn’t look like it but there is a new function being defined here. In M an each expression can be used to define a function that has no name and that takes one parameter whose name is _ (ie the underscore character). The Table.AddColumn() function passes this new function a value of type record representing all of the values in the current row of the table, and that means Record.Field() can access this record by using _. What’s more, when referring to fields in this record you don’t even have to say _[Column To Select], you can drop the underscore and just say [Column To Select], as in the code example above. All this means that the expression

[sourcecode language=”text”]
each Record.Field(_, "Column " & Text.From([Column To Select]))
[/sourcecode]

is basically the same as

[sourcecode language=”text”]
(_)=> Record.Field(_, "Column " & Text.From(_[Column To Select]))
[/sourcecode]

…which, if you know a bit of M, makes a lot more sense but for the average user is probably more intimidating.