Running M Queries In Visual Studio With The Power Query SDK

Writing M in the Advanced Editor in Excel or Power BI can be a frustrating experience unless you’re the kind of masochist who loves writing code in Notepad. There are some options for writing M code outside Excel and Power BI, for example Lars Schreiber’s M extension for Notepad++ (see here for details) or the M extension for Visual Studio Code (available from the Visual Studio Marketplace here; more details on Brett Powell’s blog here), but the trouble with them is that you have to copy the code back into Excel or Power BI to run it. What many people don’t realise, however, is that it is possible to write M code and have IntelliSense, formatting, keyword highlighting and also the ability to execute your own M queries, using the Power Query SDK in Visual Studio.

The Power Query SDK (which you can download here) supports Visual Studio 2015 and 2017 and is intended for people who are writing custom Data Connectors for Power BI. To let you test your Data Connector you can create a .pq file containing M code, and this in fact allows you to run any M query you want whether you’re building a Data Connector or not.

Here’s how. First, install the Power Query SDK and then open Visual Studio and create a new project. Find the Power Query template, select the PQ file option and give your file a name:

image

Then, in the .pq file that is created, you can enter an M query and then either press the Start button on the toolbar or hit F5 to run the query. The output of the query is shown on the Output tab in the M Query Output window:

image

Right-clicking on your project in the Solution Explorer pane and selecting Properties brings up a Properties dialog with various properties that control how your queries behave:

image

image

Many of the properties are self-explanatory, at least for anyone used to writing M in Power BI or Excel. FastCombine turns off data privacy checks. Allow Native Query lets you use M queries that contain ‘native’ queries (for example your own SQL queries if you’re using a SQL Server data source), as Cédric Charlier shows here. A few of them, such as Legacy Redirects, I have no idea about yet (I should really ask someone…). Turning on Show Engine Traces displays engine trace information in the Log tab of the M Query Output pane; turning on Show User Traces displays trace information generated by the use of the Diagnostics.Trace() function in your code in the Log tab. You can save the contents of the Log tab to a text file.

image

Error messages are displayed on the Errors tab of the M Query Output pane:

image

When you have a query that connects to an external data source, the first time you try to run your query you will be prompted to set the credentials used to connect to that data source (as you would in Power BI Desktop), and the data privacy level for the data source, on the Errors pane:

image

The query won’t actually run this first time though; you’ll need to hit Start/F5 again to see the results. If you close the project and then reopen it you will need to enter credentials again; alternatively, on the Credentials tab you can save the credentials used for a data source to a .crd file which can then be reloaded when you reopen your project. You can also edit and delete credentials on the Credentials tab.

image

If I’m honest it’s all very basic but it does the job. The main thing that I miss from writing M code in Power BI is the Query Editor UI – when I write M code there I only write about 50% of it manually, the rest I generate by clicking buttons in the UI because it’s faster. Give me the Query Editor (or the ribbonless version of it that comes with SSDT, because Visual Studio doesn’t support ribbons apparently) and I’ll be happy. Even better, give me the improved code editing functionality in the Advanced Editor in Power BI Desktop and Excel that we’ve been promised!

Creating Excel “Data Dump” Reports From Power BI

We’ve all met them: the users whose idea of a ‘report’ is a large table of data dumped into Excel. However many beautiful data visualisations they see, however many times you try to convince them of the benefits of using Power BI to build reports, they just want to know where the “Export to Excel” button is so they can carry on analysing data in the same old way. Sometimes there’s a valid reason for doing this, sometimes not, but all too often internal politics means that you have to accommodate them.

If you’re using Power BI the obvious way to do this is to use the “Export Data” button in Power BI. However, as the documentation notes, there are some limits on the amount of data that can be exported:

  • The maximum number of rows that can be exported from Power BI Desktop and Power BI service to .csv is 30,000
  • The maximum number of rows that can be exported to .xlsx is 150,000

What’s more, clicking a button to export data and then copying it into an Excel report is a time-consuming, error-prone, manual process that no-one likes doing.

There is another way to get data from Power BI into Excel though: using the Analyze in Excel feature. Rather than using Analyze in Excel to create a PivotTable, which may suffer from the subtotals issue described here if you aren’t using the click-to-run version of Excel 2016 and as a result may be very slow when dealing with large amounts of data, and which will be awkward to get data out of because you have to use Excel functions like GetPivotData(), in this post I’ll show you how to get data from Power BI into an Excel table instead.

First of all you need to create a PivotTable in Excel on your desktop that is linked to a dataset published to Power BI. You can do this either by clicking the Analyze in Excel option from inside the Power BI service in the browser (see here for how to do this), or by using the Connect to Data option in the Power BI Publisher for Excel add-in.

Next, drag any measure into your PivotTable (you will need to have at least one measure defined in your dataset to do this) like so:

image

Then double-click inside a cell containing values, such as the selected cell in the screenshot above, or right-click on the cell and select Show Details:

image

When you do this a new worksheet will appear with a table in it containing rows of data from a table in the underlying dataset, but the exact data is irrelevant here. The real point is that you now have an Excel table (not a PivotTable) with a connection back to your Power BI dataset and you can edit the query that it uses to return data from Power BI – a variation on an old trick known to SSAS and Power Pivot users. You can now delete the PivotTable you created because you will no longer need it.

On the new worksheet with the table on it, right-click in a cell and select Table then Edit Query:

image

A dialog will pop up, and you’ll see an MDX Drillthrough statement in the Command Text box as shown below:

image

You can replace this statement with any MDX or DAX query you want – I recommend using DAX queries here because they are likely to be easier to write if you already know how to write DAX calculations, and unlike MDX queries they return a tabular resultset. I have a fairly old series of posts on DAX queries here that will help get you going. It will be a good idea to install DAX Studio and write your queries there, then copy them into Excel when you are happy with them.

Once you have a working DAX query in the Command Text box:

image

You can click OK, the query will run and you’ll see the output of the query in the Excel table:

image

One minor annoyance is that the column names will be in DAX ‘Table Name’[Column Name] format, and although you can use the SELECTCOLUMNS() DAX function to alias your column names and thereby remove the table names, the square brackets around the column names will always be there.

There are several good things about this approach to getting data into Excel:

  • It allows you to exceed the 150,000 row limit of the native Power BI Export option mentioned above
  • The queries are very quick to run if you aren’t doing anything complex in them
  • It avoids manual exporting – you just use Excel’s native Refresh functionality to run the query whenever you want to download new data into Excel

There are some obvious downsides though:

  • There’s no easy way to pass parameters to the queries you use, and so allow the user to choose what data is retrieved from Power BI. I guess it would be possible with some VBA, though – I haven’t tried – and it may also be possible to connect Power Query/Get&Transform to the Power BI Service and then use some of the tricks I show in this video for building reporting solutions in Excel. You can of course use Excel’s own native table filtering functionality to filter the data that is downloaded.
  • If your DAX query is slow to run your users may get frustrated. To counter this you could create calculated tables in your Power BI dataset using the same DAX as your query, and because calculated tables are created when the Power BI dataset is refreshed rather than when your user refreshes their Excel worksheet this should speed things up. However it will increase the size of your Power BI dataset and make your dataset refresh take longer. In Excel your DAX query would simply be something like:
    EVALUATE ‘My Calculated Table Name’
  • Writing DAX can be complex, whether it’s a DAX query or a calculated table, so another option would be to use the Power BI Query Editor to create the table you need for your report, load that into your dataset and then load the table into Excel.
  • You can’t publish the Excel workbook up to Power BI and have it refresh automatically, because of course Power BI does not support connections from published Excel workbooks back to Power BI datasets. Hopefully this will change in the future – it marked as planned on the Ideas forum.

Of course doing this goes against the best practices that I and most other people recommend for Power BI, so you should probably only do this if you have no other choice. Sometimes you have to bend the rules a little to make sure your project succeeds…

Upcoming Power BI and Azure SSAS training in the UK, Israel and the Netherlands

If you have some training budget to spare why not come along to one of the pre-conference seminars/training courses I have coming up in the next few months? Specifically:

  • Introduction to Azure Analysis Services – a pre-conference seminar on Thursday 22nd February at SQLBits 2018 in London. You’ll learn about what Azure Analysis Services is, why you should use it and how to build Tabular models for it. More details and registration here.
  • Power BI for analysts and developers – a three-day course covering the whole Power BI development lifecycle in Herzliya, Israel, on March 12th-14th. Full agenda and registration here.
  • Loading and transforming data in Power BI and Power Query – a pre-conference seminar on Friday 23rd March before the Dutch Power BI User Day in Utrecht. It will cover everything you need to know about the Query Editor in Power BI and Excel, going from the basics to more advanced topics such as parameters, functions and the M language. Full agenda here.

Incidentally, if you’re in London for SQLBits don’t forget to come to the London Power BI User Group meeting on Wednesday 21st February: I’m doing a panel discussion with various other Power BI folks. I’m also going to be speaking at the Dublin Data and BI Summit (run by the Power BI User Group) at the end of April, and at several other user groups later in the year; follow me on Twitter (I’m now @cwebb_bi) to find out the details when they’re available. I hope to see you at one of these events!

Using SSAS Multidimensional As A Data Source For Power BI (Video)

The nice people at PASS have made a video of my session on “Using SSAS MD as a data source for Power BI” available to view for free on YouTube:

I’m honoured that it’s listed one of their “Best of PASS Summit 2017” sessions, and there are lots of other great videos on the same page including Alberto Ferrari’s session on DAX optimisation.

Some of the tips in this video include a few things I’ve been meaning to blog about for a while, including how important it is to set the ValueColumn property on your dimension attributes in SSAS MD – it lets you use lots of functionality in Power BI that isn’t otherwise available, including date slicers.

Setting SQL Server Connection String Properties In Power BI and SSAS Tabular Modern Data Sources

It may not be immediately obvious, but you cannot set your own connection string properties when connecting to SQL Server using the built-in SQL Server connector from either Power BI or a modern data source in Azure SSAS/SSAS Tabular 2017:

image

All you can do is configure the options that are available in the UI, which in the current version of SSDT looks like this:

image

…and which are documented in the Sql.Databases() M function here.

It turns out that the restriction on using your own connection string properties in the built-in SQL Server connector is a deliberate design decision on the part of the Power Query team because, behind the scenes, they use different providers in different circumstances to optimise performance, and because allowing arbitrary connection string properties might make maintaining backwards compatibility difficult in the future.

While your average Power BI user is unlikely to even notice this, for SSAS Tabular developers it could be a big problem: complete control over the connection string is often necessary in enterprise BI scenarios. What are the alternatives then? Well you can use the OLE DB and ODBC connectors instead:

image

Both of these connectors do allow you to set your own connection string properties. For example here’s the UI for a new ODBC connection in SSDT:

image 

The documentation for the Odbc.DataSource and OleDb.DataSource M functions has more detail on how these connectors can be used and how connection string properties can be set. Remember also that the OLE DB Provider for SQL Server was un-deprecated in October 2017.

However, apart from possible performance differences between the two (which you should test yourself – Henk van der Valk wrote a good post on this for SSAS MD and most of what he said is relevant for Tabular) there’s one less-than-obvious difference between these two options: the OLE DB connector does not appear to support query folding right now whereas the ODBC connector does. Of course this isn’t an issue if you’re writing your own SQL queries to import data, but if you do want to use M functions for partitioning (as I show here) you’re likely to get very poor performance with the OLE DB connector.

Using Your Own SQL Queries For Tables With Modern Data Sources In SSAS 2016+ And Azure Analysis Services

While the integration of the Power Query engine into Analysis Services Tabular 2017 and Azure Analysis Services with modern data sources will certainly bring a lot of benefits, I think it’s fair to say that the implementation has not been entirely painless. One problem is that it is no longer obvious how to specify your own SQL query to populate a table or partition in your Tabular model – and while the Query Editor is great, there are a lot of cases where this is necessary. In this post I’ll show you how to do this.

If you’re used to using the Power Query UI in Excel or Power BI Desktop, you’ll notice that when you connect to a SQL Server database using the SQL Server connector in SSDT:

image

…there is no option to enter your own SQL query when you do so:

image

This is deliberate. In Analysis Services, unlike Power BI and Excel, there is a distinction made between data sources and other M queries that return data from those data sources, one that makes a lot of sense in my opinion. While it is possible to enter your own SQL for other data source types, such as OLE DB connections, a data source object is really intended just to define a connection to a data source and not to define what data you want from that data source.

[You may also notice that there’s a “SQL statement” property on a SQL Server data source visible in the Visual Studio properties pane, but I don’t recommend you use it – it doesn’t seem to work well with the rest of the SSDT/Power Query UI]

To import a table or view in your database all you have to do is right-click on your data source and choose Import New Tables; my blog post from September last year describes how to do this, and how to use M functions for creating partitions.

To use your own SQL queries though you need to write some M code. First, import a table – any table, but preferably a small one – and get to the Query Editor UI. In this case I’ve imported the DimDate table from the Adventure Works DW database:

image

Next, select your query in the Queries pane on the left-hand side of the screen and open the Advanced Editor either by clicking on the relevant button in the toolbar (shown above) or by right-clicking on the query name in the Queries pane. You’ll see the following dialog:

image

The M code will be something like this:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source = #"SQL/localhost;Adventure Works DW",
dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
dbo_DimDate
[/sourcecode]

In this example the Source step creates a reference to the data source you have already created, and the dbo_DimDate step gets the contents of the DimDate table from this data source.

You can modify this code to use your own SQL by using the Value.NativeQuery() function (which I have blogged about here):

[sourcecode language=”text”]
let
Source = #"SQL/localhost;Adventure Works DW",
MyQuery =
Value.NativeQuery(
#"SQL/localhost;Adventure Works DW",
"SELECT DISTINCT FiscalYear FROM DimDate"
)
in
MyQuery
[/sourcecode]

Here what I’ve done is replaced the dbo_DimDate step in the previous query with a step called MyQuery that uses Value.NativeQuery() to run my own SQL.

image

Now all you need to do is click Import and you have the output of the query loaded into SSAS. It would be nice if there was UI support for using your own SQL queries when importing data in the future. Note that, as soon as you use this method, any other steps or queries further downstream will not be able to perform query folding, so you should make sure that you do as much of your filtering and transformation in the SQL as possible otherwise you may encounter performance problems.

The documentation describes a similar – but not identical – workflow for achieving the same result here. Personally I think it’s counter-intuitive that you should click on Expressions to create a Table object! Expressions are used for functions and other M code that is shared by the M queries used by Tables.

An alternative to doing all this is to go back to the old way of doing things and use a legacy data source rather a modern data source in SSDT. You lose the ability to use the Query Editor and M if you do this, but in a lot of cases you probably won’t care. The 17.4 release of SSDT for Visual Studio 2015, released in December 2017, has exposed a property that allows you to create legacy data sources again easily. In Visual Studio, go to the Tools menu and select Options and in the Options dialog go Analysis Services Tabular/Data Import and check “Enable legacy data sources”:

image

When you do this, you’ll notice two new options when you right-click on Data Sources in the Tabular Model Explorer pane: Import From Data Source (Legacy) and Existing Connections (Legacy).

image

This gives you access to the Table Import wizard that was available in previous versions of Analysis Services Tabular, which not only allows you to enter your own SQL but also creates a legacy data source that in turn makes it easy to use your own SQL when creating partitions.

“In the Previous” Date Filters In Power BI/Get&Transform/Power Query

The Query Editor in Power BI/Excel Get&Transform/Power Query has a number of built-in ways to filter data in date columns relative to the current date, such as the “In the Previous” option. However these filters behave in a way I find non-intuitive (and I’m not alone) and it’s not obvious how to get the behaviour I think most people would expect. In this post I’ll show you what the built-in relative date filters actually do and how you can get change them to do something more useful.

Let me give you a simple example. Imagine you’re using the following table of dates (in DD/MM/YYYY format) in an Excel table as a data source:

image

Now, let’s also assume that the today’s date is January 8th 2018 and you only want to load data from the last six months. If you load the data into Power BI in a new query:

image

…and then click on the dropdown menu in the top right-hand corner of the Date column (highlighted), you can select Date Filters/In the Previous:

image

…and then set up the filter for “Keep rows where ‘Date’ is in the previous 6 months” as shown here:

image

…you get the following table back:

image

Six out of the seven dates in the original table are returned, but not the six I would expect. Remember that today’s date is January 8th 2018, and notice that January 1st 2018 is not present in the filtered table and July 1st 2017 is present! I don’t know about you, but I would say that January 1st 2018 should be considered as being “in the previous 6 months” and July 1st 2017 should not be.

The reason this is happening is that the M code generated by the UI uses the Date.IsInPreviousNMonths function, so as a result the filter is getting all the dates that are present in the six months before the month that today’s date is in. Hmmmmm.

In many cases you can get a “last six months” filter of the type I would expect quite easily, by altering the filter dialog box shown above to filter by the last 5 months and including an Or condition that also filters by the current month, like so:

image

This returns the following table:

image

You’ll see now that January 1st 2018 is present and July 1st 2017 is not present. However you will need to be careful with this: if your source data contains dates that are after today’s date but in still in the current month, these dates will now also be included! For example, if the source data is changed to include a new row for January 31st 2018:

image

This new filter will include January 31st 2018 because it is in the same month as today’s date:

image

What if you want to exclude dates that are after today but in the current month? This is where things get tricky, and where you’ll need to write some M code. Let’s imagine that you want to get all the dates that occur in the range July 9th 2017 (the day after the date that is six months before today) and January 8th 2017 (today). You can do this by editing the original query as follows:

let
Source =
Excel.Workbook(
File.Contents("C:\SixMonths.xlsx"),
null,
true
),
Source_Table =
Source{[Item="Source",Kind="Table"]}[Data],
ChangedType =
Table.TransformColumnTypes(
Source_Table,
{{"Date", type date}}
),
EndDate =
Date.From(DateTime.FixedLocalNow()),
StartDate =
Date.AddDays(Date.AddMonths(EndDate,-6),1),
FilteredRows =
Table.SelectRows(
ChangedType,
each [Date]>=StartDate and [Date]<=EndDate
)
in
FilteredRows

In this query, the EndDate step returns today’s date using DateTime.FixedLocalNow(), the StartDate step returns the day after the date that is six months before today’s date, and the FilteredRows step filters the dates so that only those that occur between StartDate and EndDate are returned. And yes, I checked, if you do this with a SQL Server data source then query folding does occur.

With this query, you finally get the dates you’d expect from your filter:

image

To be honest, though, I don’t think it should be this hard!

Using the Resource Usage Profiler Event For SSAS Multidimensional Query Performance Tuning, Part 3: Aggregations And Indexes

Building aggregations in your SSAS Multidimensional will make your queries faster, right? While that’s true, they will only make a noticeable difference to performance if your query has Storage Engine-related problems rather than Formula Engine-related problems. What’s more, even when do you have Storage Engine-related problems there are some cases where you may find that aggregations don’t give you the kind of performance boost that you expect. In this post I’ll explain why this can happen, how you can use the Resource Usage Profiler event (as described in parts 1 and 2 of this series) to find out when this is happening, and how you can deal with the problem.

Aggregations make queries go faster for two reasons:

  1. Most importantly they contain pre-aggregated data. For example your fact table might contain data at the Day granularity but an aggregation might contain fact data aggregated up to the Year granularity. This means that when a query needs to get data at the Year granularity, SSAS does not need to read the fact table-level data stored in the partition, it can read the data it needs direct from the aggregation without needing to aggregate any data at query time.
  2. Secondly, because the size of an aggregation is usually a lot smaller than the size of the fact table-level data stored in a partition, it is much faster to read data from an aggregation.

However, regarding this second point, there’s a catch: you’ll know if you’ve read the previous posts in this series that SSAS builds indexes on fact data so it can scan that data very quickly, but most builds of SSAS do not build indexes on aggregations. I say ‘most’ because there were a few builds of SSAS that did build indexes on aggregations, but as this article explains this feature was turned off soon after it was introduced because it was found that the time spent building those aggregations was not usually worth any gain in query performance that resulted.

The Resource Usage Profiler event can be used to monitor the number of rows read and rows scanned when SSAS reads data from an aggregation.Taking the original test query and cube from the previous posts in this series:

[sourcecode language=”text” padlinenumbers=”true”]
SELECT
{[Measures].[My Measure]}
ON 0
FROM
[MyCube]
WHERE([ID].[ID].&[1])
[/sourcecode]

…the Resource Usage event shows that when run on a cold cache and no aggregations are built on the cube, SSAS scans 256 out of 5000 fact rows of data – a single page – and returns one row:

image

However if this cube has an aggregation built on the ID attribute of the ID dimension, the same granularity as the fact table:

image

…when the query runs and SSAS reads data from this aggregation rather than the fact data, the Resource Usage event returns the following:

image

Notice that the ROWS_SCANNED value is now 5000. This is because the aggregation has no indexes built on it so SSAS has to scan all the rows in the aggregation. Reading data from an aggregation at the same granularity as the fact data is therefore a lot less efficient than reading data from the original fact data.

Of course, since most aggregations are usually much smaller than the original fact data, the lack of indexes is not so important because scanning all the data is going to be very quick anyway. However, on very large cubes you may need to build some very large aggregations and find that even when your queries hit these aggregations, performance is still bad because of this lack of indexes. If you see this happening, and can see the ROWS_SCANNED value in the Resource Usage event reporting very high values, then it might be a good idea to enable the building of indexes on aggregations.

You can do this by changing the values of the AggIndexBuildEnabled and AggIndexBuildThreshold server properties in the msmdsrv.ini file. Setting AggIndexBuildEnabled to 1 allows SSAS to build indexes on aggregations. It’s not necessary to build indexes on all aggregations though: you can specify that only aggregations larger than a certain number of rows have indexes built using the AggIndexBuildThreshold property. The only public documentation for these properties is given in two articles here and here, and I strongly recommend you read these articles so that you understand the implications of doing this on your processing times. You should only consider changing these properties if you are a very experienced SSAS developer and you monitor the effects carefully – I’m not even sure if changing these properties is supported by Microsoft.

Thirteenth Blog Birthday

Every year, on the anniversary of the first-ever post on this blog, I write a post summarising my thoughts on what’s happening in the world of Microsoft BI and what I’m up to professionally. Here I am thirteen years and 1232 posts on from where I started…

First of all, an announcement. For several years now I’ve owned and operated two companies: Crossjoin Consulting, which I use for my own SSAS and Power BI consultancy and private training activities, and from which I make most of my income; and the much smaller Technitrain, which I use for running public training courses in London taught by me and various other notable SQL Server and Microsoft BI trainers. After some deliberation I’ve decided to shut down Technitrain, stop running public courses taught by other people, and focus completely on delivering my own consultancy and training through Crossjoin. There are various reasons for doing this but mostly it’s because organising and marketing public courses is very time-consuming and I’d like to free up that time for other purposes. There’s one final public Power BI course on the Technitrain site and after that all public courses that I teach will appear on the Crossjoin site. I’d like to say thank you to everyone who has taught for Technitrain or who has attended a course over the years.

What do I want to do with the time this frees up? There’s so much Power BI work out there I don’t think I’ll have any trouble keeping myself busy, but I’d really like to build more Power BI custom data connectors. I’ve built a few already for my customers and published another one I built for fun here; when this functionality comes out of preview (which should be soon I hope) and gets added to SSAS and Excel I think there will be a lot of SaaS providers who will want custom data connectors built for their platforms.

One other side-effect of this change is that I have changed my Twitter username to @cwebb_bi. If you already follow me on Twitter then you won’t need to do anything except remember to use the new username when you mention or DM me. If you don’t already follow me, please do so!

Thinking about Microsoft BI as a whole, it’s been another massively successful year for Power BI. Its popularity has now extended beyond small/medium businesses attracted by the price and dyed-in-the-wool Microsoft shops and I have seen several cases where it has replaced Tableau and Qlik based on its features alone. Another few years of this kind of growth and Microsoft will have the kind of domination in BI that it has enjoyed with Windows and Office.

I’ve also been pleased to see how Azure Analysis Services has grown this year too. I particularly like how the dev team have focussed on adding new features that take advantage of the cloud to solve problems easily that are much harder to solve on-premises – the new auto scale-out feature is a great example of this. It will be interesting to see if we get a Multidimensional version of Azure Analysis Services in 2018 – if we do it will be a vote of confidence in a platform whose users are wondering whether it will every see any investment ever again.

Finally, thinking about my other great enthusiasm – Power Query – it seems like things have gone a bit quiet recently. Apart from custom data connectors there hasn’t been much in the way of new functionality added in 2017. I suppose it’s a mature platform now and to be honest I struggle to think how it could be improved; parameters need some work for sure and there are a lot of people complaining about performance in Excel, which can be awful if you’re working with lots of complex queries. Maybe also the web-based interface seen in the CDS preview will appear in other places?

Anyway, time to sign off and get back to enjoying a few more days of holiday before work starts again. Thanks for reading and I hope you all have a great 2018!

PS For those of you looking for video training on Power BI, DAX, SSAS and MDX you can get 12% off all subscriptions at Project Botticelli by clicking here and using the discount code CHRIS12BYE2017

Using the Resource Usage Profiler Event For SSAS Multidimensional Query Performance Tuning, Part 2: Many-To-Many Relationships And Non-Materialised Referenced Relationships

You probably know that using many-to-many relationships or non-materialised referenced relationships can be bad for Analysis Services Multidimensional query performance. How can you measure their impact, though? In the first post in this series I showed how the Resource Usage Profiler event could be used to to monitor Storage Engine activity; in this post I’ll use it to show the effect of using these features on the amount of Storage Engine activity that takes place during query execution.

Many-to-many relationships

In my previous post I built a very simple SSAS cube with one measure group called Fact from a table with 5000 rows, then built a dimension called ID from that same table with 5000 members on its only hierarchy. For this post I added a second measure group called Fact Bridge based on the same table, added a new role-playing copy of the existing dimension called M2M ID, and then created a many-to-many relationship from this new dimension to the original measure group via the new measure group.

image

Because all the dimensions and measure groups are built from the same table, one member on the M2M ID dimension is linked to just one member on the ID dimension, so selecting a member on the M2M ID dimension will give the same result as selecting the member with the same key on the ID dimension. Even though SSAS see a many-to-many relationship, in the data it’s a one-to-one relationship.

As I showed in my last post, selecting one member from the ID dimension in a query like the following:

[sourcecode language=”text” padlinenumbers=”true”]
select
{[Measures].[My Measure]}
on 0,
{[ID].[ID].&[1]}
on 1
from
[M2M Test]
[/sourcecode]

image

…results in a single page of 256 rows being read and a single row being returned from the cube, as shown by the Resource Usage event in a Profiler trace:

image

If the query is changed to use the M2M ID dimension instead:

[sourcecode language=”text” highlight=”4″]
select
{[Measures].[My Measure]}
on 0,
{[M2M ID].[ID].&[1]}
on 1
from
[M2M Test]
[/sourcecode]

image

…the same result is returned, because the member [M2M ID].[ID].&[1] is associated with one member from the [ID].[ID] hierarchy, the member [ID].[ID].&[1], via the intermediate measure group. However the Resource Usage event shows something very different:

image

The ROWS_SCANNED value has gone from 256 to 5256, and ROWS_RETURNED has gone from 1 to 5003! Why? Part of the explanation is that we now have two measure groups that must be scanned, and the Resource Usage statistics are totals for all Storage Engine activity across all measure groups. In this query the Fact Bridge measure group is scanned first to resolve the many-to-many relationship between the M2M ID and ID dimensions, and then the Fact measure group is scanned to get the value for the measure My Measure. The Fact Bridge measure group only accounts for 256 rows scanned and 1 row returned though, the remaining 5000 rows scanned are from the main Fact measure group. The problem here is that SSAS does not translate the filter on the M2M ID dimension into a filter on the ID dimension (this is a limitation of the way SSAS handles many-to-many) so all the rows on the main Fact measure group get scanned in this query.

This explains something that I have blogged about before here, namely that if you partition your measure group by a dimension that is used in a many-to-many relationship you’ll see that all partitions are scanned and not just the partitions you expect to be scanned. The Resource Usage event shows that even when you don’t see unexpected partition scans happening, using a many-to-many relationship in a query can result in a lot of extra Storage Engine activity and therefore potentially worse query performance.

Non-materialised referenced relationships

Something similar happens when you use non-materialised referenced relationships (although materialised referenced relationships are OK). To test this I created another variation on my original cube, with just one measure group and the ID dimension as before but now with a new, role-playing instance of the ID dimension joining to the measure group through the ID dimension using a non-materialised referenced relationship.

image

image

The following query returns the same result as the two queries above:

[sourcecode language=”text” padlinenumbers=”true”]
select {[Measures].[My Measure]} on 0,
{[Ref ID].[ID].&[1]}
on 1
from
[RefDimTest]
[/sourcecode]

image

…but again, the Resource Usage event shows the entire measure group is being scanned when this query runs:

image

I’m not much of a fan of referenced relationships anyway – you can usually get rid of them by redesigning your SSAS dimensions or your underlying dimensional model – so this one more reason not to use them.