The AutoSetDefaultInitialCatalog Analysis Services Server Property

In Shabnam Watson’s recent blog post on a bug she found when trying to create a Live connection from Power BI to Analysis Services she mentioned that the AutoSetDefaultInitialCatalog server property could be used to solve her problem. This piqued my interested because I’d seen this property but had no idea what it did exactly or why it was there. Luckily, now I work for Microsoft, it’s even easier for me to find out about things like this from the dev team and Akshai Mirchandani was able to help.

First of all, what does it do? The documentation on this property has just been added here, and this is what it says:

AutoSetDefaultInitialCatalog
A Boolean property. When set to true, new client connections automatically default to the first catalog (database) the user has permissions to connect to.
When set to false, no initial catalog is specified. Clients must select a default catalog prior to running queries or discover operations against a database on the server. If no default catalog is specified, an error is returned. If Initial Catalog property is specified in the connection string, the default catalog will be applied from this property.

The default value for this property is true.

Let me illustrate what this means. Say you have an instance of Analysis Services (in this case it’s Tabular, but it could be Multidimensional) with two databases on it:

image

I’ve expanded the Roles tab for each database reasons that will become clear later.

Next, let’s say you run a simple trace on this server looking at the Discover End and Session Initialize events:

image

…and while this trace is running, you open up SQL Server Management Studio and connect to the SSAS instance. Here’s what you see in Profiler:

image

Now, just to be clear, all I did was open up SQL Server Management Studio and connect to the instance. I did not open up a DAX query window or anything like that; all that happened was the list of databases on the instance was displayed in the Object Explorer pane.

image

The interesting thing to notice from the trace above is that when I did that there are five Session Initialize events and even though the Database column in Profiler is blank, you can see from the list of role names in the TextData column that in each case a connection has been made to the Adventure Works Internet Sales database.

This is because when you open a connection to Analysis Services and do not set the Initial Catalog connection string property, what happens is that you will get a connection to the default database on the instance. Which database is the default? It’s just the first database that the user has permission to access on the instance, which is a bit random.

This happens at other times too. Let’s say you right click on the EmptyDB database and process it in SQL Management Studio:

image

Here’s what I see in Profiler:

image

In this case there are three connections to the default database, Adventure Works Internet Sales, when the database I am processing is EmptyDB!

Most of the time these unnecessary connections have no impact at all but sometimes they can cause problems such as the ones Shabnam describes in her blog post. For example:

  • It can cause performance problems, because there is an overhead to opening a connection – for example roles are evaluated when a connection is opened
  • Monitoring and auditing gets complicated because, as you can see from the traces above, there are a whole lot of connections to the default database taking place that you aren’t expecting
  • Most importantly, when a connection is opened a read-commit lock is acquired on that database and in a few rare cases this can cause deadlocks and other locking-related issues

This is why the AutoSetDefaultInitialCatalog server property was introduced. With this server property set to False, when you open a connection to SSAS with no Initial Catalog set, then you get a connection with no database set. You can find this server property in SQL Server Management Studio in the Analysis Server properties dialog (which you can find by right-clicking on your instance name, selecting Properties, and going to the General tab) and checking the Advanced (All) Properties box.

image

With AutoSetDefaultInitialCatalog set to False, here’s what Profiler shows when I rerun my original test of connecting to SQL Server Management Studio:

image

Note that there are now no Session Initialize events now.

Here’s what opening up a new MDX query window in SQL Management Studio shows with AutoSetDefaultInitialCatalog set to False if you don’t explicitly set a database when you connect:


image

image

Note the empty database dropdown box on the toolbar and the “Error loading metadata: No cubes were found” error message shown in the Metadata pane.

So why didn’t the dev team set AutoSetDefaultInitialCatalog to False by default on new instances? The problem with doing this is that it is a potential breaking change that could cause errors in some client tools. I’m not aware of any specific cases where this might happen but if you did decide to change AutoSetDefaultInitialCatalog to False on your instance you would need to test thoroughly to make sure it didn’t break anything. My feeling is, though, it is probably a good idea to AutoSetDefaultInitialCatalog to False on production servers and do the appropriate testing just in case those unnecessary connections are causing problems.

The Second Edition Of “The Definitive Guide To DAX” Is Out!

If you’re a Power BI fan there are three possible answers to the question “Did you know that the second edition of The Definitive Guide To DAX has just been published?”:

Answer#1: Yup, I’ve already got my copy!

If this is your answer there’s no need to read any further.

Answer #2: What’s “The Definitive Guide To DAX”?

If, on the other hand, you’re new to Power BI and this is what you’re thinking then I should explain that “The Definitive Guide To DAX” is a book by Marco Russo and Alberto Ferrari and is what its title suggests it is – the sum total of human knowledge about the DAX calculation and query language used by Power BI, written by the two people who know most about it outside the development team. Marco and Alberto are friends of mine but I don’t think anyone can accuse me of bias when I say that it’s a book that every Power BI developer needs to own, so go out and buy it! If you use Power BI you need to learn DAX and while this book may not be a simple step-by-step tutorial it has in it somewhere answers to just about every question you’ll ever ask about DAX – and, more importantly, the answers it has are as correct and as up-to-date as they possibly can be. I can tell you that it’s proved invaluable to me in my work at least twice in the last week alone.

Answer #3: Yes, I saw that but I already have the first edition – is it worth buying this one too?

This is a slightly more difficult question to answer, but I’m still going to recommend that you buy the second edition. As Marco says in his announcement blog post, a lot of the existing content has been updated and rewritten and a lot of new content has been added. If you care about following all the latest DAX best practices and you don’t want the new hire in your department to mock you because you’ve never heard of DAX Studio, you need to buy this new edition.

[Note: I didn’t get a free copy of this book for review (yet?) but I have an O’Reilly Online Learning account which means I could read it as soon as it was published]

PS I know someone needs to write the “Definitive Guide to M” but it’s not going to be me, at least not right now.

Setting Azure Analysis Services Server Properties Not Visible In SQL Server Management Studio

Users of on-premises Analysis Services will know that most of the useful server properties can be set in SQL Server Management Studio, some (such as MaxIntermediateRowsetSize) can only be set by editing the msmdsrv.ini file. How do you set these properties in Azure Analysis Services though, when there is no msmdsrv.ini file to edit?

The solution is to use an XMLA script to make the change. The easy way to do this is to open up the server properties dialog in SQL Management Studio by right clicking on your instance name in the Object Explorer and selecting Properties:

image

Then, in the server properties dialog, change any server property but do not click ok. Instead, click on the Script button and then select Script Action to New Query Window:

image

This will create a new XMLA query window in SSMS (the connection dialog for this window will be open too, which will freeze the server properties dialog, so you’ll need to either connect or dismiss the dialog to close the server properties dialog) with the XMLA script to make the server properties change you made. The actual change won’t take place, though, unless you execute the script – so don’t do that.

Instead, change the name of the server property in the script to the one you actually want to set and enter the value you want to set it to:

image

Note that you can’t just enter the name of the server property in most cases because server properties can be grouped into sections, so you’ll need to enter the section names too. For example for the MaxIntermediateRowsetSize property you’ll need to enter DAX\DQ\MaxIntermediateRowsetSize.

After that, all you need to do is hit the Execute button and the change will be made.

Tables, Numbers, Immutability And Power Query Performance

Following on from my last two posts on Power Query performance, I thought I would write about another trick that can make a big difference to your query execution times. It’s actually something that Ehren from the dev team mentioned in a forums thread that I blogged about here but it deserves to be better known.

Here’s one of the almost-optimised queries from my previous posts:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        ),
    Column2 = 
    #"Sorted Rows"[Column2]{0}
in
    Column2
[/sourcecode]

This query returns a number (the maximum value in the column called Column2 of the table loaded in from a csv file) and takes about 5 seconds to run. Now consider the following query, which is the same as the previous query but with only the last line different:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        ),
    Column2 = 
    #"Sorted Rows"[Column2]{0}
in
    Column2 + Column2 + Column2 + Column2
[/sourcecode]

This query returns the number returned by the previous query added together four times. As you would expect – and hope – this query also takes about 5 seconds to run. Now consider this query:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        )
in
    #"Sorted Rows"[Column2]{0} +
    #"Sorted Rows"[Column2]{0} +
    #"Sorted Rows"[Column2]{0} +
    #"Sorted Rows"[Column2]{0}
[/sourcecode]

Instead of taking the value #”Sorted Rows”[Column2]{0} and storing it in the variable Column2 then adding Column2 four times, I’m  adding the expression #”Sorted Rows”[Column2]{0} together four times. The query returns the same number as the previous query. However this query takes 20 seconds to run! Why?

As well as the brief remarks at the end of the post I mentioned earlier, more details can be found in section 3.6 of the Power Query Language Specification on Immutability:

Once a value has been calculated, it is immutable, meaning it can no longer be changed. This simplifies the model for evaluating an expression and makes it easier to reason about the result since it is not possible to change a value once it has been used to evaluate a subsequent part of the expression. For instance, a record field is only computed when needed. However, once computed, it remains fixed for the lifetime of the record. Even if the attempt to compute the field raised an error, that same error will be raised again on every attempt to access that record field.


An important exception to the immutable-once-calculated rule applies to list and table values. Both have streaming semantics. That is, repeated enumeration of the items in a list or the rows in a table can produce varying results. Streaming semantics enables the construction of M expressions that transform data sets that would not fit in memory at once.

In the second query above the variable Column2 returns a number and after that number has been assigned to Column2 it cannot be changed. However in the last line of the third query when I add #”Sorted Rows”[Column2]{0} together four times, #”Sorted Rows” is a variable that returns a table which means it gets evaluated four times during the addition.

In summary, if you’re getting a table or list from a data source and you want to use a value inside that table or list multiple times, store it in a variable in a let expression before you do so.

Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 2

In part 1 of this series – which I strongly recommend you read before reading this post – I showed how removing columns from a table can make a dramatic improvement to the performance of certain transformations in Power Query. In this post I’ll show some tricks taught to me by Curt Hagenlocher of the dev team that can improve performance even more.

First of all, let me remind you of my original, totally unoptimised query which takes over a minute to run:

[sourcecode language='html'  padlinenumbers='true']
let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        ),
    Column2 = 
    #"Sorted Rows"{0}[Column2]
in
    Column2
[/sourcecode]

In particular, I’d like to draw your attention to the last step where the position and lookup operators (I blogged about them here) are used to return the value in the Column2 column of the first line of the sorted table:

#”Sorted Rows”{0}[Column2]

Breaking this expression down:

  • #”Sorted Rows” returns the sorted table from the previous step in the query
  • {0} returns the first line of that table as a record
  • [Column2] gives the value in the Column2 field from that record

It turns out that changing the order of {0} and [Column2] in this expression makes a big impact on performance. So using the following expression in the last step instead:

#”Sorted Rows”[Column2]{0}

…reduces query execution time to 5-7 seconds! I’m told this is because asking for the column first does the same thing internally as manually removing all other columns in the table which, as I showed in my last post, has a dramatic impact on performance.

It’s important to understand that this is not a general recommendation to request the column before the row in expressions like this, because in other scenarios requesting the row before the column might perform better. What you need to do is test writing expressions like this both ways to see what gives you the best performance.

One piece of general advice that Curt did give me, though, was that most of the optimisations that the Power Query engine can use only happen on tables – not records or lists – so you should always try to work with tables whenever you can. For this particular query, using the Table.FirstN function to get the first row of the table like so:

Table.FirstN(#”Sorted Rows”,1)[Column2]{0}

…allows for another internal optimisation to kick in, taking the query execution time down to around 2 seconds – the same performance as the original query in my previous post.

Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 1

I’ve just learned some really interesting new Power Query/M optimisation tricks! I’m going to write them up here as a series of worked examples rather than generalised patterns because I’m still trying to understand them properly myself and I’m not really sure what lessons to draw from them exactly; instead, I hope they give you some new ideas to try when optimising your own queries. I do think they will be useful to a lot of people though.

In this first part I’m going to set up the scenario and show you what I found out from my own experimentation. The really mind-blowing tricks shown to me by the ever-helpful Curt Hagenlocher of the Power Query dev team will be covered in part 2.

Let’s say you have a large csv file which contains a numeric column and you want to get the maximum value from that column. In this case I’m going to use the 2018 Price Paid data from the UK Land Registry available here. This csv file contains 1021215 rows, one for each property transaction in England and Wales in 2018; the second column in this file contains the the price paid for the property, so the aim here is to get the maximum price paid for all property transactions in 2018.

You can build this query quickly and easily, and get excellent performance, with a few clicks in the UI. After connecting to the csv file and setting the data type on Column2 to Whole Number, all you need to do is select Column2, go to the Transform tab in the Power Query Editor window and click the Statistics then select Maximum from the dropdown menu:

image

This returns the number we’re looking for:

image

The query takes around 1.5 seconds to run (I used the technique I blogged about here to measure the duration). Here’s the M code for the query:

let
Source =
Csv.Document(
File.Contents(
"C:\Users\chwebb\Downloads\pp-2018.csv"
),
[Delimiter=",", Columns=16,
Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" =
Table.TransformColumnTypes(
Source,
{{"Column2", Int64.Type}}
),
#"Calculated Maximum" =
List.Max(#"Changed Type"[Column2])
in
#"Calculated Maximum"

I, of course, did not use this very efficient method when I first built my query. Instead did the following: after loading the data I sorted the table by Column2 in descending order and then right-clicked in the top cell in Column2 and selected Drill Down:

image

Here’s the resulting M code:

let
Source =
Csv.Document(
File.Contents(
"C:\Users\chwebb\Downloads\pp-2018.csv"
),
[Delimiter=",", Columns=16,
Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" =
Table.TransformColumnTypes(
Source,
{{"Column2", Int64.Type}}
),
#"Sorted Rows" =
Table.Sort(
#"Changed Type",
{{"Column2", Order.Descending}}
),
Column2 =
#"Sorted Rows"{0}[Column2]
in
Column2

The performance of this query is much worse: 75 seconds, although this varies a lot. So I tried to work out what’s going on here and see if I could improve performance… and that’s when I started learning.

The variation in the amount of time taken to run the query made me think about memory usage and the 256MB container size limit (see this blog post for important background information) and sure enough, Resource Monitor showed that this query was hitting the 256MB limit – unsurprising because sorting is one of those transformations that requires a table to be loaded into memory completely (at least in the worst case – and this seemed like the worst case). Why not reduce the size of the table then? Since only Column2 is needed by the query output I removed all other columns in the table before doing the sort, resulting in the following M:

let
Source =
Csv.Document(
File.Contents(
"C:\Users\chwebb\Downloads\pp-2018.csv"
),
[Delimiter=",", Columns=16,
Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" =
Table.TransformColumnTypes(
Source,
{{"Column2", Int64.Type}}
),
#"Removed Other Columns" =
Table.SelectColumns(
#"Changed Type",
{"Column2"}
),
#"Sorted Rows" =
Table.Sort(
#"Removed Other Columns",
{{"Column2", Order.Descending}}
),
Column2 =
#"Sorted Rows"{0}[Column2]
in
Column2

This reduced query execution time a lot – it still varied, but now it was in the range of 5 to 8 seconds.

This leads to the first important performance tuning tip: remove all unnecessary columns from your tables to reduce memory overhead, especially if you’re doing memory-intensive transformations such as sorts, merges, groupings, pivots or unpivots. Of course you should be removing all columns you don’t need for your reports anyway, but the point here is that:

  • You should remove columns you don’t need in your dataset as early as possible in your queries
  • In situations where you need to do more complex transformations in intermediate queries (ie queries that don’t load direct into the dataset but whose output is used by queries that do), remove all but the columns needed by the transformation

Tune in next week for part 2, where I’ll show even more examples of how this particular query can be tuned – and I promise that all you Power Query fans will learn something new!

Adding More Aggregate Columns To The Output Of Table.Profile

A few years ago I blogged about the Table.Profile M function and how you could use it to create a table of descriptive statistics for your data:

https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/

Since that post was written a new, optional second parameter has been added to the function called additionalAggregates which allows you to add your own custom columns containing aggregate values to the output of Table.Profile, so I thought I’d write a follow-up on how to use it.

Consider the following query:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source
    )
in
    Profile
[/sourcecode]

 

There are two steps here. Source returns the following table using the #table function:

image

The second step uses Table.Profile to return a table of summary statistics:

image

[Several columns returned aren’t shown in the screenshot above]

If you want some extra columns on this table you can now add them with the second additionalAggregates parameter of Table.Profile like so:

[sourcecode language='text'  htmlscript='false' highlight='15,16,17,18']
let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source,
        {
        {"Median", each Type.Is(_, type number), List.Median},
        {"Mode", each Type.Is(_, type number), List.Mode}
        }
    )
in
    Profile
[/sourcecode]

In this example I’ve added two extra columns called Median and Mode, which return the median and mode values of every numeric column:

image

The additionalAggregates parameter takes a list, where each item is itself a list of three items:

  • The name of the new column to add.
  • A function to call to determine where the aggregate column will be applied to a given column in the source table. In this case I’m checking to see if a column is type number and only returning a median or mode if it is.
  • A function to call to calculate the aggregate value across the column: in this case List.Median and List.Mode. It’s a good idea to use an aggregate function that does not require storing every value in the column in memory (ideally one that can be folded, if you’re using a data source where query folding is supported) here to ensure fast performance.

[Thanks to Curt Hagenlocher for the information here]

Excel Dynamic Arrays And FilterXML

I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.

It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.

Consider the following, publicly-available XML file:

http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml

It’s a list of exchange rates published by the UK tax authorities and the contents look like this:

image

Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:

image

If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.

The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = 
    Text.FromBinary(
        Web.Contents(
            "http://www.hmrc.gov.uk/softwaredevelopers" &
            "/rates/exrates-monthly-0719.xml"
        )
    ),
    #"Converted to Table" = 
    #table(
        1, 
        {{Source}}
    ),
    #"Cleaned Text" = 
    Table.TransformColumns(
        #"Converted to Table",
        {{"Column1", Text.Clean, type text}}
    ),
    #"Replaced Value" = 
    Table.ReplaceValue(
        #"Cleaned Text",
        "> <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value1" = 
    Table.ReplaceValue(
        #"Replaced Value",
        ">  <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value2" = 
    Table.ReplaceValue(
        #"Replaced Value1",
        ">   <",
        "><",
        Replacer.ReplaceText,
        {"Column1"}
    ),
    #"Replaced Value3" = 
    Table.ReplaceValue(
        #"Replaced Value2",
        ">    <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    )
in
    #"Replaced Value3"
[/sourcecode]

One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:

image

You can then use an Excel formula like this to run an XPath query against this XML document:

[sourcecode language='text' ]
=FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName")
[/sourcecode]

to get a list of all the country names spilling out to as many rows in the worksheet as necessary:

3d266a16-fa27-4927-9f79-558be7891293

[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]

I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.

SSASDiag: A Tool To Help Diagnose Analysis Services Problems

There are a lot of great community-developed tools out there for Analysis Services developers to use (BI Developer Extensions, DAX Studio, Tabular Editor, Analysis Services Query Analyzer to name a few) and they have saved me vast amounts of time and effort over the years. When I joined Microsoft last month I came across one which I had never seen before but which is nevertheless quite mature and feature-rich: the SSAS Diagnostics Tool or SSASDiag for short. It’s available on GitHub here:
https://github.com/ssasdiag/SSASDiag

…and you can read the documentation here:
https://github.com/ssasdiag/SSASDiag/wiki/SSAS-Diagnostics—Analysis

image

It’s an open source tool developed by the people who support Analysis Services here at Microsoft and is intended to help them collect and analyse the information they need to troubleshoot on-premises SSAS  issues, but it’s available for anyone to use. I haven’t had a chance to take a proper look at it yet myself, unfortunately, but I thought it would be interesting for any SSAS fans out there to check out.

[Thanks to Jon Burchel for providing all the background information for this post]

Sharing Data Between Organisations With Azure Data Share

If you’ve ever built a BI solution it’s likely you will have had to integrate third party data, and if that’s the case you will know how painful it often is to get your hands on that data. Badly designed portals you have to log into every week to download the data, CSV files emailed to you, APIs with complex authentication – it’s usually an unreliable, slow and manual process. This is why I was interested to read about a new Azure service that previewed this week called Azure Data Share that aims to provide a simple and secure way to share data between organisations.

You can read the announcement blog post here:

https://azure.microsoft.com/en-us/blog/announcing-preview-of-azure-data-share/

…read the documentation here:

https://docs.microsoft.com/en-us/azure/data-share/

…and watch an introductory video here:

https://channel9.msdn.com/Shows/Azure-Friday/Share-data-simply-and-securely-using-Azure-Data-Share

What could this be used for in Power BI? Well, just for fun I uploaded the contents of a Common Data Model folder to a storage account in one Azure account and used Azure Data Share to copy that folder into a storage account in another Azure account. In the destination the data was stored in Azure Data Lake Gen2 storage, so I was able to attach the CDM folder as a dataflow in Power BI. It took a lot of trial-and-error on my part to get the permissions on the various storage accounts working properly (I’m not an expert on this…) but it worked. This proves that third-party data can be exposed directly inside Power BI as a dataflow using Azure Data Share, which I think is pretty darned cool. If you’re a company that sells data and you want your customers to be able to consume that data easily in Power BI, I think this might be a good way to do it.