BI Survey 17: Power BI Highlights

Every year, in return for publicising it to my readers, I get sent a copy of the findings of the BI Survey – the largest survey of BI product users in the world. As always they make for very interesting reading indeed, and although I can’t give away all the details I have been allowed to blog about a few of the highlights for Power BI:

  • Power BI is now the third most frequently considered product in competitive evaluations, after QlikView and Tableau.
  • Indeed, based on responses from vendors and resellers, Microsoft is now the third ‘most significant competitor’ after Tableau and Qlik and is in a much stronger position than it was two years ago – clearly as a result of the impact that Power BI has made, although Excel, SSAS and SSRS contribute to this too.
  • Unsurprisingly Power BI’s exceptional price/performance ratio is the main reason that organisations purchase it. Nonetheless it gets very high satisfaction ratings too.
  • Power BI is also now the third most frequently used front-end for SSAS, after SSRS (which is in the #1 spot and quite a way out in front) and Excel (either on its own or with an AddIn).

Overall it’s a very strong showing for Microsoft. If you’re conducting a competitive evaluation of BI tools, or if you’re a BI vendor, it’s probably worth buying a copy of the full findings.

Exploring The New SSRS 2017 API In Power BI

One of the new features in Reporting Services 2017 is the new REST API. The announcement is here:

https://blogs.msdn.microsoft.com/sqlrsteamblog/2017/10/02/sql-server-2017-reporting-services-now-generally-available/

And the online documentation for the API is here:

https://app.swaggerhub.com/apis/microsoft-rs/SSRS/2.0

Interestingly, the new API seems to be OData compliant – which means you can browse it in Power BI/Get&Transform/Power Query and build your own reports from it. For example in Power BI Desktop I can browse the API of the SSRS instance installed on my local machine by entering the following URL:

http://localhost/reports/api/v2.0

…into a new OData feed connection:

image

image

image

This means you can build Power BI reports on all aspects of your SSRS reports (reports on reports – how meta is that?), datasets, data sources, subscriptions and so on. I guess this will be useful for any Power BI fans who also have to maintain and monitor a large number of SSRS reports.

However, the most interesting (to me) function isn’t exposed when you browse the API in this way – it’s the /DataSets({Id})/Model.GetData function. This function returns the data from an SSRS dataset. It isn’t possible to call this function direct from M code in Power BI or Excel because it involves making a POST request to a web service and that’s not something that Power BI or Excel support. However it is possible to call this function from a Power BI custom data extension – I built a quick PoC to prove that it works. This means that it would be possible to build a custom data extension that connects to SSRS and that allows a user to import data from any SSRS dataset. Why do this? Well, it would turn SSRS into a kind of centralised repository for data, with the same data being shared with SSRS reports and Power BI (and eventually Excel, when Excel supports custom data extensions). SSRS dataset caching would also come in handy here, allowing you to do things like run an expensive SQL query once, cache it in SSRS, then share the cached results with multiple reports both in SSRS and Power BI. Would this really be useful? Hmm, I’m not sure, but I thought I’d post the idea here to see what you all think…

Power BI And SQL Server 2016 BI Announcements At PASS Summit 2015

image1

This year’s PASS Summit is drawing to a close as I write this, and I have to say that the number of Microsoft BI-related announcements made over the last few days has been overwhelming. There have been announcements made via blog posts, such as (shock!) the roadmap blog post:
http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/29/microsoft-business-intelligence-our-reporting-roadmap.aspx

…which you should probably read before anything else, as well as the following posts which have more details on specific areas:
http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/28/sql-server-2016-community-technology-preview-3-0-is-available.aspx

http://blogs.msdn.com/b/analysisservices/archive/2015/10/28/what-s-new-for-sql-server-2016-analysis-services-in-ctp3.aspx

http://blogs.msdn.com/b/sqlrsteamblog/archive/2015/10/28/pin-reporting-services-charts-to-power-bi-dashboards-with-sql-server-2016-ctp-3-0.aspx

There have also been a lot of other announcements made in sessions about functionality that will be available at some point in the next few months, including (and in no particular order):

  • The performance problem with Excel subtotals that I described in this blog post: https://blog.crossjoin.co.uk/2011/10/07/excel-subtotals-when-querying-multidimensional-and-tabular-models/ is finally going to be addressed in Excel 2016 in an update that will be available before the end of the year. This is going to solve a lot of people’s performance problems – problems that people may not even realise they had.
  • SSDT for SSAS 2016 will have a script view where you can see all of your DAX calculations in one place
  • SSDT will be getting monthly updates so new functionality can be delivered much more quickly
  • On top of the improvements in SSAS Tabular DirectQuery mentioned in the blog posts above, we’ll also get support for row-level security and calculated columns (but only ones that reference values in the same row of the table that the calculated column is on)
  • SSAS Tabular will also get Translations, but only for metadata and not for data
  • There will be a Power BI Enterprise Gateway, the corporate big brother of the Personal Gateway
  • Datazen will be rolled into SSRS and Datazen reports will be a new ‘mobile’ report type
  • The Power BI mobile app will be able to display these new SSRS mobile reports as well as Power BI reports
  • The Power BI team will be releasing a new custom datavisualisation component every week. We had the new Chiclet slicer this week, which I am already using lots, and in one demo I spotted a Proclarity-style decomposition tree
  • Power BI desktop will work with SSAS Multidimensional as a live data source (ie not through importing data, but running DAX queries in the background) by the end of this year
  • PowerBI.com dashboard tiles will become properly interactive, and you will be able to pin entire reports as well as just individual components to them
  • You’ll be able to embed ranges and charts from Excel workbooks into PowerBI.com reports; integration looks much nicer than the rather basic functionality that’s already there
  • Power Map/3D maps will be embedded in Power BI Desktop and PowerBI.com
  • You’ll be able to run R scripts in Power BI Desktop and display R visualisations in there too
  • There was a demo of an Android(?) phone version of the Power BI mobile app, where when the phone camera saw a QR code it displayed a report for the product that the QR code represented over the camera feed. Virtual reality BI!
  • Power BI Desktop will get a “Get Insights” button that, when pushed, will display a report that does some basic statistical analysis of your data, looking for minimums, maximums, outliers etc
  • The Power BI API will be able to give you a list of reports and their URLs
  • Power BI will soon have its own registration page for applications that use the API; no need to go to the Azure Portal.
  • Synonyms and phrasings for Q&A will be coming to Power BI by the end of the year

I *think* that’s everything, but I may well have missed a few things. Many of the features that were mentioned in passing would have deserved a five-minute slot in a keynote in previous years.

Power BI is finally a commercially viable product and it’s getting even better every week – the competition should be very worried. I’m also really pleased that MS are taking corporate, on-premises BI seriously at last and that SSRS is back in favour (I would have loved more new features in SSAS Multidimensional, but hey, you can’t have everything) – if you’re wondering what the picture at the top of this post is, it’s the cloud and boxed SQL Server “happy together” at last, and it appeared in several MS presentations this week. The box is back! Most importantly, for the first time in a long time, Microsoft has a coherent vision for how all of its BI products should work together, it’s working on new features to make that vision a reality, and it is willing to share it with us as a roadmap.

In summary I can’t remember the last time I felt this positive about the future of Microsoft BI. What MS have achieved over the last year has been remarkable, and it seems like it’s the leadership of James Phillips that has made all the difference – every MS employee I’ve talked to has had good things to say about him and I guess this explains why he got a promotion in the reorg last week. I hope all this continues.

Submit Your Feedback On BI Features In SQL Server V.Next

Following on from last month’s post on ideas for new features in SSAS Multidimensional, if you are interested in telling Microsoft what features you think should be added to the on-prem SQL Server BI tools in the next version you can do so here:

http://support.powerbi.com/forums/282523-bi-in-sql-vnext/filters/top

Unsurprisingly, there are plenty of pleas for SSRS to get some love. My suggestion is to integrate Power Query with SSRS: it would add a lot of new data sources that SSRS desperately needs; it would add data transformation and calculation capabilities; and it would also provide the beginnings of a common developer experience for corporate and self-service BI tools – Power Query integrated with Report Builder would be a useful companion to the Power BI Dashboard Designer.

Caching The Rows Returned By An MDX Query

Here’s another tip for those of you struggling with the performance of SSRS reports that run on top of an Analysis Services Multidimensional cube. Quite often, SSRS reports require quite complex set expressions to be used on the rows axis of an MDX query, and one of the weaknesses of SSAS is that while it can (usually) cache the values of cells returned by a query it can’t cache the structure of the cellset returned by the query. What does this mean exactly? Well, consider the following query:

SELECT

{[Measures].[Internet Sales Amount]} ON 0,

NONEMPTY(

GENERATE(

[Date].[Calendar].[Month].MEMBERS,

{[Date].[Calendar].CURRENTMEMBER}

*

HEAD(ORDER([Customer].[Customer].[Customer].MEMBERS,

[Measures].[Internet Sales Amount],

BDESC),2)

),

[Measures].[Internet Sales Amount])

ON 1

FROM [Adventure Works]

WHERE([Product].[Category].&[3])

Here I’m taking every month on the Calendar hierarchy of the Date dimension and finding the top two customers by Internet Sales Amount for each Month; notice also that I’m slicing the query by a Product Category. The results look like this:

image

On my laptop this query takes just over three seconds to run however many times you run it (and yes, I know there are other ways this query can be optimised, but let’s imagine this is a query that can’t be optimised). The reason it is consistently slow is because the vast majority of the time taken for the query is to evaluate the set used on rows – even when the Storage Engine has cached the values for Internet Sales Amount for all combinations of month and customer, it still takes the Formula Engine a long time to find the top two customers for each month. Unfortunately, once the set of rows has been found it is discarded, and the next time the query is run it has to be re-evaluated.

How can we improve this? SSAS can’t cache the results of a set used on an axis in a query, but SSAS can cache the result of a calculated measure and calculated measures can return strings, and these strings can contain representations of sets. Therefore, if you go into Visual Studio and add the following calculated measure onto the MDX Script of the cube on the Calculations tab of the Cube Editor:

CREATE MEMBER CURRENTCUBE.MEASURES.REPORTROWS AS

SETTOSTR(

NONEMPTY(

GENERATE(

[Date].[Calendar].[Month].MEMBERS,

{[Date].[Calendar].CURRENTMEMBER}

*

HEAD(ORDER([Customer].[Customer].[Customer].MEMBERS,

[Measures].[Internet Sales Amount],

BDESC),2)

),

[Measures].[Internet Sales Amount])

);

You can then use this calculated measure in your query as follows:

SELECT

{[Measures].[Internet Sales Amount]} ON 0,

STRTOSET(MEASURES.REPORTROWS)

ON 1

FROM [Adventure Works]

WHERE([Product].[Category].&[3])

Having done this, on my laptop the query is just as slow as before the first time it is run but on subsequent executions it returns almost instantly. This is because the first time the query is run the set expression used on rows is evaluated inside the calculated measure ReportRows and it is then turned into a string using the SetToStr() function; this string is then returned on the rows axis of the query and converted back to a set using the StrToSet() function. The second time the query is run the string returned by the ReportRows measure has already been cached by the Formula Engine, which explains why it is so fast.

Couldn’t I have used a static named set declared on the cube to do this instead? I could, if I knew that the Where clause of the query would never change, but if I wanted to change the slice and look at a different Product Category I would expect to see a different set of rows displayed. While in theory I could create one gigantic named set containing every set of rows that ever might need to be displayed and then display the appropriate subset based on what’s present in the Where clause, this set could take a very long time to evaluate and thus cause performance problems elsewhere. The beauty of the calculated measure approach is that if you change the Where clause the calculated measure will cache a new result for the new context.

There are some things to watch out for if you use this technique, however:

  • It relies on Formula Engine caching to work. That’s why I declared the calculated measure on the cube – it won’t work if the calculated measure is declared in the WITH clause. There are a lot of other things that you can do that will prevent the Formula Engine cache from working too, such as declaring any other calculated members in the WITH clause, using subselects in your query (unless you have SSAS 2012 SP1 CU4), using non-deterministic functions and so on.
  • Remember also that users who are members of different roles can’t share formula engine caches, so if you have a lot of roles then the effectiveness of this technique will be reduced.
  • There is a limit to the size of strings that SSAS calculated measures can return, and you may hit that limit if your set is large. In my opinion an SSRS report should never return more than a few hundred rows at most for the sake of usability, but I know that in the real world customers do love to run gigantic reports…
  • There is also a limit to the size of the Formula Engine flat cache (the cache that is being used here), which is 10% of the TotalMemoryLimit. I guess it is possible that if you run a lot of different queries you could hit this limit, and if you do then the flat cache is completely emptied.

Tuning Queries with the WITH CACHE Statement

One of the side-effects of the irritating limitations that SSRS places on the MDX you can use in your reports is the widespread use of calculated measures to get the columns you want. For example, a query like this (note, this query isn’t on the Adventure Works cube but on a simpler cube built on the Adventure Works DW database):

SELECT
{[Measures].[Sales Amount]}
*
[Date].[Day Number Of Week].[Day Number Of Week].MEMBERS
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]

image

…which wouldn’t be allowed in SSRS, could be rewritten like so:

WITH
MEMBER MEASURES.D1 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[1])
MEMBER MEASURES.D2 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[2])
MEMBER MEASURES.D3 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[3])
MEMBER MEASURES.D4 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[4])
MEMBER MEASURES.D5 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[5])
MEMBER MEASURES.D6 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[6])
MEMBER MEASURES.D7 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[7])
SELECT
{MEASURES.D1,MEASURES.D2,MEASURES.D3,MEASURES.D4,MEASURES.D5,MEASURES.D6,MEASURES.D7}
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]

…to get it in an SSRS-friendly format with only measures on columns.

For the last few days I’ve had the pleasure of working with Bob Duffy (a man so frighteningly intelligent he’s not only an SSAS Maestro but a SQL Server MCM as well) on tuning a SSRS report like this on a fairly large cube. As Bob found, the problem with this style of query is that it isn’t all that efficient: if you look in Profiler at what happens on a cold cache, you can see there are seven separate Query Subcube events and seven separate partition scans (indicated by the Progress Report Begin/End events) for each calculated measure on columns.

image

The first thing that Bob tried to tune this was to rewrite the query something like this:

SELECT
{[Measures].[Sales Amount]}
ON 0,
NON EMPTY
[Product].[Product].[Product].MEMBERS
*
[Date].[Day Number Of Week].[Day Number Of Week].MEMBERS
ON 1
FROM [Adventure Works DW]

…and pivot the data in the SSRS tablix to get the desired layout with the Day Numbers on columns. The interesting thing, though, is that for this particular report while rewriting the query in this way made it run faster (there is only one Query Subcube event and partition scan now) it actually made the SSRS report run slower overall, simply because SSRS was taking a long time to pivot the values.

Instead, together we came up with a way to tune the original query using the WITH CACHE statement like so:

WITH
CACHE AS
‘([Measures].[Sales Amount]
, [Product].[Product].[Product].MEMBERS
, [Date].[Day Number Of Week].[Day Number Of Week].MEMBERS)’

MEMBER MEASURES.D1 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[1])
MEMBER MEASURES.D2 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[2])
MEMBER MEASURES.D3 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[3])
MEMBER MEASURES.D4 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[4])
MEMBER MEASURES.D5 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[5])
MEMBER MEASURES.D6 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[6])
MEMBER MEASURES.D7 AS
([Measures].[Sales Amount], [Date].[Day Number Of Week].&[7])
SELECT
{MEASURES.D1,MEASURES.D2,MEASURES.D3,MEASURES.D4,MEASURES.D5,MEASURES.D6,MEASURES.D7}
ON 0,
[Product].[Product].[Product].MEMBERS ON 1
FROM [Adventure Works DW]

What WITH CACHE statement does here is load all the data needed for the query into the Storage Engine cache before anything else happens. So even though there are still seven different Query Subcube events for each column, there’s now only one partition scan and each of the seven Query Subcube events now hits cache:

image

There’s no guarantee that this approach will result in the best performance even when you have a query in this form, but it’s worth testing if you have. It’s certainly the first time in a long while that I’ve used the WITH CACHE statement in the real world – so it’s interesting from an MDX point of view too.

Prompts for Reporting Services

I got an email earlier this week from Eric Nelson telling me about a new Silverlight parameter prompting application for Reporting Services called “Prompts for Reporting Services” that he’s developed and open-sourced, and since it’s got some features that look useful for anyone building SSRS reports on SSAS I thought I’d share it here.

Some of the features Eric highlighted in his mail are:

Internal/Global Prompts:  An internal prompt is just a regular parameter.  A Global prompt is a report that’s parameters are used as a report (you can create the prompt once and reference it from multiple reports).

Tree Prompt:  This prompt uses cascading parameters for fetching its data which makes it perform really well compared to an indented hierarchy parameter.

Cascading Search Prompt:  This prompt fetches no data to begin with and only queries the cube when a search is executed.  I have found this really useful when I parameter is required that has 1,000+ members that tend to lock up the web browser when rendering and are really hard for the user to navigate.

A few screenshots:

SingleSelectTree (2)

MultiSelect (3)

It’s available for download here:

http://code.google.com/p/prompts/