Generating Reports Using Word Documents And Flow

The new “Populate a Microsoft Word template” action in Flow that was released last week got me thinking: apart from the obvious mail-merge applications, this could be used to build reports in Word! Here’s a very simple example.

Let’s say you have a table in an Excel worksheet stored in OneDrive for Business:

image

Let’s say you have a Word document that looks like this:

image

Shown in this screenshot are two Plain Text Content Controls called ProductName and SalesAmount, where text can be entered and with placeholder text already added. The screenshot above is with the document in Design Mode with the controls visible; the document would normally look like this:

image

The idea is to loop through the rows in the Excel table and use the data on each row to populate the content controls in the template and then create a new Word document. Here’s a Flow that does this:

image

Looking at some of these actions in a bit more detail:

  • List rows present in a table gets the data from the Excel table shown above

    image

  • Apply to each loops over each row in the table
  • Populate a Microsoft Word Template takes the Word document shown above and then puts the value from the Product column in the current row from Excel into the ProductName content control, and the value from the Sales column in the current row into the SalesAmount content control:

    image

  • Create file saves the resulting new Word document in a separate folder:

    image

Here are the three Word files that are created:

image

And here’s what the contents of AppleSales.docx looks like:

image

This is only a very, very basic example of what’s possible: you could convert these documents to pdf, email them out as attachments, and even in the future get your source data direct from Power BI.

Why would you want to build reports in Word though? Some reports are very text heavy, and although you can do text substitution in Power BI (for example using the Enlighten Data Story visual), nothing does highly-formatted text as well as Word. OK, it’s not going to be something that you do very often – the vast majority of reports you build will be better off in Power BI, as paginated reports/SSRS, or in Excel – but there are definitely some use cases. The new Flow action is very limited right now but with support for more types of content control even more sophisticated reports will be possible using this technique.

Power BI Roadmap Announcements In The April 2019 Release Notes

The latest version of the “release notes” document that details the roadmap for Dynamics 365 and the whole Power Platform – which includes Power BI – has just been released. You can view it online here:

https://docs.microsoft.com/en-us/business-applications-release-notes/april19/

There are lots of very exciting announcements for Power BI. Here are a few of the features that are promised for later this year that caught my eye, that haven’t already been announced elsewhere (although some of them were discussed in public sessions at the last PASS Summit):

  • Expression-based formatting in Power BI reports – the ability to use a DAX expression to control properties such as colour and line style within a visual. One of those features that advanced Power BI users will love, and which will allow all kinds of interesting customisations.
  • Performance analyzer for reports – a tool in Power BI Desktop to tell you how long each visual on a page takes to refresh, making it easier to understand where your performance bottlenecks are. I wish we could have something similar for Power Query queries…
  • Customised navigation through apps
  • Shared and certified datasets – this is going to be incredibly useful! As the docs say “With shared datasets in Power BI, we are allowing a single dataset to be used by multiple reports, across workspaces”. I can’t wait. Lots of other cool, related features too here.
  • Deeper integration of paginated reports in Power BI – including the ability to create paginated reports on top of Power BI datasets.
  • Large-scale compute engine for dataflows – this is somewhat cryptic. The docs say “We are enhancing the dataflow calculation engine in Power BI Premium to enable high-performing large-scale data transformations (TBs of data).”. Are these enhancements to the Power Query engine, or is it being replaced with something else entirely (eg to make it more like Azure Data Factory?). I don’t know.
  • New dataflow viewer to visualize dataflow dependencies – I guess something similar to what we have in the “Query Dependencies” view in the Power Query Editor?
  • More new Q&A features: auto-suggestions and did you mean. I’m glad Microsoft is continuing to invest in Q&A, I think it’s going to be one of those features that suddenly goes from being a demo feature that few people actually use to something that will be very powerful indeed. In fact, I would argue that it’s already a lot more powerful than most people realise.
  • Python support in the Power BI service
  • Power BI Template Apps – the ability to create multiple apps in different workspaces from a single template. This was something that used to be possible with content packs (remember them?) so I guess anyone still using them should be able to finally migrate away. Useful for end-user customisation of apps.
  • Better integration between Flow and Power BI, for example integrated alert creation, and the ability to query a dataset and trigger refresh of a dataset from Flow. This last feature will be really important – there are lots of scenarios where you only want to refresh a dataset when you know the data is ready.

The Is Nullable Column Property In Power BI

The new Relationships view (or Modeling view – it seems to have two names) in Power BI Desktop that has been in preview since November 2018 not only makes it easier to work with complex models and set properties more easily, it also exposes a brand new property on a column: the “Is nullable” property. It’s visible at the bottom of the new Properties pane when you click on a column:

image

I was, naturally, curious about what it did and I couldn’t find any documentation so I did a bit of investigation of my own and asked a few people at Microsoft.

It turns out that it is primarily intended for validation purposes, so that if you know a column should never contain a null value and then, at a later date, a null value does appear in that column then you’ll get the following error when you try to refresh a table in Import mode:

image

Column ‘MyColumn’ in Table ‘TestTable’ contains blank values and this is not allowed for columns on the one side of a many-to-one relationship or for columns that are used as the primary key of a table

The only other use I have found for it is when it is used in combination with the DAX CombineValues() function in DirectQuery mode. In his excellent article on this subject, Marco notes that the SQL generated when you use CombineValues() to concatenate values from multiple columns and create a calculated column that can be used in a relationship includes a check for null values in those columns. I can confirm that if you turn off the ‘is nullable’ property on all the columns used by CombineValues(), there are no checks for null values in the SQL queries.

There may be a few other things that it does, but I’ve been told they are likely to be there to ensure correctness rather than performance optimisations. Nonetheless if you do find other scenarios where it has an effect please let me know in a comment.

[Thanks to Akshai Mirchandani and Jeffrey Wang for providing much of the information used in this post]

Power BI Data Privacy Levels And Cloud /Web-Based Data Sources Or Dataflows

I have already blogged in great detail many times about Power BI/Power Query data privacy settings (see this series for example) but there’s always something new to learn. Recently I was asked a question by Ian Eckert about how Power BI handles data privacy for cloud or web-based data sources after a dataset has been published, and it prompted yet more revelations…

Consider the following M query:

let
    Source = Xml.Tables(
        Web.Contents("https://blog.crossjoin.co.uk/feed/")
        ),
    channel = Source{0}[channel],
    language = channel{0}[language],
    out = Json.Document(
        Web.Contents(
            "https://data.gov.uk/api", 
            [
                RelativePath="3/action/package_search", 
                Query=[q=language]
            ]
            )
            ),
    result = out[result],
    results = result[results],
    #"Converted to Table" = Table.FromList(
        results, 
        Splitter.SplitByNothing(), 
        null, 
        null, 
        ExtraValues.Error
        ),
    Column1 = #"Converted to Table"{0}[Column1],
    #"Converted to Table1" = Record.ToTable(Column1)
in
    #"Converted to Table1"

It doesn’t do anything particularly interesting, but it does take data from one web-based data source (the RSS feed for this blog) and sends it to another (the UK government’s open data metadata search web service). As a result, in Power BI Desktop, if you set the data privacy settings for both data sources to Public then the query runs, but if you set the data privacy settings for both data sources to Private:

image

image

…As expected, you get the following error:

image

Formula.Firewall: Query ‘Test’ (step ‘Converted to Table1’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

Now the strange thing is that, when you publish the dataset that contains this M query, refresh always works. Why? What’s more, other datasets that do something similar will always fail when refreshed.

It turns out that when you publish a dataset that uses cloud or web-based data sources like the two used here, the Power BI service does not use the data privacy settings you have set in Power BI Desktop but instead it automatically assigns data privacy levels as follows:

  • Data sources, like the ones used here, that use Anonymous authentication are automatically given the privacy level Public
  • All other data sources are given the privacy level Private.

Interestingly, Power BI dataflows also count as cloud-based data sources and because they do not use Anonymous authentication they default to Private too, so if you are combining data from a dataflow with another data source in your dataset then you need to be careful of this.

What’s more there is at the time of writing no way to change these data privacy levels in the Power BI web-based portal. Hopefully this will change soon.

There are some workarounds though!

First of all, you can force refresh to take place through a gateway. This might sound strange because in theory, if you’re only using cloud or web-based data sources, a gateway should not be necessary. However there are already similar scenarios where a gateway is needed, for example if you are scraping data from a web page you need to use a gateway, and if you are combining data from a cloud-based data source with an on-premises data source you also need to use a gateway. If you add your cloud/web-based data sources as data sources in your gateway (unfortunately it does not seem to be possible to add a dataflow as a data source in a gateway, though) you can set their data privacy levels in the Advanced Settings section in the Manage Gateways screen:

image

You will also need to set the “Use a data gateway” option to On in the Settings dialog for your dataset after it has been published:

image

The other workaround is to copy the M code for your query and paste it into a new blank M query in an entity in a dataflow, as Matthew Roche shows here. While it does not seem to be possible to set data privacy levels for individual data sources when creating an entity, it is possible to turn off data privacy checks for an entity completely. If you create a query that sends data from one data source to another (regardless, as far as I can see, of the authentication mechanism used), you will see the following message in the Power Query Online query editor:

image

The evaluation was cancelled because combining data from multiple sources may reveal data from one source to another. Click Continue if the possibility of revealing data is okay.

If you click Continue, data privacy checks are turned off and the query runs; you can also click the Options button on the ribbon and check the “Allow combining data from multiple sources” option:

image

If one of your data sources is already itself a dataflow you may need to do some editing of the M query to make things work, but as Matthew Roche shows here it is possible to have an entity in a dataflow refer to another entity without using a computed entity (which is a Premium-only feature).

[Thanks to Arthi Ramasubramanian Iyer from Microsoft for providing background information for this post]

Power BI/Power Query/M Pre-cons At SQLKonferenz, SQLBits and SQL Saturday Iceland

If you’re learning Power BI and you want to deepen your understanding of its data transformation and loading functionality, the Power Query engine and the M language, I’m doing one-day pre-conference seminars on this subject at three conferences over the next few months:

You’ll learn about all the functionality in the Power Query Editor window plus concepts such as merging, appending, parameters, functions, data privacy and M. I hope to see you at one of them!

Updated Power BI Video Training Course Now Available

I’m pleased to announce that the updated version of my Power BI video training course is now available via my friends at Project Botticelli. You can find out more and purchase it here:

https://projectbotticelli.com/power-bi-desktop-and-cloud-course-video-tutorial?pk_campaign=cwblog2019&code=crossjoin

If you book using the discount code CROSSJOIN you’ll get 10% off!

I’ve recorded a lot of new content to cover all the new features that were added to Power BI in 2018, and there are also new segments on Azure Analysis Services and Power BI Premium. You can watch a few of the videos for free to get a taste of the course.

Don’t forget that my older MDX and SSAS Multidimensional courses are also still available on the same site.

ODBC Data Sources, The SqlCapabilities Option And Power Query/Power BI Data Refresh Performance

A few weeks ago I received a great tip from Simon Nuss about a Power Query feature relating to ODBC data sources that I hadn’t seen before (although it’s in Power BI Desktop and Excel 365, so it can’t be that new) and which is potentially important for the performance of data refresh and of the Power Query Editor’s preview pane. It’s only visible if you click “Advanced options” when creating your ODBC data source:

image

It’s the “Supported row reduction clauses” option:

image

What does it do? Well, first of all let’s see what happens if you don’t set it. Let’s say you have an ODBC data source that uses the Microsoft ODBC Driver for SQL Server to connect to the Adventure Works DW database. If you create a new query that uses this data source to get the contents of the DimDate table it works pretty much as you’d expect; here’s the M code:

let
    Source =
        Odbc.DataSource(
            "dsn=Test SQL",
            [HierarchicalNavigation=true]
        ),
    #"Adventure Works DW_Database" =
        Source{
            [Name="Adventure Works DW",Kind="Database"]
            }[Data],
    dbo_Schema =
        #"Adventure Works DW_Database"{
            [Name="dbo",Kind="Schema"]
            }[Data],
    DimDate_Table =
        dbo_Schema{[Name="DimDate",Kind="Table"]}[Data]
in
    DimDate_Table

image

However, if you look in Profiler when the contents of the Preview window is refreshed, you’ll see that the Power Query engine is running a query that gets the entire contents of the table:

image

Obviously, if you are connecting to a big table this could make the Power Query Editor slow. What’s more, as this blog post shows, the Power Query Editor only actually wants to display 1000 rows and the fact that the query is requesting all the rows from the table suggests that even basic query folding is not happening on this ODBC data source.

Now let’s see what happens if you do select something from the “Supported row reduction clauses” dropdown. As the screenshot above shows, there are four values in the dropdown that relate to the form of SQL supported by your ODBC source to only get the first N rows from a table. If you don’t know which one to choose you can click the Detect button and it will select one for you. In this case I selected TOP:

image

Having done this, when you connect to the same DimDate table, you’ll see that now the Power Query Editor is only requesting the top 1000 rows from the table:

image

Also, if you look at the M code for the query, the Odbc.DataSource function has the as-yet undocumented SqlCapabilities option set in its second parameter,

Odbc.DataSource(
 "dsn=Test SQL",
 [HierarchicalNavigation=true,
 SqlCapabilities=[LimitClauseKind=LimitClauseKind.Top]
 ]
)

Actually it’s not quite undocumented because it is mentioned here as something that is only available if you’re using Odbc.DataSource in a custom connector; obviously things have changed.

Setting this option can make a big difference to the responsiveness of the Power Query Editor when you’re working with large tables and/or slow data sources – I saw this myself this week when working with one of my customers. It’s also going to be important if you use the Keep Rows/Keep Top Rows button on the ribbon or the Table.FirstN() M function; I suspect there are other, more obscure, reasons why it might speed refresh up even if you aren’t filtering the rows of the table but I’m not 100% sure.

One last thing to mention is that there is another undocumented function which is what, I think, the Detect button in the screenshots above uses to find out how to set the SqlCapabilities option. It’s called Odbc.InferOptions and it can be used like so:

Odbc.InferOptions("dsn=Test SQL")

This function returns a record containing a field called LimitClauseKind that tells you the type of limit clause that is supported:

image

image

Here’s a list of the possible LimitClauseKind values I found using #shared:

image

%d bloggers like this: