Exporting Data From Websites For Use In Power BI Using Power Automate UI Flows

As a Power BI developer I know how Power Query makes it easy to get data from web sources, but I also know there are limits to what it can do. I can get data from tables on web pages, I can get data from web services, but when confronted with a website where you have to click a button to download a file there’s a problem. Take https://data.police.uk/data/ for example:

This site allows you to download crime data for England and Wales, but to do so you need to check some boxes and then click the “Generate file” button at the bottom (yes I know there’s also an API, but this is only an example). You can’t click a button in Power Query.

Enter Power Automate UI Flows. I’ve been a fan of Power Automate for a while now but it was only the release of the new Power Automate Desktop preview last week that I decided to check out its Robotic Process Automation (RPA) capabilities. Basically, if you’re still doing lots of manual downloading, clicking, moving files etc before you can load your data into Power BI then Power Automate UI Flows can do all this for you automatically by recording and replaying everything you do.

Now I’m not a UI Flows expert by any stretch of the imagination, but within an hour I was able to create a UI Flow that:

  • Opened a browser and went to https://data.police.uk/data/ (quick tip: if you want to save a file from a browser you need to use the special Automation Browser, not Chrome, Edge or Firefox)
  • Checked the boxes to download crime, outcome and stop and search data for Thames Valley Police
  • Clicked the Generate file button
  • Clicked the Download Now button on the next screen when it appeared
  • Downloaded the resulting zip file to a particular location
  • Unzipped the file to a subfolder
  • Deleted the zip file

Here’s what it looked like in Power Automate Desktop:

Now this is my first ever UI Flow so it’s probably not as robust as it could be, but it seemed to work ok. I was then able to call it from Power Automate and refresh a dataset once the UI Flow had finished:

I guess the point of this post is not to tell you how to use Power Automate UI Flows to extract data from web pages for use with Power BI, just that it’s there for you to use if you need it. It’s important to be aware of the other tools that are available in the Power Platform and understand how they can be used alongside Power BI, rather than focusing on trying to solve your problems with Power BI alone. Not only are there a lot of cool things happening with Power Automate that are directly relevant to Power BI (such as the new Export actions and the upcoming ability to kick off a Flow when a Power BI dataflow finishes refreshing – see the demo at 33:00 here) but there are other features in Power Automate like UI Flows that can be used for BI purposes, such as the ability to run an Office Script from a Flow which can be useful for preparing Excel data for Power BI.

If you want to learn more about Power Automate Desktop there’s a good multi-part tutorial on YouTube starting here; there’s also a three-part series on extracting data from web pages too (part 1, part 2, part 3) although it uses an example that Power Query handles much more elegantly ūüėČ

Power BI FastTrack Recognized Solution Architect 2021 Nominations Open

Are you really, really good at designing Power BI solutions? If so, you should nominate yourself for Microsoft’s Power BI FastTrack Recognized Solution Architect (FTRSA) distinction! Here’s the official description of what this is:

FastTrack Recognized Solution Architect is a recognition conferred by the Power Platform product engineering team upon a practicing solution architect for consistently exhibiting deep architecture expertise and creating high quality solutions during customer engagements. A FastTrack Recognized Solution Architect typically works for a systems integrator partner.

You can see some of the people who received this distinction at the Microsoft Business Applications Summit 2020 here:

https://powerplatform.microsoft.com/en-us/fasttrack/recognized-solution-architects/

I’ll be honest, the bar is set extremely high and only the very best will be successful. For an architect to be eligible for the FTRSA distinction, he/she must meet the following requirements:

  • Must‚ÄĮhave a minimum of 2 years of experience with Power BI and a minimum of 5 years of experience with Enterprise BI solutions.
  • Must have a minimum of 2 years of experience as an Enterprise BI architect.
  • Must be working for a partner with Gold certification in Data Analytics MPN Competency
  • Must have been lead architect for at least 2 Power BI in-production implementations with at least 200 active users.

Why nominate yourself?

  • FTRSA profiles, and the partners that they work for, will be listed on a public-facing page on the Power Platform product websites in recognition of their contributions towards driving customer success.
  • Partners can indicate the number of FTRSA(s) they have in their promotional materials. They may also point prospective customers to the public-facing profile listing page mentioned above.
  • The architects will get an e-badge that can be shared on LinkedIn and other social media platforms.
  • The architects will receive a special mention at the annual Microsoft Business Applications Summit.

If you’re interested and meet all the requirements you can nominate yourself here:

http://aka.ms/FTRSANomination

Nominations close on November 30th 2020.

Power BI Report Performance And The Number Of Visuals On A Page

When you’re faced with a slow Power BI report it’s very easy to assume that the problem is something to do with the dataset, how it’s modelled and how the DAX has been written, and then disappear down a rabbit hole of Marco-and-Alberto videos trying to shave milliseconds of the time taken by each DAX query the report runs. It’s certainly an assumption that I make all the time, and indeed most performance problems can be fixed in this way. However this can mean that you miss other important ways of improving report performance.

Two things happened last week to remind me of this. First of all I spent some time with my colleague Miguel Myers (check out his YouTube channel) who showed me how he had tuned a customer’s report and reduced one page’s time from 27 seconds to 10 seconds without making any changes to the dataset or DAX while still showing the same information on the page. He did this partly by using a smaller number of visuals to display the same values – something I blogged about here when I learned this from him earlier this year – but mostly by reducing the number of visuals like shapes and textboxes that don’t display any data at all. Secondly, I was helping some other colleagues with load testing a Power BI report using this tool, and trying to understand whether there was a performance bottleneck inside Power BI or on the client machines running the load test; it turned out that it was the latter problem.

You may have read the title of this post and guessed that I’m going to talk about reducing the number of visuals that display data from your dataset as a way of improving performance, but that’s not the case. In this blog post I want to show how visuals that do not display any data from your dataset can have a significant impact on report performance. Before we carry on I suggest you read the series of posts I wrote late last year on measuring the performance of reports in the browser using Chrome/Edge DevTools (part 1 is here, part 2 is here, part 3 is here) because I’ll be using techniques described in these posts in my testing.

I have a Power BI dataset with a single table in it and that table contains one column and one row:

Here’s a report page with a single card on it that displays that single value:

As you might think, this page is very quick to render. Using a secure embed link and the Lighthouse tab in DevTools, the Time To Interactive reading on a cold cache is 3.7 seconds (which, remember, is a worst-case test and slower than most users will experience).

Now, consider a second report page in the same pbix file with the same card visual on it, but this time with 260 rectangle visuals added:

These rectangle visuals don’t display any data. The Time To Interactive reading for this page is now a whopping 24.3 seconds! What’s more, Power BI Desktop is noticeably more sluggish when editing this page.

OK, this is an extreme example but it does go to show how adding too many visuals to your report pages, even when those visuals do not display any data, can have a noticeable impact on report performance. What if you need to use lots of shape visuals to provide visual grouping in your report though? The easy solution is not to use lots and lots of shapes but to create an image with the outlines you want and set that as the page background. It can be made to look exactly the same but in this case, using an image instead of all those shapes results in a page that has a Time To Interactive reading of 4.5 seconds – only a bit slower than the original page. Most users won’t be able to spot the difference either:

Nothing of what I have said so far is particularly new or ground-breaking, and indeed another one of my colleagues, Chris Hamill, wrote about this last year in a great post on how he uses PowerPoint to design backgrounds for Power BI reports. There is one last question that needs to be considered though, and it’s the question I was confronted with this week during the load testing: how do you know if you have this problem without completely redesigning your reports? Even if you don’t think you have a particularly cluttered report, is there still an opportunity to shave a second off your report load time by removing visuals? The information you see on the Network tab in DevTools can give you a hint.

Here’s what the Network tab (see this post for how to use it) in DevTools shows for the first report page above with only the card visual on it:

Focusing on the events in the 1.5 second period after the red line shown in the timeline at the top (which is the load event for the page) and before the first of the querydata requests (which are fired when the visuals request data from the dataset, one of the last things that happen in a report render), you’ll notice there are no significant gaps.

Compare this with the same part of the Network waterfall, a period of almost 14 seconds, for the version of the page with the 260 rectangles on:

Even without seeing the details, one thing is obvious: there are some large gaps where there is no network activity going on. These gaps are where JavaScript processing is happening on the client. The more visuals on your report, the longer the JavaScript processing is likely to take. Other factors can have a big impact on how long this takes too: the browser you use (new Edge and Chrome are going to be a lot faster than Internet Explorer), the spec of your PC and other resource-hungry processes running on your PC (such as other browser instances during a load test) can also have an effect. Therefore, if you see gaps on the Network tab, you might be able to improve the performance of your report by reducing the number of visuals and/or using a more modern browser and/or upgrading your PC. However, I must point out the behaviour of the Power BI web front end is intentionally not documented and could change at any time so be careful how you interpret anything you see in DevTools.

Viewing Data Privacy Partition Information With Power Query Query Diagnostics

Back in May 2020 a new feature was added to Power BI’s Power Query Query Diagnostics functionality: the ability to view data privacy partition information. The announcement blog post has some basic information on this feature but in this post I’ll go into a bit more detail about how it actually works.

If you want a refresher on the subject of Power Query data privacy then this video is a good place to start. The Power Query documentation also has a detailed article on what data privacy partitions are.

Let’s see how Query Diagnostics displays partition information using a simple example. First of all, here’s a query called DayName that reads data from an Excel file that looks like this:

and returns the text of the day name shown:

let
  Source = Excel.Workbook(
      File.Contents("C:\SelectedDay.xlsx"), 
      null, 
      true
    ),
  DayName_Table
    = Source{[Item = "DayName", Kind = "Table"]}[Data],
  #"Changed Type" = Table.TransformColumnTypes(
      DayName_Table, 
      {{"Day Name", type text}}
    ),
  #"Day Name" = #"Changed Type"{0}[Day Name]
in
  #"Day Name"

Secondly, here’s a query that gets data from the DimDate table in the Adventure Works DW in SQL Server and filters the EnglishDayNameOfWeek column by the day name returned from the Excel workbook:

let
  Source = Sql.Databases("localhost"),
  AdventureWorksDW2017 = Source{[Name
    = "AdventureWorksDW2017"]}[Data],
  dbo_DimDate = AdventureWorksDW2017{[
    Schema = "dbo", 
    Item = "DimDate"
  ]}[Data],
  #"Filtered Rows" = Table.SelectRows(
      dbo_DimDate, 
      each ([EnglishDayNameOfWeek] = DayName)
    )
in
  #"Filtered Rows"

Both the Excel workbook and the SQL Server database data sources have their privacy level set to Public.

With the Query Diagnostics options set to record data privacy partition information:

…you can click on Diagnose Step for the last step of the second query above and you’ll get a diagnostics query that returns the following:

There are three rows in this query, one for each partition (only the leftmost columns are shown here). Each partition is identified by the values in the Partition Key column. The second and third rows represent the partitions that access the SQL Server database and the Excel workbook respectively. You can see there are list values in the Accessed Resources column for these rows: these are lists of the data sources accessed by these partitions. For example here’s what the list contains for the second row above:

The first row represents the partition where the data from the DimDate table is filtered by the day name from Excel. This row has a null value for Accessed Resources because it doesn’t access any data sources directly; it does have a value in the Partition Inputs column, another list, that returns the keys of the partitions that feed data into this partition. Here’s what that list looks like:

The values in the Firewall Group column tell you something useful about the data sources used in each partition. As the article on partitioning explains, when data passes from one partition to another, it may be buffered in memory – which may then have an impact on performance. Certain types of data source such as local files and folders are considered trusted, which means that they won’t be buffered even in the presence of Private data so long as all the data sources in the partition are trusted too.

Scrolling to the right-hand side of the query shows information on the amount of time taken to evaluate each partition, while the Diagnostics column contains a nested table that has the rows from the Detailed diagnostics query but just filtered by partition:

Indeed, if you look at the Detailed and Aggregated query diagnostics queries you’ll also see a Partition Key column, which not only means that you can see partition information when looking at all the diagnostics information for your query, it also means that in situations where the Partitions query diagnostics query doesn’t appear (which happens…) you can still see useful information about data privacy.

Power Query data privacy is one of the most difficult subjects in the whole of Power BI, but it’s also really important that you understand it: apart from the performance implications it can determine whether your query even runs or not. Having information like this in Query Diagnostics means that for the first time we have detailed information about how data privacy rules are applied in a particular query.

The M Behind The New Power Query Data Types In Excel

The big news this week – at least for me – was the release of the new Power Query data types to the Excel insiders channel. You can read all about it here:

https://insider.office.com/en-us/blog/power-query-data-types-in-excel

They’re the latest manifestation of Excel linked data types; cool things are also happening with them and Power BI featured tables too.

The announcement blog post explains pretty much everything you can do right now with Power Query data types but I was curious about the M code that is used to create them. Here’s an example query that takes this source table:

…and creates this Power Query data type:

let
  Source = #table(
      type table[
        Fruit = text, 
        Colour = text, 
        Sales = number
      ], 
      {
        {"Apples", "Green", 10}, 
        {"Lemons", "Yellow", 20}, 
        {"Strawberries", "Red", 30}
      }
    ),
  #"Created data type"
    = Table.CombineColumnsToRecord(
        Source, 
        "Data type", 
        {"Fruit", "Colour", "Sales"}, 
        [DisplayNameColumn = "Fruit", TypeName
          = "Excel.DataType"]
      )
in
  #"Created data type"

The magic happens with the #”Created data type” step and the Table.CombineColumnsToRecord function; so Power Query data types are basically columns that contain record values with (I guess, I need to check) some extra metadata.

%d bloggers like this: