A Look At Lobe – A Free, Easy-To-Use Tool For Training Machine Learning Models

A few months ago I heard about a new tool from Microsoft called Lobe which makes it easy to train machine learning models. It’s nothing to do with Power BI but I find anything to do with self-service data analytics interesting, and when I finally got round to playing with it today I thought it was so much fun that it deserved a blog post.

You can download it and learn more at https://www.lobe.ai/ and there’s a great ten minute video describing how to use it here:

The most impressive thing about it is not what it does but how it does it: a lot of tools claim to make machine learning easy for non-technical users but Lobe really is easy to use. My AI/ML knowledge is very basic but I got up and running with it extremely quickly.

To test it out I downloaded lots of pictures of English churches and trained a model to detect whether the church had a tower or a spire. After I labelled the pictures appropriately:

…Lobe was able to train the model:

I could test it inside the tool. The model was able to tell whether a church had a tower:

…or a spire:

…very reliably!

If I have one criticism it’s that when you want to use your model things get a lot more technical, at least compared to something like AI Builder for Power Apps and Power Automate, but I guess that’s because it is just a tool for training models. There have been some recent improvements here though (see this blog post) and Lobe does provide a local API for testing purposes that can be consumed in Power BI with some custom M code.

Here’s an example of how to call the local API in Power Query:

let
  Source = Folder.Files("C:\Churches"),
  #"Removed Other Columns"
    = Table.SelectColumns(
    Source,
    {"Content", "Name"}
  ),
  #"Added Custom" = Table.AddColumn(
    #"Removed Other Columns",
    "CallAPI",
    each Text.FromBinary(
      Web.Contents(

        //Insert Lobe Connect URL here                              
        "http://localhost...",
        [
          Content = Json.FromValue(
            [
              image = Binary.ToText(
                [Content],
                BinaryEncoding.Base64
              )
            ]
          ),
          Headers = [
            #"Content-Type"
              = "application/json"
          ]
        ]
      )
    )
  ),
  #"Parsed JSON"
    = Table.TransformColumns(
    #"Added Custom",
    {{"CallAPI", Json.Document}}
  ),
  #"Expanded CallAPI"
    = Table.ExpandRecordColumn(
    #"Parsed JSON",
    "CallAPI",
    {"predictions"},
    {"predictions"}
  ),
  #"Expanded predictions"
    = Table.ExpandListColumn(
    #"Expanded CallAPI",
    "predictions"
  ),
  #"Expanded predictions1"
    = Table.ExpandRecordColumn(
    #"Expanded predictions",
    "predictions",
    {"label", "confidence"},
    {"label", "confidence"}
  ),
  #"Pivoted Column" = Table.Pivot(
    #"Expanded predictions1",
    List.Distinct(
      #"Expanded predictions1"[label]
    ),
    "label",
    "confidence",
    List.Sum
  ),
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Pivoted Column",
    {
      {"Tower", type number},
      {"Spire", type number}
    }
  ),
  #"Removed Columns"
    = Table.RemoveColumns(
    #"Changed Type",
    {"Content"}
  )
in
  #"Removed Columns"

You can export models to a variety of other places for production use, including Azure Functions and Azure Machine Learning.

Definitely something to keep an eye on, especially because it will soon be able to do object detection and data classification as well as image classification.

Ten Reasons Why I’m Excited About The New Power BI/Excel Integration Features

My favourite Power BI announcement at the Microsoft Business Applications Summit was, without a doubt, that Excel PivotTables connected to Power BI datasets will very soon work in the browser and not just on the desktop. This is something I have wanted for a long time, way before I joined Microsoft, so this is a feature I have a personal interest in. However I also think it’s an incredibly important step forward for Power BI in general and in this post I’ll outline the reasons why.

Before we carry on please make sure you read this post on the Excel blog which has more details on all of the new Power BI/Excel integration features that are being released. Quick summary: if you’re reading this in late May 2021 you probably won’t have all of this functionality available in your tenant yet but it is coming very soon.

So why exactly am I excited?

It makes Excel a third option for building Power BI reports

Up to now, if you wanted to build Power BI reports and share them with other people online you had two choices: regular Power BI reports and paginated reports. Now Excel gives you a third option: you can upload Power BI-connected Excel workbooks to a Power BI workspace, make them available via a Power BI app, and not only will they be fully interactive but the data in them will also update automatically when the data in your dataset updates.

PivotTables are the best way to explore Power BI data

Why do we need Excel as an option for building reports on data stored in Power BI? The first reason is data exploration. Excel PivotTables are a much better way to explore your data than any other method in Power BI, in my opinion. Why try to recreate an Excel PivotTable using a matrix visual in a regular Power BI report when you can give your users the real thing?

Cube functions also now work in the browser – and they make it easy to design financial reports

The Excel cube functions (CubeMember, CubeValue etc) are, I think, the best-kept feature in Excel. While PivotTables are great for exploring data they aren’t always so great when you want to build highly-formatted reports. The Excel cube functions make it easy to bind individual cells in a worksheet to individual values in your dataset and because they’re just like any other Excel function they allow you to use all of Excel’s formatting and charting functionality. This then makes it possible to build certain types of report, such as financial reports, much more easily. If you want to learn more about them check out this video from Peter Myers – it shows how to use them with Analysis Services but they work just the same when connected to a Power BI dataset.

Organisational data types make it easy to access Power BI data

While the Excel cube functions are very powerful they are also somewhat difficult to use and sometimes suffer from performance problems. The new organisational data types in Excel do something very similar and while they don’t yet have all the features you need to build complex reports they are also a lot easier to understand for most business users.

Excel formulas are easier than DAX for a many calculations

Everyone knows DAX can be hard sometimes. However, once you’ve got the data you need from your dataset into Excel using a PivotTable, cube functions or organisational data types you can then do your own calculations on that data using regular Excel formulas. This not only allows business users to add their own calculations easily but for BI professionals it could be the case that an Excel formula is easier to write and faster to execute than the equivalent DAX.

Excel can visualise data in ways that Power BI can’t

Excel is a very mature data visualisation tool and it has some types of chart and some formatting options that aren’t (yet) available in Power BI’s native visuals. One example that springs to mind is that you can add error bars to a bar chart in Excel; another is sparklines, although they are coming to Power BI later this year.

Power Pivot reports will also work in the browser

Even if you don’t have a Power BI pro licence, if you have a commercial version of Excel you’ll have Power Pivot and the Excel Data Model. And guess what, Power Pivot reports also now work in the browser!

Collaborate in real-time with your colleagues in Excel Online

With Excel reports connected to Power BI stored in OneDrive for Business or a SharePoint document library you get great features for collaboration and co-authoring, so you and your colleagues can analyse data together even if you’re not in the same room.

There’s a lot of other cool stuff happening in Excel right now

The Excel team are on a hot streak at the moment: dynamic arrays, LAMBDAs, LET, the beginnings of Power Query on the Mac and lots more cool new stuff has been delivered recently. If you’re only familiar with the Excel features you learned on a course 20 years ago you’re missing out on some really powerful functionality for data analysis.

Everyone knows Excel!

Last of all, it goes without saying that Excel is by far the most popular tool for working with data in the world. Everyone has it, everyone knows it and everyone wants to use it. As Power BI people we all know how difficult it is to persuade our users to abandon their old Excel habits, so why not meet them halfway? Storing data in a Power BI dataset solves many of the problems of using Excel as a reporting tool: no more manual exports, old-of-date data or multiple versions of the truth. Using Excel to build reports on top of a Power BI dataset may be much easier to learn and accept for many business users – at least at first – than learning how to build reports in Power BI Desktop.

Video: Advanced Analytics Features In Power BI

Following on from my last post, another SQLBits session of mine I wanted to highlight was “Advanced Analytics Features In Power BI”. The subject is a bit outside my normal area of expertise but it’s also one that I don’t think gets enough attention: it’s all the features available in Power BI reports that can help you explain why something happened rather than just what happened. Things I talk about include:

  • Adding forecasts to line charts
  • Symmetry shading, ratio lines and clustering on scatter charts
  • The “Explain the increase” and “Find where this distribution is different” features
  • The Key Influencers and Decomposition Tree visuals
  • Custom visuals such as Sanddance
  • Natural language querying with Q&A

Video: Performance Tuning Power BI Dataset Refresh

The team at SQLBits have been publishing all the session recordings from their last (online) conference on their YouTube channel. There’s a lot of great content there to check out and this post is to highlight one of my sessions, “Performance tuning Power BI dataset refresh”.

In this session I look at all of the factors that can influence how long it takes to import data into Power BI and what you can do to make it faster. Topics covered include:

  • Choosing a dataset storage mode
  • The importance of good data modelling
  • How the type of data source you use effects how quickly data can load
  • Ways to measure refresh performance, such as using SQL Server Profiler and Power Query Query Diagnostics
  • Power Query options that can influence refresh times such as disabling data previews
  • Query folding in the Power Query engine
  • Vertipaq engine features that affect refresh, such as calculated columns and calculated tables
  • How dataflows can help refresh performance

Power Query And Power BI Connectivity Announcements At The Microsoft Business Applications Summit

There were a lot of exciting announcements at the Microsoft Business Applications Summit this week but if you only watched the keynotes or read the recap on the Power BI blog you will have missed all the Power Query-related news in the “Data Prep in Power BI, Power Platform and Excel using Power Query” session:

https://mymbas.microsoft.com/sessions/1332f59f-a051-4a06-ae50-8f3185501a88

It covers all the new things that have happened in Power Query over the last few months such as Diagram View and, more importantly, talks about what’s going to happen in the next few months. It’s relatively short but for those of you with no time or patience, here’s a summary of the roadmap announcements:

[BTW “Power Query Online” is the browser-based version of Power Query that is used in Power BI dataflows]

My highlights are:

  • The ability to create a dataflow quickly by uploading a file to Power Query Online without needing to use a gateway to connect to a file on-premises, useful for one-time import scenarios.
  • Multi-value M parameter support – useful for dynamic M parameters and other things I can’t talk about yet 😉
  • The things that Miguel talks about regarding “easier design experiences” for Synapse are kept intentionally vague but it’s worth listening to carefully to what he says here!
  • Native SQL support for Snowflake, BigQuery and Redshift – this is really useful for anyone who wants to use DirectQuery with these databases because it will allow you to write your own SQL query and use it as the source of a table, rather than having to use a table or a view.
  • AAD based Single Sign-On support for Redshift and BigQuery (similar to what we have today for Snowflake) will also be very important for DirectQuery, because it means that the identity of the user running the report can be passed back to the database.
  • A dataflows connector for Excel Power Query – which means, at last, you’ll be able to get data from a dataflow direct into Excel. This will make a lot of Excel users very happy, I think: a lot of the time all users want is a table of data dumped to Excel and dataflows will be a great way to do provide them with that.

Last of all, the session showcases the great new home for all things Power Query – http://www.powerquery.com/ – which has great resources, newly-updated documentation and a blog. Make sure you check it out!

Power BI, Excel Organisation Data Types And Images

Excel Organisation data types were released last year (see here for details), but did you know that you can now use them to bring images as well as text and numbers into Excel? Here’s a super-simple example that shows you how to do this.

Here’s a table called ‘Fruit With Image’ in a dataset that I have published to the Power BI Service:

Notice that the Data Category property on the Image column, which contains the URL of a picture of each type of fruit listed, to “Image URL” (for more details on what this does see here). If I use this table in a Power BI report, I see the name of each fruit and a picture:

So far no surprises. I can also set this table up as a Featured Table (for more details see here) so it can be used as the source for an Organisation Data Type in Excel:

The cool thing is that when I type these fruit names into Excel and mark them as the “Fruit With Image” data type (see here for more details), I can then access the Image field and it will show the image that the URL points to inside a cell:

Measuring DirectQuery Performance In Power BI

If you have a slow DirectQuery report in Power BI one of the first questions you need to ask is how long the SQL queries that Power BI generates take to run. This is a more complicated question to answer than you might think, though, and in this post I’ll explain why.

I happen to have access to some of the famous New York taxi data in a Snowflake database, and in there is a table with trip data that has 173 million rows that I have a built a Power BI dataset from. The data and the database used are not really important here though – what is important is that it’s DirectQuery and a large-ish amount of data. Here’s a report page with a single table visual on it, showing passenger count aggregated by the hack license field:

It’s slow, but how slow? Here’s what Performance Analyzer shows when I refresh the table:

The DAX query takes 5.4 seconds but the Direct Query time is only 3.3 seconds – and the numbers don’t seem to add up. Here’s what Profiler captures for the same refresh shown in Performance Analyzer:

This shows there’s a gap of 2 seconds between the DirectQuery End event and the Query End event. What if I paste the DAX query into DAX Studio? Here’s what the Server Timings tab shows:

This is a different query execution to the two examples above, both of which show data for the same execution, which explains why the numbers are slightly different here – but again there seems to be an extra second of stuff happening and DAX Studio suggests that it’s in the Formula Engine.

So what is going on? The answer lies in understanding what the DirectQuery End Profiler event actually measures: it’s the amount of time between the Analysis Services engine handing a query over to the Power Query engine and the Analysis Services engine receiving the first row in the resultset back, including the time taken for the Power Query engine to fold the query.

Therefore if it takes a long time to get all the rows in the resultset then that could explain what’s going on here. Unfortunately there’s no way of knowing from Profiler events how long this takes – but there is another way. Going back to Performance Analyzer, if you export the data from it to JSON (by clicking the Export button) and load it into Power Query, you can see more detail about a DirectQuery query execution. Here’s the data from the first execution above:

[There’s a very good paper documenting what’s in the Performance Analyzer JSON file here]

Looking at the record in the metrics column for the Execute Direct Query event you can see the same 3.2 second duration shown above in Profiler. Notice that there are two other metrics here as well: RowsRead, which is the total number of rows returned by the resultset; and DataReadDuration, which is the amount of time to read these rows after the first row has been received plus some other Analysis Services Engine operations such as encoding of column values, joining with unpushed semijoins, projections of aggregations such as Average and saving the resultset to the in-memory cache. In this case the SQL query has returned 43191 rows and this takes 1.95 seconds – which explains the gap between the end of the Execute Direct Query event and the end of the query.

One last question: why this SQL query is returning so many rows when the DAX query is only asking for the top 502 rows?

The reason is that, at the time of writing at least, the Analysis Services engine can only push a top(n) operation down to a DirectQuery SQL query in very simple scenarios where there are no measures and no aggregation involved – and in this case we’re summing up values. As a result, if you’re using DirectQuery mode and have a visual like this that can potentially display a large number of rows and includes a measure or aggregated values, you may end up with slow performance.

[Thanks to Jeffrey Wang for providing the information in this post]

%d bloggers like this: