Creating A Partitioned Table In SSAS Tabular 2017 And SSDT Using M Functions

One of the coolest new features in SSAS Tabular 2017 and Azure Analysis Services is the integration of Power Query and M for data loading. Over the last year or so the Analysis Services team blog has posted a lot of fairly complex examples of how to use this functionality, but now that the latest release of SSDT has proper support for shared expressions I thought it would be a good idea to show a simple example of how to use it to create a partitioned table using M functions.

For this example I’ll be using the FactInternetSales fact table from the Adventure Works DW sample database, and the aim is to create a table in an SSAS Tabular project that has one partition for each year of data in FactInternetSales. Assuming that a new SSAS Tabular project has been created at the 1400 compatibility level with an integrated workspace:

image

…the first thing to do is to right-click on the Data Sources folder in the Tabular Model Explorer pane and select Import From Data Source:

This brings up the Get Data dialog:

Select SQL Server database and then click Connect. Enter the server name and database name in the SQL Server database dialog:

Choose how SSAS is to authenticate when it connects to the SQL Server database and click Connect:

Select the FactInternetSales table from the list of tables in the Adventure Works DW database:

This opens the Query Editor window; in it there is one query called FactInternetSales:

Here’s where it gets interesting. The first thing to do is to create a function that returns a filtered subset of the rows in the FactInternetSales table using the technique I blogged about here for Power BI. On the Query Editor menu bar, click Query/Parameters/New Parameter and create two new parameters called StartDate and EndDate that return the numbers 20010101 and 20011231. Here’s what they should look like:

These parameters are going to be used to filter the OrderDateKey column on the FactInternetSales table. Do this by clicking on the down arrow on the column header of OrderDateKey then selecting Number Filters and then Between:

In the Filter Rows dialog use the StartDate parameter for the start of the filter range and the EndDate parameter for the end of the filter range, then click OK:

Because the OrderDateKey contains dates stored as numbers in the YYYYMMDD format the result is a table that only contains sales where the order date is in the year 2001. This table should not be loaded into SSAS though, so right click on the FactInternetSales in the Queries pane and make sure that the Create New Table is not checked:

Next, on the same right-click menu, select Create Function:

In the Create Function dialog name the new function GetFactData then click OK:

The new GetFactData function will now be visible in the Queries pane; enter 20010101 for the StartDate parameter and 20011231 for the EndDate parameter and click Invoke:

This creates yet another new query called Invoked Function which should be renamed Internet Sales:

Right-click on this query and make sure Create New Table is selected. Next, click the Import button on the toolbar to close the Query Editor and load the Internet Sales table into SSAS.

At this point the Tabular Model Explorer will show all of the queries created above listed under the Expressions folder, and a single table called Internet Sales with a single partition:

Next, right-click on the Internet Sales table in the Tables folder and select Partitions:

This opens the Partition Manager dialog. Rename the existing partition to Internet Sales 2001:

Note that the M query for this partition calls the GetFactData() function to get the rows from FactInternetSales where OrderDateKey is between 20010101 and 20011231:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = GetFactData(20010101, 20011231)
in
    Source
[/sourcecode]

Click the New button to create new partitions, one for each year of data in the FactInternetSales table. Each new partition will initially contain the same M code shown above and should be edited so that the query gets data for the appropriate year:

Click OK, and the end result is a table with one partition per year:

What’s the point of using M functions to return the data for a partition, rather than the traditional method of using a SQL query embedded in each partition? One reason to do this would be to make maintenance easier: if you need to do something like add a new column to a fact table, rather than editing lots of partitions you just need to edit the function and all the partitions will reflect that change. I can think of a few others, but I’ll save them for future blog posts…

45 thoughts on “Creating A Partitioned Table In SSAS Tabular 2017 And SSDT Using M Functions

  1. Reading this makes me wonder what would go into extending M to allow an expression to iterate over a set of parameter members and generate the partitons internally. Basically doing the last step for you.

    As it is, we need to do that last step using AMO, which is not always natural for BI developers

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      That’s not possible right now – while M can be used to load data, it can’t be used to alter the structure of the SSAS Tabular model itself. In the future… well, it would be possible without any changes to the M language, but the dev team would have to be sure it was the right approach.

  2. While this is very interesting, I am having some problems getting to work.
    I could not get this to work on an existing project. I had to change to integrated workspace mode, but there was no way to select the created parameters in the filter of the initial fact query, so I could not make a function.

    I then created a new project as described and created everything perfectly for my date dimension (filtered to just show the days I will be loading).
    I then tried to import a fact table and apply the parameters, and the option to use a parameter in the filter rows dialog box was gone again.
    It turns you need to create new parameters for every single table you wish to filter, which is not great as this will create a proliferation of expressions and make keeping track of things quite difficult.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Michael, I agree the UI still needs a lot of work – I get confused by what’s going on too, and I get frustrated by the kind of scenarios you describe… It’s still being worked on though so hopefully these problems will be dealt with soon.

  3. Thank you for your great posts here ! I have learnt much from your blog. now I am thinking a problem, that is: how to get max value from all cells in powerpivot value area? And this max value will change accordingly with the layout of powerpivot. It’s really difficult for me, could you give me some clue?

  4. Chris,
    This is great. I found an issue after testing. I would like to manage the parameter dynamically, means to set it from another query. I did see options to set parameter as Any value/List of values/Query. But i got error when set parameter as query and refer to it from the funcation.
    Do you have chance to try the parameter as Query option?
    Anyway, it’s quite difficult to use comparing to PQ in Excel/PBI Desktop. Most of time, i have to build in PQ/PBI, then copy M code to SQL 2017. 🙁
    Hope MSFT can improve the usability soon.
    Thanks, MD

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      No, I haven’t tried this yet – I would not be surprised to find that it isn’t supported yet though… I agree the usability is not great, but it gets better with each monthly release of SSDT.

  5. I’m very new to SSAS and feel lucky to have found your blog, I think it’s going to be very useful.
    I came across this post as I’m struggling with a 490 million row Fact table.

    I followed the instructions and I initially created a single partition for a 3 month period (I’m going to work in quarters, not years). This deployed successfully and I am now creating a 2nd partition for the following quarter.

    The two quarters I have created so far are for 2016 and my question is, what happens when I reach the current quarter and my partition will be 01-Oct-17 to 31-Dec-17? I will be adding to this partition will each daily load of data so how can I reprocess this partition only each day until Jan-18 and I create the next one?

    Thanks.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Dylan is correct – there are a lot of options out there for automating partition creation and processing, either using .NET code, using SSIS or in some other way. If you choose a naming convention for your partitions that includes some indication of the date range of data it contains, it should be easy to work out which one contains the latest data when you need to process it.

      1. Thanks very much both, I’ll be looking into this more soon. There was a lot going on in that github post so may look at the SSIS option first.

        Before I look into automation I have the issue of getting the earlier data into my model in Azure AS. I created some partitions that spanned a quarter but when trying to process it received the error “Memory error: You have reached the maximum allowable memory allocation for your tier. Consider upgrading to a tier with more available memory.” I tried creating smaller partitions, reducing down to a single day (approx. 300k records) but processing that single partition is still failing with the same error. Any help or guidance gratefully received!

      2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        Azure SSAS is an in-memory database, so you need enough memory to store the data you want to analyse. It sounds like you have more data than is currently allowed in the tier you have purchased. As the message says, you can upgrade to a more expensive tier which will allow you to work with more data; you should also look at the data you are loading and delete any columns you don’t need for your reports, because that will reduce your memory usage.

  6. You’re right. I just looked at the metrics and am at 9.5gb of my 10gb cache limit. This is going to be a problem as I’ve hardly any of the data I need in there yet.

    Looks like I’m going to have to re-evaluate my approach.

    Thanks Chris.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      It is definitely worth looking at which columns you import- it can make a dramatic difference to memory usage

      1. I had to make a number of changes to reduce the number of columns in my fact table and am now at a point of creating my SSAS model. I followed the steps below again and created a partition for a single day which was worked as expected.

        Having deployed the model, I then created further partitions within SSMS which also worked fine however when I made a further change to my model within SSAS and re-deployed, the partitions I had created within SSMS had been lost. Is this expected behavior or have I done something wrong? I am about to start looking at the partition automation again but am unsure whether I need to be creating them in SSAS, or SSMS (as I originally thought).

        Hopefully this is should be a quick answer!

        Thanks.

  7. Not completely related to this topic but there’s hardly any currently that seem to cover the new get data feature in SQL 2017 so posting it here.
    When I import a table, I am seeing that is is automatically creating a partition and the name of the query changes to TableName/Partition. Is that expected behaviour?
    Also, if I add a calculated column to this table in the model and then go and edit the query to say add a new column then the calculated column goes away. Seems to me like I’m doing something wrong but can’t seem to figure out what that is.

  8. Also, after one make a change to the Table “design” i.e. say add a column, are we supposed to use “Import” or “File -> Close & Update”. What is the difference between the two?

  9. Is it possible to dynamically create partitions at the time of processing? For example, I want to partition my fact table based on an ID column. This list of ID’s I want to filter on comes from another table. Is it possible to dynamically partition the fact table, for each value comming from the ID table?

  10. ive written something in c# to process my partitions and which also creates new partitions when needed. i am using monthly partitions which each day checks whether a partition already exists for that date and creates one it necessary. im not c# expert but its not tricky code.

  11. Great blog once again Chris, I thought I would let you know that I was struggling with create an Expression and then it was never loading to a table.

    Fortunately I read through this blog post, which had the hidden gem of making sure that the “Create New Table” was enabled. And now it works!

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I guess it depends on the date range you want the partition to include. I’m not a fan of different-sized partitions though: for me, if you’re going to partition by month then each partition should just contain a single month. If you data for months after the end of the current month, create more partitions for that data. If not, make the current partition range 20180101 to 20180131.

  12. This approach only seems to work because the OrderDateKey is a smart key that contains the date within it. This goes against Kimball design recommendations and it isn’t what we do in our data warehouse. Can we perform the above using M Functions if the OrderDateKey is actually an integer that doesn’t contain date information within it? Or are we stuck using SQL and joining against the Date table?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Actually, this doesn’t go against Kimball recommendations specifically for the reason shown here: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/calendar-date-dimension/ If you have a meaningless surrogate key value you could write a more complex function that filters on your Date dimension, though.

      1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        Yes, it does seem to be contradictory. I’ve certainly seen the recommendation to use smart keys for a Date dimension in one of the Kimball books too, though I can’t remember where exactly. It’s also something I’ve seen in a lot of data warehouses.

  13. It appears to me that using a custom function like this prevents query folding. I really hope I’m wrong, because that would mean that I have to pull all the data from the source system, which pretty much defeats the point of partitioning.

  14. It appears to me that using a custom function like this prevents query folding. I really hope I’m wrong, because that would mean that I have to pull all the data from the source system, which pretty much defeats the point of partitioning.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      No, that’s not true – there must be some other reason why query folding is not taking place.

  15. Hi Chris,

    I’m new to Tabular models and Thanks for your blog. This logic suits for my requirement. Could you please help me how to edit/Add/Delete columns from the function. So that there is no need to change each partitions

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      You need to update the code for the function to add or remove columns, and to do that you need to learn some M (which isn’t too difficult).

  16. Hi Chris,
    I’m starting this approach with default partitions already in my model. When I implement this solution I keep getting an error that says the partitions schema has changed. Any idea what to do with this error without deleting and re-adding the table?

    Thank you,

  17. Hi Chris,

    I have successfully used this to partition one fact table and is a really nice way to partition. However I am having all sorts of issues trying to create other partitions on another fact table in the model in VS 2019. The dialogue box just disappears most of the time when I try to create new parameters for a new fact table. I am also wary of creating all these parameters and functions to partition fact tables when the UI seems very shaky. I find the Expression sections a bit messy and difficult with just 1 fact table let alone a dozen so I will revert back to create the partitions another way I think. It’s shame as this is a really nice way to create them!

  18. Is this way of creating multiple partitions in a Tabular model supported in Standard edition of SQL Server (or is it strictly an Enterprise feature)?

  19. I am interested in examples to help me create partitions in Analysis services tabular model using powershell, Any pointers will be appreciated. Thank you!

Leave a Reply to Chris WebbCancel reply