Naming Tables, Columns And Measures In Power BI

I see a lot of Power BI datasets in the course of my work, and as a result I see a lot of datasets that look like this:

Fields

What’s wrong with this picture? Look at the names:

  • The tables and columns have the same names that they had in the data source, in this case a SQL Server database. Note the table name prefixes of “Dim” for dimensions and “Fact” for fact tables.
  • The column and measure names either don’t have spaces or use underscores instead of spaces.
  • What on earth does the measure name _PxSysF even mean?

Datasets like this seem to work perfectly well and are often built by professional BI developers but these names are a mess – and this can cause a lot of problems later on.

This is an issue I’ve been moaning about for years, but I wanted to blog about it again because it’s just as important today for Power BI as it was ten years ago for Analysis Services. My advice is to make naming a top priority when you’re building a dataset. If you have already published your dataset it will be difficult to change the names you’ve used – if you do so, you risk breaking reports and calculations that you and other people have built on it – so this is something that should be dealt with as early in the development process as possible.

In my opinion there are three things to consider when naming a table, column or measure:

  • You should use human-readable names rather than any kind of technical naming convention, with spaces where you would expect to have spaces and all vowels present. For example, that means having names like [Sales Amount] rather than [Sales_Amount] or [SlsAmt]; similarly, prefixes like “Dim” and “Fact” might make sense to you but won’t mean anything to your users.
  • You should use the correct business terminology, the terminology that your users will know and understand, rather than just make up some names that seem appropriate. Your users might not understand what [Total Sales Value] is if the generally accepted term is [Net Sales Amount].
  • The names you use should be consistent across all datasets that contain the same data. That means that if you have a table called Sales in one dataset it should be called Sales in every other dataset that you build from the same data source, not Transactions, FactSales or something else.

This advice might be controversial to some people, especially those with a database background, but to me designing a dataset is more like designing a user interface rather than designing a database. Indeed the consequences of a dataset with no thought put into naming are similar to the consequences of a poor user interface:

  • If your end users don’t understand what your report is trying to show, what’s the point of even building a report?
  • Reuse of datasets is a good thing and ideally any dataset you build should be easy for other people to build reports from. If those other people don’t understand what the names of your tables, columns and measures mean they won’t be able to build new Power BI reports from your dataset using Live connections or use Analyze in Excel.
  • It’s not just other people who are building reports from your dataset that you need to think about. If you want to use the new visual personalisation feature or Q&A then you’ll only be able to if your end users can understand the names you’ve used.
  • Even if you’re the only person building datasets and reports in your organisation, you owe it to yourself to make your code as readable as possible and to save yourself the effort of having to rename columns and measures when you use them in a visual.

That’s enough ranting for now. Good naming is only one part of good data modelling but it’s something that’s too often neglected!

26 responses

  1. So true, I hate it when tables are prepended with “fact” and “dim”. Really all valid points and would be lovely as bullet points or a cheat sheet.

  2. _Sales_Amount is in a dimension table. In almost all cases, it should be in a fact table. Right? (Not to mention the naming convention, as Chris points out.)

  3. Dim and Fact are fine as a prefix inside a large database where techy people look. At the front end it should just be Sales and Customers. Any surrogate keys should net be seen ( 99% of end users are not interested in surrogate keys). If you have to explain the attribute or object you have failed. Simples.

  4. Yes, and no. If a user do not know what a fact table is, he should not get into designing reports in any bi tool. Abbreviations is sometimes inevitable, an dealing with full names for some measures like “Year Over Year Forecasted Full Year Revenue in constant currency rate” may not be the best idea. For me it is not using abbreviation that is the problem. The issue is in overusing it, inconsistency, lack of business definitions, no documentation. I do like when tables are clearly marked with fact or dim. I do not like where I have hundreds of columns and measures all “human friendly long descriptions” with no real consistent convention

  5. Great stuff Chris, I’ve been trying to push this point with my report designers and we have actually incorporated in our data-flow strategies as well because we are trying to build data sources that people can intuitively understand and build their own insights on. This is the essence of BI and where it needs to be heading.

  6. Some points I would agree but NetSalesAmount vs Net Sales Amount ?. The camel-casing is a nice tradeoff vs spaces/underscores. Definitely hide internal fields and tables

    • I disagree – camel casing is fine for a technical audience, but it can be confusing for a business analyst and you would not want to see it used in a report (for example as a column header).

      • Hi Chris, Can we use URL parameters for filtering reports if we have spaces in the table and column names? Will be great if you could point me to a solution. Thanks

  7. Pingback: Naming Tables, Columns And Measures In Power BI – ScienceGeek

  8. Pingback: Good Practices for Naming Things in Power BI – Curated SQL

  9. This is an important topic and your blog post is a great start but naming conventions should never be developed at the scope of a single table or subject area. Much Power BI, Tableau, and similar development is like a return to the wild, wild West. Gone are precepts and enduring processes like data governance where subject matter experts hash out naming conventions and haggle over some specifics at the enterprise level. Most enterprise sorely need a return to or perhaps first time use of data governance.

    More technically, it is not the names assigned that are most important, but where to store and manage them. The master of column names should exist in an enterprise glossary along with definitions. The glossary should tie to a lineage document or database so that the upstream database names and downstream BI product uses for the column can be tracked.

    Specific to Power BI, Microsoft has not yet provided a means to assign names globally. There is no metadata layer where the translation occurs. For each new PBIX file hosting data, the imported table/column names must be reassigned manually. Care must be taken that all manual reassignments exactly match the character string in the glossary. Some efficiency might be gained to the extent that your organization is willing to base a number of PBI reports on a single PBI “model only” host file.

    I have tried to use Python to decode and edit the JSON model elements in the PBIX [zip] file in order to have programming rename the elements to match an external glossary but this is quite complex and so far, I have been unsuccessful. Has anyone else tried to programmatically rename tables/columns or make any other changes to the PBIX file except through the PBI Desktop developer UI?

    • Hello,
      I think editing the PBIX is quite difficult and risky as we don’t know how the Power BI team will modify it in the future. By the way, it will not be usefull when the XMLA endpoint will be generally available. Then translations will solve the problem.

    • I used to be a DBA on IBM Mainframe DB2.
      We used the “unreadable” names. Well, they are unreadable to the “average user”, but to DBA’s and developers, after a couple of years you could read them like words.
      .
      We used a very specific syntax called the “IBM OF Language”. (I wish I could find documentation for it).
      .
      We had 2 people assigned to define new names for our “data dictionary”. Every element used in names had a STANDARD 3 letter short form (with a few 2 letter exceptions to make things interesting). The “OF language” had a very specific syntax that defined how names were built up.
      .
      All of the names used suffixes to identify it’s type, ie TB=Table, VW= View, N=Numeric data, T=Text, F=Flag etc.
      .
      I agree, this sort of esoteric language is OK for professional developers, but PowerBI (and PowerQuery / PowerPivot / PivotTables) are intended for use by “normal” people. They don’t have time to learn the abbreviations and the syntax.
      .
      That being said, I agree with the thesis of the article: there still should be commonly agreed upon names used by everyone. This is another situation where standards are used to redirect “creativity”. Names should be standard, let the developers focus their “creativity” on the design of the analysis.
      .

  10. Thank you for sharing this post, I would like to agree with the point of view of the article, I hope you will have more sharing to help us understand it. Thank you very much.

  11. Thank you for sharing this post, I would like to agree with the point of view of the article, I hope you will have more sharing to help us understand it. Thank you very much.OK Roi em , Cai nay hoanga ạoi noi dung comment cho chi, lam lay chuc cai comment khac nhau roi dan nhe ok ạ

  12. Hi Chris. Any benefit in Net Sales Amount vs Net_Sales_Amount other than readability? Does intellisense or Q&A work better with spaces?

  13. Does [Net Sales Amount] include account level rebates?
    That’s the problem with human-readable names. Different business units can have the same word mean different things.
    Worse case, a user sums Net Sales, says Sales are doing awesome, and at year end finds out profits were horrible… due to accidentally excluding account volume level rebates.

  14. I have a few simple rules:

    Do Top-Down-Naming (or hierarchical naming) whenever you can – for the Sort-Order.
    Most do the opposite, by intuition … though we have omni-directional reading capabilities.

    Example:
    Top Down: 2020-01-31 Bottom up: 31-01-2020.
    Sales revenue (net), Sales revenue (grs), or Vehicles (cnt), etc

    Measures: Start with the Dimension name, and then the Measure. Again Top-Down.

    Top down naming is also the most intuitive way to read a tables.


    +Underscores, don’t work well with “Wrap text”. For that we need a “blank underscore”. So no.


    ++the Measure-tables are to be placed in Process-Order. Yes, that is the key to opening the “virtual dimension of measures” or “value engineering” (or any other type of construction) – enjoy.

  15. Pingback: Translations life cycle management - The BI Power

  16. Thank you very much for mentioning it. I was wondering all that time if I was the only one complaining about those huggly names.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: