Power BI Pro Tip: Confusion about TOPN() versus RANKX()

This post serves as a follow-up on my Power BI Pro Tip about using RANKX to show Top X results. I am writing this because I discovered that there is a lot of confusion about the RANKX() versus the TOPN() function.

Let me try to explain what each function does. The RANKX() function ranks individual data rows according to a certain ranking attribute. It’s result is a numerical value associated with each and every single row of the data table, as you can see below:

 

Now, the TOPN() function sounds the same, right? That’s were the confusion comes from. Also, the call to the function is really similar:

RANKX( ; )

TOPN( ; ; )

 

However, TOPN does not return a value for each row in the data table. It returns a table that contains the top N items (N is the number you specified in the first argument) from the original data table according to the ranking attribute you specified.

In itself this is pretty useless, since you cannot display this data in any way. If you could it would be an alternative way to get a top N ranking to RANKX.

To make TOPN useful you need to wrap it in another function, such as SUMX or AVERAGEX. Let’s see an example:

MyMeasure := SUMX(TOPN(10;Sales;Sales[Sales Amount]);Sales[Sales Amount])

 

Now, MyMeasure equals the sum of sales amount for the best performing cities. At this point I do not know which cities it were, and maybe that is not even important to you. The total sales amount for the top 10 performers is returned.

This comes in handy when benchmarking an individual or organization against a bigger population. You could do an AVERAGEX of the TOPN result and that would be the average score for the top 10 performers. A dashboard showing how an individual measures up against the top 10 is then quickly created.

 

 

Power BI Pro Tip: Dealing with errors when reading Excel files

If you use Power Query to read an Excel file that has errors (such as #VALUE, #REF, #N/A, etc) the rows that contain the error will not be loaded into the data model but instead will be flagged as error rows. But what to do if you really want to keep the data rows with the errors with some replacement value for the error instead of skipping the error row altogether?

To demonstrate this I created this simple Excel table that contains only errors (actually all errors that Excel can generate that I am aware of):

How they are generated and what they mean is not the point here, but I am sure most of you have seen some if not all of them before.

On loading this table using Power Query all five rows will be marked as error rows:

As a result, no data is loaded into your data model. This is as expected since Power Query filters out the rows that have an error, which in this case are all of the input rows.

However, in some cases this is not what you want. Suppose this was a 50 column dataset and that in one column sometimes an error occurs. Then do you want to disregard the rows with errors and thus delete all possible valuable info in the rows or do you want to somehow fix the errors and flag them as problematic but still load them? The latter is sometimes the better choice.

This can easily be done by editing the Power Query and using the Replace Errors function (on the Transform tab):

Make sure you have the column with the errors in it selected, enter a valid value for the data type in the column (in my case it was Any since there is really no other data available in the MyCol) and hit OK.

Power Query will replace the error with the value you entered:

Now, on loading the rows with errors will end up in the data model and your data will be flagged with the label you chose.

Hope this helps! Until next time!

Power BI for partners

Frequently, partners ask me about getting access to Power BI for a little longer than a trial period. Up until recently the only answer I could give was there was nothing available. However, there is news! Partners can now get access to Power BI through the Partner benefits portal. This blog explains it all: http://blogs.technet.com/b/uspartner_ts2team/archive/2014/07/10/power-bi-for-office-365-now-available-to-competency-partners.aspx

Hope this helps!

Power BI pro tip: using Access Online for data entry

With powerful self-service BI tools such as Power BI comes the need for business user data entry; data does not exist in source systems or does need to be enhanced / enriched before going into the report, or the business user just wants to change the way the data is organized. In those cases (which are present more often than not) we need to find a way to give the business user an easy to use way to do data entry while keeping it robust: i.e. not use a tool the user could easily make mistakes in and hurt the reporting process. You could use Excel but you would have to secure it so no mistakes can be made. Also, SharePoint lists are a good option if you have less than 5000 data rows (that’s the hard limit in SharePoint Online). If you need to store a lot of data and need a robust solution, Access Services or Access Online is a great tool for the job and the best part is it works perfectly with Power BI.

Perhaps the biggest change in Access 2013 is that it now stores that in SQL Server Databases rather than Access files. In this post I will show you how to build a sample application concerning reports on KPIs for production plants around the world. The data is entered by the business user using a web form generated by Access and the dashboard is created using Power BI. So here we go.

First step is to get the data. For that I created a simple Access 2013 application that I published on my SharePoint Online site. The Access application consists of three tables: KPIs, Periods, Plants and of course the actual facts: the KPI Values. On top of this sits a very basic data entry screen that enables the user to enter new actuals and targets for a KPI for a period for a given plant:

I entered some test data and saved the app. Imagine your business user just entering their data in here.

The next step is to get the data out of the SQL database Access Services will store it in and build a report / dashboard on top of it. For this, you will need to go to the Info pane of the File menu in Access. Look for the ‘Manage’ button next to Connections:

If you click it you get a big flyout presenting you with a lot of options. You will need to select the following:

-From My location or From Any location. I chose from Any.

-Enable Read Only connections.

See this screenshot:

Now, click on ‘View Read-Only Connection Information’ and leave it open for now. You will need to later.

Next step is to start Excel, go to Power Query, select From Database à SQL Server (and not Access since data is stored in SQL Server by default in Access 2013).

Copy paste the server and database name from the Connection information screen in Access and choose Ok. In next screen enter your credentials and passwords (again copy/paste from the connection information screen in Access). After a while you can select the table you are interested in and you can load the data into PowerPivot. I loaded my Plants, Periods and Values (I skipped KPIs since it was only the KPI label):

Next step is to create relationships between tables in PowerPivot, hide some columns as well as add a KPI definition. I ended up with this model:

Now, with Power View I created the following basic report (I did not give myself time to work on the layout, this is just quick and dirty):

 

This concludes this Power BI Pro Tip!

Hybrid BI environments with Power BI whitepaper

A new whitepaper highlighting best practices for building hybrid business intelligence environments with Power BI has been released. I highly recommend reading it: http://blogs.msdn.com/b/powerbi/archive/2014/08/26/best-practices-for-building-hybrid-business-intelligence-environments-with-power-bi.aspx

 

Combining text (csv) files using Power Query – follow-up

I have been getting quite a few responses on my original post on how to combine text (csv) files using Power Query. One of the FAQs is how to keep the filename where the data came from in your result set. Said differently: what you want is the contents of all files in a folder plus the filename of the originating files all in one table. I thought it was simply a matter of adding a column, but Nicolas pointed out in a comment that adding a column would create a cross product of all data and all filenames. So, I needed to come up with another solution. The solution I present here might not be the best one, but it works. The magic trick here is knowing that the Csv.Document() function exists. Allow me to explain. First of all, I followed the “normal” approach to list files in the folder using Power Query. What you get is: Now, it might be tempting to expand the ‘Content’ column (as I did in the original post). However, as in this scenario the goal is to get the contents as well as keep the originating filename, we need a different approach. What we need to do is add a custom column that equals the following:

What this is doing is opening the contents in a single column and it expects CSV format. The custom column shows up like this: Next step is to expand the table. The contents will be displayed in your custom column: Then, I did a split on the CSV separator (semi-column in this case), so I ended up with three columns: The only thing left to do is clean up (remove columns) and filter rows (since my CSVs had headers the header from the second CSV is still in my data). The end result is: For your reference, here is my code:

With this I hope Sunflowers and Nicolas are happy J

Macro malware on the rise (again)

I have reported on the subject of macros and how I consider them a security risk multiple times on this blog (see http://dutchdatadude.com/macros-are-dead/ and http://dutchdatadude.com/keep-macros-under-control/). Also, in my talks macros are a frequent topic.

Now, Virus Bulletin (https://www.virusbtn.com/virusbulletin/archive/2014/07/vb201407-VBA) has observed a rise in macro related malware, specifically trojan horses. I encourage you to read the report and see for yourself why macros need to be treated carefully. Maybe, sometime in the future, macros will be really extinct. I certainly hope so. However, before that happens we will need to find a way to bring comparable functionality to end-users without the security problems.

Power BI Pro Tip: Name your linked tables

Just a quick Power BI Pro Tip this time: if you use linked tables to add data to your data model in Excel, before you press the ‘Add To Data Model’ button be sure to go to the table properties in Excel and give your table a better name (better than Table X). This makes figuring out which data you are looking at so much easier. You will thank me later J

Power BI Pro Tip: Show Top x results with RANKX() function

One of the more frequent scenarios is listing the top X results, such as most profitable products, biggest customers, top 10 best selling stores, etc. Also doing a top X selection helps reduce clutter in charts: a lot of data points can work as noise and obscure the data points that really matter and make the biggest impact.

In this post I describe an approach to implementing these scenarios using Power Pivot’s RANKX() function.

Let’s start with a simple dataset consisting of products (P1…P20 in my sample), Cities, Sales Amount and Number of products sold:

 

After adding this table to the Power Pivot data model, we can use the RANKX() function to get the best selling products / cities etc. I added the following measures to my table:

Sum of Sales Amount:=SUM([Sales Amount])

Sum of Number Sold:=SUM([Number Sold])

Rank of products by sales amount:=RANKX(ALL(Sales[Product]);[Sum of Sales Amount])

Rank of city by number sold:=RANKX(ALL(Sales[City]);[Sum of Number Sold])

 

These measures allow me to determine the top selling products by sales amount and best cities by number of products sold.

Only thing left to do is to use a Pivot Table / Pivot Graph or Power View / Power Map visualization and display the results.

 

If you create a new Pivot Table and add the Product column and the ‘Rank of product by sales amount’ measure you get the following:


 

So how do we get the top 10 selling products by sales amount is a nice ordered fashion? Very easy, just a matter of the right sorting and filtering. Click on the little downwards pointing triangle button at Row Labels and choose ‘More Sort Options’. There I chose Ascending and then selected the rank measure:


 

Now the Pivot Table is sorted by rank with the highest ranking product at the top. Now, to filter out only the top ten, we press the same button again and choose Value Filters and then Top 10. Here I made the following selections:


 

This seems maybe a bit counter intuitive, but what this does is return the lowest ten ranks (which would be 1 to 10 or the highest ranking products). Alternatively I could have used a Lower Than or Equal To Value Filter with these settings to produce the same result:

And here it is: a top 10 of products by sales amount.

 

Of course, you can also use Power View or Power Map to visualize these results. Here is a Power View based on the same information:

The trick here is to create the visualization just as normal (as above). Above displays the sales amount by product and the number sold by city. However, the catch here is that both the graph as well as the map have a filter on them that utilizes the rank measures I created. Here is the filter for the chart. The ‘Rank of products by sales amount’ measure is filtered to showing only values less than or equal to 10, i.e. the top 10.

What’s best about this is that it is very easy to change from top 10 to top 15 to top 5 or anything you desire. Also, the Power View is fully interactive. For example, clicking on one of the cities on the right shows which products are sold in that city. Note that it does not show the top 10 products in that city however.

Hope you liked this Power BI Pro Tip!