Combining text (csv) files using Power Query – follow-up

I have been getting quite a few responses on my original post on how to combine text (csv) files using Power Query. One of the FAQs is how to keep the filename where the data came from in your result set. Said differently: what you want is the contents of all files in a folder plus the filename of the originating files all in one table. I thought it was simply a matter of adding a column, but Nicolas pointed out in a comment that adding a column would create a cross product of all data and all filenames. So, I needed to come up with another solution. The solution I present here might not be the best one, but it works. The magic trick here is knowing that the Csv.Document() function exists. Allow me to explain. First of all, I followed the “normal” approach to list files in the folder using Power Query. What you get is: Now, it might be tempting to expand the ‘Content’ column (as I did in the original post). However, as in this scenario the goal is to get the contents as well as keep the originating filename, we need a different approach. What we need to do is add a custom column that equals the following:

What this is doing is opening the contents in a single column and it expects CSV format. The custom column shows up like this: Next step is to expand the table. The contents will be displayed in your custom column: Then, I did a split on the CSV separator (semi-column in this case), so I ended up with three columns: The only thing left to do is clean up (remove columns) and filter rows (since my CSVs had headers the header from the second CSV is still in my data). The end result is: For your reference, here is my code:

With this I hope Sunflowers and Nicolas are happy J

Macro malware on the rise (again)

I have reported on the subject of macros and how I consider them a security risk multiple times on this blog (see http://dutchdatadude.com/macros-are-dead/ and http://dutchdatadude.com/keep-macros-under-control/). Also, in my talks macros are a frequent topic.

Now, Virus Bulletin (https://www.virusbtn.com/virusbulletin/archive/2014/07/vb201407-VBA) has observed a rise in macro related malware, specifically trojan horses. I encourage you to read the report and see for yourself why macros need to be treated carefully. Maybe, sometime in the future, macros will be really extinct. I certainly hope so. However, before that happens we will need to find a way to bring comparable functionality to end-users without the security problems.

Power BI Pro Tip: Name your linked tables

Just a quick Power BI Pro Tip this time: if you use linked tables to add data to your data model in Excel, before you press the ‘Add To Data Model’ button be sure to go to the table properties in Excel and give your table a better name (better than Table X). This makes figuring out which data you are looking at so much easier. You will thank me later J

Power BI Pro Tip: Show Top x results with RANKX() function

One of the more frequent scenarios is listing the top X results, such as most profitable products, biggest customers, top 10 best selling stores, etc. Also doing a top X selection helps reduce clutter in charts: a lot of data points can work as noise and obscure the data points that really matter and make the biggest impact.

In this post I describe an approach to implementing these scenarios using Power Pivot’s RANKX() function.

Let’s start with a simple dataset consisting of products (P1…P20 in my sample), Cities, Sales Amount and Number of products sold:

 

After adding this table to the Power Pivot data model, we can use the RANKX() function to get the best selling products / cities etc. I added the following measures to my table:

Sum of Sales Amount:=SUM([Sales Amount])

Sum of Number Sold:=SUM([Number Sold])

Rank of products by sales amount:=RANKX(ALL(Sales[Product]);[Sum of Sales Amount])

Rank of city by number sold:=RANKX(ALL(Sales[City]);[Sum of Number Sold])

 

These measures allow me to determine the top selling products by sales amount and best cities by number of products sold.

Only thing left to do is to use a Pivot Table / Pivot Graph or Power View / Power Map visualization and display the results.

 

If you create a new Pivot Table and add the Product column and the ‘Rank of product by sales amount’ measure you get the following:


 

So how do we get the top 10 selling products by sales amount is a nice ordered fashion? Very easy, just a matter of the right sorting and filtering. Click on the little downwards pointing triangle button at Row Labels and choose ‘More Sort Options’. There I chose Ascending and then selected the rank measure:


 

Now the Pivot Table is sorted by rank with the highest ranking product at the top. Now, to filter out only the top ten, we press the same button again and choose Value Filters and then Top 10. Here I made the following selections:


 

This seems maybe a bit counter intuitive, but what this does is return the lowest ten ranks (which would be 1 to 10 or the highest ranking products). Alternatively I could have used a Lower Than or Equal To Value Filter with these settings to produce the same result:

And here it is: a top 10 of products by sales amount.

 

Of course, you can also use Power View or Power Map to visualize these results. Here is a Power View based on the same information:

The trick here is to create the visualization just as normal (as above). Above displays the sales amount by product and the number sold by city. However, the catch here is that both the graph as well as the map have a filter on them that utilizes the rank measures I created. Here is the filter for the chart. The ‘Rank of products by sales amount’ measure is filtered to showing only values less than or equal to 10, i.e. the top 10.

What’s best about this is that it is very easy to change from top 10 to top 15 to top 5 or anything you desire. Also, the Power View is fully interactive. For example, clicking on one of the cities on the right shows which products are sold in that city. Note that it does not show the top 10 products in that city however.

Hope you liked this Power BI Pro Tip!

Version compatibility between Power Pivot Data Models in Excel 2010 / 2013 and SharePoint 2010 / 2013

I have been getting a lot of questions on compatibility around Excel 2010 / Excel 2013 / SharePoint 2010 / SharePoint 2013. To be honest, I have been confused myself.

I encourage you to check out our Excel help page, which makes it crystal clear:

Note to SharePoint Server 2010 customers: Client applications used to create a Data Model need to align with the server applications that host a Data Model. For SharePoint Server 2010, this means you’ll need to continue to use Excel 2010 and a Power Pivot for Excel add-in to create and maintain a Data Model. Excel 2013 cannot be used to create Data Models that run on SharePoint Server 2010.

This means that you can upload an Excel 2013 file with a Power Pivot data model in it to SharePoint. However, if you interact with it (click refresh, click a slicer, etc) you will get an error message.

It does not get any more specific than that, right?

 

 

You will not believe these sites have been built with SharePoint

I get myself in many discussions about SharePoint as application platform for public / customer facing sites. The reason I end up in the discussion is because people are looking to use our great BI tooling to provide insight to their customers.

Most people do not believe that SharePoint is used to create internet sites (most people just know SharePoint as a collaboration platform).

Here is a great site that gives you an idea of what websites were created with SharePoint and it even features a great BI visualization of the data (PivotViewer). I encourage you to go check it out!

http://www.spsdemo.com/livepivot

SPS

My personal favorite is http://www.ferrari.com. Can you believe it has been built using SharePoint?

Ferrari

Using Power BI to analyze the BI market

Disclaimer: Yes, I know the Gartner Magic Quadrants normally should be bought to be able to use. The information in this post however is based on the following web search: http://www.bing.com/images/search?q=gartner+magic+quadrant+bi&FORM=AWIR which happens to return the Gartner Magic Quadrants…

Recently I thought about the Gartner Magic Quadrants and how they offer a great view on the BI market and the BI vendors. However, they provide static views and it is hard to spot developments over time. With that, I thought to myself that an animated scatter plot would be just the thing to make these MQs much more interesting.

Power View (part of Power BI) provides this kind of visualization and here is what I did.

First, I followed a very advanced procedure to get coordinates for each BI vendor for each year in each MQ. This advanced procedure involved complex machine learning resizing the MQ images to 100×100 pixels and me manually writing down all the coordinates. This resulted in the following table:

Vendor Date Xcoord Ycoord Completeness Of Vision Ability To Execute
arcplan

1-1-2007

18

73

-32

-23

Panoroma Software

1-1-2007

37

74

-13

-24

Spotfire

1-1-2007

32

73

-18

-23

Applix

1-1-2007

29

68

-21

-18

Actuate

1-1-2007

46

54

-4

-4

 

The vendor is or course the name of the BI vendor in the MQ. Date is the year the MQ was released (I did not take into account months). Xcoord and Ycoord are the X and Y coordinates respectively and the next two columns (Completeness Of Vision and Ability To Execute) takes Xcoord and Ycoord and substract 50 from it so the midpoint for the scatter plot is 0.

Although by now I would have been ready to plot the data I decided to make it a bit more interesting by also including market capitalization for each vendor. In order to do this I added another table which looked like this:

Vendor Stock
arcplan PRIVATE 1
Panoroma Software PRIVATE 2
Spotfire PRIVATE 3
Applix PRIVATE 4
Actuate BIRT
QlikTech QLIK
MicroStrategy MSTR
SAS PRIVATE 5

 

Using the Stock Indicator I used Power Query to get the stock information for the “date” the MQ was published (which I fixed to 1st of January of that year). The data I got from http://www.quandl.com which provides CSV downloads per stock (for example the historical market capitalization for MSFT can be retrieved from http://www.quandl.com/api/v1/datasets/DMDRN/MSFT_MKT_CAP.csv). What I needed next was a function that would retrieve the market capitalization data for a given stock label. I create a function called MarketCap that takes a stock label as input and then reaches out to Quandl to download the CSV. Next it does some formatting and returns the data. Here is the code:

All I needed to do was get Power Query to iterate over all stock labels and call this function for every stock label. I used my Excel table shown above as source and inserted a customer column that called the MarketCap function with the stock label. Next, I expanded the returned column and removed some columns. The code is shown below:

Next, I loaded all of this into PowerPivot and created a relationship between the MQ scores table and the Vendor table.

However, I could not create a relationship between the MarketCapData and the Vendor table, since I would need to draw two relationships: one on Vendor and one on date (or actually year). However, I still wanted to get the Market Capitalization per vendor for the moment the MQ was released. Luckily, we have LOOKUPVALUE (which I discussed earlier). With LOOKUPVALUE I added a calculated column to the Scores table to retrieve the Market Capitalization:

Next I went ahead and did some cleanup (hide columns / tables).

Finally, it was time to build the visualization using Power View:

What you see here is a line graph showing the market cap data and a scatter plot with play axis on Date that shows the scores per vendor for Completeness Of Vision (horizontal axis) and Ability To Execute (vertical axis). I could have used market cap data for the size of each vendor’s indicator but that makes the chart unreadable. So now we can go back in time and see all the MQs for BI one after the other in a nice animation:

Notice how in the video I show the development of Microsoft over time. Also see that the top chart responds to the selection I made in the scatter plot.

This wraps up this post. Hope you liked this demonstration of meta-BI using Power BI!

Power BI Pro Tip: LOOKUPVALUE() function

Power Pivot is a great way to do data modelling and analysis right in Excel. It works great for data that is dimensionally organized (facts and dimensions) as well as other forms of data. It even enables you to define relationships between datasets regardless of source. However, one thing that has been hard is the following: consider the scenario where you have stock values for certain stocks for certain days, like below:

 

Let’s also assume you have a table that shows marketshare (or something else) per company on a certain date, like this:

Finally, you have a third table that lists the stock label by company, like so:

Now, assume that you would like to add the stock value of a company on a certain date next to the market share for that company at that date so as to provide more context to a potential relationship between market share and stock value. Maybe a bigger market share has an impact on stock value?

Naturally, what you would do is load these tables into Power Pivot so you get the following:

Now, the next step would be to add relationships between these two tables. The relationships should be defined as follows:

I.E.: Stock labels and Market shares are related on the Company column, whereas Stock and Stock Labels are related on the Stock / Stock label column.

We can now try to get the stock value for the company at a certain date, but how? Just using RELATED() to get stock values will not work as it will return a table. You could use MAX or MIN to then get a maximum or minimum value, but that is not what we are after: we wanted to return the stock value for that company at exact that date. More generally, this problem occurs when a table is related “twice” to another table, such as monthly targets by person vs. actuals (the relationship between the actual and target table is double: both on month as well as person).

The solution is using LOOKUPVALUE() and here is how. In the Marketshares table I add a calculated column with the following definition:

 

 

This might seem complex, so allow me to explain. What this does is the following:

Look up and return a value from

The Value column in the Stocks table (Stocks[Value])

For which

The stock label is equal to the stock label on record for the company (RELATED(‘Stock Labels’[Stock Label]))

And

The date equals the date of the market share information.

The result is:

Pretty nifty huh? Turns out that to use LOOKUPVALUE() this way you do not even have to be able to relate the lookup table to the data model at all. In my example the relationship between Stocks and Stock Labels is not even necessary, although I find it good practice to include all relationships just for clarity.

Power BI Pro Tip: DIVIDE() function

If you ever used Power Pivot to calculate things such as sales amount per capita or averages of some sort you will have run into the situation that the denominator (the column you want to divide by) is empty or zero. To cope with a potential division by zero, Power Pivot outputs Infinity. This can be seen in the screenshot below where the ‘Per Capita’ column is defined as:

Of course you can fix this by using

and others to work around the error. However, the DIVIDE() function makes this all a lot easier!

The DIVIDE() function takes two required and one optional parameters, which are: numerator, denominator and an optional value to return when division by zero occurs.

To see what DIVIDE() does, consider the following screenshot:

The Divide1 column here is defined as:

Whereas the Divide2 column contains the following function: 

 

The results are great, no Infinities are returned! By default DIVIDE() returns empty in case of a problem (Divide1 column). You can override this by specifying the third parameter so to return a fixed value in cased of a problem (Divide2 column in my example).

Hope this helps!