Webinar on Azure Machine Learning

Last week I did a live webinar on Azure Machine Learning. The webinar can serve as a introduction into the subject of machine learning, data mining, predictive analytics as well as Microsoft’s solution for it: Azure Machine Learning.

Watch the recording here (in Dutch, sorry).

Power BI Pro Tip: making date / time calculations work (Time Intelligence)

Ever so often I get asked how to do a year-over-year, quarter-over-quarter, month-over-month or year-vs-year calculation in Power BI. In most cases people would like to create a KPI to measure a certain periods performance compared to another. Power BI (specifically DAX) provides great functions for this; the Time Intelligence functions. In this scenarios PREVIOUSMONTH, PREVIOUSYEAR, SAMEPERIODLASTYEAR are used most. However, there are some frequent mistakes that result in errors when using these functions:

1) You will need to have a Date table in your model. Technically you do not need one, but you need to make sure the column you use for the time based calculations contains only unique values/dates. This is often not the case with sales happening more than once a day! Once you have the date table in the model, make sure to create a relationship between your facts date (for example sales date) and the date table.

2) The time intelligence functions should really be used as measures; not as calculated columns. This means their position in the Excel PowerPivot 2013 screen is under the horizontal line, not in the columns above.

3) Time intelligence functions work best when using totals, averages or other aggregated info.

Here are some examples. I will use ‘Date’[Date] as the reference to my date column in my date table. Also, for a best practice I split the calculation in two parts: the first part just calculates the total sales, while the other calculations refer to that base calculation.

Sales:=SUM([SalesAmount])
SalesPreviousYear:=[Sales](PREVIOUSYEAR('Date'[Date]))
SalesPreviousMonth:=[Sales](PREVIOUSMONTH('Date'[Date]))
SalesSamePeriodLastYear:=[Sales](SAMEPERIODLASTYEAR('Date'[Date]))

Note that the last one uses SAMEPERIODLASTYEAR which is more flexible as it will select the same day in the previous year or the same month in the previous year depending on the selection the user makes in the tables/graphs. This is however not always what you want; so you can make it more specific by using PREVIOUSYEAR / PREVIOUSMONTH etc.

You could also use DATEADD to be even more flexible:

Sales14DaysBack:=[Sales](DATEADD('Date'[Date],-14,DAY))

Notice by the way that I tend to use the short-hand notation to prevent me from having to type CALCULATE all the time (yes I am lazy :)). Here is an example of the two ways to get the sales for the previous year, the first line is the short-hand, the second is the more elaborate but not less correct option:

SalesPreviousYearShortHand:=[Sales](PREVIOUSYEAR('Date'[Date]))
SalesPreviousYear:=CALCULATE([Sales];PREVIOUSYEAR('Date'[Date]))

Hope this helps!

 

Loading multiple JSON files using Power Query

I had to figure out recently how to load multiple JSON files using Power Query. It turned out to be less easy than expected, so I figured it is worth blogging about…

The scenario: I have multiple JSON files sitting in a container in Azure Blob Storage; I would like to load them all into a data model for use in Power BI. I am assuming all the files you want to load are in one container. My solution will not work for multiple containers.

I will be using Power Query for this, from the Power BI Designer. You could do the same using Power Query in Excel.

First, let’s connect to the blob storage. This part is easy. Just click Get Data à More in the Power BI Designer and then select ‘Azure’ and then choose Microsoft Azure Blob Storage and click Connect:

 

In Excel, navigate to the Power Query tab, select From Azure à From Microsoft Azure Blob Storage:

 

 

You will need to enter your Azure Storage account name and key. Next, you will see a list of containers in the blob storage. Select the container the data is in and choose Edit:

 

What we will need to do is create a function that loads the JSON files. To do this we use an approach similar to loading multiple Excel or CSV files (see here and here respectively): first we just load one file and then we convert it into a function which we will call for all files we want to load.

So first, click on ‘Binary’ in the first column for one of the rows representing a JSON file. You will a one column table listing all records in the JSON file (the exact number of rows changes with the length of the JSON file):

What you want to do is convert the records into a table by clicking on the button:

You will probably see a ‘to Table’ dialogue, allowing customization of the conversion; for JSON you normally should not have to change the defaults, so click OK.

Next step is to expand the resulting Column1 to see some actual data. To do this click the expand button to the right of the column header and click OK (I deselected the ‘use original column name as prefix’ option):

And voila: a nice looking table of the records in this JSON file:

We are not done however; this was the easy part. Remember we need to create a function that will enable us to iterate over multiple files.

To start editing the code hop over to the Advanced Editor (ViewàAdvanced Editor). Your code should look something like this: (Your last line will be different from mine since it is dependent on the contents of the JSON)

letSource = AzureStorage.Blobs("yourstorageaccount"),
container = Source{[Name="yourcontainer"]}[Data],
contents = container{[#"Folder Path"="https://yourstorageaccount.blob.core.windows.net/test/",Name="yourjsonfilename"]}[Content],
#"Imported JSON" = Json.Document(contents),
#"Table from List" = Table.FromList(#"Imported JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expand Column1" = Table.ExpandRecordColumn(#"Table from List", "Column1", {"date", "product", "store", "counter", "customer", "amount"}, {"date", "product", "store", "counter", "customer", "amount"})
in
] #"Expand Column1"

First, we will need to wrap this in a function, so add this line at the top:

let LoadJSON = (path,name) =>

Then, add this at the bottom:

in LoadJSON

We need to edit the line that defines ‘contents’ to look like this:

Contents = container{[#"Folder Path"=path,Name=name]}[Content]

Your code should look like this:

let LoadJson = (path,name) =>
let Source = AzureStorage.Blobs("yourstorageaccount"),
container= Source{[Name="yourcontainer"]}[Data],
mysource = container{[#"Folder Path"=path,Name=name]}[Content],
#"Imported JSON" = Json.Document(mysource),
#"Table from List" = Table.FromList(#"Imported JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expand Column1" = Table.ExpandRecordColumn(#"Table from List", "Column1", {"date", "product", "store", "counter", "customer", "amount"}, {"date", "product", "store", "counter", "customer", "amount"})
in #"Expand Column1"
in LoadJson

Click ‘Done’ and give the query a descriptive name (I suggest naming it the same as the function: LoadJSON)… pfew, that was not too bad right? So now, let’s use this function on all our JSON files. Let’s do the same as we did at the start; connect to the blob and stop at the screen where you have a list of files in the container:

Since we only can apply the function to JSON files, my first step is to filter on the Extension being just ‘.json’:

Then, we need to get rid of all the columns except Name and Folder Path. To do this, select the columns to keep and choose Remove Other Columns.

Now, let’s call the function and pass in the path and name parameters. Insert a custom column (Add Column à Add Custom Column) with the following setting:

LoadJson([Folder Path],[Name])

Then, we need to expand the resulting custom column by clicking on the little expand button again:

Click ‘OK’. Now you have all the contents visible. To clean up lets delete the Name and Folder Path column since we do not need them anymore. Since this is JSON you will probably want to fix data types before reporting on this.

And…. You’re done, how cool is this?

CTP2 of the new SQL Server 2016 has been released to public

With this new beta release of SQL Server we get a better insight in what we can expect in the full product. Some of the top capabilities in SQL Server 2016 CTP2 are:

  • Real-time Operational Analytics & In-Memory OLTP, enhanced for up to 30x faster transactions for a greater number of applications, customers can configure the in-memory columnstore to work on top of a transactional database to achieve real-time operational analytics with breakthrough OLTP performance.
  • Always Encrypted helps protect data at rest, in motion and while in use, on-premises and in the cloud. With Always Encrypted, SQL Server can perform operations on encrypted data. Best of all the encryption key resides with the application in the customer's trusted environment.
  • Stretch Database technology keeps historical data at users' fingertips by transparently stretching warm and cold data in a more secure manner to Microsoft Azure on demand without application changes.

I advise you to take a special look into the stretched databases.

It is a technique originating in Azure where it’s already very easy to scale up or down on a working database. And not only on a single database! Also a set of databases (called Elastic Pool) can be tailored to meet the per time different demand of power for each individual database. A brand new concept which you could use in use-cases like having a database per customer or project which scale needs are different in time and in database.

In my next blogpost I will dig deeper on the new concept of elastic database.

Harry

Automatically building a Microsoft BI machine using PowerShell – preparation: install files using Azure File Service (post #3)

This post is #3 in the series to automatically build a Microsoft BI machine using PowerShell – see the start of series.

 

In this series so far:

Start of series – introduction and layout of subjects Post #2 – Preparation: install files using Azure disk

 

 

In our last post we looked a one way of working with the install files required for automating the installation of a BI machine, using disks. This post will focus on sharing the install files using Azure File Service. The Azure File Service exposes file shares using the standard SMB 2.1 protocol. It is in some ways an addition to storage accounts. See http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/12/introducing-microsoft-azure-file-service.aspx for more information. This service is in beta at the moment, so you will need to subscribe to the beta using the Azure Preview portal: http://azure.microsoft.com/en-us/services/preview/. Look for ‘Azure Files’ in the list and click on ‘Try it’ to get your account activated for the preview.

The Azure File Service is not exposed in any portal, probably since it is in preview. Also, keep in mind that while the service is in preview existing storage accounts will not have access to the File Service, so we will need to create a new storage account as well. To do this login into the portal and click on ‘New’ and create a new storage account. After the storage account has been created, you will need to use PowerShell to create a file share. Make sure you have the latest version of Azure PowerShell installed and then run the following in Azure PowerShell or use ISE:

$storageAccountName  = "YourStorageAccountName"
$storageAccountKey = "YourStorageAccountKey" 
$ctx = New-AzureStorageContext $storageAccountName $storageAccountKey
$s = New-AzureStorageShare YourShareName -Context $ctx

After this runs you should be able to access the file share in multiple ways, but the easiest way I found is mapping the share as a folder in a VM by running:

Net use e: \\YourStorageAccountName.file.core.windows.net\YourShareName /u:YourStorageAccountName YourStorageAccountKey

Now you can download and store files on the share just as you can with disks, as discussed in post #2 on using install files using Azure disks.

Next post will be our final step of the preparation: logging.