29 Sep 2015
The pricing table on the Power BI website does a good job at explaining when a free account is acceptable and when a pro account is required. However, it does not explain all nor is really clear (in my opinion). So, after some digging I came up with this: an step-wise wizard that helps you determine if you can use a free account for Power BI or if you need pro (below); simply answer a series of Yes/No questions and you will know if you can use free or really need pro. Please note that this is no official communication and by no means I am responsible for any errors. Use this at your own risk.
Enjoy!
22 Sep 2015
Azure Data Factory provides a great number of data processing activities out of the box (for example running Hive or Pig scripts on Hadoop / HDInsight).
In many case though, you just need to run an activity that you already have built or know how to build in .NET. So, how would you go about that? Would you need to convert all those items to Hive scripts?
Actually, no. Enter Custom .NET activities. Using this you can run a .NET library on Azure Batch or HDInsight (whatever you like) and make it part of your Data Factory pipeline. Regardless of whether you use Batch or HDInsight you can just run your .NET code on it. I prefer using Batch since it provides more auto-scaling options, is cheaper and makes more sense to me in general; I mean, why run .NET code on a HDInsight service that runs Hive and Pig? It feels weird. However, if you already have HDInsight running and prefer to minimize the number of components to manage, choosing HDInsight might make more sense than using Batch.
So, how would you do this? First of all, you would need a custom activity. For this you will need to .NET class library and need to extend IDotNetActivity interface. Please refer to https://azure.microsoft.com/en-us/documentation/articles/data-factory-use-custom-activities/ for details. Trust me, it is not hard; I have done it.
Next, once you have a zip file as indicated on the page above, make sure to upload it to a Azure Blob store you can use later. The pipeline will need to know what assembly to load from where later on.
You will need to create an Azure Batch account and pool if you use that. If you decide to use HDInsight either let ADF spin one up on demand or make sure you have your HDInsight cluster ready.
You will need to create input and output tables in Azure Data Factory, as well as linked services to Storage and Batch or HDInsight. Your pipeline will look a bit like this:
{
"name": "ADFTutorialPipelineCustom",
"properties": {
"description": "Use custom activity",
"activities": [
{
"Name": "MyDotNetActivity",
"Type": "DotNetActivity",
"Inputs": [
{
"Name": "EmpTableFromBlob"
}
],
"Outputs": [
{
"Name": "OutputTableForCustom"
}
],
"LinkedServiceName": "AzureBatchLinkedService1",
"typeProperties": {
"AssemblyName": "AzureDataFactoryCustomActivity.dll",
"EntryPoint": "AzureDataFactoryCustomActivityNS.AzureDataFactoryCustomActivity",
"PackageLinkedService": "AzureStorageLinkedService1",
"PackageFile": "adfcustomactivity/customactivitycontainer/AzureDataFactoryCustomActivity.zip",
"extendedProperties": {
"SliceStart": "$$Text.Format('{0:yyyyMMddHH-mm}', Time.AddMinutes(SliceStart, 0))"
}
},
"Policy": {
"Concurrency": 1,
"ExecutionPriorityOrder": "OldestFirst",
"Retry": 3,
"Timeout": "00:30:00",
"Delay": "00:00:00"
}
}
],
"start": "2015-09-07T14:00:00Z",
"end": "2015-09-07T18:00:00Z",
"isPaused": false
}
}
Switching from Batch to HDInsight means to changing the LinkedServiceName for the activity to point to your HDInsight or HDInsight on demand cluster.
Tables are passed to the .NET activity using a connection string, so essentially if you have both input and output tables defined as blob storage items, your custom assembly will get a connection string to the blob storage items, read the input files, do its processing and write the output files before passing on the control to ADF.
Using this framework the sky is the limit: anything you can run in .NET can now be part of your ADF processing pipeline…pretty cool!
15 Sep 2015
A little while ago an R package for AzureML was released, which enables R users to interface with Azure Machine Learning (Azure ML). Specifically, it enables you to easily use one of the coolest features of Azure ML: publishing and consuming algorithms / experiments as web services.
Check it out: https://cran.r-project.org/web/packages/AzureML/vignettes/AzureML.html.