12 Nov 2014
Big Data en Analytics zijn de populaire termen die je overal hoort. Iedereen heeft er de mond van vol. Vaak komen daar nog een of meerdere van de volgende termen bij: Machine Learning, Predictive Analytics, Self-Service BI, Advanced Visualization en Data Exploration.
Het lijkt zo langerzamerhand wel of al die zaken een doel zijn geworden in plaats van een middel. Onlangs zei een klant tegen mij: “wij doen aan Big Data”. Omdat ik wilde weten wat het doel van die effort was, zei ik: “Stoer, maar waarom?” Het antwoord was:
“Hoezo waarom? Nu we aan Big Data doen, worden we succesvol!”
Wij (technologie leveranciers, Big Data enthousiastelingen, adviesbureaus, consultants, noem maar op) vergeten soms dat al die technologie slechts een middel is in de handen van mensen om een doel te bereiken. Het doel is dan vaak het optimaliseren van de bedrijfsvoering of op een andere manier een economisch voordeel te behalen. Om dat doel te bereiken zijn de technologieën een fantastische aanzet, maar niet genoeg. Mensen maken of breken de inzet van de technologie. Data does not lie, interpretation does. Oftewel: de data kun je wel hebben, maar het is de interpretatie die maakt of het een succes wordt. De vraag “wat zegt de data” is al lang niet meer de enige vraag die gesteld moet worden, maar vooral ook “wat kan ik hiermee” en “hoe communiceren we het”.
Als we samen Big Data e.d. een succes willen laten worden dan moeten we investeren in mogelijkheden om het gebruik, de inbedding en het beschikbaar stellen van informatie gemakkelijk te maken en ook leuk. Want data is leuk.
Onlangs kreeg ik om 22:30 een sms van een data analist bij een klant. Ik had hem die dag gesproken en had hem uitgerust met wat ideeën over hoe hij de middelen kon gebruiken en naar zijn hand zetten. De sms luidde: “Ben net ‘wakker’ geworden achter mijn laptop op kantoor. Het is hier helemaal donker en iedereen is al lang naar huis. Normaal zou ik als een van de eerste buiten staan na de werkdag, maar nu heb ik me enorm vermaakt en ben ik de tijd vergeten!”
Ondanks dat doorwerken tot 22:30 niet echt gezond is en ik dat zeker niet aan iedereen wil aanraden, gun ik wel iedereen die met data werkt die magische creatieve kriebel als je iets moois aan het maken bent; ach, en als je daardoor eens de tijd vergeet, daar zou ik mee kunnen leven…
Op maandag 24 en dinsdag 25 november wordt weer het BI & IM Symposium gehouden (http://www.bisymposium.nl); het grootste evenement in zijn soort van de Benelux. Graag zie ik je daar.
04 Nov 2014
This is the first post of my new first look series. This series focusses on new products, recent announcements, previews or things I have not had the time to provide a first look at and serves as introduction to the subject.
First look posts are fairly short and high level.
Today in first look: Azure Data Factory. This service was only recently announced as is available to all Azure customers in preview mode. To get a hold of it make sure you open the Azure preview portal. In your classic Azure portal click on your email address in the top right and choose ‘Switch to new portal’ or go directly to https://portal.azure.com.
So what is Azure Data Factory? I may be downplaying it a bit, but essentially Data Factory gives you ETL in the cloud. It connects to both on premises as well as cloud data stores and enables you to read data from the stores, do transformations and publish data in stores, while at the same time providing rich analytics on how the data flow is doing. The paradigm here is a factory floor: pieces of data enter the factory floor as raw materials, they undergo some treatment (transformations) and go out the door at the other end of the floor as finished product. The canvas of Data Factory closely resembles this floor and shows an assembly line for data. Here is very simple example, which retrieves data in hourly batches from a table in Blob Storage and stores it in a SQL Azure table:

More info is available on http://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/.
More to come!
21 Oct 2014
This post serves as a follow-up on my Power BI Pro Tip about using RANKX to show Top X results. I am writing this because I discovered that there is a lot of confusion about the RANKX() versus the TOPN() function.
Let me try to explain what each function does. The RANKX() function ranks individual data rows according to a certain ranking attribute. It’s result is a numerical value associated with each and every single row of the data table, as you can see below:

Now, the TOPN() function sounds the same, right? That’s were the confusion comes from. Also, the call to the function is really similar:
RANKX( ; )
TOPN( ; ; )
However, TOPN does not return a value for each row in the data table. It returns a table that contains the top N items (N is the number you specified in the first argument) from the original data table according to the ranking attribute you specified.
In itself this is pretty useless, since you cannot display this data in any way. If you could it would be an alternative way to get a top N ranking to RANKX.
To make TOPN useful you need to wrap it in another function, such as SUMX or AVERAGEX. Let’s see an example:
MyMeasure := SUMX(TOPN(10;Sales;Sales[Sales Amount]);Sales[Sales Amount])
Now, MyMeasure equals the sum of sales amount for the best performing cities. At this point I do not know which cities it were, and maybe that is not even important to you. The total sales amount for the top 10 performers is returned.
This comes in handy when benchmarking an individual or organization against a bigger population. You could do an AVERAGEX of the TOPN result and that would be the average score for the top 10 performers. A dashboard showing how an individual measures up against the top 10 is then quickly created.
14 Oct 2014
Just wanted to share this with you. Kenneth Cukier talks about big data and its future, risks and issues. I found it very inspiring! Also interesting to see that he reads the same books as I do: we use the same examples :)
http://www.ted.com/talks/kenneth_cukier_big_data_is_better_data#t-403254
07 Oct 2014
If you use Power Query to read an Excel file that has errors (such as #VALUE, #REF, #N/A, etc) the rows that contain the error will not be loaded into the data model but instead will be flagged as error rows. But what to do if you really want to keep the data rows with the errors with some replacement value for the error instead of skipping the error row altogether?
To demonstrate this I created this simple Excel table that contains only errors (actually all errors that Excel can generate that I am aware of):

How they are generated and what they mean is not the point here, but I am sure most of you have seen some if not all of them before.
On loading this table using Power Query all five rows will be marked as error rows:

As a result, no data is loaded into your data model. This is as expected since Power Query filters out the rows that have an error, which in this case are all of the input rows.
However, in some cases this is not what you want. Suppose this was a 50 column dataset and that in one column sometimes an error occurs. Then do you want to disregard the rows with errors and thus delete all possible valuable info in the rows or do you want to somehow fix the errors and flag them as problematic but still load them? The latter is sometimes the better choice.
This can easily be done by editing the Power Query and using the Replace Errors function (on the Transform tab):

Make sure you have the column with the errors in it selected, enter a valid value for the data type in the column (in my case it was Any since there is really no other data available in the MyCol) and hit OK.
Power Query will replace the error with the value you entered:

Now, on loading the rows with errors will end up in the data model and your data will be flagged with the label you chose.
Hope this helps! Until next time!