First look: Esri ArcGIS Maps in Power BI

29 Sep 2016

Esri is a leader in the GIS industry and ArcGIS is a very popular product to build great maps. Now, you can use ArcGIS maps in Power BI (in preview). See the official information here: https://powerbi.microsoft.com/en-us/blog/announcing-arcgis-maps-for-power-bi-by-esri-preview/. This is really cool, I know a lot of you have been asking for this for a long time!

You will find the option to enable this preview in PowerBI.com, not in the Power BI Desktop. Log in to PowerBI and open the settings. You can find the ArcGIS preview there and enable it by simply selecting the checkbox:

With that enabled, create a report with some geographical information (or edit an existing one). I used the Google Analytics data that keeps track of my blog. Google Analytics data can be loaded into Power BI simply by using the content pack. In edit mode in the report you will find the ArcGIS component in the Visualizations list:

Click it and create your map as you would with the normal map. I noticed it needs some time to build the map (probably due to the preview) but once it is done it is fully interactive with the other items on your report as you would expect:

You can change a lot of the ArcGIS options, such as switching out maps, changing symbol styles, adding reference layers, etc.

I love this – the awesome power of ArcGIS and Power BI combined! I cannot wait to see what you will create with this.

Adding sequence numbers using R in Azure ML

16 Aug 2016

When going through data preparation sometimes sequence numbers need to be added. If you are like me, you probably spent some time looking for a component in Azure ML to do this. I never found it.

Turns out it is really easy to do this in R and as a result also very easy to do in Azure ML.

In your experiment, add an Execute R Script component and connect it to the data flow.

Edit the script and add a column to the dataset that equals:

seq.int(nrow(dataset1))

See my code example:

# Map 1-based optional input ports to variables]
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset1$time=seq.int(nrow(dataset1)) 
# Select data.frame to be sent to the output Dataset port 
maml.mapOutputPort("dataset1");

On the third line the column is added and defined as a sequence number. The resulting dataset indeed has an extra column (called time) that like this:

The small histogram at the top and the details that right confirm it has only unique values and starts at 1; our sequence column has been added!

Power BI learning resources – follow up 3

05 Aug 2016

Another great resource for learning about Power BI is the course on EDX: Analyzing and Visualizing Data with Power BI. Granted, has been around for a while, but I forgot blogging about it; maybe it is a bit easier to find now.

Enjoy the new skills you will learn with this!

Power BI Refresh scenarios against data storage solutions explained

07 Jun 2016

A recurring theme with customers and partners is Power BI data refresh. More specifically, there is some confusion on what refresh scenario requires a Pro version and what can be done with the Free version of Power BI. I made the diagram below to help explain this. It shows the refresh scenarios against data storage solutions, such as SQL Azure, SQL in a virtual machine in Azure or SQL server on premises. I used these as examples, there are other options as well. I think the overall time carries over to other data storage solutions. The diagram shows the refresh that can be done using a Power BI Free account as orange and the refresh scenarios that need Power BI Pro as green lines. As shown in the diagram, if you want to refresh against on-premises sources or a database running in a VM in Azure you will need a gateway and Power BI Pro. This applies not only to the creator of the report and schedule but also to every consumer. If you use PAAS solutions for data storage in Azure such as SQL Azure, it becomes a bit more difficult and it is really dependent on the type of refresh required. If you need a refresh cycle higher than once a day (either max 8 times per 24 hours or live) you will need Power BI Pro. If you just want to refresh against such as SQL Azure and once a day is enough you can do that using Power BI Free. Again, the license requirement carries over from author to viewer; if the author of the report requires Pro, then the viewers also need Pro.

Power BI Refresh scenarios against data storage solutions

Hope this helps. If you have any questions or feedback, please comment below!

Roles in a data driven organization

12 Apr 2016

All this talk all the time about Big Data and Advanced Analytics is all well and good, in fact it is something I do most of my time. The technology is there and has great potential. The biggest question now is how to use these technologies to their full extent and maximize the benefits of the technologies for your organization. The answer lies in becoming a data driven organization.

A data driven organization is an organization that breathes data, not only in the sense of producing data, but also in the sense of analyzing, consuming and really understanding data, both their own as well as the data others can provide. In order to have a sense to become a data driven organization, you will need to change People, Process and Technology. There is enough talk about the Technology in the market already (and on this blog), so I will come back to that later and not go into much detail now. Let’s look at the other two: People and Process. I view Process as very much related to People: bringing in new skills without the proper Process in place for how to work with them and for the new People to work together will not be very useful.

So, what People do you need? In other words: what roles do you need in a data driven organization? I see four required roles in any organization that wants to be more data-driven. This is not to say that these four roles should be four different people; it is very well possible that someone might take on more than one role. I am however confident that there exist very few people who will able to do all four roles since each requires specific skills, focus and passion.

The four roles are: Wrangler, Scientist, Artist and Communicator. Let’s look at the four roles in more detail.

Wrangler

The role Wrangler, or data wrangler as others call this role is responsible for identifying, qualifying and providing access to data sets. In this sense the data is the wild horse that the wrangler tames. This role is a need for the Scientist role to work with qualified, trustable and managed data sources. In much situations, this looks a lot like the current data management roles already present in organizations. This role lives mostly in IT. Keywords here are databases, connection strings, Hadoop, protocols, file formats, data quality, master data management, data classification.

Scientist

More popularly called the Data Scientist, a lot of people seem to believe that as long as you hire a Data Scientist you are a data driven company. This is much the same as saying that if you have Hadoop you ‘do Big Data’. This is about as smart as saying that if you got your driver’s license you make an excellent Formula 1 driver. It is just not true, sorry. Note also, that the opposite applies; if you are a great Formula 1 driver you could be a very bad driver on open roads. Running Hadoop does say you use Big Data. Hiring a Data Scientist does not mean you are a data driven company.

A Scientist is someone who applies maths, a lot of maths, to convert data into information. He or she applies statistical models and things like deep learning, data mining and machine learning to make this happen. Scientists are the rock stars of this data-focused world since they are the once actually making the magic happen. However, they cannot do it alone. They need good quality and trustable data, which is what the Wrangler supplies. Also, these Scientists happen to be ill-understood by the rest of the organization. This do this experiment: have your (Data) Scientist stick around the water cooler for 15 minutes every day and let him / her talk to people (I know, for some this is hard already). Then, check how quickly the person the Scientist is talking to disconnects. My experience is that someone who is not a fellow Scientist or Communicator will not make it for 15 minutes. Just try it, you will see what I mean.

Keywords here are data mining, R, Python, machine learning, statistics, algorithms.

Artist

The Artist role converts information the Scientist brewed up to insight that the consumers can understand and use. This role focusses on esthetics and the best way of data visualization to bring the message across in the best possible way. While the Wrangler is a very IT focused role and the Scientist is very mathematical, the Artist often comes from the creative arts world. The Artist just loves making things understandable and loves making the world a better place by creating beautiful things, such as great looking reports and dashboards. They often employ storytelling and other powerful visual methods such as infographics to convey their message to the consumers.

Keywords: data visualization, dashboard design, signaling colors, storytelling.

Communicator

The last role in data driven organizations is a chameleon; If you look at the types of person in the Wrangler, Scientist and Artist role it is clear to see that these are very different people, with different backgrounds and different passions. Just as much as some of them find it hard to talk to the rest of the organization they can find it difficult to talk among their own and work together. In order to make sure there is no communications breakdown, many organizations invest in a Communicator; someone who has enough understanding of the passion of the people in the other roles to be able to level with them, understand their needs and explain the needs of others to them. Sub types of the Communicator is the Wrangler-Scientist communicator and the Scientist-Artist communicator.

This concludes the roles I see in a data driven organization; of course these roles with need the be supported with the right Processes and Technology. Having a Technology platform instead of disparate tools will help you to achieve this and make the best out of the investments you are making in these roles.

Older Newer

Dutch Data Dude Data Nerdiness

First look: Esri ArcGIS Maps in Power BI

Adding sequence numbers using R in Azure ML

Power BI learning resources – follow up 3

Power BI Refresh scenarios against data storage solutions explained

Roles in a data driven organization