Adding sequence numbers using R in Azure ML
16 Aug 2016When going through data preparation sometimes sequence numbers need to be added. If you are like me, you probably spent some time looking for a component in Azure ML to do this. I never found it.
Turns out it is really easy to do this in R and as a result also very easy to do in Azure ML.
In your experiment, add an Execute R Script component and connect it to the data flow.
Edit the script and add a column to the dataset that equals:
seq.int(nrow(dataset1))
See my code example:
# Map 1-based optional input ports to variables]
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset1$time=seq.int(nrow(dataset1))
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("dataset1");
On the third line the column is added and defined as a sequence number. The resulting dataset indeed has an extra column (called time) that like this:

The small histogram at the top and the details that right confirm it has only unique values and starts at 1; our sequence column has been added!
Dutch Data Dude