Adding sequence numbers using R in Azure ML
16 Aug 2016When going through data preparation sometimes sequence numbers need to be added. If you are like me, you probably spent some time looking for a component in Azure ML to do this. I never found it.
Turns out it is really easy to do this in R and as a result also very easy to do in Azure ML.
In your experiment, add an Execute R Script component and connect it to the data flow.
Edit the script and add a column to the dataset that equals:
seq.int(nrow(dataset1))
See my code example:
# Map 1-based optional input ports to variables] dataset1 <- maml.mapInputPort(1) # class: data.frame dataset1$time=seq.int(nrow(dataset1)) # Select data.frame to be sent to the output Dataset port maml.mapOutputPort("dataset1");
On the third line the column is added and defined as a sequence number. The resulting dataset indeed has an extra column (called time) that like this:
The small histogram at the top and the details that right confirm it has only unique values and starts at 1; our sequence column has been added!