This post demonstrates a function I had to create as I didn’t find anything quite like it on the R discussion boards. Warning: I’m also assuming you have some background with R to be able to understand the code and leverage this work. For an excellent resource to get started with R, would recommend R in Action.
When handed a file of demand history we want to forecast, the first thing to do is to turn the raw transactions into a orderly series of demand by month that R can do something with. The created function “as.aggr.ts” does just that.
as.aggr.ts <- function(x, time.unit="month") { # Take a data frame of quantity and date transactions and aggregates the # quantity as a monthly time series. # # Args: # x: data frame with trans. 'date' in first column and 'qty' in second # x.unit: first date of month the transaction lands in # x.aggr: data frame of aggregated quantities and 'bucket' dates # # Returns: # x.ts: summed by month and fully populated time series #Find the first date of each bucket x$unit <- as.numeric(floor_date(x[, 1], unit = time.unit)) x$unit <- floor_date(x[, 1], unit = time.unit) # Sum the quantities within their monthly bucket x.aggr <- aggregate(x[,2], by=list(x$unit), sum ) names(x.aggr) <- c("bucket", "qty") x.aggr$bucket <- as.Date(x.aggr$bucket) # Merge the bucketed quantities with sequence of all dates # to insert "0" months if needed x.aggr <- merge(x.aggr , data.frame(bucket=seq.Date(min(x.aggr$bucket), max(x.aggr$bucket),by=time.unit)), all.y=TRUE) x.aggr[is.na(x.aggr)] <- 0 # Populate the NA quantities with 0 x.ts <- ts(x.aggr['qty'], start = c(year( first(x.aggr['bucket'])), month( first(x.aggr['bucket']))), frequency=12 ) return(x.ts) }
The following example assumes you have the following data in a data frame in R. This data comes from the “Transactions” tab of the Retail Transactional Dataset sample file on the customers-dna website:
Timestamp Items | _Number |
1/11/2001 | 1 |
4/8/2001 | 1 |
7/1/2001 | 1 |
10/24/2001 | 1 |
2/9/2001 | 1 |
4/29/2001 | 1 |
11/4/2001 | 1 |
6/6/2001 | 1 |
3/10/2001 | 2 |
9/12/2001 | 1 |
11/1/2001 | 1 |
12/18/2001 | 1 |
8/1/2001 | 1 |
7/7/2001 | 1 |
3/22/2001 | 1 |
7/10/2001 | 1 |
6/8/2001 | 1 |
7/26/2001 | 3 |
3/8/2001 | 1 |
11/11/2001 | 1 |
3/29/2001 | 1 |
1/17/2001 | 2 |
7/20/2001 | 1 |
1/16/2001 | 1 |
3/1/2001 | 1 |
2/7/2001 | 1 |
1/31/2001 | 1 |
4/6/2001 | 1 |
6/11/2001 | 1 |
10/27/2001 | 1 |
After running these 30 sales transactions through the “as.aggr.ts” function, the following monthy time series gets produced:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2001 5 2 6 3 0 3 7 1 1 2 3 1
Let me know how this function helps your own time series analysis projects.