Quantcast
Viewing all articles
Browse latest Browse all 5

R Technical Note: Aggregating data into monthly time series

This post demonstrates a function I had to create as I didn’t find anything quite like it on the R discussion boards.  Warning: I’m also assuming you have some background with R to be able to understand the code and leverage this work.  For an excellent resource to get started with R, would recommend R in Action.

 When handed a file of demand history we want to forecast, the first thing to do is to turn the raw transactions into a orderly series of demand by month that R can do something with.  The created function “as.aggr.ts” does just that.

as.aggr.ts <- function(x, time.unit="month") {

  # Take a data frame of quantity and date transactions and aggregates the 
  # quantity as a monthly time series.
  #
  # Args:
  #  x: data frame with trans. 'date' in first column and 'qty' in second
  #  x.unit: first date of month the transaction lands in
  #  x.aggr: data frame of aggregated quantities and 'bucket' dates
  #
  # Returns:
  #  x.ts: summed by month and fully populated time series 

  #Find the first date of each bucket
  x$unit <- as.numeric(floor_date(x[, 1], unit = time.unit))
  x$unit <- floor_date(x[, 1], unit = time.unit) 

  # Sum the quantities within their monthly bucket
  x.aggr <- aggregate(x[,2], by=list(x$unit), sum )
  names(x.aggr) <- c("bucket", "qty")
  x.aggr$bucket <- as.Date(x.aggr$bucket)

  # Merge the bucketed quantities with sequence of all dates
  # to insert "0" months if needed
  x.aggr <- merge(x.aggr , 
                  data.frame(bucket=seq.Date(min(x.aggr$bucket), max(x.aggr$bucket),by=time.unit)), 
                  all.y=TRUE)
  x.aggr[is.na(x.aggr)] <- 0 # Populate the NA quantities with 0

  x.ts <- ts(x.aggr['qty'],
            start = c(year( first(x.aggr['bucket'])), month( first(x.aggr['bucket']))),
            frequency=12 )
  return(x.ts)
}

The following example assumes you have the following data in a data frame in R.  This data comes from the “Transactions” tab of the Retail Transactional Dataset sample file on the customers-dna website:

Timestamp Items _Number
1/11/2001 1
4/8/2001 1
7/1/2001 1
10/24/2001 1
2/9/2001 1
4/29/2001 1
11/4/2001 1
6/6/2001 1
3/10/2001 2
9/12/2001 1
11/1/2001 1
12/18/2001 1
8/1/2001 1
7/7/2001 1
3/22/2001 1
7/10/2001 1
6/8/2001 1
7/26/2001 3
3/8/2001 1
11/11/2001 1
3/29/2001 1
1/17/2001 2
7/20/2001 1
1/16/2001 1
3/1/2001 1
2/7/2001 1
1/31/2001 1
4/6/2001 1
6/11/2001 1
10/27/2001 1

 After running these 30 sales transactions through the “as.aggr.ts” function, the following monthy time series gets produced:

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   5   2   6   3   0   3   7   1   1   2   3   1

 

Let me know how this function helps your own time series analysis projects.


Viewing all articles
Browse latest Browse all 5

Trending Articles