Saturday, July 13, 2013

Read from Excel

We get a lot of data in the form of Excel spreadsheets. Generally I have saved each worksheet as separate text files and read into R from them. But that is extra work and goes against my preference of having a single source for any data.

I guess I'd never tried hard enough to read directly from Excel into R, but seeing Jeffrey Breen's slides last year on Tapping the Data Deluge with R inspired me to give the XLConnect package a try.

It works well, but I missed one thing. I often put comments in the beginning of the text file starting with a "#" or some other comment character which read.table can handle. Martin Studer had a suggestion for handling such comments in the Excel file.

The function below expands on that to allow comment lines and blank lines throughout the Excel dataset, for example like this
# some data
X Y Z
1 2 3
2 3 4
3   5#where is Y?
4 5 6

# note things change here
17 15 13
16 14 12

So, in the function,
  • The whole worksheet is read in as a data.frame d.all, possibly skipping a fixed number of rows.
  • Blank rows are removed
  • The data.frame is written to a temporary text file
  • The text file is read in as the desired data.frame, skipping the first row of names added by write.table and dealing with the comments
readFromExcel <- function(wbName,sheetName=1,comment.char="#",header=TRUE,skip=NULL,as.is=TRUE,sep="\t") {
  require(XLConnect) 
  if(is.null(skip)) skip <- 0
  tempFile <- basename(paste0(wbName,'TEMP.txt'))
  d.all <- readWorksheetFromFile(wbName,sheetName,header=FALSE,startRow=skip+1)
  d.all <- d.all[rowSums(is.na(d.all)) != ncol(d.all),] # remove blank rows
  write.table(d.all,tempFile,quote=FALSE,row.names=FALSE,sep=sep)
  df <- read.table(tempFile,header=header,comment.char=comment.char,sep=sep,skip=1,as.is=as.is)
  file.remove(tempFile)
  df
}
The function could be improved by allowing more arguments to pass to readWorksheet such as colTypes or dateTimeFormat.

Friday, July 5, 2013

pocket calendar

Some years back, before I learned R, I put together an Excel spreadsheet to make a pocket calendar. The Hipster PDA was all the rage, but I preferred something smaller, like the PocketMod, but not so cute. I recently converted this to R.


# make a 4-page pdf with a 15 week pocket calendar
# print out landscape, double-sided, open to left
# usage:
#   'make15(start="m/d/yyyy")'
# I start on a Monday.


MonthString <- function(d1,d2) { 
# d1 and d2 are Date objects
# returns a string of the month(s) and year(s) of d1 and d2
  if (format(d1,'%Y%')!=format(d2,'%Y')) return(paste(format(d1,"%b %Y"),format(d2,"%b %Y"),sep=' - '))
  if (format(d1,'%b%')!=format(d2,'%b')) return(paste(format(d1,"%b"),format(d2,"%b %Y"),sep=' - '))
  return(format(d1,"%b %Y"))
}

make15 <- function(start) {
  start <- as.Date(start,format="%m/%d/%Y")
  days <- start+0:104
  panelorder <- c(1,2,3,4,14,13,16,15,5,6,7,8,10,9,12,11)
  filename <- paste("cal-15_",start,".pdf",sep="")
  pdf(file=filename,height=8.5,width=11)
  par(mfcol=c(2,2),mar=c(.5,.5,.5,.5))
  for(i in panelorder) {
    plot(1,1,type='n',xlim=c(0,1),ylim=c(0,1),bty='n',axes=FALSE,
         xlab='',ylab='')
    if(i==16) {
      rect(.55,0,1,1)
      text(.75,c(.75,.6),format(c(days[1],days[105]),"%B %d, %Y"))}
    else {
      rect(0,.75,.5,1,col=grey(.95),border=NA)
      text(.25,1-.02,MonthString(days[(i-1)*7+1],days[(i-1)*7+7]),adj=c(.5,1))
      segments(0,1:3/4,1)
      text(0,3:1/4-.02,format(days[(i-1)*7+1:3],"%a %d"),adj=c(0,1))
      text(1,4:1/4-.02,format(days[102:105-(i-1)*7],"%a %d"),adj=c(1,1))
    }
  }
  dev.off()
  print(paste(filename,'starts on a',format(start,"%A")))
}

The function make15("7/8/2013") will make a 15 week calendar starting July 8 on a four page PDF, size 8.5 x 11 inches. Print double sided, cut in quarters, assemble, and staple. I usually carry two, held together by an elastic band.