Saturday, July 13, 2013

Read from Excel

We get a lot of data in the form of Excel spreadsheets. Generally I have saved each worksheet as separate text files and read into R from them. But that is extra work and goes against my preference of having a single source for any data.

I guess I'd never tried hard enough to read directly from Excel into R, but seeing Jeffrey Breen's slides last year on Tapping the Data Deluge with R inspired me to give the XLConnect package a try.

It works well, but I missed one thing. I often put comments in the beginning of the text file starting with a "#" or some other comment character which read.table can handle. Martin Studer had a suggestion for handling such comments in the Excel file.

The function below expands on that to allow comment lines and blank lines throughout the Excel dataset, for example like this
# some data
X Y Z
1 2 3
2 3 4
3   5#where is Y?
4 5 6

# note things change here
17 15 13
16 14 12

So, in the function,
  • The whole worksheet is read in as a data.frame d.all, possibly skipping a fixed number of rows.
  • Blank rows are removed
  • The data.frame is written to a temporary text file
  • The text file is read in as the desired data.frame, skipping the first row of names added by write.table and dealing with the comments
readFromExcel <- function(wbName,sheetName=1,comment.char="#",header=TRUE,skip=NULL,as.is=TRUE,sep="\t") {
  require(XLConnect) 
  if(is.null(skip)) skip <- 0
  tempFile <- basename(paste0(wbName,'TEMP.txt'))
  d.all <- readWorksheetFromFile(wbName,sheetName,header=FALSE,startRow=skip+1)
  d.all <- d.all[rowSums(is.na(d.all)) != ncol(d.all),] # remove blank rows
  write.table(d.all,tempFile,quote=FALSE,row.names=FALSE,sep=sep)
  df <- read.table(tempFile,header=header,comment.char=comment.char,sep=sep,skip=1,as.is=as.is)
  file.remove(tempFile)
  df
}
The function could be improved by allowing more arguments to pass to readWorksheet such as colTypes or dateTimeFormat.

Friday, July 5, 2013

pocket calendar

Some years back, before I learned R, I put together an Excel spreadsheet to make a pocket calendar. The Hipster PDA was all the rage, but I preferred something smaller, like the PocketMod, but not so cute. I recently converted this to R.


# make a 4-page pdf with a 15 week pocket calendar
# print out landscape, double-sided, open to left
# usage:
#   'make15(start="m/d/yyyy")'
# I start on a Monday.


MonthString <- function(d1,d2) { 
# d1 and d2 are Date objects
# returns a string of the month(s) and year(s) of d1 and d2
  if (format(d1,'%Y%')!=format(d2,'%Y')) return(paste(format(d1,"%b %Y"),format(d2,"%b %Y"),sep=' - '))
  if (format(d1,'%b%')!=format(d2,'%b')) return(paste(format(d1,"%b"),format(d2,"%b %Y"),sep=' - '))
  return(format(d1,"%b %Y"))
}

make15 <- function(start) {
  start <- as.Date(start,format="%m/%d/%Y")
  days <- start+0:104
  panelorder <- c(1,2,3,4,14,13,16,15,5,6,7,8,10,9,12,11)
  filename <- paste("cal-15_",start,".pdf",sep="")
  pdf(file=filename,height=8.5,width=11)
  par(mfcol=c(2,2),mar=c(.5,.5,.5,.5))
  for(i in panelorder) {
    plot(1,1,type='n',xlim=c(0,1),ylim=c(0,1),bty='n',axes=FALSE,
         xlab='',ylab='')
    if(i==16) {
      rect(.55,0,1,1)
      text(.75,c(.75,.6),format(c(days[1],days[105]),"%B %d, %Y"))}
    else {
      rect(0,.75,.5,1,col=grey(.95),border=NA)
      text(.25,1-.02,MonthString(days[(i-1)*7+1],days[(i-1)*7+7]),adj=c(.5,1))
      segments(0,1:3/4,1)
      text(0,3:1/4-.02,format(days[(i-1)*7+1:3],"%a %d"),adj=c(0,1))
      text(1,4:1/4-.02,format(days[102:105-(i-1)*7],"%a %d"),adj=c(1,1))
    }
  }
  dev.off()
  print(paste(filename,'starts on a',format(start,"%A")))
}

The function make15("7/8/2013") will make a 15 week calendar starting July 8 on a four page PDF, size 8.5 x 11 inches. Print double sided, cut in quarters, assemble, and staple. I usually carry two, held together by an elastic band.

Saturday, October 27, 2012

Viewing large dataframes in RStudio


UPDATE: March 2015.
Well here is a lot easier way. utils::View brings up the good old unlimited viewer in a separate window.

I am doing more of my work in RSudio. It has some nice features.
But for large dataframes, the View function is limited to about 1000 rows and 100 columns.
In the standard R Console, View handles large dataframes well.
For when I do want to look at the whole dataframe in RStudio, I wrote this function to write the dataframe to a temporary .csv file and open it with the default application – for me Excel2010, which handles plenty of rows and columns.
myView <- function(dframe) {
    # RStudio does not have a good viewer for large data frames.  This
    # function writes a dataframe to a temporary .csv and then opens it,
    # presumably in excel (if that is the file association).
    csvName <- paste0(tempdir(), "\\myView-", substitute(dframe),
         format(Sys.time(), "%H%M%S"), ".csv")
    write.csv(dframe, file = csvName)
    shell.exec(csvName)
}
It is slower to open the file in Excel, but sometimes worth the wait to view the whole thing.

Note that shell.exec is Windows only, so this needs to be modified for other systems.