I guess I'd never tried hard enough to read directly from Excel into R, but seeing Jeffrey Breen's slides last year on Tapping the Data Deluge with R inspired me to give the XLConnect package a try.
It works well, but I missed one thing. I often put comments in the beginning of the text file starting with a "#" or some other comment character which
read.table
can handle. Martin Studer had a suggestion for handling such comments in the Excel file.
The function below expands on that to allow comment lines and blank lines throughout the Excel dataset, for example like this
# some data X Y Z 1 2 3 2 3 4 3 5#where is Y? 4 5 6 # note things change here 17 15 13 16 14 12
So, in the function,
- The whole worksheet is read in as a data.frame
d.all
, possibly skipping a fixed number of rows. - Blank rows are removed
- The data.frame is written to a temporary text file
- The text file is read in as the desired data.frame, skipping the first row of names added by write.table and dealing with the comments
readFromExcel <- function(wbName,sheetName=1,comment.char="#",header=TRUE,skip=NULL,as.is=TRUE,sep="\t") { require(XLConnect) if(is.null(skip)) skip <- 0 tempFile <- basename(paste0(wbName,'TEMP.txt')) d.all <- readWorksheetFromFile(wbName,sheetName,header=FALSE,startRow=skip+1) d.all <- d.all[rowSums(is.na(d.all)) != ncol(d.all),] # remove blank rows write.table(d.all,tempFile,quote=FALSE,row.names=FALSE,sep=sep) df <- read.table(tempFile,header=header,comment.char=comment.char,sep=sep,skip=1,as.is=as.is) file.remove(tempFile) df }The function could be improved by allowing more arguments to pass to
readWorksheet
such as colTypes
or dateTimeFormat
.
No comments:
Post a Comment