Tuesday, January 24, 2012

Summary stats in R

I've been looking for an easy way of creating a data.frame of summary statistics in R, and haven't been able to find anything. The summary() function seems to output a list, and it isn't easily malleable into a data.frame. This makes it hard to add other stats to the list, or to query it from other functions. I've written a simple function that uses boxplot() plus a few other bits to make a nice data.frame.

It doesn't do any checking of the data, you need to do that yourself. This is licensed under GPL3 or later. Please link back here if cross-posting it elsewhere.

summary_stats <- function(these_data, output_dir) {

num_NAs=as.data.frame(t(colSums(is.na(these_data))))
rownames(num_NAs)<-"NA count"

means<-as.data.frame(t(colMeans(these_data, na.rm=TRUE)))
rownames(means)<-"means"

num_dat=as.data.frame(t(rep(nrow(these_data),ncol(these_data))))
rownames(num_dat)<-"num data"
names(num_dat)<-names(these_data)

stats<-boxplot(these_data,plot=FALSE)
stats<-as.data.frame(stats$stats[1:5,])
names(stats)<-names(these_data)
rownames(stats)<-c("minimum (excl outliers)","lower quartile","median", "upper quartile", "maximum (excl outliers)")

output<-as.data.frame(rbind(
num_NAs,
num_dat,
means,
stats
))
return(output)
}

No comments:

Post a Comment

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.