Friday, July 31, 2015

Removing Outliers to Plot Data

I am currently working a lot with R. One simple thing that helps me to better visualize data is to plot it excluding outliers.

To do so, first read the data

data = read.table(“myfile.txt”)                                     
Then, you can check how data is distributed

quantile(data, c(.02, .05, .10, .50, .90, .95, .98))                

An example output would be

  2%   5%  10%  50%  90%  95%  98% 
 189  190 190  194  241  275  316 

Now,  to plot your data discarding the 1% lowest values and 1% higher values, you could use

x <- quantile(data, c(.01, .99))                                   

And then

plot(data, xlim=c(x[[1]], x[[2]]))