Tuesday, July 2, 2013

Customize your .Rprofile and Keep Your Workspace Clean

Like your .bashrc, .vimrc, or many other dotfiles you may have in your home directory, your .Rprofile is sourced every time you start an R session. On Mac and Linux, this file is usually located in ~/.Rprofile. On Windows it's buried somewhere in the R program files. Over the years I've grown and pruned my .Rprofile to set various options and define various "utility" functions I use frequently at the interactive prompt.

One of the dangers of defining too many functions in your .Rprofile is that your code becomes less portable, and less reproducible. For example, if I were to define adf() as a shortcut to as.data.frame(), code that I send to other folks using adf() would return errors that the adf object doesn't exist. This is a risk that I'm fully aware of in regards to setting the option stringsAsFactors=FALSE,  but it's a tradeoff I'm willing to accept for convenience. Most of the functions I define here are useful for exploring interactively. In particular, the n() function below is handy for getting a numbered list of all the columns in a data frame; lsp() and lsa() list all functions in a package, and list all objects and classes in the environment, respectively (and were taken from Karthik Ram's .Rprofile); and the o() function opens the current working directory in a new Finder window on my Mac. In addition to a few other functions that are self-explanatory, I also turn off those significance stars, set a default CRAN mirror so it doesn't ask me all the time, and source in the biocLite() function for installing Bioconductor packages (note: this makes R require web access, which might slow down your R initialization).

Finally, you'll notice that I'm creating a new hidden environment, and defining all the functions here as objects in this hidden environment. This allows me to keep my workspace clean, and remove all objects from that workspace without nuking any of these utility functions.

I used to keep my .Rprofile synced across multiple installations using Dropbox, but now I keep all my dotfiles in a single git-versioned directory, symlinked where they need to go (usually ~/). My .Rprofile is below: feel free to steal or adapt however you'd like.

13 comments:

  1. This is really useful, thank you!

    ReplyDelete
  2. RE: your modified ls() function... You may want to look at ls.str("package:car") as an alternative that lists arguments. I find it handy.

    ReplyDelete
  3. Stephen, can I get some 101 help? I do not have a .Rprofile file currently (at least in my library/Frameworks/R.Framework/Version/3.0/Resources folders). Is this where I put it? If i start with yours, do I just place it in a text file via text editor or use some other file format?

    I am on a Mac 10.8. Also, if this makes a difference- I use RStudio.

    THanks

    ReplyDelete
    Replies
    1. If you're on a mac it should be in your home directory, i.e., ~/.Rprofile. It's hidden. Fire up a terminal and type 'vim ~/.Rprofile' to edit.

      Delete
  4. According to http://cran.us.r-project.org/bin/macosx/tools/ on the Mac Tcl/Tk ships with R 3.0.0 and higher. You might check if you really need the gsubfn.engine option.

    ReplyDelete
    Replies
    1. Thanks, you're right, I removed that option.

      Delete
  5. Great post. I have a few of those but I didn't have it in an environment. Will definitely align my profile with this. I also had ta for tail but it doesn't save a lot of keystrokes.

    ReplyDelete
  6. I've enjoyed many of these functions since you last posted your Rprofile. Thanks again!

    ReplyDelete
  7. I discovered that the way `lsa` is written, if you have an object `x` in your workspace, it will always get returned as a character due to the way get() works in obj_type. From the help: if pos or envir is omitted, it will search as if the name of the object appeared unquoted in an expression. So it just interprets the `x` in the function definition as a literal `x` character and so always returns object x as class character, no matter what it really is in your global environment. I don't fully understand it, but that's what I think is happening... eg: https://gist.github.com/ateucher/8743987

    ReplyDelete
    Replies
    1. @act - thanks very much. I also don't fully understand it either, but I ran your demo code and saw the same thing with any object named x. Updated the code posted here. Thanks.

      Delete
  8. Stephen, thank you for the post! Could you (or other people) clarify the interplay between R and '.Rprofile' in case of multiple '.Rprofile' files (i.e. user's home directory, project's top-level and sub-directories) and multiple R sessions? Here's my relevant question on StackOverflow: http://stackoverflow.com/questions/21985888/accessing-global-variables-fails. I figured out the solution to the problem, but it could be nice to have my lesson learned confirmed.

    ReplyDelete
  9. Stephen, how big does your invisible .env have to get before you realise its better maintained as a package and loaded in your .Rorofile? Its a neat trick for one or two functions, but like a lot of ad-hoc techniques in R (and programming in general) doesn't scale well. Put in a package, those functions can be properly documented, shared to other users without hacking them out of your .RProfile, and even used in R processes that didn't use your .Rprofile (--vanilla) by simply loading them as a package.

    ReplyDelete
    Replies
    1. Great points Barry, issues I've actually been thinking of lately, but I had forgotten about this post. Here's the package I started that has a few of these things implemented. Not ready for distribution yet, but will post about it here when it is.

      https://github.com/stephenturner/Tmisc

      Delete

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.