Tutorial How to work with large R projects Alex Zolotovitski, www.zolot.us Medio Inc, www.medio.com How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 1 Contents 1. Workflow of R Projects 2. Reporting and Literate Programming. Packages knitr, highlight, brew, R2HTML. Code2HTML(); ReleaseOut() 3. IDE: Eclipse+StatET, RStudio and other (vim, ess,..). 4. Naming and style conventions 5. Structure of a project directory. Package ProjectTemplate. CreateProject() 6. Helper functions to work with a number of large projects. 7. R Work Journal: Code2HTML(); MakeRWJournals(); createRWJalbum() 8. ToDo 9. References How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 2 1. Workflow of R Projects CRISP-DM - lifecycle for a data mining project Model Monitoring Model Deployment How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 3 1. Workflow of R Projects Many projects: Objective: reduce overhead in cycle “ start - leave – find – return – remember - continue” How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 4 1. Workflow of R Projects Many projects: How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 5 1. Workflow of R Projects Key points How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 6 1. Workflow of R Projects Each Project: How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 7 1. Workflow of R Projects Each Project: How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 8 2. Reporting and Literate Programming. Packages knitr, highlight, brew, R2HTML. Code2HTML(); ReleaseOut() Literate programming: Self-reported / Self-explaining code [1-3]. Reproducible Research. [http://www.r-bloggers.com/how-to-set-up-a-reproducible-r-project/]: Treating data as read-only files: do datamunging in R code, but always start with the source data Consider output artifcacts (figures and tables) as disposable: the data plus the R script is the canonical source [Rich Fitzjohn : http://nicercode.github.io/blog/2013-04-05-projects/ ] 1. Treat data as read only In my mind, this is probably the most important goal of setting up a project. Data are typically time consuming and/or expensive to collect. Working with them interactively (e.g., in Excel) where they can be modified means you are never sure of where the data came from, or how they have been modified. My suggestion is to put your data into the data directory and treat it as read only. Within your scripts you might generate derived data sets either temporarily (in an R session only) or semi-permanently (as an file in output/), but the original data is always left in an untouched state. 2. Treat generated output as disposable In this approach, files in directories figs/ and output/ are all generated by the scripts. A nice thing about this approach is that if the filenames of generated files change (e.g, changing from phylogeny.pdfto mammal-phylogeny.pdf) files with the old names may still stick around, but because they’re in this directory you know you can always delete them. How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 9 2. Reporting and Literate Programming. Before submitting a paper, I will go through and delete all the generated files and rerun the analysis to make sure that I can create all the analyses and figures from the data. What is Sweave [4]? Sweave is a tool that allows to embed R code in (sort of) LATEX documents. The document will contain both documentation parts (written in LATEX) and code parts (written in R). The code is evaluated in R. The resulting console output, figures and tables are automatically inserted into the final document. This produces a .tex file on which it is possible to run LATEX: .Rnw → .tex → .pdf R packages: highlight, brew, knitr, R2HTML::RweaveHTML Package knitr: .R → .Rmd → .md → .HTML R code → document (pdf, HTML) to publish . How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 10 2. Reporting and Literate Programming. Problems: Change input data (rare) or modify .R code (often) – rerun all chain to recreate the .html Navigation through the code Need: 1. Additional research 2. Forks in environment/workspace → data to save and restore. 3. If modify .R code - Long time to rerun all code to recreate .html 4. Dynamic HTML with JavaScript navigation. Need: R code ↔ R work journal → document (pdf, HTML) to publish Similar to journal in JMP or notebook in iPython. How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 11 3. IDE: Eclipse+StatET, RStudio and other (vim, ess,..). Eclipse + StatET vs RStudio [7, 8] Pro: Ctrl+R, V Search Multi-win Other useful Eclipse plugins may be: mylyn (tasks), SVN, git, python, java, toad for clouds (hive),…. Full screen view on click Contra: Installation [7]. Solution: Portable for Win R+Eclipse package that does not require installation: http://dl.dropbox.com/u/37458038/REclipse-J.zip - just download, unzip in any folder and click REclipse.bat . It supposes that jre is installed in default location. How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 12 4. Naming and style conventions Not the point to discuss. The attached code could be easy modified to your naming and style convention I prefer from alternatives the shorter, e.g. x= 1 if it is equivalent x <- 1, 'a' if it is equivalent "a" How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 13 5. Structure of a project directory. Package ProjectTemplate. CreateProject() http://projecttemplate.net/architecture.html How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 14 5. Structure of a Project Directory. How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 15 5. Structure of a Project Directory. file:///C:/z/eclipse/work/R-svn-ass/00_commonR/71_TestProjTemplate/zProj2-min/ file:///T:/work/UseR-2013/lib/newProjTemplName/ file:///M:/88_XBox.LTV How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 16 5. Structure of a Project Directory. Template Folder: Just created Proj: How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 17 5. Structure of a Project Directory. After 2nd ReleaseOut(): How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 18 6. 1. 2. 3. 4. Helper Functions. Wrappers, one-liners, aliases do something print reminder print hint /template for the next step mnemonics Function Wrapper for == Create Project CreateProject() Create new Project libra() install.packages + library() theFile global variable - current R code How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 19 6. Helper Functions. Wrappers, one-liners, aliases #' #' == Remind objects: DT() strftime(Sys.time()) - current Date and Time st({}) system.time + play sound - for long executed blocks hee() nrow + head sg() dev.print srm() save & remove ## copy output from console to the code ^RV save graphics to .png file == Save state: sa() save.image, save Code2HTML() R code theFile to html R Work Journal MakeRWJournals() createRWJalbum() ReleaseOut() #' - /// How to work with large R projects. - the same for many R files to create albums of galleries move R code and all output to a DateTime-version folder before new data exit location - mark place in the file Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 20 6. Helper Functions. Wrappers, one-liners, aliases #' == Restore state: rmall() rm all rmDF() rm datafames and lists init initialise environment loo(); gff('saved:') find saved data; find saved locations lo() load saved data lsDF() ls data frames #' Convenience, aliases tocsv() write.csv totsv() write.table suss() subset + grep gre2() grep df() data.frame How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 21 7. R Work Journal: Code2HTML(); MakeRWJournals(); createRWJalbum() Code2HTML()Features: 1. Transforms .R file into self-documented .html file, containing all R code with output pics, headers, table of contents and gallery. 2. The titles in body and contents are clickable to navigate from contents to body and back. 3. The pics are clickable to resize. 4. The html file has partly R syntax highlighted. It is possible to do the full R syntax highlighting in resulting html, but the result file becomes almost twice heavier. 5. Parts of the result html file could be folded. 6. If you in browser fold TOC, select all, copy and paste from browser to a text editor, you should get the pure original R file. 7. If modify .R code, recreate .html is fast. 8. It is not replacement of knitr or sweave, because output is not a document to print, but rather an R work journal. How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 22 7. R Work Journal To show: How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 23 7. R Work Journal How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 24 7. R Work Journal How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 25 7. R Work Journal 1. 2. 3. RAlbum 42d 95_ABC_LTV 97_tutorial-demo How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 26 8. To do: Max: execute code from r.html Min: navigate between r.html and .R views in Eclipse How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 27 9. References 1. David Smith, How to set up a reproducible R project www.r-bloggers.com/how-to-set-up-a-reproducible-r-project/ 2. Carl Boettiger, My research workflow, based on Github http://carlboettiger.info/2012/05/06/research-workflow.html 3. Rich Fitzjohn, Nice R Code http://nicercode.github.io/blog/2013-04-05-projects 4. Daniel Falster, Why I want to write nice R code http://nicercode.github.io/blog/2013-04-05-why-nice-code 5. William Stafford Noble, A Quick Guide to Organizing Computational Biology Projects www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000424 6. Data: Data Management http://software-carpentry.org/4_0/data/mgmt.html 7. Eclipse and StatET 2.0 Install For Running R http://lukemiller.org/index.php/2012/01/eclipse-and-statet-2-0-install-for-running-r 8. Eclipse Platform Runtime Binary http://download.eclipse.org/eclipse/downloads/drops4/S-4.3RC4-201306052000/#RCPRuntime 9. StatET Installation. www.walware.de/?page=/index.mframe 10. Installation & Update of the Eclipse Plug-in StatET www.walware.de/?page=/it/statet/installation.html 11. Longhow Lam, A guide to Eclipse and the R plug-in StatET, www.splusbook.com/RIntro/R_Eclipse_StatET.pdf 12. http://en.wikipedia.org/wiki/Literate_Programming 13. http://en.wikipedia.org/wiki/Noweb 14. CRAN Task View: Reproducible Research http://cran.r-project.org/web/views/ReproducibleResearch.html 15. Nicola Sartori, An Sweave tutorial. www.cepe.ethz.ch/education/NPecoHS2010/Sartori-Sweave.pdf 16. Package Knitr http://yihui.name/knitr/, http://cran.r-project.org/web/packages/knitr/index.html How to work with large R projects. Alex Zolotovitski, [email protected] UseR! 2013, Albacete, Spain 28
© Copyright 2024