1. Sean Davis
  2. ReproducibleResearchTutorial

Commits

Sean Davis  committed 2abf0ec

initial commit

  • Participants
  • Branches master

Comments (0)

Files changed (14)

File Vignette/.gitignore

View file
+ExampleSweaveDocument-concordance.tex
+ExampleSweaveDocument-intro.pdf
+ExampleSweaveDocument.log
+ExampleSweaveDocument.synctex.gz
+ExampleSweaveDocument.tex
+Literate_Programming_book_cover.jpg
+ReproducibleResearch-concordance.tex
+ReproducibleResearch.log
+ReproducibleResearch.nav
+ReproducibleResearch.snm
+ReproducibleResearch.synctex.gz
+ReproducibleResearch.tex
+ReproducibleResearch.toc
+ReproducibleResearch.vrb
+Volcano.Rmd
+Volcano.html
+Volcano.md
+figure

File Vignette/ExampleSweaveDocument.Rnw

View file
+\documentclass{article}
+\title{A Very Minimal Sweave Document}
+
+\begin{document}
+\SweaveOpts{concordance=TRUE}
+\maketitle
+
+\section{Introduction}
+This includes one small code block that generates a simple plot.
+
+<<intro,fig=TRUE>>=
+x = rnorm(100)
+y = 1:100
+plot(x,y)
+@
+
+\end{document}

File Vignette/ExampleSweaveDocument.pdf

Binary file added.

File Vignette/ReproducibleResearch.Rnw

View file
+\documentclass[14pt]{beamer}
+\usetheme{Warsaw}
+\usepackage{hyperref}
+\usepackage{verbatim}
+%\usepackage{listings}
+\newcommand{\Rfunction}[1]{{\texttt{#1}}}
+\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
+\newcommand{\Robject}[1]{{\texttt{#1}}}
+\newcommand{\Rpackage}[1]{{\textit{#1}}}
+\newcommand{\Rclass}[1]{{\textit{#1}}}
+\newcommand{\code}[1]{{\texttt{#1}}}
+\newcommand{\software}[1]{{\textit{#1}}}
+\SweaveOpts{png=true,format=png,pdf=true,cache=False,echo=True}
+
+\title[Reproducible Research with R]{Reproducible Research with R}
+\subtitle{Using knitr, Sweave, version control, and packages to improve reproducibility}
+\author{Sean Davis}
+\institute[NCI]{National Cancer Institute}
+
+\begin{document}
+\SweaveOpts{concordance=TRUE}
+
+
+%------------------ Title 
+
+\begin{frame}[plain]
+  \titlepage
+\end{frame}
+
+\begin{frame}
+\frametitle{Outline}
+  \tableofcontents
+\end{frame}
+
+%------------------ Reproducible Research Section
+
+\section{Reproducible Research}
+
+\begin{frame}
+\frametitle{What is Reproducible Research?}
+\begin{exampleblock}{}
+  {\large ``The term \textit{reproducible research} refers to the idea that the ultimate product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results and building upon the research.''}
+  \vskip5mm
+  \hspace*\fill{\small--- Wikipedia}
+\end{exampleblock}
+
+Details available \href{http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4720217}{in this set of review articles}.
+\end{frame}
+
+\begin{frame}[plain]
+\begin{figure}
+\includegraphics[width=\textwidth,height=0.8\textheight,keepaspectratio]{Literate_Programming_book_cover.jpg}
+\caption{The first description of \textit{Literate Programming} came from Donald Knuth in the 1970's}
+\end{figure}
+\end{frame}
+
+\begin{frame}
+\frametitle{What is Literate Programming?}
+\begin{exampleblock}{}
+{\small ``The \textit{literate programming paradigm}, represents a move away from writing programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts.  Literate programs are written as an uninterrupted exposition of logic in an ordinary human language, much like the text of an essay, in which macros are included to hide abstractions and traditional source code.''}
+  \vskip5mm
+  \hspace*\fill{\small--- Wikipedia}
+\end{exampleblock}
+\end{frame}
+
+\begin{frame}
+\frametitle{Tangling and Weaving}
+Literate programming tools are used to produce two products:
+\begin{itemize}
+\pause
+\item{The \textit{tangled} code that is meant to be consumed by the computer to produce the results of the analysis.}
+\pause
+\item{The \textit{woven} document that renders the documentation, code, and results into a human-consumable format.}
+\end{itemize}
+\end{frame}
+
+\begin{frame}
+\frametitle{Other Aspects of Reproducible Research}
+\begin{itemize}
+\item{Versioning of data \textit{and} code}
+\item{Data and code availability}
+\item{Data and code provenance}
+\item{Dependency tracking}
+\item{Documentation}
+\end{itemize}
+\end{frame}
+
+%----------------------- Sweave Section
+
+\subsection{Sweave}
+
+\begin{frame}
+\frametitle{What is Sweave?}
+\begin{itemize}
+\item{A literate programming tool for R.}
+\item{Based on noweb markup and \LaTeX.}
+\item{A set of tools for working with Rnw (RNoWeb) files.}
+\end{itemize}
+\end{frame}
+
+\begin{frame}[fragile]
+\scriptsize\verbatiminput{ExampleSweaveDocument.Rnw}
+\normalsize
+\end{frame}
+
+\begin{frame}[fragile]{Running Sweave}
+\begin{block}{Run from command line}
+{\small\begin{verbatim}
+R CMD Sweave ExampleSweaveDocument.Rnw
+R CMD pdflatex ExampleSweaveDocument.tex
+\end{verbatim}
+}
+\end{block}
+\begin{block}{Run from within R}
+{\small\begin{verbatim}
+Sweave("ExampleSweaveDocument.Rnw")
+system("R CMD pdflatex ExampleSweaveDocument.tex")
+\end{verbatim}
+}
+\end{block}
+\end{frame}
+
+
+%----------------------- knitr Section
+
+\subsection{knitr}
+
+\begin{frame}
+\frametitle{knitr Introduction}
+\begin{exampleblock}{}
+  {\large ``The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package''}
+  \vskip5mm
+  \hspace*\fill{\small--- The knitr website}
+\end{exampleblock}
+
+\end{frame}
+
+\begin{frame}[fragile]{knitr on an R script}
+\begin{block}{Volcano.R}
+{\small\begin{verbatim}
+z <- 2 * volcano        # Exaggerate the relief
+x <- 10 * (1:nrow(z))   # 10 meter spacing (S to N)
+y <- 10 * (1:ncol(z))   # 10 meter spacing (E to W)
+par(mar=rep(.5,4))
+persp(x, y, z, theta = 120, phi = 15, scale = FALSE, axes = FALSE)
+\end{verbatim}
+}
+\end{block}
+\end{frame}
+
+\begin{frame}[fragile]{knitr on R script}
+\begin{block}{spin the R script}
+\begin{verbatim}
+# install.pacakges('knitr')
+library(knitr)
+spin('Volcano.R')
+\end{verbatim}
+\end{block}
+This will produce:
+\begin{itemize}
+\item{Volcano.Rmd}
+\item{Volcano.md}
+\item{Volcano.html}
+\end{itemize}
+\end{frame}
+
+\begin{frame}{knitr for literate programming}
+\begin{itemize}
+\item{Traditional Rnw files (used by Sweave) to produce latex/pdf.}
+\item{Using the \href{http://daringfireball.net/projects/markdown/syntax}{markdown}-based .Rmd files}
+\item{``spinning'' an R file to html}
+\item{First ``spinning'' an R file to Rmd and then working on the Rmd file}
+\end{itemize}
+\end{frame}
+
+\section{R Packages}
+
+\begin{frame}{Advantages of R packages}
+\begin{itemize}
+\item{Standard packaging mechanism}
+\item{Versioned}
+\item{Maintains provenance}
+\item{Documented}
+\item{Can contain both code \textit{and} data}
+\item{Tracks dependencies}
+\item{Simplifies literate programming}
+\end{itemize}
+\end{frame}
+
+\begin{frame}[fragile]{Create a simple package}
+<<createpackage,size="scriptsize">>=
+package.skeleton(code_files='../pubmed.R'
+                 ,path='..',name='pubmedR',
+                 force=TRUE)
+@
+This produces a directory, pubmedR that is an R package.
+\end{frame}
+
+\begin{frame}{Next steps}
+\begin{itemize}
+\item{Edit DESCRIPTION file}
+\item{Edit .Rd files (documentation)}
+\item{Add data files or further R code}
+\item{R CMD check}
+\item{R CMD INSTALL}
+\item{SHARE!!!}
+\end{itemize}
+\end{frame}
+
+%------------------- Version Control
+
+\section{Version Control}
+
+\begin{frame}[fragile]{What is Version Control?}
+\begin{exampleblock}{}
+  {``Revision control, also known as version control and source control (and an aspect of software configuration management), is the management of changes to documents, computer programs, large web sites, and other collections of information.''}
+  \vskip5mm
+  \hspace*\fill{\small--- Wikipedia}
+\end{exampleblock}
+\end{frame}
+
+\subsection{git}
+
+\begin{frame}[fragile]
+<<sessionInfo,size="scriptsize">>=
+sessionInfo()
+@ 
+\end{frame}
+
+\end{document}

File Vignette/ReproducibleResearch.pdf

Binary file added.

File Vignette/Volcano.R

View file
+z <- 2 * volcano        # Exaggerate the relief
+x <- 10 * (1:nrow(z))   # 10 meter spacing (S to N)
+y <- 10 * (1:ncol(z))   # 10 meter spacing (E to W)
+par(mar=rep(.5,4))
+persp(x, y, z, theta = 120, phi = 15, scale = FALSE, axes = FALSE)

File pubmed.R

View file
+pubmedQuery = function(search) {
+  library(XML)
+   return(xmlRoot(xmlTreeParse(sprintf("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%s&tool=R",search))))
+}
+
+getQueryCounts = function(genes,terms,baseline=NULL) {
+  retmat = matrix(rep(0,length(terms)*length(genes)),nc=length(terms))
+  colnames(retmat)=terms
+  rownames(retmat)=genes
+  pb = txtProgressBar(max=length(terms)*length(genes),style=3)
+  i = 0
+  for(gene in genes) {
+    for(term in terms) {
+      i=i+1
+      setTxtProgressBar(pb,i)
+      search = sprintf("%s AND %s",term,gene)
+      if(!is.null(baseline)) {
+        search = sprintf("%s AND %s",search,baseline)
+      }
+      doc=pubmedQuery(search)
+      if(length(getNodeSet(doc,'//ErrorList'))>0) {
+        # If we are here, an error or "not found" occurred.
+        retmat[gene,term]=0
+      } else {
+        retmat[gene,term]=as.integer(xmlValue(getNodeSet(doc,"/eSearchResult/Count")[[1]]))
+      }
+    }
+    Sys.sleep(0.33)
+  }
+  return(retmat)
+}

File pubmedR/DESCRIPTION

View file
+Package: pubmedR
+Type: Package
+Title: What the package does (short line)
+Version: 1.0
+Date: 2013-01-10
+Author: Who wrote it
+Maintainer: Who to complain to <yourfault@somewhere.net>
+Description: More about what it does (maybe more than one line)
+License: What license is it under?

File pubmedR/NAMESPACE

View file
+exportPattern("^[[:alpha:]]+")

File pubmedR/R/pubmed.R

View file
+pubmedQuery = function(search) {
+  library(XML)
+   return(xmlRoot(xmlTreeParse(sprintf("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%s&tool=R",search))))
+}
+
+getQueryCounts = function(genes,terms,baseline=NULL) {
+  retmat = matrix(rep(0,length(terms)*length(genes)),nc=length(terms))
+  colnames(retmat)=terms
+  rownames(retmat)=genes
+  pb = txtProgressBar(max=length(terms)*length(genes),style=3)
+  i = 0
+  for(gene in genes) {
+    for(term in terms) {
+      i=i+1
+      setTxtProgressBar(pb,i)
+      search = sprintf("%s AND %s",term,gene)
+      if(!is.null(baseline)) {
+        search = sprintf("%s AND %s",search,baseline)
+      }
+      doc=pubmedQuery(search)
+      if(length(getNodeSet(doc,'//ErrorList'))>0) {
+        # If we are here, an error or "not found" occurred.
+        retmat[gene,term]=0
+      } else {
+        retmat[gene,term]=as.integer(xmlValue(getNodeSet(doc,"/eSearchResult/Count")[[1]]))
+      }
+    }
+    Sys.sleep(0.33)
+  }
+  return(retmat)
+}

File pubmedR/Read-and-delete-me

View file
+* Edit the help file skeletons in 'man', possibly combining help files
+  for multiple functions.
+* Edit the exports in 'NAMESPACE', and add necessary imports.
+* Put any C/C++/Fortran code in 'src'.
+* If you have compiled code, add a useDynLib() directive to
+  'NAMESPACE'.
+* Run R CMD build to build the package tarball.
+* Run R CMD check to check the package tarball.
+
+Read "Writing R Extensions" for more information.

File pubmedR/man/getQueryCounts.Rd

View file
+\name{getQueryCounts}
+\alias{getQueryCounts}
+%- Also NEED an '\alias' for EACH other topic documented here.
+\title{
+%%  ~~function to do ... ~~
+}
+\description{
+%%  ~~ A concise (1-5 lines) description of what the function does. ~~
+}
+\usage{
+getQueryCounts(genes, terms, baseline = NULL)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{genes}{
+%%     ~~Describe \code{genes} here~~
+}
+  \item{terms}{
+%%     ~~Describe \code{terms} here~~
+}
+  \item{baseline}{
+%%     ~~Describe \code{baseline} here~~
+}
+}
+\details{
+%%  ~~ If necessary, more details than the description above ~~
+}
+\value{
+%%  ~Describe the value returned
+%%  If it is a LIST, use
+%%  \item{comp1 }{Description of 'comp1'}
+%%  \item{comp2 }{Description of 'comp2'}
+%% ...
+}
+\references{
+%% ~put references to the literature/web site here ~
+}
+\author{
+%%  ~~who you are~~
+}
+\note{
+%%  ~~further notes~~
+}
+
+%% ~Make other sections like Warning with \section{Warning }{....} ~
+
+\seealso{
+%% ~~objects to See Also as \code{\link{help}}, ~~~
+}
+\examples{
+##---- Should be DIRECTLY executable !! ----
+##-- ==>  Define data, use random,
+##--	or do  help(data=index)  for the standard data sets.
+
+## The function is currently defined as
+function (genes, terms, baseline = NULL) 
+{
+    retmat = matrix(rep(0, length(terms) * length(genes)), nc = length(terms))
+    colnames(retmat) = terms
+    rownames(retmat) = genes
+    pb = txtProgressBar(max = length(terms) * length(genes), 
+        style = 3)
+    i = 0
+    for (gene in genes) {
+        for (term in terms) {
+            i = i + 1
+            setTxtProgressBar(pb, i)
+            search = sprintf("\%s AND \%s", term, gene)
+            if (!is.null(baseline)) {
+                search = sprintf("\%s AND \%s", search, baseline)
+            }
+            doc = pubmedQuery(search)
+            if (length(getNodeSet(doc, "//ErrorList")) > 0) {
+                retmat[gene, term] = 0
+            }
+            else {
+                retmat[gene, term] = as.integer(xmlValue(getNodeSet(doc, 
+                  "/eSearchResult/Count")[[1]]))
+            }
+        }
+        Sys.sleep(0.33)
+    }
+    return(retmat)
+  }
+}
+% Add one or more standard keywords, see file 'KEYWORDS' in the
+% R documentation directory.
+\keyword{ ~kwd1 }
+\keyword{ ~kwd2 }% __ONLY ONE__ keyword per line

File pubmedR/man/pubmedQuery.Rd

View file
+\name{pubmedQuery}
+\alias{pubmedQuery}
+%- Also NEED an '\alias' for EACH other topic documented here.
+\title{
+%%  ~~function to do ... ~~
+}
+\description{
+%%  ~~ A concise (1-5 lines) description of what the function does. ~~
+}
+\usage{
+pubmedQuery(search)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{search}{
+%%     ~~Describe \code{search} here~~
+}
+}
+\details{
+%%  ~~ If necessary, more details than the description above ~~
+}
+\value{
+%%  ~Describe the value returned
+%%  If it is a LIST, use
+%%  \item{comp1 }{Description of 'comp1'}
+%%  \item{comp2 }{Description of 'comp2'}
+%% ...
+}
+\references{
+%% ~put references to the literature/web site here ~
+}
+\author{
+%%  ~~who you are~~
+}
+\note{
+%%  ~~further notes~~
+}
+
+%% ~Make other sections like Warning with \section{Warning }{....} ~
+
+\seealso{
+%% ~~objects to See Also as \code{\link{help}}, ~~~
+}
+\examples{
+##---- Should be DIRECTLY executable !! ----
+##-- ==>  Define data, use random,
+##--	or do  help(data=index)  for the standard data sets.
+
+## The function is currently defined as
+function (search) 
+{
+    library(XML)
+    return(xmlRoot(xmlTreeParse(sprintf("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=\%s&tool=R", 
+        search))))
+  }
+}
+% Add one or more standard keywords, see file 'KEYWORDS' in the
+% R documentation directory.
+\keyword{ ~kwd1 }
+\keyword{ ~kwd2 }% __ONLY ONE__ keyword per line

File pubmedR/man/pubmedR-package.Rd

View file
+\name{pubmedR-package}
+\alias{pubmedR-package}
+\alias{pubmedR}
+\docType{package}
+\title{
+What the package does (short line)
+~~ package title ~~
+}
+\description{
+More about what it does (maybe more than one line)
+~~ A concise (1-5 lines) description of the package ~~
+}
+\details{
+\tabular{ll}{
+Package: \tab pubmedR\cr
+Type: \tab Package\cr
+Version: \tab 1.0\cr
+Date: \tab 2013-01-10\cr
+License: \tab What license is it under?\cr
+}
+~~ An overview of how to use the package, including the most important ~~
+~~ functions ~~
+}
+\author{
+Who wrote it
+
+Maintainer: Who to complain to <yourfault@somewhere.net>
+~~ The author and/or maintainer of the package ~~
+}
+\references{
+~~ Literature or other references for background information ~~
+}
+~~ Optionally other standard keywords, one per line, from file KEYWORDS in ~~
+~~ the R documentation directory ~~
+\keyword{ package }
+\seealso{
+~~ Optional links to other man pages, e.g. ~~
+~~ \code{\link[<pkg>:<pkg>-package]{<pkg>}} ~~
+}
+\examples{
+~~ simple examples of the most important functions ~~
+}