Sean Davis avatar Sean Davis committed b97835c

working vignette

Comments (0)

Files changed (3)

+Cheung-015.pdf
+Cheung-017.pdf
+Cheung-024.pdf
+Cheung.aux
+Cheung.log
+Cheung.out
+Cheung.tex
+Rplots.pdf

inst/doc/Cheung.Rnw

 
 <<loadaffy>>=
 library(affy)
+library(CheungSubsetCelData)
 @ 
 
 To get an overview of the affy package, use:
 abatch
 @ 
 
-The \texttt{annotation} accessor function is used to determine the array type after loading affy data into R.  In this case, the array type is the ``hgfocus'' array.
+The \texttt{annotation} accessor function is used to determine the array type after loading affy data into R.  In this case, the array type is the ``hgfocus'' array.  \textit{Note that the information stored in the AffyBatch object is ``raw data''.  It has not been summarized to probeset, yet, for example}.  
 
 \section{Exploratory Data Analysis}
 
 summary(pm(abatch))
 @ 
 
-\section{histograms}
+\subsection{histograms}
 
 \textbf{Exercise 2}:  Create a matrix of the \texttt{pm} intensities.  Then, make a histogram of the intensities for the first array.  Is this plot useful to you?  How could you improve it?  (Hint: you might want to transform the data before plotting).
 
 
 <<exercise3,results=hide,echo=false>>=
 pmmat = pm(abatch)
-hist(pmmat[,1])
-hist(log2(pmmat[,1]))
+plot(density(pmmat[,1]))
 @ 
 
+\textbf{Exercise 4}:  Look at the help for the \texttt{plotDensity.AffyBatch} function and use the function to compare the densities for all the arrays.  Do the arrays have similar intensity distributions?  
 
-<<results=hide,echo=false>>=
-pmmat = pm(abatch)
-plot(density(pmmat[,1]))
+<<exercise4,echo=false,results=hide>>=
+plotDensity.AffyBatch(abatch)
+@ 
+
+\subsection{boxplots}
+
+In addition to density plots and histograms, a boxplot can be quite useful for examine at a high level the relative distributions between samples.  
+
+\textbf{Exercise 5}:  Create a boxplot using the ``pm'' intensity matrix.  Again, is this helpful, or do you need to transform the data?  Do these intensity distributions appear to be similar to each other?  As an aside, what happens if you call ``boxplot'' on the AffyBatch object directly?  
+
+<<exercise5,echo=false,results=hide>>=
+boxplot(pmmat)
+boxplot(log2(pmmat))
+boxplot(abatch)
+@ 
+
+\subsection{Scatterplots}
+
+We can compare arrays in a pairwise fashion to look for similarity and for nonlinear effects.  
+
+\textbf{Exercise 6}: Make a \texttt{pairs} plot of the first four samples in the AffyBatch object.  Hint: the AffyBatch class has a method for the \texttt{pairs} method.  Do any samples in the pairs plot look unusual or problematic?
+
+<<exercise6,results=hide,echo=false>>=
+pairs(abatch[,1:4])
 @ 
 
-\textbf{Exercise 4}:  
+\subsection{MA plots}
+
+Remember that MA plots are useful to:
+
+\begin{itemize}
+  \item{Compare arrays to each other}
+  \item{Look for linear and non-linear trends that need to be corrected}
+\end{itemize}
+
+\textbf{Exercise 7}:  Look at the help for the affy package to find a method that creates an MA plot (or plots) and use it to look at the MA plots for the first four arrays.  Do you see any effects that appear to need some correction?
+
+<<exercise7,results=hide,echo=false>>=
+MAplot(abatch[,1:4])
+@ 
+
+\textbf{Exercise 8}:  Try using the \texttt{Mbox} method on your AffyBatch object.  What is it showing you?  
+
+<<exercise8,results=hide,echo=false>>=
+Mbox(abatch)
+@ 
+
+\subsection{Image Plots}
+
+Microarrays are physical media that are subject to scratches, defects, and other systematic problems that may show up as a recognizable problem on the array.  Since modern microarrays store the position location of the intensities on the physical array, we can plot the data as it was arranged on the array.  This is not a routine plot, but it can be useful to check for arrays that appear to be significant outliers or problems.
+
+<<fig=true>>=
+image(abatch[,1],main='First array')
+@ 
+
+
+\section{Normalization}
+
+We are going to ignore the background subtraction step for the purposes of this vignette and move along to normalization.  The goal of normalization is to remove as much technical bias as possible while conserving biological variability.  There are a number of methods for normalization in the affy package.  The \texttt{normalize.methods} function returns the possible normalization methods that we can apply to our raw AffyBatch dataset.
+
+<<>>=
+# ask about the methods for normalization for our AffyBatch object
+normalize.methods(abatch)
+@ 
+
+Our goal in this section is to try at least one normalization method and then to evaluate the changes with regard to reducing some of the problems we saw in the raw data.  
+
+\subsection{Apply a normalization method}
+
+Here, I am going to apply ``quantiles'' normalization to our raw AffyBatch object; ``quantiles'' normalization is the default.  
+
+<<normbatch>>=
+normbatch = normalize(abatch,method='quantiles.robust')
+class(normbatch)
+normbatch
+@ 
+
+The object returned by our normalization is itself an AffyBatch object, so all the work done in the section above working with the raw AffyBatch object will pay off to examine the new normalized AffyBatch object.
+
+\textbf{Exercise 8}:  Make a density plot, Mbox plot, and some MA plots of the normalized data.  Are the problems that you noted with the raw AffyBatch object improved or removed?
+
+<<exercise8,results=hide,echo=false>>=
+plotDensity.AffyBatch(normbatch)
+Mbox(normbatch)
+par(mfrow=c(2,2))
+MAplot(normbatch[,1:4])
+@ 
+
+
+\section{Getting an ExpressionSet}
+
+Remember that preprocessing involves the following four steps:
+
+\begin{enumerate}
+  \item{Background correction}
+  \item{Normalization}
+  \item{Perfect-match/mismatch correction}
+  \item{Summarization}
+\end{enumerate}
+
+The affy package offers a function called \texttt{expresso} that allows custom combinations of choices for each step.  The available options for each step are given by the respective lookup functions.
+
+<<>>=
+bgcorrect.methods()
+normalize.methods(abatch) # this one takes an object as an argument--???
+pmcorrect.methods()
+express.summary.stat.methods()
+@ 
+
+The most popular approach for working with affymetrix data is to use rma.  The choices for each step of the RMA approach are given in Table \ref{tab:rma}.
+
+\begin{table}
+\begin{centering}
+  \begin{tabular}{c | c}
+  Step & Method \\
+  \hline
+  Background Correction & RMA \\
+  Normalization & quantiles \\
+  PM/MM correction & pmonly \\
+  Summarization & median polish \\
+  \hline
+  \end{tabular}
+  \label{tab:rma}
+  \caption{RMA settings for the steps of preprocessing of affy arrays}
+\end{centering}
+\end{table}
+
+Applying rma to the abatch object is very straightforward, though, as the affy package has a wrapper for rma.  
+
+<<rma>>=
+eset = rma(abatch)
+@ 
+
+I have also created some sample information stored in the data object ``samples''.  Here, I am going to assign it to our eset object pData slot.
+
+<<eset>>=
+data(samples)
+head(samples)
+pData(eset)=samples
+sampleNames(eset)=paste0(samples$Patient,samples$rep)
+@ 
+
+These samples are actually pairs of replicates.  If we want to look at the high-level structure of the data, we can use a simple hierarchical clustering.  Here is one example.  Feel free to modify it to try other clustering distance metrics or linkage methods.
+
+<<fig=true>>=
+plot(hclust(dist(t(exprs(eset)))))
+@ 
+
+One of the pairs does not cluster together, so we may need to spend some time sorting that detail out; it may be a technical issue, but it may also be a sample naming mixup.  Since these samples came from a larger cohort of patients, such sample labeling errors are not uncommon.
+
+Feel free to experiment further with clustering and data exploration.  Once we are satisfied that we have sorted out data quality issues,  with an ExpressionSet in hand, we can move forward to other tasks such as hypothesis testing.
+
 
 \begin{thebibliography}{1}
 

Binary file added.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.