Wiki

Clone wiki

rTASSEL / Home

Getting Started with rTASSEL

Author: Brandon Monier

Last updated: 2020-04-22

Table of Contents

Introduction

Overview

Thanks for checking out rTASSEL! In this document, we will go over the functionalities used to work with the TASSEL software via R.

TASSEL is a software package used to evaluate traits associations, evolutionary patterns, and linkage disequilibrium. Strengths of this software include:

  1. The opportunity for a number of new and powerful statistical approaches to association mapping such as a General Linear Model (GLM) and Mixed Linear Model (MLM). MLM is an implementation of the technique which our lab's published Nature Genetics paper - Unified Mixed-Model Method for Association Mapping - which reduces Type I error in association mapping with complex pedigrees, families, founding effects and population structure.

  2. An ability to handle a wide range of indels (insertion & deletions). Most software ignore this type of polymorphism; however, in some species (like maize), this is the most common type of polymorphism.

More information can be found in the following paper:

Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.

Detailed documentation and source code can be found on our website:

https://www.maizegenetics.net/tassel

Motivation

The main goal of developing this package is to construct an R-based front-end to connect to a variety of highly used TASSEL methods and analytical tools. By using R as a front-end, we aim to utilize a unified scripting workflow that exploits the analytical prowess of TASSEL in conjunction with R's popular data handling and parsing capabilities without ever having the user to switch between these two environments.

Disclaimer

Due to the experimental nature of this package's lifecycle, end functionalities are prone to change after end-user input is obtained in the near future.

Installation and Preliminary Steps

Prerequisites - installing rJava

Since TASSEL is written in Java, Java JDK will need to be installed on your machine. Additionally, for R to communicate with Java, the R package rJava will need to be installed. Detailed information can be found using the following links, depending on your OS:

Potential rJava installation problems

macOS - you have upgraded Java and rJava no longer works...

If you previously had rJava working through RStudio, then you upgraded your Java and it now longer works, try the following:

At the command line type:

R CMD javareconf

Then check for a left over symbolic link via:

ls -ltr /usr/local/lib/libjvm.dylib

If the link exists, remove it, then create it fresh via these commands:

rm /usr/local/lib/libjvm.dylib
sudo ln -s $(/usr/libexec/java_home)/lib/server/libjvm.dylib /usr/local/lib

You should now be able to enter RStudio and setup rJava.

macOS - clang: error: unsupported option '-fopenmp'

Change clang paths to gcc (via Homebrew). Add the following to your .bash_profile file:

export LD=/usr/local/bin/gcc-9
export CC=/usr/local/bin/gcc-9
export CXX=/usr/local/bin/g++-9
export CPP=/usr/local/bin/cpp-9

alias c++=/usr/local/bin/c++-9
alias g++=/usr/local/bin/g++-9
alias gcc=/usr/local/bin/gcc-9
alias cpp=/usr/local/bin/cpp-9
alias ld=/usr/local/bin/gcc-9
alias cc=/usr/local/bin/gcc-9

NOTE: Make sure your gcc version that you have is 9.

Install from BitBucket

After you have rJava up and running on your machine, install rTASSEL by installing the source code from our BitBucket repository:

if (!require("devtools")) install.packages("devtools")
devtools::install_bitbucket(
    repo = "bucklerlab/rTASSEL",
    ref = "master",
    build_vignettes = FALSE
)

Preliminary steps

Setting Memory

Since genome-wide association analyses can use up a lot of computational resources, memory allocation to rTASSEL can be modified. To change the amount of memory, use the base options() function and modify the following parameter:

options(java.parameters = c("-Xmx<memory>", "-Xms<memory>"))

Replace <memory> with a specified unit of memory. For example, if I want to allocate a maximum of 6 GB of memory for my operations, I would use the input "-Xmx6g", where g stands for gigabyte (GB). More information about memory allocation can be found here.

NOTE: Setting Java memory options for rTASSEL and any rJava-related packages needs to be set before loading the rTASSEL package!

Loading rTASSEL

After source code has been compiled and optional Java memory options have been set, the package can be loaded into your environment using:

library(rTASSEL)

Or, if you want to use a function without violating your environment you can use rTASSEL::<function>, where <function> is an rTASSEL function.

The importance of logging your progress

Before we begin analyzing data, optional parameters can be set up to make rTASSEL more efficient. To prevent your R console from being overloaded with TASSEL logging information, it is highly recommended that you start a logging file. This file will house all of TASSEL's logging output which is beneficial for debugging and tracking the progress of your analytical workflow. To start a logging file, use the following command:

rTASSEL::startLogger(fullPath = NULL, fileName = NULL)

If the rTASSEL::startLogger() parameters are set to NULL, the logging file will be created in your current working directory. If you are unsure of what your working directory is in R, use the base getwd() command.

Additionally, since this is a general walkthrough, certain intricaces of each function may glossed over. If you would like to study a function in full, refer to the R documentation by using ?<function> in the console, where <function> is an rTASSEL-based function.

Reading Data

Overview

Like TASSEL, rTASSEL will read two main types of data:

  • Genotype data
  • Phenotype data

This data can be read in several different ways. In the following examples, we will demonstrate various ways genotype and phenotype information can be loaded into rTASSEL objects.

Loading genotype data

From a path

Currently, reading in genotype data to rTASSEL is based off of file locations as paths. Genotype/sequencing data can be stored in a variety of formats. rTASSEL can read and store a wide variety of file types:

  • hapmap (HMP)
  • HDF5 (hierarchical data format version 5)
  • VCF (variant call format)
  • Plink

To load this genotype data, simply store your file location as a string object in R. For this example, we will load two toy data sets - one being a VCF file and the other being a hapmap file. These data sets can be accessed via the rTASSEL package itself:

# Load hapmap data
genoPathHMP <- system.file(
    "extdata",
    "mdp_genotype.hmp.txt",
    package = "rTASSEL"
)
genoPathHMP
## [1] "/home/bm646/Development/R/library/rTASSEL/extdata/mdp_genotype.hmp.txt"
# Load VCF data
genoPathVCF <- system.file(
    "extdata",
    "maize_chr9_10thin40000.recode.vcf",
    package = "rTASSEL"
)
genoPathVCF
## [1] "/home/bm646/Development/R/library/rTASSEL/extdata/maize_chr9_10thin40000.recode.vcf"

Now that we have the file paths to this data, we can pass this to TASSEL and create a formal TasselGenotypePhenotype class object in R using the following:

# Load in hapmap file
tasGenoHMP <- rTASSEL::readGenotypeTableFromPath(
    path = genoPathHMP
)

# Load in VCF file
tasGenoVCF <- rTASSEL::readGenotypeTableFromPath(
    path = genoPathVCF
)

When we call these objects, a summary of the data will be posted to the R console:

tasGenoHMP
## A TasselGenotypePhenotype Dataset
##   Class.............. TasselGenotypePhenotype 
##   Taxa............... 281 
##   Positions.......... 3093 
##   Taxa x Positions... 869133 
## ---
##   Genotype Table..... [x]
##   Phenotype Table.... [ ]

This summary details the number of Taxa (Taxa) and marker positions (Positions) within the data set. Additionally, since we can load both genotype and phenotype information into this object, a helpful check will be displayed to show what is populating the object ([x] or [ ]).

Additional information about TasselPhenotypeGenotype data sets

In general, this S4 class data object houses "slot" information relating to TASSEL/Java pointers of the respective data.

class(tasGenoHMP)
## [1] "TasselGenotypePhenotype"
## attr(,"package")
## [1] "rTASSEL"
slotNames(tasGenoHMP)
## [1] "name"            "jTasselObj"      "jTaxaList"       "jPositionList"  
## [5] "jGenotypeTable"  "jPhenotypeTable"

Technically, this object does not contain the full information of the data represented in R space, but merely contains addresses to the memory store of the reference TASSEL object ID. For example, if we wanted to extract the GenotypeTable with the S4 @ operator, we would get something that looks like this:

tasGenoHMP@jGenotypeTable
## [1] "Java-Object{net.maizegenetics.dna.snp.CoreGenotypeTable@2cfb4a64}"

This entity is a rJava internal identifier. It isn't until we call downstream rTASSEL functions where we will bring the TASSEL data into the R environment.

Loading phenotype data

From a path

Similar to reading in genotype data, phenotype data can also be read in via paths. If you already have preconstructed phenotype data in a file, this option will most likely work best for you. One caveat to this is how the data file is constructed in terms of columns and trait data for TASSEL analyses. More information about how these files can be found at this link under the Numerical Data section.

Loading this type of data is very similar to how genotype data is loaded. here, we will use the rTASSEL::readPhenotypeFromPath() function:

# Read from phenotype path
phenoPath  <- system.file("extdata", "mdp_traits.txt", package = "rTASSEL")
phenoPath
## [1] "/home/bm646/Development/R/library/rTASSEL/extdata/mdp_traits.txt"
# Load into rTASSEL `TasselGenotypePhenotype` object
tasPheno <- rTASSEL::readPhenotypeFromPath(
    path = phenoPath
)

# Inspect object
tasPheno
## A TasselGenotypePhenotype Dataset
##   Class.............. TasselGenotypePhenotype 
##   Taxa............... 301 
##   Positions.......... NA 
##   Taxa x Positions... NA 
## ---
##   Genotype Table..... [ ]
##   Phenotype Table.... [x]
## ---
##   Traits: Taxa EarHT dpoll EarDia

The object output is very similar to the genotype table output with some minor additions to which traits are displayed in the file.

From an R data frame

In some cases you might want to first modify your phenotype data set in R and then load it into the TASSEL environment. If you wish to choose this route, you will need to use the rTASSEL::readPhenotypeFromDataFrame() function along with a couple of parameters. First, we will construct an R data frame and load it with this function:

# Create phenotype data frame
phenoDF <- read.table(phenoPath, header = TRUE)
colnames(phenoDF)[1] <- "Taxon"

# Inspect first few rows
head(phenoDF)
##   Taxon EarHT  dpoll  EarDia
## 1   811 59.50 -999.0 -999.00
## 2 33-16 64.75   64.5 -999.00
## 3 38-11 92.25   68.5   37.90
## 4  4226 65.50   59.5   32.22
## 5  4722 81.13   71.5   32.42
## 6  A188 27.50   62.0   31.42
# Load into rTASSEL `TasselGenotypePhenotype` object
tasPhenoDF <- rTASSEL::readPhenotypeFromDataFrame(
    phenotypeDF = phenoDF,
    taxaID = "Taxon",
    attributeTypes = NULL
)

# Inspect new object
tasPhenoDF
## A TasselGenotypePhenotype Dataset
##   Class.............. TasselGenotypePhenotype 
##   Taxa............... 301 
##   Positions.......... NA 
##   Taxa x Positions... NA 
## ---
##   Genotype Table..... [ ]
##   Phenotype Table.... [x]
## ---
##   Traits: Taxa EarHT dpoll EarDia

The phenotypeDF parameter is for the R data frame object. The taxaID parameter is needed to determine which column of your data frame is your TASSEL taxa data. The final parameter (attributeTypes) is optional. If this parameter is set to NULL, all remaining data frame columns will be classified as TASSEL data types. If this is not the case for your data (e.g. if you have covariate or factor data in your experiment), you will need to specify which columns are what TASSEL data type (i.e. data, covariate, or factor). This will have to be passed as an R vector of string elements (e.g. c("data", "factor", "covariate")). Currently, this data type needs to be entered in the same order as they are found in the data frame.

Loading genotype and phenotype data simultaneously

In association studies, we are interested in combining our genotype and phenotype data. To usually run this operation in TASSEL, an intersect combination between the two data sets is needed. To run this in rTASSEL, we can use the rTASSEL::readGenotypePhenotype() function. The parameter input needed for this function is, of course, a genotype and phenotype object. For genotype input, the following can be used:

  • a path to a genotype file
  • a prior TasselGenotypePhenotype object

For phenotype input, the following can be used:

  • a path to a phenotype data set,
  • a prior TasselGenotypePhenotype object
  • an R data frame

For example, if we wanted to read the prior TasselGenotypePhenotype genotype and phenotype objects from earlier:

tasGenoPheno <- rTASSEL::readGenotypePhenotype(
    genoPathOrObj = tasGenoHMP,
    phenoPathDFOrObj = tasPheno
)
tasGenoPheno
## A TasselGenotypePhenotype Dataset
##   Class.............. TasselGenotypePhenotype 
##   Taxa............... 279 
##   Positions.......... 3093 
##   Taxa x Positions... 862947 
## ---
##   Genotype Table..... [x]
##   Phenotype Table.... [x]
## ---
##   Traits: Taxa EarHT dpoll EarDia

We can also use a combination of the above parameter options (e.g. load a genotype path and a phenotype data frame, etc.). One caveat though, if you load in a phenotype data frame object with this function, the prior parameters from the rTASSEL::readPhenotypeFromDataFrame will be needed (i.e. the taxaID and attributeTypes parameters):

tasGenoPhenoDF <- rTASSEL::readGenotypePhenotype(
    genoPathOrObj = genoPathHMP,
    phenoPathDFOrObj = phenoDF,
    taxaID = "Taxon",
    attributeTypes = NULL
)
tasGenoPhenoDF
## A TasselGenotypePhenotype Dataset
##   Class.............. TasselGenotypePhenotype 
##   Taxa............... 279 
##   Positions.......... 3093 
##   Taxa x Positions... 862947 
## ---
##   Genotype Table..... [x]
##   Phenotype Table.... [x]
## ---
##   Traits: Taxa EarHT dpoll EarDia

Extracting data from rTASSEL objects

Get genotype data

If you want to bring in genotype data into the R environment, you can use the rTASSEL::getSumExpFromGenotypeTable() function. All this function needs is a TasselGenotypePhenotype class object containing a genotype table:

tasSumExp <- rTASSEL::getSumExpFromGenotypeTable(
    tasObj = tasGenoPheno
)
tasSumExp
## class: RangedSummarizedExperiment 
## dim: 3093 279 
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(3): tasselIndex refAllele altAllele
## colnames(279): 33-16 38-11 ... WF9 YU796NS
## colData names(2): Sample TasselIndex

As you can see above, the gentoype object is returned as a SummarizedExperiment class R object. More information about these objects can be found here.

From this object, we extract taxa information and their respective TASSEL integer locations:

SummarizedExperiment::colData(tasSumExp)
## DataFrame with 279 rows and 2 columns
##           Sample TasselIndex
##         <factor>   <integer>
## 33-16      33-16           0
## 38-11      38-11           1
## 4226        4226           2
## 4722        4722           3
## A188        A188           4
## ...          ...         ...
## W22          W22         274
## W64A        W64A         275
## WD            WD         276
## WF9          WF9         277
## YU796NS  YU796NS         278

We can also extract the allelic marker data using SummarizedExperiment::rowData():

SummarizedExperiment::rowData(tasSumExp)
## DataFrame with 3093 rows and 3 columns
##      tasselIndex   refAllele   altAllele
##        <integer> <character> <character>
## 1              0           A           C
## 2              1           C           G
## 3              2           G           T
## 4              3           A           T
## 5              4           A           G
## ...          ...         ...         ...
## 3089        3088           A           G
## 3090        3089           C           G
## 3091        3090           C           G
## 3092        3091           A           G
## 3093        3092           C           G

And get marker coordinates using SummarizedExperiment::rowRanges()

SummarizedExperiment::rowRanges(tasSumExp)
## GRanges object with 3093 ranges and 3 metadata columns:
##          seqnames    ranges strand | tasselIndex   refAllele   altAllele
##             <Rle> <IRanges>  <Rle> |   <integer> <character> <character>
##      [1]        1    157104      + |           0           A           C
##      [2]        1   1947984      + |           1           C           G
##      [3]        1   2914066      + |           2           G           T
##      [4]        1   2914171      + |           3           A           T
##      [5]        1   2915078      + |           4           A           G
##      ...      ...       ...    ... .         ...         ...         ...
##   [3089]       10 147819958      + |        3088           A           G
##   [3090]       10 148332885      + |        3089           C           G
##   [3091]       10 148488692      + |        3090           C           G
##   [3092]       10 148816538      + |        3091           A           G
##   [3093]       10 148907116      + |        3092           C           G
##   -------
##   seqinfo: 10 sequences from an unspecified genome; no seqlengths

Extract phenotype data

If you want to bring in phenotype data into the R environment, you can use the rTASSEL::getPhenotypeDF() function. All this function needs is a TasselGenotypePhenotype class object containing a phenotype table:

tasExportPhenoDF <- rTASSEL::getPhenotypeDF(
    tasObj = tasGenoPheno
)
tasExportPhenoDF
## # A tibble: 279 x 4
##    Taxa   EarHT dpoll EarDia
##    <chr>  <dbl> <dbl>  <dbl>
##  1 33-16   64.8  64.5  NaN  
##  2 38-11   92.2  68.5   37.9
##  3 4226    65.5  59.5   32.2
##  4 4722    81.1  71.5   32.4
##  5 A188    27.5  62     31.4
##  6 A214N   65    69     32.0
##  7 A239    47.9  61     36.1
##  8 A272    35.6  70    NaN  
##  9 A441-5  53.5  67.5   35.0
## 10 A554    38.5  66     33.4
## # … with 269 more rows

As shown above, an R tibble-based data frame is exported with converted data types translated from TASSEL. See the following table what TASSEL data types are tranlated into within the R environment:

TASSEL Data Converted R Data type
taxa character
data numeric
covariate numeric
factor factor

Filtering genotype data

Prior to association analyses, filtration of genotype data may be necessary. In TASSEL, this accomplished through the Filter menu using two primary plugins:

  • Filter Site Builder plugin
  • Filter Taxa Builder plugin

In rTASSEL, this can also be accomplished using the follwing functions:

  • rTASSEL::filterGenotypeTableSites()
  • rTASSEL::filterGenotypeTableTaxa()

These objects take a TasselGenotypePhenotype class object. For example, in our genotype data set, if we want to remove monomorphic and low coverage sites, we could use the following parameters in rTASSEL::filterGenotypeTableSites():

tasGenoPhenoFilt <- rTASSEL::filterGenotypeTableSites(
    tasObj = tasGenoPheno,
    siteMinCount = 150,
    siteMinAlleleFreq = 0.05,
    siteMaxAlleleFreq = 1.0
)
tasGenoPhenoFilt
## A TasselGenotypePhenotype Dataset
##   Class.............. TasselGenotypePhenotype 
##   Taxa............... 279 
##   Positions.......... 2555 
##   Taxa x Positions... 712845 
## ---
##   Genotype Table..... [x]
##   Phenotype Table.... [x]
## ---
##   Traits: Taxa EarHT dpoll EarDia

We can then compare this to our original pre-filtered data set:

tasGenoPheno
## A TasselGenotypePhenotype Dataset
##   Class.............. TasselGenotypePhenotype 
##   Taxa............... 279 
##   Positions.......... 3093 
##   Taxa x Positions... 862947 
## ---
##   Genotype Table..... [x]
##   Phenotype Table.... [x]
## ---
##   Traits: Taxa EarHT dpoll EarDia

These functions can work on any TasselGenotypePhenotype class object that contains genotypic data, regardless of single or combined TASSEL objects.

Analysis - Relatedness

Create a kinship matrix object

In TASSEL, for mixed linear model analyses, a kinship matrix calculated from genotype data is necessary. This can be accomplished by calculating a kinship TASSEL object using the function rTASSEL::kinshipMatrix(). The main parameter input is a TasselGenotypePhenotype class object that contains a genotype data set:

tasKin <- rTASSEL::kinshipMatrix(tasObj = tasGenoPheno)

Since this object is essentially a TASSEL reference object, it is relatively not of use until association analysis. We can coerce this TASSEL object by using the following function:

# Get full R matrix
tasKinRMat <- rTASSEL::kinshipToRMatrix(tasKin)

# Inspect the first 5 rows and columns
tasKinRMat[1:5, 1:5]
##           33-16     38-11     4226      4722     A188
## 33-16  1.796955  0.039558  0.06060 -0.008674  0.03524
## 38-11  0.039558  1.910151  0.01960 -0.006317 -0.02938
## 4226   0.060600  0.019603  1.92646 -0.017025  0.02449
## 4722  -0.008674 -0.006317 -0.01703  1.453315 -0.02040
## A188   0.035244 -0.029376  0.02449 -0.020401  2.00156

As you can see, we can take the initial TASSEL object and convert it into a standard R object of matrix class.

Calculate a distance matrix

Very similar to kinship matrix calculation, a distance matrix can also be calculated using genotype data

tasDist <- rTASSEL::distanceMatrix(tasObj = tasGenoPheno)

And just like the kinship matrix, we can also coerce this into an R matrix

# Get full R matrix
tasDistRMat <- rTASSEL::distanceToRMatrix(tasDist)

# Inspect the first 5 rows and columns
tasDistRMat[1:5, 1:5]
##        33-16  38-11   4226   4722   A188
## 33-16 0.0000 0.2799 0.2794 0.2698 0.2882
## 38-11 0.2799 0.0000 0.2920 0.2777 0.3042
## 4226  0.2794 0.2920 0.0000 0.2863 0.2999
## 4722  0.2698 0.2777 0.2863 0.0000 0.2920
## A188  0.2882 0.3042 0.2999 0.2920 0.0000

Analysis - Association

Overview

One of TASSEL's most powerful functionalities is its capability of performing a variety of different association modeling techniques. If you have started reading the walkthrough here it is strongly suggested that you read the other components of this walkthrough since the following parameters require what we have previously created!

If you are not familar with these methods, more information about how these operate in base TASSEL can be found at following links:

The rTASSEL::assocModelFitter() function has several primary components:

  • tasObj: a TasselGenotypePhenotype class R object
  • formula: an R-based linear model formula
  • fitMarkers: a boolean parameter to differentiate between BLUE and GLM analyses
  • kinship: a TASSEL kinship object
  • fastAssociation: a boolean parameter for data sets that have many traits

Probably the most important concept of this function is formula parameter. If you are familar with standard R linear model functions, this concept is fairly similar. In TASSEL, a linear model is composed of the following scheme:

y ~ A

...where y is any TASSEL data type and A is any TASSEL covariate and / or factor types:

<data> ~ <covariate> and/or <factor>

This model can be written out in several ways. With an example phenotype data, we can have the following variables that are represented in TASSEL in the following way:

  • Taxon <taxa>
  • EarHT <data>
  • dpoll <data>
  • EarDia <data>
  • location <factor>
  • Q1 <covariate>
  • Q2 <covariate>
  • Q3 <covariate>

Using this data, we could write out the following formula in R

list(EarHT, dpoll, EarDia) ~ location + Q1 + Q2 + Q3

In the above example, we use a base list() function to indicate analysis on multiple numeric data types. For covariate and factor information, we use + operator. One problem with this implementation is that it can become cumbersome and prone to error if we want to analyze the entirety of a large data set or all data and/or factor and covariate types.

A work around for this problem is to utilize a special character to indicate all elements within the model (.). By using the . operator we can simplify the above model into the following:

. ~ .

This indicates we want to analyze the whole data set and leave nothing out. If we want to analyze all data types and only a handful of factor and/or covariates, we can use something like this:

. ~ location + Q1 + Q2

Or vice-versa:

list(EarHT, dpoll) ~ .

Take note we can be very specific with what we want to include in our trait model! In the above example we have deliberately left out EarDia from our model.

Additionally, we can also fit marker and kinship data to our model which can change our analytical methods. Since these options in TASSEL are binary, additional parameters are passed for this function. In this case, genotype/marker data is fitted using the fitMarker parameter and kinship is fitted using the kinship parameter.

Fast Association implements methods described by Shabalin (2012). This method provides an ordinary least squares solution for fixed effect models. For this method to proper work it is necessary that your have:

  • No missing data in your phenotype data set
  • Phenotypes and genotypes have been merged using an intersect join. Since this is currently the only option of join genotype and phenotype data, you do not have to worry about this for now.

NOTE: since we are working with "toy" data, empirical insight will not be elucidated upon in the following steps. This is simply to show the user how properly use these functions and the outputs that they give.

In the following examples, we will run example data and in return, obtain TASSEL association table reports in the form of an R list object containing tibble-based R data frames.

Calculate BLUEs

To caclulate best linear unbiased estimates (BLUEs), numeric phenotype data can be used along with covariate and factor data only if it is intended to control for field variation. Since genotype data is not needed for this method, we can leave the fitMarkers, kinship, and fastAssociation to NULL or FALSE:

# Read in phenotype data
phenoPathCov <- system.file("extdata", "mdp_phenotype.txt", package = "rTASSEL")
tasPhenoCov <- rTASSEL::readPhenotypeFromPath(phenoPathCov)

# Calculate BLUEs
tasBLUE <- rTASSEL::assocModelFitter(
    tasObj = tasPhenoCov,
    formula = . ~ .,                  # <- All data is used!
    fitMarkers = FALSE,
    kinship = NULL,
    fastAssociation = FALSE
)
## Running all traits...

## Association Analysis : BLUEs
# Return BLUE output
tasBLUE
## $BLUE
## # A tibble: 284 x 4
##    Taxa   EarDia EarHT dpoll
##    <chr>   <dbl> <dbl> <dbl>
##  1 33-16    47.2  82.6  51.7
##  2 38-11    46.8 107.   56.6
##  3 4226     56.7  81.9  53.8
##  4 4722     46.1  92.8  62.5
##  5 A188     43.3  44.5  49.9
##  6 A214N   185.   74.1 109. 
##  7 A239     52.1  63.7  51.1
##  8 A272     38.3  37.7  67.9
##  9 A441-5   40.5  59.8  61.1
## 10 A554     46.6  53.7  53.3
## # … with 274 more rows
## 
## $BLUE_ANOVA
## # A tibble: 3 x 9
##   Trait       F         p taxaDF taxaMS errorDF errorMS modelDF modelMS
##   <chr>   <dbl>     <dbl>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 EarDia   6.93 1.67e- 46    281   28.5     241    4.11     284    29.1
## 2 EarHT  167.   2.30e-225    281  696.      274    4.18     284   815. 
## 3 dpoll   12.1  1.01e- 76    283   49.4     269    4.09     286    68.9

Calculate GLM

Similar to BLUEs, we can fit a generalized linear model (GLM) by simply fitting marker data to our model. For this, we need a genotype data set combined with our phenotype data in a TasselGenotypePhenotype class object:

# Calculate GLM
tasGLM <- rTASSEL::assocModelFitter(
    tasObj = tasGenoPheno,             # <- our prior TASSEL object
    formula = list(EarHT, dpoll) ~ .,  # <- only EarHT and dpoll are ran
    fitMarkers = TRUE,                 # <- set this to TRUE for GLM
    kinship = NULL,
    fastAssociation = FALSE
)
## Running all non <data> traits and/or <taxa>...

## Association Analysis : GLM
# Return GLM output
tasGLM
## $GLM_Stats
## # A tibble: 6,186 x 18
##    Trait Marker Chr      Pos marker_F       p marker_Rsq   add_F   add_p
##    <fct> <chr>  <fct>  <dbl>    <dbl>   <dbl>      <dbl>   <dbl>   <dbl>
##  1 EarHT PZB00… 1     1.57e5   5.02   0.00721   0.0359   8.47    0.00391
##  2 EarHT PZA01… 1     1.95e6   1.08   0.300     0.00404  1.08    0.300  
##  3 EarHT PZA03… 1     2.91e6   0.514  0.474     0.00189  0.514   0.474  
##  4 EarHT PZA03… 1     2.91e6   0.0467 0.954     0.000340 0.00678 0.934  
##  5 EarHT PZA03… 1     2.92e6   1.74   0.188     0.00668  1.74    0.188  
##  6 EarHT PZA03… 1     2.92e6   1.49   0.223     0.00578  1.49    0.223  
##  7 EarHT PZA00… 1     2.97e6   4.03   0.0190    0.0307   6.40    0.0120 
##  8 EarHT PZA02… 1     3.21e6   3.08   0.0477    0.0230   1.91    0.168  
##  9 EarHT PZA02… 1     3.21e6   0.227  0.634     0.000897 0.227   0.634  
## 10 EarHT PZA00… 1     3.21e6   1.21   0.301     0.00882  0.111   0.740  
## # … with 6,176 more rows, and 9 more variables: dom_F <dbl>, dom_p <dbl>,
## #   marker_df <dbl>, marker_MS <dbl>, error_df <dbl>, error_MS <dbl>,
## #   model_df <dbl>, model_MS <dbl>, minorObs <dbl>
## 
## $GLM_Genotypes
## # A tibble: 14,416 x 7
##    Trait Marker     Chr       Pos   Obs Allele Estimate
##    <fct> <chr>      <fct>   <dbl> <dbl> <fct>     <dbl>
##  1 EarHT PZB00859.1 1      157104    63 A        -18.7 
##  2 EarHT PZB00859.1 1      157104   207 C        -10.2 
##  3 EarHT PZB00859.1 1      157104     3 M          0   
##  4 EarHT PZA01271.1 1     1947984   131 C         -2.60
##  5 EarHT PZA01271.1 1     1947984   137 G          0   
##  6 EarHT PZA03613.2 1     2914066    78 G         -1.96
##  7 EarHT PZA03613.2 1     2914066   196 T          0   
##  8 EarHT PZA03613.1 1     2914171    69 A          4.17
##  9 EarHT PZA03613.1 1     2914171   207 T          4.36
## 10 EarHT PZA03613.1 1     2914171     2 W          0   
## # … with 14,406 more rows

Calculate MLM

Adding to our complexity, we can fit a mixed linear model (MLM) by adding kinship to our analysis. In addition to the prior parameters, we will also need a TASSEL kinship object (see Create a kinship matrix object in the Analysis - Relatedness section):

# Calculate MLM
tasMLM <- rTASSEL::assocModelFitter(
    tasObj = tasGenoPheno,             # <- our prior TASSEL object
    formula = EarHT ~ .,               # <- run only EarHT
    fitMarkers = TRUE,                 # <- set this to TRUE for GLM
    kinship = tasKin,                  # <- our prior kinship object
    fastAssociation = FALSE
)
## Running all non <data> traits and/or <taxa>...

## Association Analysis : MLM
# Return GLM output
tasMLM
## $MLM_Stats
## # A tibble: 3,094 x 18
##    Trait Marker Chr       Pos    df        F        p add_effect     add_F
##    <fct> <chr>  <fct>   <dbl> <dbl>    <dbl>    <dbl>      <dbl>     <dbl>
##  1 EarHT None   <NA>       NA     0 NaN      NaN         NaN     NaN      
##  2 EarHT PZB00… 1      157104     2   1.67     0.191      -2.60    3.22   
##  3 EarHT PZA01… 1     1947984     1   2.98     0.0853    NaN     NaN      
##  4 EarHT PZA03… 1     2914066     1   0.480    0.489     NaN     NaN      
##  5 EarHT PZA03… 1     2914171     2   0.0377   0.963       0.126   0.00676
##  6 EarHT PZA03… 1     2915078     1   0.673    0.413     NaN     NaN      
##  7 EarHT PZA03… 1     2915242     1   0.744    0.389     NaN     NaN      
##  8 EarHT PZA00… 1     2973508     2   1.04     0.357       0.616   0.158  
##  9 EarHT PZA02… 1     3205252     2   3.93     0.0208     -1.25    0.377  
## 10 EarHT PZA02… 1     3205262     1   0.0182   0.893     NaN     NaN      
## # … with 3,084 more rows, and 9 more variables: add_p <dbl>,
## #   dom_effect <dbl>, dom_F <dbl>, dom_p <dbl>, errordf <dbl>,
## #   MarkerR2 <dbl>, `Genetic Var` <dbl>, `Residual Var` <dbl>,
## #   `-2LnLikelihood` <dbl>
## 
## $MLM_Effects
## # A tibble: 7,211 x 7
##    Trait Marker     Locus    Site Allele Effect   Obs
##    <fct> <chr>      <fct>   <dbl> <fct>   <dbl> <dbl>
##  1 EarHT PZB00859.1 1      157104 A       -7.91    63
##  2 EarHT PZB00859.1 1      157104 C       -2.72   207
##  3 EarHT PZB00859.1 1      157104 M        0        3
##  4 EarHT PZA01271.1 1     1947984 C       -4.46   131
##  5 EarHT PZA01271.1 1     1947984 G        0      137
##  6 EarHT PZA03613.2 1     2914066 G        1.93    78
##  7 EarHT PZA03613.2 1     2914066 T        0      196
##  8 EarHT PZA03613.1 1     2914171 A        2.80    69
##  9 EarHT PZA03613.1 1     2914171 T        2.55   207
## 10 EarHT PZA03613.1 1     2914171 W        0        2
## # … with 7,201 more rows
## 
## $MLM_Residuals_EarHT
## # A tibble: 279 x 2
##    Taxa    EarHT
##    <chr>   <dbl>
##  1 33-16   0.836
##  2 38-11   7.34 
##  3 4226    4.05 
##  4 4722    5.80 
##  5 A188   -5.33 
##  6 A214N   1.27 
##  7 A239   -1.09 
##  8 A272   -6.00 
##  9 A441-5 -0.780
## 10 A554   -5.13 
## # … with 269 more rows

Calculate Fast Association

Finally, we can run fast association analysis in our GLM model by setting the fastAssociation parameter to TRUE. NOTE: this is only really effective if you have many phenotype traits:

# Read data - need only non missing data!
phenoPathFast <-system.file(
    "extdata",
    "mdp_traits_nomissing.txt",
    package = "rTASSEL"
)

# Creat rTASSEL object - use prior TASSEL genotype object
tasGenoPhenoFast <- rTASSEL::readGenotypePhenotype(
    genoPathOrObj = tasGenoHMP,
    phenoPathDFOrObj = phenoPathFast
)


# Calculate MLM
tasFAST <- rTASSEL::assocModelFitter(
    tasObj = tasGenoPhenoFast,         # <- our prior TASSEL object
    formula = . ~ .,                   # <- run all of the phenotype data
    fitMarkers = TRUE,                 # <- set this to TRUE for GLM
    kinship = NULL,
    fastAssociation = TRUE             # <- set this to TRUE for fast assoc.
)
## Running all traits...

## Association Analysis : Fast Association
# Return GLM output
tasFAST
## $FastAssociation
## # A tibble: 640 x 7
##    Trait  Marker      Chr       Pos    df     r2          p
##    <fct>  <chr>       <fct>   <dbl> <dbl>  <dbl>      <dbl>
##  1 dpoll  PZA00258.3  1     2973508     1 0.0435 0.000466  
##  2 EarDia PZA02869.8  1     4429897     1 0.0458 0.000325  
##  3 EarDia PZA02869.4  1     4429927     1 0.0606 0.0000331 
##  4 EarDia PZA02869.2  1     4430055     1 0.0504 0.000160  
##  5 EarDia PHM2244.142 1     5562502     1 0.0473 0.000260  
##  6 dpoll  PZA00181.2  1     8366411     1 0.0775 0.00000241
##  7 dpoll  PZA00528.1  1     8367944     1 0.0412 0.000661  
##  8 dpoll  PZA00447.6  1     9023947     1 0.0585 0.0000462 
##  9 EarHT  PZA00447.8  1     9024005     1 0.0469 0.000275  
## 10 dpoll  PZA00447.8  1     9024005     1 0.0449 0.000376  
## # … with 630 more rows

Updated