Wiki
Clone wikigopher-pipelines / createsamplesheet
Create Sample Sheet
The createsamplesheet.pl script automatically creates a sample sheet file for a folder of fastq files.
The Script
The script performs the following steps:
- Identifies fastq files in the fastq folder
- Determines if the files are uncompressed or gzip compressed
- Determines if the folder is paired-end or single-end
- For each sample, the sample name is parsed from the fastq filename
- Primers are read in from the primer file, bases other than AGCTagct are converted to N (in qiime mode)
- Cutadapt is used to determine how many times each primer is seen in the first 4000 reads of each file (can be configured using -s option) (in qiime mode)
- A samplesheet is created using the most frequent primer for each sample (in qiime mode)
Sample names are not modified to be compatible with Qiime.
Input
Options for createsamplesheet.pl
-f folder | A folder containing fastq files to process |
-o file | Name of the output samplesheet file |
-z | fastq files are gzip compressed (filenames end with fastq.gz) |
-h | Print usage instructions and exit |
-v | Print more information while running (verbose) |
Fastq file support: Folders with either Paired-end or single-end fastq files are supported. Compressed (.fastq.gz") or uncompressed files are supported, but not a mix. Fastq files must have Illumina formatted names, or formatted as: sample_*_R1_*.fastq or sample_*_R1.fastq. Otherwise this script cannot determine the sample name of each file, or determine which files are R1 reads and which are R2. In that case a samplesheet must be created by hand, or the files renamed to a parseable format.
Output
- Sample sheet file
- Named "samplesheet.txt" by default
Columns in the sample sheet file
- #sample
- The first column in the file, contains the sample name, as parsed from the fastq file name, with forbidden characters converted to "."
- fastqR1
- Name of the R1 fastq file
- fastqR2
- Name of the R2 fastq file, only present for paired-end datasets
- BarcodeSequence
- Qiime-specific empty column,
- LinkerPrimerSequence
- Qiime-specific column, contains R1 primer sequence
- ReversePrimer
- Qiime-specific column, contains R2 primer sequence, only present for paired-end datasets
- Group
- The first two characters of the sample name (often sufficient to split the samples into apropriate experimental groups)
- Description
- Contains the sample name before forbidden characters were removed
Running the program
Load necessary software modules:
$ module load riss_util
Run the script. You must specify the location of a folder containing fastq files to process using the "-f" option:
$ createsamplesheet.pl -f /path/to/fastq/folder
Support
If you are having issues, please contact John Garbe at jgarbe@umn.edu
Updated