The SraTailor manual Last update: 11-Sep-2014 Shinya Oki

The SraTailor manual
Last update: 11-Sep-2014
Shinya Oki
Kyushu University
Video tutorials are available from following URL.
http://www.devbio.med.kyushu-u.ac.jp/sra_tailor/
Contact: [email protected]
1
0. Introduction
A growing number of researchers have published studies that used high-throughput sequencing
technology, such as ChIP-seq. In principle, these authors are obligated to deposit their raw sequence
data into public databases. For example, GEO (Gene Expression Omnibus) at NCBI contains
thousands of raw sequence datasets in the form of SRAs (Sequence Read Archives); these files are
freely accessible, and allow the full datasets to be exploited by other researchers. However,
downloading and processing the raw data requires sophisticated skills with CLI (Command-Line
Interface) and scripting language.
To overcome this hurdle, we have developed SraTailor, a simple GUI (Graphical User Interface)
application. SraTailor converts the SRAs of published ChIP-seq experiments into BigWig files,
which can be graphically visualized on genome browsers. SraTailor requires only the accession
number of the GEO repository; once the user has provided this information, SraTailor automatically
processes the information in the SRAs into BigWig or other formats.
This manual for SraTailor is composed of three chapters. Chapter 1 is an SraTailor tutorial
recommended for all users. Chapter 2 is an advanced tutorial, including how to obtain the GSM
accession number (2.2) and how to produce BigWig files and other file types such as BAM and
peak-call (2.3). Chapter 3 is for users who are skilled with CLI or work with sequencers.
2
Contents
1. Getting Started ............................................................................................................................. 4
1.1. Add Genome ......................................................................................................................................... 4
1.2. Make BigWig ........................................................................................................................................ 4
1.3. Viewer (IGV) ........................................................................................................................................ 5
1.4. Sequential submissions ......................................................................................................................... 6
2. Advanced Tutorial ....................................................................................................................... 7
2.1. Add Genome (mouse mm9 assembly) .................................................................................................. 7
2.2. Get the GSM number ............................................................................................................................ 7
2.2.1. Obtaining GSM numbers from published papers ........................................................................................... 7
2.2.2. Obtaining GSM numbers by keyword search ................................................................................................ 7
2.3. Make BigWig and other file formats ..................................................................................................... 8
2.4. View BigWig and other file formats ..................................................................................................... 9
3. Advanced Usage of SraTailor ................................................................................................... 11
3.1. Peak calling with input DNA data as control ...................................................................................... 11
3.2. Additional tools for mapping and peak-calling ................................................................................... 11
3.3. Options for mapping anf peak-calling ................................................................................................. 12
3.4. Files as input ........................................................................................................................................ 13
4. FAQs ........................................................................................................................................... 15
5. Appendices .................................................................................................................................. 18
5.1. Algorithmic flowchart of SraTailor ..................................................................................................... 18
5.2. File composition of SraTailor.app ....................................................................................................... 19
6. References ................................................................................................................................... 20
3
1. Getting Started (20 min)
This tutorial is basically for the Mac version of SraTailor. The algorithms and operation of
Ubuntu/Windows SraTailor are almost identical to those of the Mac version, except for the
graphics.
1. 1. Add Genome (2–3 min)
SraTailor requires library files from the genome assemblies of your interest. The library files are
used for mapping the reads, converting file formats, and so on. This section demonstrates how to
prepare library files, using an illustrative example involving the yeast genome (sacCer3). This
genome was chosen for the example, because it is relatively small and the process can be completed
quickly.
(1) Launch SraTailor.
(2) Select “Add Genome” from a “menu” window and click the “Continue” button.
(3) Select “sacCer3 (S.cerevisiae)” from the “Add genome” list window and click the “OK” button.
“Add Genome Error” message will appear if your Mac is not connected to the Internet or Energy Saver is turned on.
- “Unconnected to internet.” => Connect your Mac to the internet.
- “Inactivate computer sleep mode.” => See 2.4. section of “Installation manual of Mac SraTailor”.
(4) It takes 1–2 minutes to prepare the sacCer3 libraries*. The process is shown in a Terminal
window, which must not be closed until the process is completed.
(*) It takes between a few minutes and a few hours to prepare a genome library, depending on the genome size: For
example, it takes 1–2 min for S. cerevisiae, 10 min for Drosophila and C. elegans, and 1–2 hr for mammals.
(5) You can close the Terminal window once the following messages are shown.
sacCer3 libraries have been prepared.
You can close this window.
Now you are ready to go to the next section (1.2).
1.2. Make BigWig (2-3 min)
You are now ready to make BigWig files with yeast sacCer3 libraries. This section demonstrates
how to make BigWig files from GEO (Gene Expression Omnibus) accession number GSM461562,
which is an MNase-seq data of S. cerevisiae.
(1) Launch SraTailor.
(2) Select “Make BigWig” from the “menu” window and click the “Continue” button.
An error message will appear if your Mac is not connected to the Internet or Energy Saver is turned on.
- “Unconnected to internet.” => Connect your Mac to internet.
- “Inactivate computer sleep mode.” => See 2.4. section of “Installation manual of Mac Sratailor”.
4
(3) Enter information in the “Make BigWig” window as follows:
Input: GSM461562
Genome: sacCer3 (S.cerevisiae)
Filename: MNase
(or any arbitrary word)
Save to: Desktop
(4; only for Mac) Press the tab key or click the cartoon image at the bottom of the “Make BigWig”
window to reflect editing of the text fields, and click the “Continue” button. Please see Q1 on FAQs
(chapter 4) for the detail about the cartoon image.
(5) When the “Final confirmation” window appears, click the “Run” button.
(6) It takes a few minutes to create the BigWig file. The process is shown in a Terminal window,
which must not be closed until the process is completed. If a sentence regarding host key
confirmation appears as follows, input “y” and then press the “Enter” key.
The server's host key is not cached. You have no guarantee
:
Store key in cache? (y/n)
(7) You can close the Terminal window if completed.
(8) “MNase_GSM461562.bw” file can be found in the “MNase_GSM461562” folder on the Desktop.
1.3. Viewer (IGV) (2-3 min)
BigWig is the most suitable file format for visualization of the extent of mapped reads (so-called
coverage scores). Genome browser IGV, which was installed with the initial settings, allows us to
smoothly browse the BigWig files.
(1) Launch SraTailor.
(2) Select “Viewer (IGV)” from the “menu” window and click the “Continue” button.
(3) IGV is launched and an “IGV” window is opened.
(4) Select “More…” from the Genome list on the top left in IGV window.
(5) Select “S. cerevisiae (sacCer3)” from the “Genomes to add to list” window, and click “OK”.
(6) From the menu bar of IGV, select “File” and then “Load from File”.
t
e&lis
om
n
e
G
o
om
Chr
som
t
e&lis
t&fi
Tex
eld
Zoo
ut
m&o
Zoo
m&in
ruler
Track&name
track
5
(7) Open the BigWig file produced in the earlier step (1.2).
Desktop/MNase_GSM461562/MNase_GSM461562.bw
(8) View the BigWig file and arrange it as you like. Try the mouse operations suggested below.
- Click and drag the ruler and track area horizontally.
- Click Zoom buttons at the upper right of the window.
- Set to “Autoscale” by control- or right-clicking the track name (MNase_GSM461562.bw).
- Enter a gene name of interest in the text field (eg. YML032C).
See “Help” from IGV menu bar to learn more about navigation.
1.4. Sequential submissions (~ 10 min)
If you want SraTailor to produce multiple BigWig files from multiple GSM numbers, you can
submit the GSM numbers sequentially to the “Make BigWig” window. In this tutorial, three BigWig
files will be produced by sequentially submitting three GSMs from yeast MNase-seq data.
(1) Launch SraTailor.
(2) Select “Make BigWig” from the “menu” window, and click the “Continue” button.
(3) Enter parameters as shown in (A) in the table below, and click the “Run” button.
(4) Repeat (1)–(3) with parameters (B).
(5) Repeat (1)–(3) with parameters (C).
Now, you can find three Terminal windows; one is processing (A), and the other two are pausing
(B and C) with following message.
Waiting for previous Make BigWig tasks to complete...
After a few minutes, the first submission (A) will be completed, and the processing will be
resumed automatically for the second submission (B), and finally for the third (C).
6
2. Advanced Tutorial (2–3 hrs)
This chapter demonstrates how to visualize ChIP-seq data, using a dataset for Pou5f1 (also
known as Oct3/4) from mouse embryonic stem cells. In this tutorial, you will learn about GEO
accession numbers, the file formats produced by SraTailor, and how to make and browse these files.
2.1. Add Genome (mouse mm9 assembly) (1–2 hrs)
According to section 1.1, prepare mouse mm9 genome libraries. Because it takes 1–2 hours for
this operation to complete, take a long coffee break.
2.2. Get the GSM number
The GSM number is required for SraTailor to produce BigWig and other format files. GSM
numbers can be obtained by two ways: from published papers (2.2.1) and by keyword search (2.2.2).
If you want to submit accession numbers not beginning with GSM (eg. SRX, DRX, ERX), see Q10
on FAQs (chapter 4).
2.2.1. Obtaining GSM numbers from published papers
(1) Download the paper from this link. (Ang YS et al, 2011 Cell 145(2):183-97)
http://www.ncbi.nlm.nih.gov/pubmed/21477851
(2) You will find “GSE22934” in the ACCESSION NUMBERS section at pp 196.
(3) Open the GEO DataSets web site from the link below.
http://www.ncbi.nlm.nih.gov/gds/
(3) Enter “GSE22934” in the text field and click “Search”.
(4) Click the top-hit link as shown below.
Wdr5 mediates self-renewal and reprogramming via the embryonic stem cell core transcriptional network
(5) A GSE record is a series of biologically comparable samples processed with the same platform,
and each set of sample data is deposited as GSM records. All the GSM records in a GSE are shown
by clicking “More…” under “Samples” at the middle left of the page. In this tutorial, “GSM566277
Oct4_ChIP-seq”
is our target, so you can enter “GSM566277” as the input for the “Make BigWig”
operation.
2.2.2. Obtaining GSM numbers by keyword search
(1) Open the GEO DataSets web site from the link below.
http://www.ncbi.nlm.nih.gov/gds/
(2) You can perform text searches for keywords of interest. For example, enter the following words:
7
ChIP-seq Oct4 Wdr5
(3) Click the top-hit link as shown below.
Wdr5 mediates self-renewal and reprogramming via the embryonic stem cell core transcriptional network
(4) You can find “GSM566277” as described in 2.2.1 (5).
2.3. Make BigWig and other file formats (20–30 min)
SraTailor is able to produce not only BigWig files, but also files of other formats. A description
of these formats is provided later (2.4), but we can go ahead and make some now.
(1) Launch SraTailor.
(2) Select “Make BigWig” from the “menu” window and click the “Continue” button.
(3) In the “Make BigWig” window, enter text and check options as shown below.
SraTailor(on(Mac
SraTailor(on(Ubuntu
(4; only for Mac) Press the tab key or click the cartoon image at the bottom of the “Make BigWig”
window to reflect editing of the text fields before clicking the “Continue” button. Please see Q1 on
8
FAQs (chapter 4) for the detail about the cartoon image.
(5) When the “Final confirmation” window appears, click the “Run” button.
(6) A dialog will ask you to install FastQC (see 2.4. (8) for the details). Allow to install it, and
accept the agreement.
(7) Wait about 10–30 minutes. When the process is completed, you can close the Terminal window.
(8) You can find the “Pou5f1_mESC_GSM566277” folder on the Desktop.
2.4. View BigWig and other file formats
(1) Open the IGV window according to the steps in 1.3 [(1)–(3)].
(2) Select “More…” from the genome list at the top left of the IGV window.
(3) Select “Mouse mm9” from the “Genomes to add to list” window, and click “OK”.
(4) From IGV menu bar, select “File” and then “Load from File”.
(5) Open the following files, located in the “Pou5f1_mESC_GSM566277” folder on the Desktop.
Pou5f1_mESC_GSM566277_peaks.bb
Pou5f1_mESC_GSM566277_summits.bb
Pou5f1_mESC_GSM566277.bam
Pou5f1_mESC_GSM566277.bw
(6) Enter the gene name “Pou5f1” in the text field of the IGV window.
(7) After arranging the view, you can see that transcription factor Pou5f1 binds to its own upstream
enhancer, as shown below.
(8) The files now opened in IGV are all binary files, shown in green in the table (next page). The
9
binary files are written in machine language, and are made from text files shown on the left column
(pink). The contents of the text files can be viewed with text editors (TextEdit.app on Mac; gedit on
Ubuntu) whereas binary files cannot. The text files can be visualized with IGV, but viewing them is
extremely slow, whereas binary files are much more suitable for smooth browsing. Opening the text
files with text editor and viewing binaries with IGV will help you understand about differences
between these file formats. To learn more, go to the UCSC Genome Bioinformatics Site
(https://genome.ucsc.edu/FAQ/FAQformat.html)
In addition, Pou5f1_mESC_GSM566277_fastqc.html file is provided if you check Fastq/QC checkbox.
If you double-click it, you can see the summary of the read quality (so-called quality check).
10
3. Advanced Usage of SraTailor
This chapter is recommended for users who are skilled in processing sequence data using CLI
tools such as bowtie2 and MACS, or who have worked with high-throughput sequencers.
3.1. Peak calling with input DNA data as control (30–60 min)
MACS, one of the most widely used peak-calling algorithms, is installed at the initial launch of
SraTailor and is executed for peak calling when users check “Bed” and “BigBed” in “Make BigWig”
window (2.3). The precision of peak detection is increased if the experimental data are processed
along with control sequence data such as genomic DNA not subjected to ChIP or subjected to ChIP
with a control antibody. In this tutorial, you will obtain the BED file for the Pou5f1 ChIP-seq from
mESCs by processing the data along with a pre-made control BAM file.
(1) Open “GSE22934” at the GEO web site as shown in 2.2.1 (5).
(2) Find “GSM566281 Input DNA” as the control sample for “GSM566277 Oct4_ChIP-seq”.
(3) Launch SraTailor.
(4) Select “Make BigWig” from the “menu” window and click the “Continue” button.
(5) Fill in the fields as shown below; check “BAM” in the “Optional Settings” field, and click “Run”.
Input: GSM566281
Genome: mm9 (Mouse)
Filename: Input
(or an arbitrary word)
Save to: Desktop
(6) Input_GSM566281.bam will be generated in the “Input_GSM566281” folder on the Desktop.
(7) Launch SraTailor again.
(8) Select “Make BigWig” from the “menu” window and click the “Continue” button.
(9) Fill in the fields as shown below, and check “BED” and “BigBed” at “Optional Settings” field.
Input: GSM566277
Genome: mm9 (Mouse)
Filename: Pou_vs_Input
(or arbitrary word)
Save to: Desktop
(10) Check “Select control BAM” (only for Mac) and select the Input_GSM566281.bam given earlier.
(11) .bed and .bb files are generated in the “Pou_vs_Input_GSM566281” folder on the Desktop. In
IGV, compare these peak-call data with the results obtained without a control
(Pou5f1_mESC_GSM566277_peaks.bb; produced at 2.4).
3.2. Additional tools for mapping and peak-calling (10 min)
As the default, mapping and peak calling are processed with Bowtie2 and MACS, which can
optionally be replaced with Bowtie or BWA and MACS2, respectively.
11
(1) Fill in the fields as shown below, and check “BED” or “BigBed” at “Optional Settings” field.
Input: GSM461562
Genome: sacCer3 (S.cerevisiae)
Filename: test
(or arbitrary word)
Save to: Desktop
(2; only for Mac) Check “Options for mapping and peak-calling”.
(3) Select “Bowtie” and “MACS2”, and run.
(Mac)
(Ubuntu)
(4) Bowtie and MACS2 are additionally installed only when these tools are selected for the first
time. Index files for Bowtie is prepared as well.
3.3. Options for mapping and peak-calling
SraTailor executes Bowtie2 and MACS for mapping and peak calling, respectively. Both tools
can be executed with various optional settings in CLI operation. For example, the Bowtie2 –N 1
option allows mapping with at most one mismatch per seed, and MACS -p 1e-10 option sets the
P-value cutoff for peak detection to 1E-10. Such options are also available in SraTailor in the same
manner as in the CLI. For example:
(A)
(B)
(C)
(A) Bowtie2 with --very-sensitve -k2 options.
(B) BWA in BWA-MEM mode with -k15 -a options.
(C) MACS2 in callpeak mode with --broad –q 1e-10 options.
$ bowtie2 -p INT -t [Options] -x index -q in.fq -S out.sam
$ bowtie -p INT -t --sam [Options] index -q in.fq -S out.sam
$ bwa [Options*] -t INT index in.fq > out.sai | out.sam
$ macs14 -t in.bam [-c ctrl.bam ] -f BAM -g GSIZE -n out [Options]
$ macs2 callpeak -t in.bam [-c ctrl.bam ] -f BAM -g GSIZE -n out [Options]
12
Commands scripted in SraTailor is shown above table. Users’ optional settings are located at
[Options]. Italics indicate variables (INT, maximum thread number; index, path to indexes for
mapping; GSIZE, genome size parameters [mm, hs, dm, or ce]; and the others are input or output file
names). [Options*] must begin with aln, bwasw or mem for BWA-backtrack, BWA-SW or
BWA-MEM, respectively. Because MACS2 is preset with the callpeak sub-command in SraTailor,
additional users’ optional settings are limited to the callpeak mode.
Main features of mapping tools are as follows:
Critical options of Bowtie2
Mapping time and sensitivity decrease in the following order:
--very-sensitive > --sensitive (default) > --fast > --very-fast.
Mapping with much more sensitivity: -N1 (allows at most one mismatch per seed)
Mapping with higher coverage: -a or -k <INT> (reports all or up to <INT> alignments per read)
Trimming the reads: -5 or -3 <INT> (trims <INT> bases at the 5’ or 3’ end of each read)
Local alignment mode: --local (omits mismatched bases at one or both ends)
MACS and MACS2 are used for peak calling, and the latter accepts more advanced optional
settings. As the default, both algorithms are designed for ChIP-seq data of transcription factors (i.e.
narrow peak) and return two types of BED-formatted files describing peak regions and summit
positions. --broad is a MACS2-specific option for broad peak calling (e.g., in data obtained from
ChIP for modified histones or RNA polymerase II), and provides two BED files in broadPeak and
gappedPeak format. Other critical options are -p and –q (MACS2 only) to set the statistical
threshold, and -m to set the enrichment ratio against background.
3.3. Files as input
In addition to GSM numbers, SraTailor also accepts Fastq files as input. Therefore, if you
have performed ChIP-seq experiments, you can process your Fastq files in SraTailor as the input,
resulting in peak calling and conversion into BigWig files. In the same way, SAM- and
BAM-formatted files are also accepted as input, allowing the mapping process to be skipped before
13
calculation of the coverage and peak-call data. Take care that the extension of the input files should
be .fq (Fastq), .sam (SAM) or .bam (BAM).
For practice, you can process the Fastq file produced in 2.3.
(1) Open the “Pou5f1_mESC_GSM566277” folder on the Desktop, and find the Fastq file produced at
2.3 (Pou5f1_mESC_GSM566277.fq).
(2) Launch SraTailor.
(3) Select “Make BigWig” from the “menu” window and click the “Continue” button.
(4) Make sure that the “Input” text field is empty and a text cursor is in the field.
(5; for Mac) Drag the Pou5f1_mESC_GSM566277.fq icon into the text field, and release your mouse.
(5; for Ubuntu) Right-click the Pou5f1_mESC_GSM566277.fq and copy. Then, right-click the “Input”
text field, and paste there.
(6) The text field will be filled with the file path. The Fastq file is then converted to BigWig and
other formats after running.
14
4. FAQs
Q1. In the “Make BigWig” window, text in the “Input” and “Filename” fields is not inputted correctly.
A1. Pressing the tab key or moving the cursor elsewhere fixes the text input. Alternatively, click the
cartoon image at the bottom of the window; this is equivalent to pressing the tab key. Make sure
that your text does not include spaces or symbols such as * - { ( “ (underscores are permitted).
Q2. Why is Xcode needed?
A2. In the initial settings of SraTailor (1.3), bedtools and samtools are downloaded and installed
with the make command that is provided by Xcode. Therefore, you may uninstall Xcode after the
initial setup of SraTailor
Q3. I closed the Terminal window during the “Initial settings”, “Add Genome”, or “Make BigWig”
processes. Will my computer be affected?
A3. No. Although some intermediate files are left in your hard disk, they will be removed when you
retry the processing.
Q4. The “Initial settings”, “Add Genome”, or “Make BigWig” process does not end.
A4. This may be due to network disturbance. See the Terminal.app window and find following
indication.
% Total
% Received % Xferd
Average Speed
Dload Upload
Time
Total
Time
Spent
Time
Left
Current
Speed
If “Current Speed” is “0”, a network disturbance is occurring. Check whether your computer is
correctly connected to the internet. If the condition does not improved, you may ask the
administrator of the network at your institute.
Q5. Which genome assembly should I select for each species?
A5. mm10, for example, is the latest version of the mouse genome assembly, but earlier version of
mm9 is often preferred in the genomics field because more knowledge and research has
accumulated regarding the earlier assembly. Which is objectively “better” is beyond our scope, but
you can find out which is preferred by examining the number of records returned by GEO keyword
searches for the names of genome assemblies (e.g., mm9 or mm10).
Q6. What is indicated by the y-axis values shown in BigWig tracks in the IGV browser?
A6. The y-axis indicates the coverage score scaled against the total number of mapped reads. More
precisely, the number of the reads mapped at a specific location is divided by the number of total
mapped reads, in millions. Such values are so-called RPM (Reads Per Million mapped reads) values,
and are useful in comparing coverage scores between tracks. For example:
15
Q7. What is the values shown in BigBed tracks in IGV browsing?
A7. The higher the value, the more statistically significant the region is. More precisely, the values
are calculated by the MACS peak caller as -10 * Log10(P-value).
For example:
BED value = 256
==> MACS P-value = 10-25.6
In MACS2, the values are calculated with Q-values instead of P-values.
Q8. Why is peak calling restricted to human, mouse, C. elegans, and yeast genomes?
A8. The MACS peak caller requires a genome size value for each species, and this has only been
calculated for the species listed above. The author of this manual does not know the genome size
values of the species other than those mentioned. Please let us know these values if you learn them.
Q9. Does SraTailor accept platforms besides ChIP-seq?
A9. SraTailor accepts platforms in which Fastq data are directly mapped on reference genome, e.g.,
DNA-seq, MNase-seq, and FAIRE-seq. RNA-seq is also acceptable, but the reads at splice
junctions are not mapped. HiC and Methyl-seq are not acceptable because their Fastq files require
intermediate processing prior to mapping.
Q10. Tell me acceptable accession numbers not beginning with GSM.
A10. SRAs are most frequently deposited in GEO, but some are in other repositories (NCBI, DDBJ,
and ENA). The table below shows accession numbers equivalent to GSE and GSM of GEO.
SraTailor accepts accession numbers beginning with letters shown in pink.
For example, download the following paper in PDF format.
Waki H et al, 2011 PLoS Genet Oct;7(10):e1002311 (http://www.ncbi.nlm.nih.gov/pubmed/22028663)
You will find “DDBJ accession number: DRA000378” on p. 14. Search for DRA000378 with
Google, and open the link in the top hit (DRA Search). On the right of the page, you will find
accession numbers beginning with DRX, which can be accepted by SraTailor.
Q11. How to customize the “Make BigWig” pipeline?
A11. The “Make BigWig” process is programmed in run.sh in a sh directory (see Appendices). You
can replace the tools and algorithms with your customized ones by rewriting the shell scripts in a
text editor.
Q12. Can I remove tools installed by SraTailor if I have already installed them?
A12. You need to rewrite the “Make BigWig” process (run.sh in a sh directory) to replace the path
of tools with those you want to use. After this operation, you can remove unnecessary tools
16
installed by SraTailor (in a bin directory; see Appendices).
Q13. How to update SraTailor?
A13. Open the Terminal.app, and enter the following command.
curl http://devbio.med.kyushu-u.ac.jp/SraTailor/update.sh| sh
Q14. How to uninstall SraTailor?
A14. (Mac) Open the “Applications” folder, and move the SraTailor.app to “Trash”.
(Ubuntu) Move the SraTailor folder at home directory to “Trash”.
17
5. Appendices
5.1. Algorithmic flowchart of SraTailor
“Make BigWig” (left) and “Add Genome” (right).
18
5.2. File composition of SraTailor.app
19
6. References
[Aspera Connect]
http://downloads.asperasoft.com/connect2/
[SRA Toolkit]
http://www.ncbi.nlm.nih.gov/books/NBK56560/
[Bowtie 2]
Langmead, B. and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat.
Methods. 9, 357-359
http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
[Bowtie]
Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L. (2009). Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome. Genome Biol 10, R25.
http://bowtie-bio.sourceforge.net/index.shtml
[BWA]
Li, H. and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 25, 1754–1760.
Li, H. and Durbin, R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler
transform. Bioinformatics 26, 589–595.
http://bio-bwa.sourceforge.net
[FastQC]
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
[SAMtools]
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G.,
Durbin, R., and 1000 Genome Project Data Processing Subgroup. (2009). The Sequence
alignment/map (SAM) format and SAMtools. Bioinformatics 25, 2078-9.
http://samtools.sourceforge.net
[BEDTools]
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing
genomic features. Bioinformatics 26, 841-2.
http://bedtools.readthedocs.org/en/latest/
20
[MACS]
Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C.,
Myers, R.M., Brown, M., Li, W., Liu, and X.S. (2008). Model-based analysis of ChIP-Seq
(MACS). Genome Biol. 9, R137.
http://liulab.dfci.harvard.edu/MACS/
[MACS2]
https://github.com/taoliu/MACS/
[IGV]
Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and
Mesirov, J.P. (2011). Integrative genomics viewer. Nat. Biotechnol. 29, 24-6.
http://www.broadinstitute.org/igv/
[UCSC Utilities]
http://hgdownload.cse.ucsc.edu/admin/exe/
[UCSC Sequence and Annotation Downloads]
http://hgdownload.soe.ucsc.edu/downloads.html
21