Tools for Maximising the Value of Genomic Data

WEHI Postgraduate Seminar Series 2003
Tools for Maximising the Value
of Genomic Data
Keith Satterley, Bioinformatics,
The Walter & Eliza Hall Institute of Medical research
2nd. June 2003
[email protected]
http://bioinf.wehi.edu.au/resources/presentations.html
Overview
1.
Genomic data – what is it, where is it
1.
Gene Finding
1.
2.
GenScan
Comparitive Genomics
1.
Gene Finding
1.
2.
3.
Finding Regulatory Regions
1.
2.
3.
2.
rVista
Consite
Toucan
Programming Tools
1.
Languages
1.
2.
3.
4.
2.
3.
4.
Slam
Twinscan
Perl
BioPerl
BioJava
Bio???
Slipper-a Perl program & results
Link References
Aknowledgements
1953
2003
http://www.geneticscongress2003.com/index.php
• Genomic data
– Whole genome data sets. According to
http://www.ebi.ac.uk/genomes/ as at 28-May-03
•
•
•
•
•
•
•
Archea – 16
Bacteria – 107
Organelles – 308
Phages – 112
Plasmids – 280
Viroids – 40
Viruses – 880
• TOTAL:1743
Eukaryota (completed chromosomes)
Description
Chromosomes
Anopheles gambiae: Ensembl project data
2L 2R 3L 3R X
MUSTARD: Arabidopsis thaliana complete genome:
I II III IV V
I II III IV V
Proteome pages
WORM:Caenorhabditis elegans complete genome
I II III IV V
X
FastA
Proteome pages
FLY: Drosophila melanogaster complete genome
X,2-4,Y
FastA
Proteome pages
Encephalitozoon cuniculi complete genome
I II III IV V
VI VII VIII IX X
XI
I II III IV V
VI VII VIII IX X
XI
HUMAN:Homo sapiens complete genome:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
Proteome pages
source MIPS
Ensembl project data
Homo sapiens complete genome parts: CON files
14 21q
Leishmania major
1
MOUSE:Mus musculus complete genome: Ensembl
123456
7 8 9 10 11
12 13 14 15
16 17 18 19 X
project data
Oryza sativa
1
Plasmodium falciparum
1 2 3 4 5 6 7 8 9 10 11 12 13 14
RAT:Rattus norvegicus complete genome:
123456
7 8 9 10 11
12 13 14 15
16 17 18 19 20 X
Ensembl project data
YEAST:Saccharomyces cerevisiae strain S288C complete
genome
YEAST:Schizosaccharomyces pombe strain 972hcomplete genome
Trypanosoma brucei
http://www.ebi.ac.uk/genomes/
Proteins
Proteome pages
I-XVI
FastA
Proteome pages
I-III
I II III
Proteome pages
1
http://gnn.tigr.org/sequenced_genomes/genome_guide_p1.shtml
gnn.tigr.org
GOLD – Genomes Online Database
http://www.genomesonline.org/
It’s a Fact:
Count @ 1 base per second, 24 hours a day,
It would take you about
95 years to count the DNA in one cell.
Francis Collins
Director, National Human Genome Research Institute
25th. April 2003
“Here in the very month of the 50th anniversary of the
discovery of DNA’s double helix, I am pleased and honored
— perhaps I should say exhilarated — to declare the goals
of the Human Genome Project to be completed.”
3.1 Million years
To count to 100
Trillion!
“..the information that will matter to you about your life is a fraction of your genetic
code — probably less than 1 percent.” J.Craig Venter, 25-04-2003(Bio-IT World)
http://www.genomesonline.org/
Most Recent Genomics News
BETHESDA, Md., May 20, 2003
By June, researchers from the Whitehead/MIT Center and the
Genome Sequencing Center at Washington University School
of Medicine expect to complete the sequencing work
(approximately four-fold coverage) necessary to create an
initial working draft of the genome of the chimpanzee (Pan
troglodytes).
The Whitehead/MIT team expects to complete a high-quality
draft of the dog genome sequence within the next 12 months.
After the genome of the boxer is sequenced, researchers plan
to sample and analyze DNA from 10 to 20 other dog breeds,
including the beagle, to study genetic variation within the
canine species.
http://www.genome.gov/11007358
Gene Finding
• Gene finding is about detecting coding regions and
inferring gene structure.
• Gene finding is difficult.
• DNA sequence signals have low information
content (degenerated and highly unspecific)
• It is difficult to discriminate real signals
• Sequencing errors
• Prokaryotes: High gene density and simple gene structure,
Short genes have little information, Overlapping genes.
• Eukaryotes: Low gene density and complex gene structure
Alternative splicing, Pseudo-genes.
Gene Finding
A Good Gene Finding Review has been
prepared by Lorenzo Cerutti of the Swiss
Institute of Bioinformatics. It is an EMBNet
course, (September 2002) entitled “Gene
Finding”.
It is at:
http://www.ch.embnet.org/CoursEMBnet/Pages02/slides/gene_finding.pdf
Gene Finders
•
GenScan - Uses generalized hidden Markov models to predict complete gene structure
http://genes.mit.edu/GENSCAN.html
•
MZEF - Designed to predict only internal coding exons.
•
http://www.cshl.org/genefinder
FGENES – Uses linear discriminant analysis.
http://genomic.sanger.ac.uk/gf/gf.shtml
•
GeneFinder:
http://www.cshl.org/genefinder
•
GRAIL 1,1a,2
http://compbio.ornl.gov
•
HMMgene - Designed to predict complete gene structure.
http://genome.cbs.dtu.dk/services/HMMgene
•
Genewise - Uses HMMs. Genewise is part of the Wise2 package:
http://www.sanger.ac.uk/Software/Wise2.
•
Procrustes - Predicts gene structure from homology found in proteins.
http://hto-13.usc.edu/software/procrustes/index.html
•
GeneMark.hmm. Recently modified to predict gene structure in eukaryotes.
http://opal.biology.gatech.edu/GeneMark
•
Geneid. Recently updated to a new and faster version.
http://www1.imim.es/geneid.html
Gene Finders
Gene Finders
1. Overall performances are the best for HMMgene and GENSCAN.
2. Some program’s accuracy depends on the G+C content, except for
HMMgene and GENSCAN, which use different parameters sets for different
G+C contents.
3. For almost all the tested programs, ”medium” exons (70-200 nucleotides
long), are most accurately predicted. Accuracy decrease for shorter and
longer exons, except for HMMgene.
4. Internal exons are much more likely to be correctly predicted (weakness of
the start/stop codon detection).
5. Initial and terminal exons are most likely to be missed completely.
6. Only HMMgene and GENSCAN have reliable scores for exon prediction.
Gene prediction limits
1.
2.
3.
4.
5.
6.
7.
Existing predictors are for protein coding regions
Non-coding areas are not detected (5’ and 3’ UTR)
Non-coding RNA genes are missed
Predictions are for ”typical” genes
Partial genes are often missed
Training sets may be biased
Atypical genes use other grammars
GenScan
• GENSCAN was developed by Chris Burge and
Samuel Karlin, Department of Mathematics, Stanford University
• Genscan is a general probabilistic model of the
gene structure of human genomic sequences.
• Genscan identifies complete exon/intron structures
of genes in both strands of genomic DNA.
• The new Genscan Web Server is at
http://genes.mit.edu/GENSCAN.html
• Genscan is also available for WEHI people at
http://www.wehi.edu.au/resources/PBC/index.html
with a greater choice of options.
Prediction of Complete Gene Structures in Human Genomic DNA. J. Mol. Biol. (1997) 268, 78-94
Comparitive Genomics
Quotes from the 50/50 series of
interviews by Bio-IT World
Gene Myers
Professor, Dept. of Electrical Engineering & Computer Sciences
University of California, Berkeley .
“If you take a sequence and just run a gene
prediction program on it, the programs don’t
usually do very well. But if you take human
and mouse sequence, and compare them
against each other — looking for similar
regions — you get better predictions. And
the more genomes we have, the better it will
get.”
Quotes from the 50/50 series of
interviews by Bio-IT World
Richard Durbin
Head of Informatics, Wellcome Trust Sanger Institute.
• “Looking at the similarity between the human
genome and other species is a really powerful way
to get at functional sequences and to allow us to
work on them in different species.”
• “Several groups, including ours, have gene-finding
methods for comparative genomics. This is an
active area where we will see significant advances
in the next few years.”
Comparative Genomics
•
The Assumption that underlies comparitive genomics is that the two
genomes had a common ancestor and that each organism is a combination
of the ancestor and the action of evolution.
•
Evolution can be broadly thought of as the combination of two processes:
mutational forces that generate random mutations in the genome
sequence, and selection pressures that
1. Eliminate random mutations (negative selection),
2. Have no effect on mutations (neutral selection) or,
2. Increase the frequency of mutant alleles in the population as a result
of a gain in fitness (positive selection).
•
The combined action of mutation and selection is represented generally by a
RATE MATRIX of base-pair changes between the two observed genomes.
Human
Comparative Genomics
Mouse
Rat
Evolutionary
relationship
between metazoans
that are sequenced,
or due for
sequencing.
Evolutionary
distances are in
millions of years.
C.Elegans
Comparitive Genomics
• Comparative genomics may be defined as the
derivation of genomic information following
comparison of the information content of 2 or
more species genome sequences
•
There is a good article in Nature Genetics Reviews, April
2003 Vol 4 No 4,pp251-262.
“Comparative Genomics: Genomice-Wide
analysis in Metazoan Eukaryotes”,
Ureta-Vidal, A. Laurence Ettwiller &
Ewan Birney 2003
•
http://www.nature.com/cgi-taf/DynaPage.taf?file=/nrg/journal/v4/n4/full/nrg1043_fs.html
The similarity is such that human chromosomes can be cut
(schematically at least) into about 150 pieces (only about 100 are
large enough to appear here), then reassembled into a reasonable
approximation of the mouse genome.
http://www.ornl.gov/TechResources/Human_Genome/graphics/slides/ttmousehuman.html
Comparitive Genomics
• …there has been an explosion in the
availability of tools which may make it
difficult to decide which tool is most
suitable for your research.
• Indeed, to interpret these resources, you
must be aware of the differences between
them and between their underlying
assumptions.
Whole Genome Alignments
K-browser http://hanuman.math.berkeley.edu/cgi-bin/kbrowser
A multiple genome browser, currently set up for human, mouse and rat
based on the MAVID alignments, UCSC genome browser.
Comparative Gene Prediction
SLAM
http://baboon.math.berkeley.edu/~syntenic/slam.html
Example of a comparative genefinder
Employs a generalised pair hidden Markov model
approach for predicting gene structures within
syntenic genomic sequences
Performing gene finding and alignment of the
sequences simultaneously
SLAM
SLAM has been used for whole genome annotation projects.
For the Mouse/Human analysis, SLAM used a human/mouse sytenny map,
giving segments which are further broken up into 300kb pieces.
These pieces are aligned by AVID .
SLAM then ran on all syntenic pieces using AVID alignments as guides.
Coding lengths < 120 were discarded.
SLAM also predicted conserved non coding regions(CNS), the first de novo
prediction of CNS in the human and mouse genome.
The results are available at
http://bio.math.berkeley.edu/slam/mouse/
A similar result is available for Human/Rat.
seq1 SLAM CDS 2421 2478 . + 2 gene_id "000001"; transcript_id "000001.1"; frame "1"; exontype "internal"
seq1 SLAM CDS 3127 3805 . + 1 gene_id "000001"; transcript_id "000001.1"; frame "1"; exontype "internal"
-------------------------------------------------------------------------------------------------------------------------------------------------------------seq2 SLAM CDS 2134 2191 . + 2 gene_id "000001"; transcript_id "000001.1"; frame "2"; exontype "internal"
seq2 SLAM CDS 2867 3545 . + 1 gene_id "000001"; transcript_id "000001.1"; frame "2"; exontype "internal“
-------------------------------------------------------------------------------------------------------------------------------------------------------------> Protein 1: (244,244) aa (incomplete protein)
Y
Z
...
1 KCEAIASDCF LSGNVDIELK DHNNCISKIN VEDQKNCALS WAFASIYHLE
CE IAS CF LSGNVDIE K D ++C S I
E+Q NC LS W F S HLE
1 TCERIASSCF LSGNVDIEWK DKSSCFSSIE TEEQGNCNLS WLFTSKTHLE
http://baboon.math.berkeley.edu/~syntenic/slam.html
50
50
TwinScan
• One of the first gene predictors to substantially
exceed the performance of GENSCAN on a
genomic scale by using mouse–human
comparison was TWINSCAN (Korf et al. 2001).
http://genes.cs.wustl.edu/query.html
Other Comparative Gene
Predicters
• DoubleScan http://www.sanger.ac.uk/cgibin/doublescan/submit
It is a program for comparative ab initio prediction of
protein coding genes in mouse and human DNA.
Generates exon candidates in both sequences.
• SPG-1....http://soft.ice.mpg.de/sgp-1
SGP-1 is a similarity based gene prediction
program. Given two genomic DNA sequences it
post-processes the pairwise local alignment to
predict single or multiple gene models of protein
coding genes in forward and reverse strands.
Regulatory Sequence
Regulatory Sequence
… Leroy Hood brought out this point in
his talk at the Bio2001 meeting in San
Diego (24–28 June 2001) with his statement
that
“The difference between man and
monkey is gene regulation.”
Quotes from the 50/50 series of
interviews by Bio-IT World
Lincoln Stein
Associate Professor, Cold Spring Harbor Laboratory .
“I think the places that we should be looking at
now are the non-repetitive, unique, noncoding DNA. … If they are conserved, they
must be important. There are discoveries in
there.”
Finding regulatory regions
rVISTA. . . . . . . . . . . . . . . . . . . . . . .
http://teapot.jgi-psf.org/ovcharen/rvista/index.html
Consite. . . . . . . . . . . . . . . . . . . . . . .
http://forkhead.cgb.ki.se/cgi-bin/consite
Footprinter. . . . . . . . . . . . . . . . . . .
http://abstract.cs.washington.edu/~blanchem/FootPrinterWeb/FootprinterInput.pl
Toucan. . . . . . . . . . . . . . . . . . . . . . .
http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php/
Trafac . . . . . . . . . . . . . . . . . . . . . . . .
http://trafac.chmcc.org/trafac/index.jsp
VISTA is a set of tools for comparative genomics. It was designed to visualize
long sequence alignments of DNA from two or more species with annotation
information.
The alignment engine behind VISTA. AVID is a program for globally
aligning DNA sequences of arbitrary length.
mVISTA (main VISTA) A program for visualizing alignments of an
arbitrary number of genomic sequences from different species
rVISTA (regulatory VISTA) combines transcription factor binding sites
database search with a comparative sequence analysis.
http://teapot.jgi-psf.org/ovcharen/rvista/index.html
rVista
http://teapot.jgi-psf.org/ovcharen/rvista/index.html
A program that combines transcription factor
binding site (TFBS) searches with
comparative sequence analysis.
At the first step, human and mouse sequences are aligned using the global
alignment program MAVID.
At the second step, potential transcription factor binding sites were predicted
by Match™ program based on TRANSFAC Professional 5.3 library.
At the third step, the human-mouse sequence conservation of a DNA
region spanning a transcription factor binding site was assessed using a
novel strategy.
Human and/or mouse annotation determine the genomic location of each
predicted transcription factor hit.
Finding Regulatory Regions
rVista
A program that combines transcription factor
binding site (TFBS) searches with comparative
sequence analysis.
ConSite
http://forkhead.cgb.ki.se/cgi-bin/consite
“Identification of conserved regulatory
elements by comparative genome analysis”
Boris Lenhard*†, Albin Sandelin*†, Luis
Mendoza*‡, Pär Engström*,
Niclas Jareborg*§ and Wyeth W Wasserman*¶
BioMed Central - Open Access
Journal of Biology
ConSite - Identification of conserved regulatory
elements by comparative genome analysis
• Consite is a web-based tool for detecting
transcription factor binding sites in
genomic sequences using phylogenetic
footprinting.
• Two orthologous genomic sequences are
aligned, and transcription factor binding
sites are only reported for those regions in
the alignment which transcend a certain
treshold of conservation.
ConSite
• The method is implemented as a graphical web
application, ConSite, which is at:
• http://forkhead.cgb.ki.se/cgi-bin/consite or
• http://www.phylofoot.org/
• Various tools are made available at phylofoot.org.
http://www.phylofoot.org/
Sequence View
http://www.phylofoot.org/
http://www.phylofoot.org/
Toucan
http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php
• Toucan is a workbench for regulatory
sequence analysis on metazoan genomes :
comparative genomics, detection of significant
transcription factor binding sites, and detection
of cis-regulatory modules (combinations of
binding sites) in sets of
coexpressed/coregulated genes.
• Standalone Java application that is tightly linked
with Ensembl, and was built using the BioJava
package
Perl – A Programming Language.
• What is Perl?
• Perl actually stands for
• Practical Extraction and Report Language,
and was invented by Larry Wall.
• Perl is supported by its users and was all
written by volunteers.
Programming Tools
Perl
• Perl is remarkably good for slicing, dicing,
twisting, wringing, smoothing,
summarizing and otherwise mangling text!
• Perl's powerful regular expression
matching and string manipulation
operators simplify this job in a way that is
unequalled by any other modern
language.
Perl & Genome Data
• Although genome informatics groups are
constantly tinkering with other "high level"
languages such as Python, Tcl and recently
Java, nothing comes close to Perl's popularity.
• “In short, when the genome project was
foundering in a sea of incompatible data
formats, rapidly-changing techniques, and
monolithic data analysis programs that were
already antiquated on the day of their release,
Perl saved the day.”
Lincoln Stein.
Perl one-Liners!
• Take a blast output and print all of the
gi's(Genbank Identifiers) matched, one per
line.
• Solution one line of Perl.
• perl -pe 'next unless ($_) =
/^>gi\|(\d+)/;$_.="\n"' filename.blast
Perl Modules/Programs
• Perl can be used for complex programs.
• The RepeatMasker program is written in Perl. It
calls other programs written in other
languages(Crossmatch written in C).
• Slipper is a 4500 line program written in Perl. It
calls Repeatmasker and Primer3 repeatedly and
processes the output files from them, writing
summarised results to disk.
SLiPPER
Sequence Length Polymorphism and Primer FindER
Programming: Keith Satterley, Specifications: Grant Morahan
Division of Bioinformatics & Genetics,
The Walter & Eliza Hall Institute of Medical Research
Slipper
•
•
•
•
Masks Alu etc. repeats (using RepeatMasker);
Selects SSLRs with user-specified parameters;
Designs primers (using Primer3)
Grant Morahan selects and tests chosen SSLR’s
to become Microsatelite Markers on the Mouse
Genome.
• To derive a first generation systematic map of the
mouse, with sub-centiMorgan (1Mb) resolution.
• Extend to 10 times this density over 50 strains.
UTILITY OF SLIPPER
40
Dimer repeat s
Mult imer repeat s
35
O* = polymorphic
O* B6 = NZO
30
Number
of SSRs
25
O*
20
O*
15
10
O*
O*
O*
O*
O*
O*
O*
5
O*
O*
0
0
200000
400000
600000
800000
1000000
1200000
Position on chromosome (bp)
1400000
Possible SLIPPER Data Analysis
• STRAIN RELATEDNESS AND EVOLUTION
-graphic depiction of allele sharing between strains
-probability of IBD v allele convergence by mutation
-comparison of close strain relatedness
(eg B6 v b10; B6 v Ka; D1 v D2; NOD v NOR)
-overall strain relatedness –> cladogram
-pairwise strain dimorphism rate
useful for choosing 2 strains to be used in a cross
-comparison of results for reduced strains set with MIT markers
- comparison of haplotypes with Phenome database
O|B|F - Open Bioinformatics
Foundation
• The Open Bioinformatics Foundation is a non profit,
volunteer run organization focused on supporting open
source programming in bioinformatics.
• The foundation grew out of the volunteer projects
Bioperl, BioJava and Biopython.
• Underwrites and supports the BOSC conferences.
• Organizing and supporting developer-centric
"hackathon" events.
• Managing servers, bank account & other assets.
Open Bioinformatics Foundation
• PROJECTS
• BioPerl
BioJava
BioPython
BioRuby
BioPipe
BioSQL / OBDA
MOBY
DAS
BioPathways†
EMBOSS†
Open Bioinformatics Foundation
•
June 27-28 2003 -- 4th Annual Bioinformatics
Open Source Conference
• www.open-bio.org/bosc/
ISMB 2003 - Brisbane
• Normally held in Europe and Nth. America.
• For 2 days beforehand
– BOSC(Open Source conference) .
– Biopathways, BioOntology, Text Mining & WEB03
• Tutorials on Sunday – choose 2 from 15 offered.
• ISMB for 4 days – over 50 no parallel talks!
http://www.iscb.org/ismb2003/index.shtml
The bioperl project
• Officially organized in 1995 and existing
informally for several years prior, The
Bioperl Project is an international
association of developers of open source
Perl tools for bioinformatics, genomics and
life science research.
What is BioPerl
• Bioperl is a tookit of perl modules useful in
building bioinformatics solutions in perl.
• It is built in an object-oriented manner
• The collection of modules can be used to
run a large range of Bioinformatics
programs and process their output files.
• There are modules to carry out analyses,
to graph data and to read many data
formats.
BioJava
http://www.biojava.org/
• The BioJava Project is an open-source
project dedicated to providing Java tools
for processing biological data.
• BioJava is a general bioinformatics toolkit.
It provides a framework for building
everything from simple scripts to complete
applications. BioJava is designed to be
used as a library.
BioJava
http://www.biojava.org/
•
•
Currently, there are objects for:
Sequences and features
–
–
–
•
Dynamic programming
–
–
–
–
•
Single-sequence and pair-wise HMMs
Viterbi-path, Forward and Backward algorithms
Training models
Sampling sequences from models
External file formats and programs
–
–
–
•
IO
Processing, storing, manipulating
Visualising
GFF
Blast
Meme
Sequence Databases
–
–
–
BioCorba interoperability
ACeDB client
DAS client
Other Open Source Projects.
• BioDAS - Distributed Annotation System
(DAS) - A server system for the sharing of
Reference Sequences.
• Biopython. tools for computational
molecular biology. Python(excellent
language for beginners, yet superb for
experts).
• BioRuby, BioSQL, MOBY,BioPathways
and BioOpera.
LINKS
Internet Resources
Prediction of exons and gene structure
SLAM....http://baboon.math.berkeley.edu/~syntenic/slam.html
SPG-1....http://soft.ice.mpg.de/sgp-1
TwinScan....http://genes.cs.wustl.edu
Finding regulatory regions by phylogenetic footprinting
Consite....http://forkhead.cgb.ki.se/cgi-bin/consite
rVISTA....http://teapot.jgi-psf.org/ovcharen/rvista/index.html
Toucan....http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php
Whole-genome alignments in genome browser
ECR browser....http://nemo.lbl.gov/ecrBrowser
Ensembl....http://www.ensembl.org
UCSC....http://genome.ucsc.edu
A comprehensive, straightforward Links Page – one of the best!
http://apollo11.isto.unibo.it/
Genome Links from Ewan Birney et al.
• Genome aligners
AVID....http://www-gsd.lbl.gov/vista/details_avid.htm
BLASTZ....http://bio.cse.psu.edu
BLAT....http://genome.ucsc.edu
Exonerate....http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/Exonerate.html
GLASS....http://crossspecies.lcs.mit.edu
LAGAN/MLAGAN....http://lagan.stanford.edu
MegaBLAST....http://www.ncbi.nih.gov/blast/tracemb.html
MUMmer....http://www.tigr.org/software/mummer
PatternHunter....http://www.bioinformaticssolutions.com/products/ph.php
WABA....http://www.cse.ucsc.edu/~kent/xenoAli/index.html
• Prediction of exons or coding regions
DIALIGN2....http://bibiserv.techfak.uni-bielefeld.de/dialign
ExoFish....http://www.genoscope.cns.fr/proxy/cgi-bin/exofish.cgi
OrthoSeq....http://www.phylofoot.org/cgi-bin/orthoseq.cgi
ROSETTA/GLASS....http://crossspecies.lcs.mit.edu
• Prediction of exons and gene structure
DoubleScan....http://www.sanger.ac.uk/Software/analysis/doublescan
SLAM....http://baboon.math.berkeley.edu/~syntenic/slam.html
SPG-1....http://soft.ice.mpg.de/sgp-1
TwinScan....http://genes.cs.wustl.edu
Genomics Web sites
•
•
•
•
•
•
•
•
•
Functional and Comparative Genomics Research - More technical information
on HGP involvement with comparative and functional genomics.
Virtual Library of Genetics - Links to genetic and genomic information
organized by organism.
Microbial Genome Program - U.S. Department of Energy program to study the
genetic material of microbes that may be useful in helping DOE fulfill its
missions.
DOE Joint Genome Institute - Consortium of U.S. Department of Energy
researchers developing and exploiting new technologies as a means for
discovering and characterizing the basic principles and relationships
underlying living systems.
A Quick Guide to Sequenced Genomes - Illustrated index of organisms that
have had their genomes sequenced. From the Genome News Network.
Model Organisms for Biomedical Research - Information on model organisms
from the National Institutes of Health.
Mouse Genome Resources - Gateway to mouse resources in and beyond
National Center for Biotechnology Information (NCBI) resources.
Functional Genomics - Gateway to functional genomics sources from Science.
Ecce homology: A primer on comparative genomics - From Modern Drug
Discovery, a publication of the American Chemical Society.
Image Gallery – links
http://www.ornl.gov/TechResources/Human_Genome/education/images.html
Image Gallery – links
http://www.ornl.gov/TechResources/Human_Geno
me/education/images.html
•
•
•
•
•
Gallery 1: Genome Science
Gallery 2: Genome Tools and Technologies
Gallery 3: Genomes to Life
Gallery 4: Human Genome Project
Gallery 5: Ethical, Legal, and Social Issues;
Genomic Medicine
http://www.ornl.gov/TechResources/Human_Geno
me/education/images.html
• Other Website Image Galleries and
Resources
• NIH NHGRI Press Photos
• CSHL Eugenics Archive
• RasMol Protein Gallery
• Photos of normal and abnormal chromosomes
• Access Excellence Graphics Gallery
• The Why Files Cool Image Gallery
• Genetics Animation Gallery
http://www.ornl.gov/TechResources/Human_Geno
me/education/images.html
• Molecular Expressions Photo Gallery
• Gene Maps
– 1999 Online Gene Map from NCBI.
– Clickable 1996 Gene Map from Science magazine.
You can click on any one of the 24 different human
chromosomes and see examples of genes found.
• Chromosome Maps
– human chromosome 16
– human chromosome 19
http://www.ornl.gov/TechResources/Human_Geno
me/education/images.html
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
U.S. Government Image Galleries
Argonne National Laboratory Photo Gallery
Brookhaven National Laboratory Image Library
Fermi National Laboratory Photo Database
Jefferson Laboratory Picture Exchange
Lawrence Berkeley National Laboratory Image Gallery
Lawrence Livermore National Laboratory Image Gallery
Los Alamos National Laboratory Photo Gallery
National Human Genome Research Institute Image Gallery
National Renewable Energy Laboratory Photo Library
Oak Ridge National Laboratory Image Gallery
Pacific Northwest National Laboratory Photo Gallery
Stanford Linear Accelerator Center Photo Archives
Sandia National Laboratory Photo Gallery
U.S. DOE Image Gallery
Acknowledgements
• WEHI Bioinformatics group
– Tim Beissbarth
– Alex Gout
– Terry Speed
– All the others in Bioinformatics who provide a
great environment to work in and with.
• Grant Morahan
• WEHI ITS - who provide the best infrastructure
of anywhere I know of.
Year by year we are becoming better equipped to
accomplish the things we are striving for.
But what are we actually striving for?
- Bertrand de Jouvenel, 1903-1987
Success is the ability to go
from failure to failure without
losing your enthusiasm.
- Winston Churchill, 18741965