Document 213323

727
How to submit nucleotide sequence data to the EMBL Data Library:
Information for Authors
l\i»Jhe EMBL Data Library, Postfach 10.2209, D-6900 Heidelberg, Federal Republic of Germany
ii I i ii
January 1990
1 The first step in getting an accession number
2 What to submit to the EMBL Data Library
Before doing anything else, authors should get a copy of
a sequence data submission form. This form solicits all
of the information needed to make a database entry;
that is, the primary sequence data together with descriptive information such as the source of the
sequenced segment (e.g., organism, strain, tissue) and
the location of interesting regions within the sequence
(e.g., coding regions, regulatory signals). It also contains information about data formats. The data submission form exists in both a paper and a computerreadable version; the latter can be completed using a
text editor. These versions are available from the
following sources:
A data submission should include the following (for
further details, see the data submission form itself):
(a) Paper form: printed at the end of this article, from
the Development editorial office and available
upon request from EMBL, GenBank® and the
DNA Databank of Japan (DDBJ) at the addresses
given in Appendix 2.
(b) Computer-readable form:
(1) With all releases of the EMBL and GenBank®
databases since January 1987 and with DDBJ
releases since January 1988.
(2) From EMBL by electronic mail (computer
network) via our file server. Anyone with access
to BITNET (either directly or via a gateway)
can send a request to the EMBL file server,
which will automatically return a copy of the
data submission form by electronic mail.
Instructions for using the EMBL file server are
given in Appendix I.
(3) From EMBL, on Macintosh or IBM-compatible
(5i" or 3£") floppy diskettes. Complete information on how to contact the EMBL Data
Library is given in Appendix II.
(4) From GenBank® via electronic mail or on
floppy diskette. For information on requesting
the form from GenBank® via Telenet, contact
David Benton (+1-415-962-7360). Researchers
in Japan can obtain the form by dialing up the
DDBJ computer system (0559-75-6026).
3 How to send data to the EMBL Data Library
(a) the sequence itself, in computer-readable form
(computer network mail, magnetic tape or IBMcompatible or Macintosh floppy diskette). Printouts will be accepted only if the authors have no
access to a computer.
(b) a completed data submission form for each submitted sequence. The form is available from the
sources listed in section l(a).
(c) a computer network address, a telex number or a
telefax number (advisable, to help speed things up,
but not required).
Data can be sent to the Data Library in one of several
ways:
(a) Electronic file transfer: files can be sent via computer network to [email protected].
This BITNET address can be reached directly (by
people at BITNET sites) or via various gateways
from Arpanet, Usenet, JANET, etc. Ask your local
network expert for help or phone us (+49-6221387-258).
(b) Telefax to Data Submissions, EMBL Data Library.
Our fax number is: +49-6221-387-306.
(c) Normal post. See address given in Appendix II.
4 How long will it take to get an accession
number?
We will process data submissions within 7 working days
of receipt and send authors notification of either what
accession number(s) their data have been assigned or
what additional information is needed. There are several things authors can do to minimise the time it takes
to get an accession number:
(a) Be sure that submissions include all the necessary
materials and that all relevant questions on the data
submission form have been answered.
728
EMBL Data Library
(b) Check the data to be sure that they do not contain
inconsistencies/errors (e.g., a stop codon in the
middle of a region listed on the form as an exon).
(c) Be sure to include either a computer network
address or a telex or telefax number. If this information is not provided, notification of accession
numbers will be sent by regular post. Telephoning
is costly and time-consuming, and the Data Library
will therefore not attempt to contact authors by
phone.
Although we will process data submissions as quickly as
we can, we strongly encourage authors to submit their
data at or before the time they begin writing the
manuscript, rather than once it is finished. This way we
can process the data while the manuscript is being
written, and authors will not have to delay submission
of their manuscript while they wait for notification of
their accession number.
It should be emphasised that authors are responsible
for communicating their accession number(s) to the
journal at the time they submit their manuscript; the
Data Library will not contact the journal.
5 Data security
The data submission form asks authors whether their
submitted data can be made available to the public
immediately or whether it should be withheld until
publication.
Appendix I. EMBL network file server
Computer users with access to BITNET (directly or via
a gateway) can obtain copies of the data submission
form, or of database entries, by sending commands to a
file server running on the VAXcluster at EMBL. The
file server facility is provided free of charge, though
users may have to meet some or all of the communication costs, depending on the accounting system of
their local computer service.
To use this facility, send file server commands (as
electronic mail) to the address NETSERV@EMBL.
BITNET. Each line of the mail message should consist
of a single file server command, and nothing else. The
mail can be sent over BITNET, or from any other
network which has a gateway into BITNET (e.g.,
JANET in the UK or ARPANET in the USA).
The most important file server command, to get users
started, is HELP. If the file server receives this command, it will return a help file to the sender, explaining
in some detail how to use the facility.
In order to send electronic mail to a BITNET
address, users must find out which command they have
to use on their own local machine and how they should
format the address [email protected].
Users who don't already know how to do this should
contact their local computer service, or if all else fails,
contact the Data Library and we will do our best to
help. Below are some examples which illustrate how to
send commands to the file server using a VAX/VMS
system that is a BITNET node running JNET software.
To send a HELP command to the file server, you could
use the operating system command MAIL as follows:
6 Updating your data
$ MAIL <filename> "JNET% ""NETSERV@EMBL"""
Once a database entry has been created from a submission, a copy is sent to the submittor for his/her
reference and for comments or corrections. However, it
often happens that the entry is correct when it is created
but, with the passage of time, becomes out of date: the
authors may make corrections to the sequence itself, or
may discover new features of the sequence. Since such
findings are generally not published, the only way to
keep entries correct and up to date is if the authors
communicate their new findings to the database. This
can be done by normal post or electronic mail to the
address given in Appendix II.
One type of update which merits separate mention is
that relating to citations. Most submissions represent
data not yet been accepted for publication, and therefore the journal citation is not available when the entry
is created. Adding this information at a later date
requires that the database staff identify which submissions correspond to which publications; while this is
often straightforward, it can also be problematic, especially if the journal does not print an accession
number in the article, or if the submitted and the
published data are not identical. We therefore strongly
encourage researchers to let us know when and where
and when data they have submitted to us are published.
where <filename> is the name of a file containing file
server commands.
To request help information the file should contain the
following command:
HELP
To request a copy of the data submission form, it should
contain the following GET command:
GET DATALIB: DATASUB.TXT
Users can also request specific sequences via the File
Server. Information on how to do this is provided in the
HELP file.
Appendix II. How to contact the nucleotide
sequence databases
EMBL Data Library:
(a) Computer network: [email protected] (for
data submissions); [email protected] (for questions requiring a personal response)
(b) Postal address: Data Submissions, EMBL Data
Information for Authors
Library, Postfach 10.2209, 6900 Heidelberg,
Federal Republic of Germany
(c) Telephone: +49-6221-387-258
(d) Telefax: +49-6221-387-306
(e) Telex: 461613 (embl d)
GenBank®:
(a) Computer network address: [email protected]
(b) Postal address: GenBank® Submissions, Mail Stop
K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
(c) Telephone: +1-505-665-2177
729
(d) Telefax: +1-505-665-3493
DNA Databank of Japan:
(a) Computer network: [email protected] (for
data submissions); [email protected] (for other
enquiries)
(b) Postal address: Laboratory of Genetic Information
Analysis, Center for Genetic Information
Research, National Institute of Genetics, Mishima,
Shizuoka 411, Japan
(c) Telephone: +81-559-75-0771 x647
(d) Telefax: +81-559-75-6040
730
EMBL Data Library
Sequence Data Submission Form
This form solicits the information needed for a nucleotide or amino acid sequence database entry. By completing and returning it
to us promptly you help us to enter your data in the database accurately and rapidly. These data will be shared among the
following databases: EMBL Data Library (Heidelberg, Federal Republic of Germany); GenBank (Los Alamos, NM, U.S.A.
and Mountain View, CA, U.S.A), DNA Data Bank of Japan (DDBJ; Mishima, Japan); National Biomedical Research
Foundation Protein Identification Resource (NBRF-PIR; Washington, D.C., U.S.A.); Martinsried Institute for Protein Sequence
Data (MIPS; Martinsried, Federal Republic of Germany) and International Protein Information Database in Japan (JEPID; Noda,
Japan).
Please answer all questions which apply to your data. If you submit 2 or more non-contiguous sequences, copy and fill out this
form for each additional sequence. Please include in your submission any additional sequence data which is not reported in your
manuscript but which has been reliably determined (for example, introns or flanking sequences). When submitting nucleic acid
sequences containing protein coding regions, also include a translation (SEPARATELY from the nucleic acid sequence). Then
send (1) this form, (2) a copy of your manuscript (if available) and (3) your sequence data (in machine readable form) to the
address shown below. Information about the various ways you can send us your data and about formats for the sequence data is
given in the following two sections.
Thank you.
SUBMITTING DATA TO THE EMBL DATA LIBRARY
We are happy to accept data submitted in any of the following ways: (1) Electronic Tile transfer: files can be sent via
computer network to: [email protected]. This BITNET/EARN address can be reached via various gateways from
Arpanet, Usenet, JANET, etc. Ask your local network expert for help or phone us. Please ensure that each line in your file is
not longer than 80 characters; longer lines often get truncated when they are sent. (2) Floppy disks: we can read Macintosh
and IBM-compatible diskettes. Please use the 'save as text only1 feature of your editor to save your sequence file, as otherwise
we might have difficulty processing it (3) Magnetic tapes: 9-track only (fixed-length records preferred); 800, 1600 or 6250
bpi (any blocksize); ASCII or EBCDIC character codes; any label type or unlabelled. Our address is:
EMBL Data Library Submissions
Computer network [email protected]
Postfach 10.2209
Telefax
(+49) 6221 387 306
D-6900 Heidelberg
Telephone (+49) 6221 387 258
Federal Republic of Germany
When we receive your data we will assign them an accession number, which serves as a reference that permanently identifies
them in the database. We will inform you what accession number your data have been given and we recommend that you cite
this number when referring to these data in publications.
If your manuscript has already been accepted for publication, the accession number can be included at the galley proof stage as a
note added in proof. So that we can process your data and inform you of your accession number before you
receive the galley proofs, please return this form to us as soon as possible. We suggest that the note added
in proof should read approximately as follows: The nucleotide sequence data reported will appear in the EMBL, GenBank and
DDBJ Nucleotide Sequence Databases under the accession number
."
A computer-readable version of this form is available on the distribution tapes of the EMBL Data Library from Release 11
onwards and on GenBank Releases 48 onwards. The BIONET National Computer Resource for Molecular Biology (Mountain
View, CA, U.S.A.) also has a copy. Feel free to use the computer-readable form rather than this printed one. In this case, the
form should be filled out with a text editor and sent via computer network or normal post to the address indicated above.
FORMATS FOR SUBMITTED DATA
We would appreciate receiving the sequence data in a form which conforms as closely as possible to the following standards.
Each sequence should include the names of the authors.
Each distinct sequence should be listed separately using the same number of bases/residues per line. The length of each
sequence in bases/residues should be clearly indicated.
Enumeration should begin with a "1" and continue in the direction 5' to 3' (or amino- to carboxy- terminus).
Amino acid sequences should be listed using the one-letter code.
Translations of protein coding regions in nucleotide sequences should be submitted in a separate computer file from the
nucleotide sequences themselves.
The code for representing the sequence characters should conform to the IUPAC-IUB standards, which are described in:
Nucl. Acids Res. 13: 3021-3030 (1985) (for nucleic acids) and J. Biol. Chem. 243: 3557-3559 (1968) and Eur. J.
Biochem 5: 151-153 (1968) (for amino acids).
El.5/11.89
Information for Authors
731
L GENERAL INFORMATION
Your last name
First name
Middle initials
Institution
Address
Computer mail address
Telex number
Telephone
Telefax number
On what medium and in what format are you sending us your sequence data? (see
[ ] electronic mail
[ ] diskette: computer
oneratine svstem
[ ] magnetic tape
record length
blocksize
[ ] 800
[ ] 1600
[ ] 6250
density
t ] ASCII
[ ] EBCDIC
character code
[ ] printed copy (please, ONLY if it is impossible to send us machine-readable
instructions on front page)
eriitnr
label tvoe
data)
H. CITATION INFORMATION
These data are
[ ] published
[ ] in press
[ ] submitted
[ ] in preparation
[ ] no plans to publish
authors
title of paper
journal
volume
first-last pages
year
Do you agree that these data can be made available in the database before they appear in print?
[ ] yes
[ ] no, they should be made available only after publication (estimated date:
Does the sequence which you are sending with this form include data that does not appear in the above citation?
[ ] no
[ ] yes, from position
to
[ ] base pairs OR [ ] amino acid residues
(If your sequence contains 2 or more such spans, use the feature table in section IV to indicate their positions)
If so, how should these data be cited in the database?
[ ] published
[ ] in press
[ ] submitted
[ ] in preparation
[ ] no plans to publish
authors
address (if different from that given in section I)
title of paper
volume
journal
first-last pages
year
List references to papers and/or database entries which report sequences overlapping with that submitted here.
first author
C2J/I1.89
journal, vol., pages, year and/or
database, accession number
732
EMBL Data Library
m . DESCRIPTION OF SEQUENCED SEGMENT
Wherever possible, please use standard nomenclature or conventions. If a question is not applicable to your sequence, answer
by writing N.A.; if the information is relevant but not available, write a question mark (7).
What kind of molecule did you sequence?
[
[
[
[
] genomic DNA
]cDNAtomRNA
] organelle DNA
] tRNA
[
[
[
[
]
]
]
]
(check all boxes which apply)
genomic RNA
[ ] virus
cDNA to genomic RNA
organelle RNA
please specify organelle
rRNA
[ ] snRNA
[ ] provirus
[ ] scRNA
[ ] other nucleic acid (please specify)
[ ] peptide:
[ ] sequence assembled by
[ ] partial:
length of sequence
[ ] overlap of sequenced fragments [ ] homology with related sequence
[ ] other (please specify)
[ ] N-terminal
or
[ ] C-terminal
or
[ ] internal fragment
[ ] base pairs
or
[ ] amino acid residues
gene name(s) (e.g., lacT)
gene product name(s) (e.g., beta-D-galactosidase)
Enzyme Commission number (e.g., EC 3.2.1.23)
gene product subunit structure (e.g., hemoglobin
The following items refer to the original source of the molecule you have sequenced.
organism (species) name (e.g., Escherichia coli; Mus musculus)
sub-species
strain (e.g., K12; BALB/c)
name/number of individual or isolate (e.g., patient 123; influenza virus A/PR/8#4)
developmental stage
[ ] germ line
haplotype
tissue type
[ ] rearranged
cell type
The following items refer to the immediate experimental source of the submitted sequence,
name of cell line (e.g., Hela; 3T3-L1)
library (type; name)
clone(s)
The following items refer to the position of the submitted sequence in the genome,
chromosome (or segment) name/number
map position
units: [ ] genome % or [ ] nucleotide number or [ ] other
Using single words or short phrases, describe the properties of the sequence in terms of:
its associated phenotype(s);
the biological/enzymatic activity of its product;
the general functional classification of the gene and/or gene product
macromolecules to which the gene product can bind (e.g., DNA, calcium, other proteins);
subcellular localization of the gene product;
any other relevant information.
Example (for viral erbB nucleotide sequence): transforming capacity, EGF receptor-related; tyrosine kinase; oncogene;
transmembrane protein.
C3.1/2.88
Information for Authors
733
IV. FEATURES OF THE SEQUENCE
Please list below the types and locations of all significant features experimentally identified within the sequence. Be sure
that your sequence is numbered beginning with " 1 . "
In the column marked
fill
in
type of feature (see information below)
number of first base/amino acid in the feature
number of last base/amino acid in the feature
x, if your numbers refer to positions of base pairs in a nucleotide sequence
x, if your numbers refer to positions of amino acid residues in a peptide sequence
method by which the feature was identified. E = experimentally, S = by similarity with known
sequence or to an established consensus sequence; P = by similarity to some other pattern, such as
an open reading frame
x, if feature is located on the nucleic acid strand complementary to that reported here
feature
from
to
bp
aa
id
comp
Significant features include:
regulatory signals (e.g., promoters, attenuators, enhancers)
transcribed regions (e.g., mRNA, rRNA, tRNA). (indicate reading frame if start and stop codons are not present)
regions subject to post-transcriptional modiftcaton (e.g., introns, modified bases)
translated regions
extent of signal peptide, prepropeptide, propeptide, mature peptide
regions subject to post-translational modification (e.g., glycosylated or phosphorylated sites)
other domains/sites of interest (e.g., extracellular domain, DNA-binding domain, active site, inhibitory site)
sites involved in bonding (disulfidc, thiolester, intrachain, interchain)
regions of protein secondary structure (e.g., alpha helix or beta sheet)
conflicts with sequence data reported by other authors
variations and polymorphisms
The first 2 lines of the table are filled in with examples.
If you think you will need more space than the table below provides, please photocopy this page before you fill it out.
Numbering for features on the sequence submitted here
feature
[ ] matches paper
[ ] does not match paper
from
to
EXAMPLE
TATA b o x
1
8
EXAMPLE
exon 1
9
264
C4.1/2.88
bp
aa
id
comp