Data Transcription Procedures
Introduction
This section contains information on the practices for transcribing specimen data from specimens into a Microsoft Excel worksheet using the data entry template (File:Data Entry Template 08-Mar-2013.xls).
Contents
Procedures
Sort Specimens
Sort the specimens by collecting event (Locality, Date, Collecting Method, Collectors) for a taxon (scientific group). Do not mix up the taxa. If the specimen labels appear to be entirely heterogeneous, this step may be skipped.
Barcode Specimens
Apply the unique identifier (barcode) to the bottom of the specimen careful to avoid the encoded barcode area of the media. Some pins will have a blunted tip making it difficult to penetrate the barcode media. In this case, use a loose pin with a sharp tip to carefully pierce but not fully penetrate the media. The slight piercing of the barcode media will avoid potential "spinning" of the label in the future.
If you come upon a misprinted barcode whether it be the encoded part being smudged or a digit being unreadable, do not use! The barcode is used to uniquely identify the specimen unambiguously. If the barcode number is not correct, the reference in the database for that specimen will not be correct, thus causing a number of serious problems if not rectified quickly. Copying the correct barcode number is extremely important!
Save New File
Open the data entry template and save the file according to this standard: [most specific taxonomic name for all specimens in drawer]_[collection or location that the specimens belong to; empty if OSUC specimens]_[the date for today].xls
Examples:
- Scelio_AEIC_22-Mar-2007.xls
- Misc_Scelionidae_Colombia_17-Feb-2007.xls
- Misc_Coleoptera_06-Apr-2007.xls
Each drawer or box of specimens should be entered in a single file. Do not overwrite the data entry template file and always begin from the data entry template.
Transcribe Data
Transcribe the specimen labels into the Raw Data worksheet of the spreadsheet. Each label of the specimen will correspond to the numbered label columns in the worksheet with the top-most label being label1. Copy all of the label data as is including any misspellings, incorrect information, or apparently extraneous characters. The goal of transcribing specimen labels is to have as close to an electronic reproduction of the labels as possible. Any additional information about the specimen that does not have its own column such as condition, preservation medium, etc. should be added into the comments column. Any supplementary specimen identification information should also be included in the comments columns to avoid the loss of any useful information.
Any of the columns that are not needed for transcription (i.e. type, life_status, etc.) can be removed with a few exceptions, but new columns cannot be added. If a column is inadvertently removed, the removed column can be re-added but must bear the exact same column name. Reference the data entry template for the original column names. A few columns are required to be present within a data entry spreadsheet: date, name, comments, at least one identification column, and all of the label columns. Do not remove these columns.
See data entry template information and additional transcription notes.
Specimen Container Marking
When transcription of all specimens is completed, place a Post-it Note or another form of adhered stationery on the finished drawer or box. Finding a specimen post-transcription is far easier when the specimen container is marked after specimen transcription. There will always be transcription mistakes, so finding offending specimens quickly and easily is essential.
Mark the specimen container in the following format:
- Into Excel: [date finished]
- By: [your name]
- Filename: [data entry filename]
Example:
- Into Excel: 18-Apr-2007
- By: Joe
- Filename: Misc_Scelionidae_USNM_15-Apr-2007.xls
Data Entry Template Information
Column Name | Description | Required |
---|---|---|
date | Date the record was recorded | yes |
name | Name of the person who recorded the record | yes |
order | Order in which the record is identified | no; unless order is the most specific identification |
superfamily | Superfamily in which the record is identified | no; unless superfamily is the most specific identification |
family | Family in which the record is identified | yes; unless the most specific identification is broader than family (only to order or superfamily) |
subfamily | Subfamily in which the record is identified | no; unless subfamily is the most specific identification |
genus | Genus in which the record is identified | sometimes; required for genus, species, and subspecies-level identifications |
species | Specific epithet in which the record is identified | sometimes; required for species, and subspecies-level identifications |
subspecies | Subspecific epithet in which the record is identified | sometimes; required for subspecies-level identifications |
cuid | Unique specimen identifier (aka collecting unit identifier) for a record usually in the form 'collection code' 'specimen number' (e.g. 'OSUC 213455') | yes |
alt_ids | Alternative specimen identifier for a record. Unlike a cuid, these identifiers do not need to be globally unique but must be used specifically for the record and not a "lot" or group of specimens (e.g. '145a Mitch', 'DNA Specimen #12'). Multiple identifiers must be separated by a semicolon then a space (e.g. '745 platy; Browner #2'). | no |
inst | Institution code of the depository for the specimen or lot | no; default: OSUC |
type | Type status for the record (e.g. 'Holotype', 'Syntype', 'Paratype') | no; default: non-type |
life_status | Life stage for the record (e.g. 'nymph', 'larva', 'egg') | no; default: adult |
preparations | The storage vessel (preparation type) and contents contained within the vessel (preparation contents) are collectively considered to be a preparation (see this Darwin Core term for additional info). Each preparation in the preparations column must be separated by a semicolon. A preparation can be a preparation type alone (e.g. 'slide', 'slide; jar'), a pipe (|) separated list of number of preparation type items, preparation type, then preparation contents (e.g. '1|jar|ethanol', '|slide|skin; 2|box|partial skeleton'), or a mix of both (e.g. 'slide; 2|jar|ethanol'). When not specified, number of preparation type items defaults to 1. See Preparation Terms for available preparation types and contents. When specimen groups are defined in spm_num, this column is ignored | no |
spm_num | Number of specimens associated with the record. When more than one specimen are identified to be of different sexes, this column will contain the number of specimens and their respective sex separated by a space (e.g. '10-F 5-M 1-U'). When a varied set of number of specimens, specimen sexes, life stages and preparations are needed, this contain will contain those granular specimen groups that must be formatted as so according to their column names: spm_num|spm_sex|life_status[preparations]. Using this format, the spm_num column must be numeric or blank, and if preparations are not necessary for a specimen group, the brackets can be omitted. Multiple specimen groups must be separated by a semicolon. Examples of specimen groups within spm_num column: '|F|larva;2||nymph[slide;1|jar|ethanol]', '1|M|adult[vial];2|U|egg[vial;slide]', '1|U|deutonymph;2|F|adult;1|M|adult' | no; default: 1 |
spm_sex | Specimen sex (M = male, F = female, U = unsexed) for all of the specimens associated with the record. When more than one specimens are determined of different sex, this column is ignored | no; default: U |
comments | Comments on the condition of the specimen, additional information not present on the record labels, or any other information that is desired to be associated with the record | yes; may be blank |
determiner | Determiner of the record with last name first, a comma, then his/her initials (e.g. 'Johnson, N F') | no |
det_date | Determination date of the record (year only) | no; default: 'current year' |
Label1-8 | Verbatim label data attached to the medium containing the specimen or lot | yes; may be blank |
new_comments | Excel formula for concatenating the label data for database entry. Ignore this column! | yes; DO NOT modify this column |
DEA Processed Specimen Data Template Information
Main worksheet
Column Name | Description | Required |
---|---|---|
date | Date the record was recorded | yes |
name | Name of the person who recorded the record | yes |
cuid | Unique specimen identifier (aka collecting unit identifier) for a record usually in the form 'collection code' 'specimen number' (e.g. 'OSUC 213455') | yes |
alt_ids | Alternative specimen identifier for a record. Unlike a cuid, these identifiers do not need to be globally unique but must be used specifically for the record and not a "lot" or group of specimens (e.g. '145a Mitch', 'DNA Specimen #12'). Multiple identifiers must be separated by a semicolon then a space (e.g. '745 platy; Browner #2'). | no |
inst | Institution code of the depository for the specimen or lot | yes |
type | Type status for the record (e.g. 'Holotype', 'Syntype', 'Paratype') | no; default: non-type |
spm_num | see spm_num notes above under Data Entry Template Information. | no; default: 1 |
spm_sex | Specimen sex (M = male, F = female, U = unsexed) for all of the specimens associated with the record. When multiple sexes or specimen groups are defined in spm_num, this column is ignored | no; default: U |
life_status | Life stage for the record (e.g. 'nymph', 'larva', 'egg'). When specimen groups are defined in spm_num, this column is ignored | no; default: adult |
preparations | see preparations notes above under Data Entry Template Information. | no |
new_comments | Comments on the condition of the specimen, additional information not present on the record labels, or any other information that is desired to be associated with the record. This column also contains the verbatim specimen labels with the transliteration of each label enclosed within square brackets ([...]) and separated by a space | yes; may be blank |
family | Family in which the record is identified | yes; identifications to a rank higher than order must be replaced with the taxon ID in the format 'id=tnid' |
subfamily | Subfamily in which the record is identified | no; unless subfamily is the most specific identification |
genus | Genus in which the record is identified | sometimes; required for genus, species, and subspecies-level identifications |
species | Specific epithet in which the record is identified | sometimes; required for species, and subspecies-level identifications |
subspecies | Subspecific epithet in which the record is identified | sometimes; required for subspecies-level identifications |
determiner | Determiner of the record with last name first, a comma, then his/her initials (e.g. 'Johnson, N F'). If complete given names are known, they may be included (e.g. 'Johnson, Norman F') | no |
det_date | Determination date of the record (year only) | no; default: 'current year' |
stated_loc & stated_date | These columns are present for legacy reasons. Ignore these columns | no; deprecated |
loc_name | A locality name that exactly matches an existing locality name within the database | yes |
coll_date | The specific date (day, month and year) a specimen was collected (in Excel date format). 'start_date', 'end_date', and 'period' must be blank | yes |
start_date | The specific starting date (day, month and year) a specimen was collected within a range(in Excel date format). 'coll_date' and 'period' must be blank; 'end_date' must be present and later than 'start_date' | yes |
end_date | The specific ending date (day, month and year) a specimen was collected within a range(in Excel date format). 'coll_date' and 'period' must be blank; 'start_date' must be present and earlier than 'end_date' | yes |
period | The non-specific date (missing a day, month or year) a specimen was collected (in text format). 'coll_date', 'start_date', and 'end_date' must be blank | yes |
coll_method | The collecting method used to collect the specimen (e.g. 'by hand', 'trap', etc.) | yes |
collector1 | The first collector of the specimen with last name first, a comma, then his/her initials (e.g. 'Johnson, N F'). If complete given names are known, they may be included (e.g. 'Johnson, Norman F') | |
collector2 | The second collector of the specimen with last name first, a comma, then his/her initials (e.g. 'Johnson, N F'). If complete given names are known, they may be included (e.g. 'Johnson, Norman F') | |
collector3 | The third collector of the specimen with last name first, a comma, then his/her initials (e.g. 'Johnson, N F'). If complete given names are known, they may be included (e.g. 'Johnson, Norman F'). If more than three collectors were given, use 'et al.' as the third collector and include the rest of the collecting party within the 'new_comments' column | |
field_code | The field code, field number, or other term used to define the code that is used to identify a given collecting event (e.g. 'MA-02-18A-10', 'BJZ 2007SEP1a', etc.) | yes |
habitat | The habitat in which a specimen was collected. Habitat should omit distinct, definite biological associations for a particular organism which should go into the associations part of this worksheet | yes |
assoc_cuid | Unique specimen identifier (aka collecting unit identifier) for the vouchered biological association usually in the form 'collection code' 'specimen number' (e.g. 'OSUC 213455'). Not required for unvouchered associations | yes |
assoc_inst | Institution code of the depository for the vouchered biological association. Not required for unvouchered associations | yes |
association_type | The type of biological association between the record and the association (e.g. 'collected on / collecting site of') | yes |
assoc_order | Order in which the biological association is identified | yes |
assoc_family | Family in which the biological association is identified | yes |
assoc_genus | Genus in which the biological association is identified | yes |
assoc_species | Species in which the biological association is identified | yes |
assoc2_cuid | Unique specimen identifier (aka collecting unit identifier) for the second vouchered biological association usually in the form 'collection code' 'specimen number' (e.g. 'OSUC 213455'). Not required for unvouchered associations | yes |
assoc2_inst | Institution code of the depository for the second vouchered biological association. Not required for unvouchered associations | yes |
association2_type | The type of biological association between the record and the second association (e.g. 'collected on / collecting site of') | yes |
assoc2_order | Order in which the second biological association is identified | yes |
assoc2_family | Family in which the second biological association is identified | yes |
assoc2_genus | Genus in which the second biological association is identified | yes |
assoc2_species | Species in which the second biological association is identified | yes |
Transcription Notes
- If a specimen has an ambiguous identification, use the parent taxon above the uncertain taxon. Examples: Paratelenomus / Telenomus => Telenominae; ? Maamingidae => Diaprioidea; nr. Parascelio => Scelioninae; Trissolcus cf. basalis => Trissolcus; Telenomus n.sp. => Telenomus
- A determination label at the top left of the unit tray applies to all of the specimens subsequent to that label unless a noticeable demarcation is present. This means that the determiner and the determination date will apply for all of the specimens as well as the taxonomic name.
- If an ambiguous identification is used as defined above, put the determination in the comments column. The exception to this rule is for a specimen with a determination label attached to it. Since the determination label will be transcribed along with the rest of the labels, only the first specimen will be missing the determination in the comments for a determined series. Example:
genus species ... comments ... label2 Breviscelio ... ... Breviscelio n.sp., NF Johnson, 2007 Breviscelio ... Breviscelio n.sp. ...
- Every few specimens that have different labels, go back and re-read the label data to see if any obvious transcription mistakes were made. If you have any doubts about the veracity of a transcription, go back to the specimen to assure that the copied label data is correct. Quality always takes a priority over quantity.