Difference between revisions of "Data Validation"
(Created page with "'''Introduction''' This section contains information on validating data for various bioinformatics tasks. == OSUC Database == === Invalid Locality Detection and Correction ===...") |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | ---- | ||
+ | |||
+ | |||
'''Introduction''' | '''Introduction''' | ||
This section contains information on validating data for various bioinformatics tasks. | This section contains information on validating data for various bioinformatics tasks. | ||
− | |||
== OSUC Database == | == OSUC Database == | ||
Line 12: | Line 14: | ||
Country-level invalid locality detection is accomplished by obtaining the bounding box region of a particular country from [http://www.geonames.org/ geonames] then gathering all of the localities within that country. When a coordinate for a locality is beyond the bounding box region of a country, the locality name is displayed for manual correction. | Country-level invalid locality detection is accomplished by obtaining the bounding box region of a particular country from [http://www.geonames.org/ geonames] then gathering all of the localities within that country. When a coordinate for a locality is beyond the bounding box region of a country, the locality name is displayed for manual correction. | ||
− | + | Go to the [[Invalid Locality Detection and Correction Schedule|schedule of past and future runs of invalid locality detection and correction]]. | |
+ | |||
+ | === OSUC Specimen Data Verification === | ||
+ | Using the ''Data_Validator'' package, run the ''listCollectionComposition'' procedure to get a breakdown of the composition of the number of specimens within the collection by order. The composition will dictate how many specimens for each order will be verified. After determining the number of specimens needed for an order, run the ''listRandomSpecimens'' procedure within the ''Data_Validator'' package to generate a random list of specimens from that order. Go to the taxon in which the specimen was identified, then randomly select a unit tray and specimen within the unit tray for the taxon. The specimen selected for random checking will probably not be the specimen generated from the database. Note: do not arbitrarily select, but rather randomly select a unit tray and specimen. | ||
+ | |||
+ | When the specimen data from a physical specimen does not match the database or the specimen is not in the database, an assessment of the situation must be made. Sometimes a whole unit tray or multi-UT taxon may need to have its specimens re-transcribed and entered, but most times the error will be minor and involve a single mistranscribed specimen. | ||
+ | |||
+ | Go to the [[OSUC Specimen Data Verification Schedule|schedule of past and future runs of OSUC specimen data verification]]. | ||
[[Category:OSUC Private]] | [[Category:OSUC Private]] | ||
[[Category:Data Validation]] | [[Category:Data Validation]] |
Latest revision as of 15:44, 10 November 2011
Introduction
This section contains information on validating data for various bioinformatics tasks.
OSUC Database
Invalid Locality Detection and Correction
Using the Data_Validator package, run the listInvalidLocalities procedure to correct locality coordinates for localities within US states and list invalid coordinates for countries.
US State-level locality correction begins by gathering all of the localities within each US state that is absent of coordinates. After gathering the offending localities, the midpoint coordinate for the state is obtained from geonames and set to all of the localities within that state without coordinates.
Country-level invalid locality detection is accomplished by obtaining the bounding box region of a particular country from geonames then gathering all of the localities within that country. When a coordinate for a locality is beyond the bounding box region of a country, the locality name is displayed for manual correction.
Go to the schedule of past and future runs of invalid locality detection and correction.
OSUC Specimen Data Verification
Using the Data_Validator package, run the listCollectionComposition procedure to get a breakdown of the composition of the number of specimens within the collection by order. The composition will dictate how many specimens for each order will be verified. After determining the number of specimens needed for an order, run the listRandomSpecimens procedure within the Data_Validator package to generate a random list of specimens from that order. Go to the taxon in which the specimen was identified, then randomly select a unit tray and specimen within the unit tray for the taxon. The specimen selected for random checking will probably not be the specimen generated from the database. Note: do not arbitrarily select, but rather randomly select a unit tray and specimen.
When the specimen data from a physical specimen does not match the database or the specimen is not in the database, an assessment of the situation must be made. Sometimes a whole unit tray or multi-UT taxon may need to have its specimens re-transcribed and entered, but most times the error will be minor and involve a single mistranscribed specimen.
Go to the schedule of past and future runs of OSUC specimen data verification.