XBio:D Data Management

From xBio:D Wiki
Revision as of 16:10, 18 September 2015 by Nfj (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


xBio:D Data Management

This section provides instructions on performing common data management tasks on the xBio:D database that are not privy to general users. These tasks are not available through web applications and must either be performed by running a script or a database procedure.


Change Occurrence (Specimen) Record CUID

In the Oracle SQL Developer, turn DBMS_OUTPUT on, then run the procedure EXEC updateSpmID(old_cuid, new_cuid);. The procedure will only produce output when a problem occurred.


Remove Occurrence (Specimen) Record

In the Oracle SQL Developer, turn DBMS_OUTPUT on, then run the procedure EXEC removeSpm(cuid);. The procedure will only produce output when a problem occurred.


Merged/Remove Locality Name

In the Oracle SQL Developer, turn DBMS_OUTPUT on, then run the procedure EXEC change_locality(obsolete_loc_name, valid_loc_name);. The procedure removes the obsolete locality and places all of the occurrence records from the obsolete locality into the valid locality name. There is also an additional, optional parameter coll_date, which accepts the format DD-MON-YYYY, that will change the functioning of this procedure to change only those occurrence records that were collected on that particular date. If all of the occurrences were collected on that day, the locality is removed. The procedure will only produce output when a problem occurred.


Add Collection Records or Taxon Checklist as a Project to the xBio:D IPT

  1. Log into the database server (osuc.biosci.ohio-state.edu) via SSH and open DwC-A collection export script, export_dwca_collections.py. This script is set to run every Sunday at 8PM.
    1. ssh [username]@osuc.biosci.ohio-state.edu
    2. sudo nano /var/www/cgi-bin/export_dwca_collections.py
  2. Go to the handleDwCAExport function and add the taxon or collection you would like to export.
    1. The exportDwCA function of the OJ_Exporter package takes the collection Storage_Regime_ID as the identifier.
    2. The exportTaxonDwCA function of the OJ_Exporter package takes the taxon tnuid as the identifier.
  3. Execute the export_dwca_collections.py script immediately or wait until after the weekly export execution
  4. Download the DwC-A file created for the new collection or checklist
    1. Collection records example: http://osuc.biosci.ohio-state.edu/data_exports/dwca_1.zip
    2. Taxon checklist example: http://osuc.biosci.ohio-state.edu/data_exports/dwca_tax_195000.zip
  5. Create project in xBio:D IPT for the data you will be disseminating.
    1. Log into IPT at (xbiod.osu.edu/ipt/) and go to Manage Resources
    2. Scroll to the bottom to Create New Resource, specify a short name and resource type, then upload the previously exported file and create resource
    3. Fill in the metadata section of the resource with the appropriate values for provider, owner, etc.
  6. Log into the hymfiles server (hymfiles.biosci.ohio-state.edu) via Remote Desktop and open IPT DwC-A import script, populate_ipt.py. The Python script is located in C:\backup_scripts.
  7. Go to the handlePopulateIPT function and add the taxon or collection you would like to export.
    1. The numeric identifiers are the same as defined in the export step above
    2. The second parameter for the populate script is the resource short name specified above.
  8. Register and publish the new project. The resource will now be available on GBIF and can be shared with other data aggregators like iDigBio, VertNet, etc.


Specialized Scripts for DB processing Tasks

These scripts include DOI lookup for xBio:D publication, processing the gender of people, and uploading of PDFs into Zenodo. The scripts are located in /scripts on the Database server.


Aggregation Update

Many DB queries use tables that are aggregated from many separate tables to speed up processing. These aggregation tasks are performed automatically every Saturday under the DB job named aggregation_update. These tasks include the following:

  • processDescriberIDs - Goes through each taxonomic concept that does not already have a describer specified within the Protonym table and attempts to infer from an original description or replacement name literature annotation the author of the taxon. If a match is found, then the individual describer or describer party is added to the Protonym table.
  • update_people_stats - Builds the AGGR_Person_Stats table that contains the # of occurrences collected, # of papers published, # of determinations made, # of distinct events collected, and the total # of contributions for each person.
  • update_genera_stats - Builds the AGGR_Genus_Stats table that contains the # of occurrences determined, # of papers published, # of images taken, # of collections deposited, and distinct events collected for each tnuid.
  • update_place_children - Builds the AGGR_Place_Children table that contains a place along with each subordinate place at any lower position within the geopolitical hierarchy.
  • update_taxon_children - Builds the AGGR_Taxon_Children table that contains a tnuid along with each subordinate tnuid at any lower position within the taxonomic hierarchy. Species-group taxa are excluded as parent taxa within this table.
  • update_fossil_taxa - Builds the AGGR_Fossil_Taxa table that contains a tcid for each taxon concept that is a fossil. Fossil status is only applied to species-group taxa, so this procedure also added taxa in which all of its subordinate taxa are fossils.
  • update_recordings - Builds the AGGR_Taxon_Recordings table that contains a tnuid and recording info id for any recording that is contained within the taxon.
  • update_images - Builds the AGGR_Taxon_Images table that contains a tnuid and image id for any specimen or taxon image that is contained within the taxon. Species-group taxa are excluded as parent taxa within this table.
  • update_habitats - Builds the AGGR_Habitat table that contains a tnuid and habitat for any occurrence that is contained within the taxon. Species-group taxa are excluded as parent taxa within this table.
  • update_taxon_contributors - Builds the AGGR_Taxon_Contribs table that contains a tnuid, contributor, and contribution flags for every contribution from the contributor within the taxon. Species-group taxa are excluded as parent taxa within this table.
  • update_inst_tcids - Builds the AGGR_Inst_Tcids table that contains the tcids for every collection where an occurrence is present. Species-group taxa are excluded as parent taxa within this table.
  • update_associations - Builds the AGGR_Taxon_Assocs table that contains a tnuid and associated tnuid for every taxon that is contained within the taxon. Species-group taxa are excluded as parent taxa within this table.