XBio:D Roadmap

From xBio:D Wiki
Revision as of 12:56, 3 October 2018 by Nfj (talk | contribs)
Jump to navigation Jump to search

The Database

The core of the database is information found on the specimen labels: this includes place of collection, time of collection, who did the collecting, how the specimens were collected, the identification of the specimen. This is all linked together using the unique identifier for each specimen (the collecting unit ID). This ID then links to any information on where the specimen is deposited and any images (or other media) of the specimen. The database also stores information on the published literature.


IPT

The data in the database are publically available through our own portals. Additionally, the data are supposed to be regularly harvested and cached by data aggregators. These include the Global Biodiversity Information Facility (GBIF), iDigBio (iDigBio), and the SCAN network (SCAN). These aggregators do this by connecting to resources made available with the Integrated Publishing Toolkit (IPT), a Java program produced by GBIF. The scheme is that the database, at regular weekly intervals, produces a Darwin Core (DwC) file that contains the information we are sharing. Each resource we make available has a separate DwC file. We have, or intend to have anyway, a couple dozen such resources.


Specimage

First, this app is intended to be pronounced "spess-ee-maj", a mashup of the words "specimen" and "image." Fundamentally, this is simply an image management system. It differs from similar commercially available programs in that the specimen seen in each image is linked to its collecting unit ID. This ID then provides access to all of the information in the core database that is associated with that specimen. Specimage also has an upload function to add new images. During that process a thumbnail and a web-friendly JPG version of the original image are produced, and the user specifies the license under which the image may be distributed. The core database contains only pointers to the location of the actual images.


HOL

HOL - Hymenoptera On Line - is intended as a generic portal to the data we have. Text entered into the search box is interpreted in as many ways as possible: as a specimen ID, as the name of an organism, as a place name, as a person's name, etc. The results from these various options are presented as a series of tabs. Within each tab are expandable sections for predefined categories of information. Wildcards are accepted (% and _) as text input. Most of the information is live, i.e., directly extracted from the database and therefore as current as possible. Some summary information, however, is collated weekly and so may be slightly out-of-date.

OJ_Break API

OJ_Break is the name of the API used to interact with many (ultimately all!) of the web-based data portals. The output of OJ_Break are JSON objects that can then be parsed and formatted for display by Javascript code in the webpage.

bioguid.osu.edu

The Biodiversity Informatics Standards group (TDWG) has set up and maintains (???) a vocabulary for the basic kinds of information that we share. In some of the data portals (e.g., HNS) we have the option of delivering the information in RDF. The domain bioguid.osu.edu is intended as a resolution mechanism for the (hopefully) globally unique identifiers that we support. In its original formulation TDWG recommend the use of life sciences identifiers (LSIDs) as the format for these identifiers. The community has now generally abandoned that format, and instead opted for stable URLs. The resolver software should be able to handle both formats. Unfortunately, bioguid.osu.edu presently seems to be offline.

HNS

HNS is the Hymenoptera Name Server (HNS). Its function is to provide basic information associated with a taxonomic name. The code for this portal is actually compiled and stored within the database itself. It does not make use of the OJ_Break API.

osuc-mgr

The osuc-mgr (database manager) is a set of forms used to enter or edit information within the database. It is protected by a username/password combination, and roles for different users are specified. It uses the OJ_Break API. This app is designed for entering/editing individual pieces of information. For batch input of specimen information see the description of DEA below. Information in the database may be edited in osuc-mgr, but not deleted outright.

DEA

DEA, the Digital Entry Assistant (DEA), is intended as a means of batch upload of specimen collecting data. Users first transcribe specimen data into an Excel spreadsheet (template provided). Each row in the spreadsheet is a different specimen, and the columns are the attributes to be associated with the specimen. This spreadsheet is then used as the input to DEA. DEA is a set of Python scripts written in Django framework. Internally, a mySQL database stores information about data set uploads, status, etc. DEA takes the input, parses it, and then checks to see if the individual pieces of attribute data (such as the country in which a specimen was collected) are already in the database. If not, then the user is prompted to enter those missing pieces (via osuc-mgr). When all instances of missing data are resolved, DEA then manages the upload and proper storage of all of the information.