Do you know how the content from the collections actually reach the
Eropeana portal and could be then demonstrated for example as our content highlight?
Today we would like to explain to you, the OpenUp! technical architecture and
content workflow.
One of the main tasks in the
OpenUp! project is to harvest the standardised metadata of multimedia objects
of natural history data providers and to transform this data into the Europeana
schema. The transformed data is aggregated in the OpenUp! Metadata Database of
the Europeana Natural History Aggregator established by the OpenUp! project and
subsequently handed over to Europeana (Berendsohn &
Günstsch 2012).
Data or metadata?
We need to explain our view on
the term, ‘data’ and ‘metadata’ in the OpenUp! project. For example: Natural
History domain data is included in the metadata for multimedia objects
(physical object information). Metadata usually refers to the technical data of
a multimedia object, e.g. aperture, camera type, etc. However, Europeana calls
domain data (= records) related to the physical object by the term ‘metadata’,
and features this associated metadata along with the digital object. This
metadata is distributed under CC zero licence in Europeana – under the full
control of the provider. Only a minimum set of mandatory concepts is required
for Europeana (fig 1).
Fig.1) Green squares display the metadata provided to Europeana under CC0 (no copyright). The red square shows the data which is licensed under the CC framework including the thumbnail (preview) and the source object (indicated by link View). The icence used is indicated by icon. (Creative Commons licences) |
The OpenUp! architecture is
divided in two integral parts. The first
part addresses the data provision, including the set-up of the BioCASe Provider
software and the mapping to the domain standard ABCD (Access to Biological
Collections Data) and its extension EFG (Extension for Geosciences). The second
part is the Europeana Natural History Aggregator which builds the OpenUp!
Metadata Database, assures the transformation of the domain standard ABCD (EFG)
to the Europeana standard ESE (Europeana Semantic Elements) and enables the
harvest by Europeana.
The overall OpenUp! to Europeana
(technical aggregation) workflow consists of seven major steps that are
visualized in the following graphic (p. 19) and described bellow.
Workflow Description:
Content provider and coordination
(Steps 1–3):
The technical set-up for data
provision in OpenUp! can be used /is used to provide data to the GBIF
network.
Step1: Domain standard ABCD and its extension EFG
As the first step the multimedia
object associated metadata of the provider (collection data) is mapped to the
ABCD domain standard (zoology and botany) and its extension EFG (palaeontology,
mineralogy and anthropology). The mapping to the ABCD standard is carried out
using the BioCASe Provider Software. Finally the BioCASe Provider Software
serves as a web-interface for providing the data for harvesting.
Step2 (optional): Data quality check
Before harvest (Step 4),
providers can check their data with the Data Quality Toolkit,
which provides a service for automated testing of their data quality, e.g.
conformity of the data or check of scientific names against reference services.
After testing the data, providers can apply necessary changes in their source
data or in mapping between the database and the BioCase Software tool.
Step3: Compliance check and monitoring of data provision
Providers can check their mapping
and the correctness of the used concepts in the BioCASe Monitor
Service by attaching their data source access point URL to this
URL. Sample values for each concept are displayed and concept values are
counted on demand, which helps detecting inconsistencies or incorrect use of
concepts according to the ABCD
documentation.
Furthermore, the BioCASe Monitor
provides a compliance check for Europeana and displays error messages if
mandatory concepts for the ABCD to ESE transformation are missing. The
providers should assure they have a functional data source and correct mapping
before requesting a test-harvest. The OpenUp! Helpdesk
provides documentation and technical assistance for the setup of the BPS and
the ABCD (EFG) mapping, and assists the providers in troubleshooting, in close
collaboration with the BioCASe
Helpdesk and the GBIF team.
The progress in content provision
is monitored in the BioCASe Monitor Service by the coordination teams of the
content providing Work Packages 4 & 5 in OpenUp!.
Step4: (Test) Harvest
Once the mapping is quality
checked by the coordination teams of the content providing Work packages and
the OpenUp! Helpdesk, a test harvest with the GBIF Harvester, the HIT
(Harvesting and Indexed Toolkit), is initiated. Test results and valid content
is communicated back to the provider in order to allow for further adjustments.
Technical problems encountered during test-harvest are fixed in collaboration
with AIT, the OpenUp! Helpdesk and the BioCASe Helpdesk team. A harvest of the
entire data source is initiated after successful completion of the test-harvest
and confirmation by the provider.
The data provider can check the
visualization of their content in Europeana by the Europeana Content Checker
tool. This tool is also used by the WP coordination for a final quality check
and to detect issues in the display of data/content in Europeana. Encountered
problems in display of the data in the Europeana portal not related to the data
provided are communicated back to the Europeana.
Fig.2) OpenUp! technical architecture with main steps indicated |
Step5: HIT Harvest
The HIT Harvester stores bulks of
ABCD (EFG) records into the central aggregator OpenUp! metadata database. This
database stores only the metadata, including the URLs of the multimedia data.
Step6: ABCD (EFG) transformation to ESE
The metadata from the ABCD (EFG)
standard used by the natural history domain are transformed into ESE, which is
used as a cross-domain metadata standard in Europeana. The transformation is
carried out using Pentaho Data Integration (Penthaho Kettle). The mapping tool
picks up the metadata, transforms them and stores them in a metadata database.
Step7: OAI-PMH and Europeana harvest
The metadata are periodically
harvested by Europeana via a single OAI-PMH (The Open Archives Initiative
Protocol for Metadata Harvesting) access point at the metadata database.
Previews of multimedia objects for presentation and queries in the Europeana
portal are generated by Europeana from full object URLs given in metadata. This
is the final step in the workflow when providing data in the flat ESE standard.
This is the actual implemented
technical workflow. We will publish updated workflow including the semantic
enrichment and EDM transformation in next newsletters and on our Blog. Stay tuned!
References: