Archivematica 1.8 is our latest release.

Dataverse Integration

Overview

Dataverse is an open source repository for research data. Archivematica can be configured to use a Dataverse Repository as a Transfer Source Location. Dataverse Transfer Source Locations can be configured to display all available datasets or a subset of them.

Datasets are retrieved directly using the Dataverse API and processed using the Dataverse transfer type, which enables some additional processing steps as described below.

Dataverse integration is supported in Archivematica 1.8 and above, and has been developed and tested using Dataverse version 4.8.6.

Important

As of Archivematica 1.8 there are a number of workflow limitations that users may need to be aware of. Namely:

  • Multiple authors are not captured in the Dataverse METS- only the first author listed is.
  • It is not possible to delete packages after extraction using the Dataverse transfer type

There are a number of other enhancements/improvements to the workflow that could be supported in a future release along with these two issues. Please see issues filed in GitHub.

On this page

Selecting Datasets for preservation

When a Dataverse Transfer Source Location is selected in the Transfer tab of the Dashboard, users can browse a list of available datasets. Selecting a directory icon will expand the view to display the list of files included in the dataset. Individual files can’t be selected for Transfer.

When a Dataverse dataset is selected, the transfer type ‘Dataverse’ must also be selected.

Dataset contents

Dataverse provides a metadata file called dataset.json that lists all of the files included in the dataset as well as other descriptive metadata.

When a dataset includes tabular data files, Dataverse creates derivative formats and additional metadata files. See the Dataverse guide describing how a tabular data file bundle works.

Archivematica detects tabular data file bundles and retrieves all derivative files and metadata files.

Processing Dataverse datasets

Archivematica creates a Dataverse METS file to describe the contents and structure of the dataset as retrieved from Dataverse. Archivematica also creates an agents.json file, that includes details of the Dataverse instance configured in the Storage Service. This information is used to populate the Dataverse PREMIS agent details in the AIP METS.

Fixity checks are conducted using any checksums provided by Dataverse. Other microservices are carried out as normal (and as configured in the processing configuration). The final AIP will contain descriptive metadata provided by Dataverse, attributes to indicate any derivatives generated by Dataverse, and attributes to indicate the outcome of fixity checks conducted using checksums provided by Dataverse.

Important

When you are processing a Dataverse dataset that includes packaged material (i.e. .zip or .tar files), Archivematica can extract the contents of these files and run preservation microservices on the contents. This occurs during Microservice: Extract packages on the Transfer tab. However, due to a known bug, you must not delete the packages after they have been extracted.

Dataverse METS file

Archivematica generates a Dataverse METS file that describes the contents of the dataset as retrieved from Dataverse. The Dataverse METS includes:

  • descriptive metadata about the dataset, mapped to the DDI standard
  • a <mets:fileSec> section that lists all files provided, grouped by type (original, metadata or derivative)
  • a <mets:structMap> section that describes the structure of the files as provided by Dataverse. This is particularly helpful for understanding which files were provided in a tabular data file bundle.

The Dataverse METS is found in the final AIP in this location: <AIP Name>/data/objects/metadata/transfers/<transfer name>/METS.xml (This is also where you will find the dataset.json metadata file provided by Dataverse, and the agents.json metadata file created by Archivematica).

AIP METS file

The Archival Information Package (AIP) METS file follows the basic structure for a standard Archivematica AIP METS file. Derivatives generated by Dataverse are indicated using the METS fileGrp attribute (where USE =“derivative”).

The descriptive metadata (dmdSecs) in the Dataverse METS file are copied over to the AIP METS file.

In the PREMIS Object entity, relationships between original and derivative tabular format files from Dataverse are described using PREMIS semantic units. A PREMIS derivation event indicates the derivative file was generated from the original file, and a Dataverse Agent indicates the Event was carried out by Dataverse prior to ingest, rather than by Archivematica.

Fixity checks that use checksums provided by Dataverse are recorded as PREMIS events using the eventOutcomeDetailNote attribute to indicate the source of the checksum.

Configuration

Integration with a Dataverse repository is configured in the Storage Service. For detailed instructions, see the Administrators Manual.

Back to the top