Storage service- Administrator instructions

The Archivematica Storage Service allows the configuration of storage spaces associated with multiple Archivematica pipelines. It allows a storage administrator to configure what storage is available to each Archivematica installation, both locally and remote.

Home page of the Storage Service

On this page:

Storage Service entities and organization

Pipelines

A pipeline refers to a single installation of an Archivematica dashboard. The Storage Service can be used to configure spaces and locations across multiple Archivematica pipelines.

Spaces

A space models a specific storage device. That device might be a locally- accessible disk, a network share, or a remote system accessible via a protocol like FEDORA, SWIFT, DuraCloud, or LOCKSS. The space provides the Storage Service with configuration to read and/or write data stored within itself.

Packages are not stored directly inside a space; instead, packages are stored within locations, which are organized subdivisions of a space.

Locations

A location is a subdivision of a space. Each location is assigned a specific purpose, such as AIP storage, DIP storage, transfer source or transfer backlog, in order to provide an organized way to structure content within a space.

Packages

The Storage Service is oriented to storing packages. A “package” is a bundle of one or more files transferred from an external service; for example, a package may be an AIP, a backlogged transfer, or a DIP. Each package is stored in a location.

Archivematica Configuration

When installing Archivematica, options to configure it with the Storage Service will be presented.

Configuring the Storage Service during Archivematica installation.

If you have installed the Storage Service at a different URL, you may change that here.

The top button ‘Use default transfer source & AIP storage locations’ will attempt to automatically configure default Locations for Archivematica, register a new Pipeline, and generate an error if the Storage Service is not available. Use this option if you want the Storage Service to automatically set up the configured default values.

The bottom button ‘Register this pipeline & set up transfer source and AIP storage locations’ will only attempt to register a new Pipeline with the Storage Service, and will not error if not Storage Service can be found. It will also open a link to the provided Storage Service URL, so that Locations can be configured manually. Use this option if the default values not desired, or the Storage Service is not running yet. Locations will have to be configured manually before any Transfers can be processed, or AIPs stored.

If the Storage Service is running, the URL to it should be entered, and Archivematica will attempt to register its dashboard UUID as a new Pipeline. Otherwise, the dashboard UUID is displayed, and a Pipeline for this Archivematica instance can be manually created and configured. The dashboard UUID is also available in Archivematica under Administration -> General.

Change the port in the web server configuration

The storage services uses nginx by default, so you can edit /etc/nginx/sites-enabled/storage and change the line that says

listen 8000;

change 8000 to whatever port you prefer to use.

Keep in mind that in a default installation of Archivematica, the dashboard is running in Apache on port 80. So it is not possible to make nginx run on port 80 on the same machine. If you install the storage service on its own server, you can set it to use port 80.

Make sure to adjust the dashboard UUID in the Archivematica dashboard under Administration -> General.

Spaces

Storage Service spaces screen.

A storage Space contains all the information necessary to connect to the physical storage. It is where protocol-specific information, like an NFS export path and hostname, or the username of a system accessible only via SSH, is stored. All locations must be contained in a space.

A space is usually the immediate parent of the Location folders. For example, if you had transfer source locations at /home/artefactual/archivematica- sampledata-2013-10-10-09-17-20 and /home/artefactual/maildir_transfers, the Space’s path would be /home/artefactual/

Currently supported protocols are local filesystem, NFS, pipeline local filesystem, LOCKSS, DuraCloud, Arkivum, Fedora and Swift.

Some protocols require a staging path. This is a temporary location on the Storage Service server that is used when copying material from that service service (DuraCloud, Swift, etc) to another space within the Storage Service.

Local Filesystem

Local Filesystem spaces handle storage that is available locally on the machine running the storage service. Typically this is the hard drive, SSD or raid array attached to the machine, but it could also encompass remote storage that has already been mounted. For remote storage that has been locally mounted, we recommend using a more specific Space if one is available.

Fields

  • Path: Absolute path to the Space on the local filesystem
  • Size: (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.

NFS

NFS spaces are for NFS exports mounted on the Storage Service server, and the Archivematica pipeline.

Fields

  • Path: Absolute path the space is mounted at on the filesystem local to the storage service
  • Size: (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Remote name: Hostname or IP address of the remote computer exporting the NFS mount.
  • Remote path: Export path on the NFS server
  • Version: nfs or nfs4 - as would be passed to the mount command.
  • Manually Mounted: Check this if it has been mounted already. Otherwise, the Storage Service will try to mount it. Note: this feature is not yet available.

Pipeline Local Filesystem

Pipeline Local Filesystems refer to the storage that is local to the Archivematica pipeline, but remote to the storage service. For this Space to work properly, passwordless SSH must be set up between the Storage Service host and the Archivematica host.

For example, the storage service is hosted on storage_service_host and Archivematica is running on archivematica1 . The transfer sources for Archivematica are stored locally on archivematica1, but the storage service needs access to them. The Space for that transfer source would be a Pipeline Local Filesystem.

Note

Passwordless SSH must be set up between the Storage Service host and the computer Archivematica is running on.

Fields

  • Path: Absolute path to the space on the remote machine.
  • Size: (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Remote name: Hostname or IP address of the computer running Archivematica. Should be SSH accessible from the Storage Service computer.
  • Remote user: Username on the remote host

LOCKSS

Archivematica can store AIPs in a LOCKSS network via LOCKSS-O-Matic, which uses SWORD to communicate between the Storage Service and a Private LOCKSS Network (PLN).

Fields:

  • Size: (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Path: Absolute path to the space on the remote machine.
  • Staging path: Absolute path to a staging area. Must be UNIX filesystem compatible, preferably on the same filesystem as the path.
  • Service document IRI: URL of LOCKSS-o-matic service document IRI, eg. http://lockssomatic.example.org/api/sword/2.0/sd-iri
  • Content Provider ID: On-Behalf-Of value when communicating with LOCKSS-o-matic
  • Externally available domain: Base URL for this server that LOCKSS will be able to access. Generally this is the URL for the home page of the Storage Service.
  • Keep local copy? Check the box if you wish to store a local copy of the AIPs even after they are stored in LOCKSS.

Note

When creating a Location for a LOCKSS space (see below), the Purpose of the Location must be AIP Storage.

DuraCloud

Archivematica can use DuraCloud as an access protocol for the Storage Service in version 0.5 and higher. Typically one Storage Service space has a one to one relationship with a space within DuraCloud.

Fields:

  • Size: (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Path: Absolute path to the space on the remote machine. Normally left blank for DuraCloud implementations.
  • Staging path: Absolute path to a staging area. Must be UNIX filesystem compatible, preferably on the same filesystem as the path.
  • Host: Hostname of the DuraCloud instance, e.g. example.duracloud.org
  • User: Username to authenticate as
  • Password: Password to authenticate with
  • Duraspace: Name of the Space within DuraCloud

Arkivum

Archivematica can use Arkivum’s A-Stor as an access protocol in version 0.7 and higher. A-Stor can expose a CIFS share to the Storage Service so that the storage service can copy files to an A-Stor datapool for AIP storage, for example.

Add an entry to /etc/fstab on the Storage Service, then mount the A-Stor CIFS share.

Example:

//ARK00092/astor /mnt/astor cifs
defaults,guest,file_mode=0666,dir_mode=0777,uid=archivematica,gid
=archivematica,forcegid,forceuid,rw 0 1

In this example, ARK00092 is the name of the appliance and should be resolvable through DNS or be set as an entry in /etc/hosts.

Then, choosing Arkivum as the access protocol, create a new space in the Storage Service:

Fields

  • Size: (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Path: local path on the Storage Service machine to the CIFS share.

Example: /mnt/astor

  • Staging Path: Absolute path to a staging area. Must be UNIX filesystem compatible, preferably on the same filesystem as the path.

Example: /mnt/astor/archivematica1/tmp

  • Host: Arkivum appliance hostname or IP address with port.
  • Remote user: (Optional) Username on the remote machine accessible via passwordless ssh.
  • Remote name: (Optional) Name or IP of the remote machine.

Swift

OpenStack’s Swift is available as an access protocol in Storage Service 0.7 and higher. At this time, locations within Swift have been tested as AIP Storage, DIP Storage and Transfer Backlog. Using Swift as Transfer Source is possible, but under-tested at this time.

Fields

  • Size (Optional): Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Path: Absolute path to the space on the storage service machine.
  • Staging Path: Absolute path to a staging area. Must be UNIX filesystem compatible, preferably on the same filesystem as the path.
  • Auth url: URL to authenticate against
  • Auth version: OpenStack auth version
  • Username: Username to authenticate as. E.g. http://example.com:5000/v2.0/
  • Password: Password to authenticate with
  • Container: Name of the Swift container. To list available containers in your Swift installation, run swift list from the command line.
  • Tenant: The tenant/account name, required when connecting to an auth 2.0 system.
  • Region (Optional): Region in Swift.

Fedora via SWORD2

Fedora via SWORD2 is currently supported in the Storage Service as an Access Protocol to facilitate use of the Archidora plugin, which allows ingest of material from Islandora to Archivematica. This workflow is in beta testing as of Storage Service 0.7/Archivematica 1.4/Islandora 7.x-1.5.

Fields

  • Size (Optional): Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Path: Absolute path to the space on the storage service machine.
  • Staging Path: Absolute path to a staging area. Must be UNIX filesystem compatible, preferably on the same filesystem as the path.
  • Fedora user: Fedora user name (for SWORD functionality)
  • Fedora password: Fedora password (for SWORD functionality)
  • Fedora name: Name or IP of the remote Fedora machine

Locations

Storage Service locations screen.

A storage Location is contained in a Space, and knows its purpose in the Archivematica system. A Location is also where Packages are stored. Each Location is associated with a pipeline and can only be accessed by that pipeline.

Currently, a Location can have one of six purposes: Transfer Source, Transfer Backlog, Currently Processing, Storage Service Internal Processing, AIP Storage, or DIP Storage. Transfer source locations display in Archivematica’s Transfer tab, and any folder in a transfer source can be selected to become a Transfer. Transfer backlog stores transfers until such a time that the archivist contiues processing them. AIP storage locations are where the completed AIPs are put for long- term storage. Likewise, DIP storage is used for storing DIPs until such a time that they can be uploaded to an access system. During processing, Archivematica uses the currently processing location associated with that pipeline.

Only one currently processing location should be associated with a given pipeline. Likewise, there should only be one Storage Service Internal Processing location for each Storage Service installation. If you want the same directory on disk to have multiple purposes, multiple Locations with different purposes can be created.

Fields

  • Purpose: What use the Location is for
  • Pipeline: Which pipelines this location is available to.
  • Relative Path: Path to this Location, relative to the space that contains it.
  • Description: Description of the Location to be displayed to the user.
  • Quota: (Optional) Maximum size allowed for this space. Set to 0 or leave blank for unlimited.
  • Enabled: If checked, this location is accessible to pipelines associated with it. If unchecked, it will not show up to any pipeline.

Pipelines

Storage Service pipelines screen.

A pipeline is an Archivematica instance registered with the Storage Service, including the server and all associated clients. Each pipeline is uniquely identified by a UUID, which can be found in the dashboard under Administration -> General Configuration. When installing Archivematica, it will attempt to register its UUID with the Storage Service, with a description of “Archivematica on <hostname>”.

Fields

  • UUID: Unique identifier of the Archivematica pipeline
  • Description: Description of the pipeline displayed to the user. e.g. Sankofa demo site
  • Enabled: If checked, this pipeline can access locations associate with it. If unchecked, all locations will be disabled, even if associated.
  • Default Locations: If checked, the default locations configured in Administration -> Configuration will be created or associated with the new pipeline.

Packages

Storage Service packages screen.

A Package is a file that Archivematica has stored in the Storage Service, commonly an Archival Information Package (AIP). They cannot be created or deleted through the Storage Service interface, though a deletion request can be submitted through Archivematica that must be approved or rejected by the storage service administrator. To learn more about deleting an AIP, see Deleting an AIP.

Administration

Storage Service Administration screen part 1. Storage Service Administration screen part 2. Storage Service Administration screen part 3.

The Administration section manages the users and settings for the Storage Service.

Users

Only registered users can long into the storage service, and the Users page is where users can be created or modified.

The storage service has two types of users: administrative users, and regular users. The only distinction between the two types is for email notifications; administrators will be notified by email when special events occur, while regular users will not.

Settings

Settings control the behavior of the Storage Service. Default Locations are the created or associated with pipelines when they are created.

Pipelines are disabled upon creation? sets whether a newly created Pipeline can access its Locations. If a Pipeline is disabled, it cannot access any of its locations. By disabling newly created Pipelines, it provides some security against unwanted perusal of the files in Locations, or use by unauthorized Archivematica instances. This can be configured individually when creating a Pipeline manually through the Storage Service website.

Default Locations sets which existing locations should be associated with a newly created Pipeline, or which new Locations should be created for each new Pipeline. No matter what is configured here, a Currently Processing location is created for all Pipelines, since one is required. Multiple Transfer Source or AIP Storage Locations can be configured by holding down Ctrl when selecting them. New Locations in an existing Space can be created for Pipelines that use default locations by entering the relevant information.

How to Configure a Location

For Spaces of the type “Local Filesystem,” Locations are basically directories (or more accurately, paths to directories). You can create Locations for Transfer Source, Currently Processing, or AIP and DIP Storage.

To create and configure a new Location:

  1. In the Storage Service, click on the “Spaces” tab.
  2. Under the Space that you want to add the Location to, click on the “Create Location here” link.
  3. Choose a purpose (e.g. AIP Storage) and pipeline, and enter a “Relative Path” (e.g. var/mylocation) and human-readable description. The Relative Path is relative to the Path defined in the Space you are adding the Location to. For example, for the default Space, the Path is / so your Location path would be relative to that (in the example here, the complete path would end up being /var/mylocation).

Note

If the path you are defining in your Location doesn’t exist, you must create it manually and make sure it is writable by the archivematica user.

  1. Save the Location settings.
  2. The new location will now be available as an option under the appropriate options in the Dashboard, for example as a Transfer location (which must be enabled under the Dashboard “Administration” tab) or as a destination for AIP storage.

Back to the top