Upgrade from Archivematica 1.11.x to 1.12.1¶
On this page:
- Clean up completed transfers watched directory
- Create a backup
- Upgrade Ubuntu package install
- Upgrade CentOS/Red Hat package install
- Upgrade in indexless mode
- Upgrade with output capturing disabled
- Update search indices
Note
While it is possible to upgrade a GitHub-based source install using ansible, these instructions do not cover that scenario.
Clean up completed transfers watched directory¶
Note
Ignore this section if you upgrading from Archivematica 1.11.
Upgrading from Archivematica 1.10.x or older to Archivematica 1.12.1 can result in a number of completed transfers appearing as failed in the Archivematica dashboard, as well as corresponding failure notification emails being sent. These are not actual failures, but are unintentional side effects of changes made in Archivematica 1.11 to the workflow and to how metadata files are stored and copied into the SIP.
To prevent these failures from occuring during an upgrade from Archivematica 1.10 or earlier:
Confirm that all transfers and ingests are complete.
Check that there are no transfers or SIPs that are still being processed or awaiting decisions in the Transfer and Ingest tabs. If there are, finish processing the transfers/ingests before proceeding.
Delete all contents of the completedTransfers watched directory.
sudo rm -rf /var/archivematica/sharedDirectory/watchedDirectories/SIPCreation/completedTransfers/*
Perform the upgrade as described below.
Create a backup¶
Before starting any upgrade procedure on a production system, we strongly recommend backing up your system. If you are using a virtual machine, take a snapshot of it before making any changes. Alternatively, back up the file systems being used by your system. Exact procedures for updating will depend on your local installation. At a minimum you should make backups of:
- The Storage Service SQLite (or MySQL) database
- The dashboard MySQL database
This is a simple example of backing up these two databases:
sudo cp /var/archivematica/storage-service/storage.db ~/storage_db_backup.db
mysqldump -u root -p MCP > ~/am_backup.sql
If you do not have a password set for the root user in MySQL, you can take out the ‘-p’ portion of that command. If there is a problem during the upgrade process, you can restore your MySQL database from this backup and try the upgrade again.
If you’re upgrading from Archivematica 1.8 or lower to the 1.9 version or higher, the Elasticsearch version support changed from 1.x to 6.x and it’s also recommended to create a backup of your Elasticsearch data, especially if you don’t have access to the AIP storage locations in the local filesystem.
You can follow these steps in order to create a backup of Elasticsearch:
# Remove and recreate the folder that stores the backup
sudo rm -rf /var/lib/elasticsearch/backup-repo/
sudo mkdir -p /var/lib/elasticsearch/backup-repo/
sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backup-repo/
# Allow elasticsearch to write files to the backup
echo 'path.repo: ["/var/lib/elasticsearch/backup-repo"]' |sudo tee -a /etc/elasticsearch/elasticsearch.yml
# Restart ElasticSearch and wait for it to start
sudo service elasticsearch restart
sleep 60s
# Configure the ES backup
curl -XPUT "localhost:9200/_snapshot/backup-repo" -H 'Content-Type: application/json' -d \
'{
"type": "fs",
"settings": {
"location": "./",
"compress": true
}
}'
# Take the actual backup, and copy it to a safe place
curl -X PUT "localhost:9200/_snapshot/backup-repo/am_indexes_backup?wait_for_completion=true"
cp /var/lib/elasticsearch/backup-repo elasticsearch-backup -rf
For more info, refer to the ElasticSearch 6.8 docs.
Upgrade on Ubuntu packages¶
Update the operating system.
sudo apt-get update && sudo apt-get upgrade
Update package sources.
In Ubuntu 16.04:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list echo 'deb [arch=amd64] http://packages.archivematica.org/1.12.x/ubuntu xenial main' >> /etc/apt/sources.list echo 'deb [arch=amd64] http://packages.archivematica.org/1.12.x/ubuntu-externals xenial main' >> /etc/apt/sources.list
Optionally you can remove the lines referencing packages.archivematica.org/1.11.x from /etc/apt/sources.list.
In Ubuntu 18.04:
echo 'deb [arch=amd64] http://packages.archivematica.org/1.12.x/ubuntu bionic main' >> /etc/apt/sources.list echo 'deb [arch=amd64] http://packages.archivematica.org/1.12.x/ubuntu-externals bionic main' >> /etc/apt/sources.list
Optionally you can remove the lines referencing packages.archivematica.org/1.11.x from /etc/apt/sources.list.
Update the Storage Service.
sudo apt-get update sudo apt-get install archivematica-storage-service
Update Archivematica. During the update process you may be asked about updating configuration files. Choose to accept the maintainers versions. You will also be asked about updating the database - say ‘ok’ to each of those steps. If you have set a password for the root MySQL database user, enter it when prompted.
sudo apt-get install archivematica-common sudo apt-get install archivematica-dashboard sudo apt-get install archivematica-mcp-server sudo apt-get install archivematica-mcp-client
Restart services.
sudo service archivematica-storage-service restart sudo service gearman-job-server restart sudo service archivematica-mcp-server restart sudo service archivematica-mcp-client restart sudo service archivematica-dashboard restart sudo service nginx restart
Depending on your browser settings, you may need to clear your browser cache to make the dashboard pages load properly. For example in Firefox or Chrome you should be able to clear the cache with control-shift-R or command-shift-F5.
Upgrade on CentOS/Red Hat packages¶
Upgrade the repositories for 1.12:
sudo sed -i 's/1.11.x/1.12.x/g' /etc/yum.repos.d/archivematica*
Remove the current installed version of ghostscript:
sudo rpm -e --nodeps ghostscript ghostscript-x11 \ ghostscript-core ghostscript-fonts
Upgrade Archivematica packages:
sudo yum update
Once the new packages are installed, upgrade the databases for both Archivematica and the Storage Service. This can be done with:
sudo -u archivematica bash -c " \ set -a -e -x source /etc/default/archivematica-dashboard || \ source /etc/sysconfig/archivematica-dashboard \ || (echo 'Environment file not found'; exit 1) cd /usr/share/archivematica/dashboard /usr/share/archivematica/virtualenvs/archivematica-dashboard/bin/python manage.py migrate --noinput "; sudo -u archivematica bash -c " \ set -a -e -x source /etc/default/archivematica-storage-service || \ source /etc/sysconfig/archivematica-storage-service \ || (echo 'Environment file not found'; exit 1) cd /usr/lib/archivematica/storage-service /usr/share/archivematica/virtualenvs/archivematica-storage-service/bin/python manage.py migrate ";
Restart the Archivematica related services, and continue using the system:
sudo systemctl restart archivematica-storage-service sudo systemctl restart archivematica-dashboard sudo systemctl restart archivematica-mcp-client sudo systemctl restart archivematica-mcp-server
Depending on your browser settings, you may need to clear your browser cache to make the dashboard pages load properly. For example in Firefox or Chrome you should be able to clear the cache with control-shift-R or command-shift-F5.
Upgrade on Vagrant / Ansible¶
This upgrade method will work with Vagrant machines, but also with cloud based virtual machines, or physical servers.
Connect to your Vagrant machine or server
vagrant ssh # Or ssh <your user>@<host>
Install Ansible
sudo pip install ansible
Checkout the deployment repo:
git clone https://github.com/artefactual/deploy-pub.git
Go into the appropiate playbook folder, and install the needed roles
Ubuntu 16.04 (Xenial):
cd deploy-pub/playbooks/archivematica-xenial ansible-galaxy install -f -p roles/ -r requirements.yml
Ubuntu 18.04 (Bionic):
cd deploy-pub/playbooks/archivematica-bionic ansible-galaxy install -f -p roles/ -r requirements.yml
Centos 7:
cd deploy-pub/playbooks/archivematica-centos7 ansible-galaxy install -f -p roles/ -r requirements.yml
All the following steps should be run from the respective playbook folder for your operating system.
Verify that the vars-singlenode.yml has the appropiate contents for Elasticsearch and Archivematica, or update it with your own
Create a hosts file.
echo 'am-local ansible_connection=local' > hosts
Upgrade Archivematica running
ansible-playbook -i hosts singlenode.yml --tags=elasticsearch,archivematica-src
Upgrade in indexless mode¶
As of Archivematica 1.7, Archivematica can be run in indexless mode; that is,
without Elasticsearch. Installing Archivematica without Elasticsearch, or with
limited Elasticsearch functionality, means reduced consumption of compute
resources and lower operational complexity. By setting the
archivematica_src_search_enabled
configuration attribute, administrators can
define how many things Elasticsearch is indexing, if any. This can impact
searching across several different dashboard pages.
Upgrade your existing Archivematica pipeline following the instructions above.
Modify the relevant systemd EnvironmentFile files by adding lines that set the relevant environment variables to
false
.If you are using Ubuntu, run the following commands.
sudo sh -c 'echo "ARCHIVEMATICA_DASHBOARD_DASHBOARD_SEARCH_ENABLED=false" >> /etc/default/archivematica-dashboard' sudo sh -c 'echo "ARCHIVEMATICA_MCPSERVER_MCPSERVER_SEARCH_ENABLED=false" >> /etc/default/archivematica-mcp-server' sudo sh -c 'echo "ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_SEARCH_ENABLED=false" >> /etc/default/archivematica-mcp-client'
If you are using CentOS, run the following commands.
sudo sh -c 'echo "ARCHIVEMATICA_DASHBOARD_DASHBOARD_SEARCH_ENABLED=false" >> /etc/sysconfig/archivematica-dashboard' sudo sh -c 'echo "ARCHIVEMATICA_MCPSERVER_MCPSERVER_SEARCH_ENABLED=false" >> /etc/sysconfig/archivematica-mcp-server' sudo sh -c 'echo "ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_SEARCH_ENABLED=false" >> /etc/sysconfig/archivematica-mcp-client'
Restart services.
If you are using Ubuntu, run the following commands.
sudo service archivematica-dashboard restart sudo service archivematica-mcp-client restart sudo service archivematica-mcp-server restart
If you are using CentOS, run the following commands.
sudo -u root systemctl restart archivematica-dashboard sudo -u root systemctl restart archivematica-mcp-client sudo -u root systemctl restart archivematica-mcp-server
If you had previously installed and started the Elasticsearch service, you can turn it off now.
sudo -u root systemctl stop elasticsearch sudo -u root systemctl disable elasticsearch
Upgrade with output capturing disabled¶
As of Archivematica 1.7.1, output capturing can be disabled at upgrade or at
any other time. This means the stdout and stderr from preservation tasks are
not captured, which can result in a performane improvement. See the
Task output capturing configuration <task-output-capturing-admin> page for
more details. In order to disable output capturing, set the
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CAPTURE_CLIENT_SCRIPT_OUTPUT
environment
variable to false
and restart the MCP Client process(es). Consult the
installation instructions for your deployment method for more details on how to
set environment variables and restart Archivematica processes.
Update search indices¶
Note
Ignore this section if you are planning to run Archivematica without search indices.
Archivematica releases may introduce changes that require updating the search indices to function properly, e.g. Archivematica v1.12.0 introduced new fields to the search indices and made some changes to text field types. Please keep an eye on our release notes before you start the upgrade.
The update can be accomplished one of two ways. Preferably, you can reindex the documents which is usually faster because the same documents that you already have indexed will be re-ingested. We would love to know if this is not working for you, but when that’s the case, it is possible to recreate the indices which will take much longer to complete because it accesses the original data, e.g. your AIPs.
Reindex the documents¶
In Elasticsearch, it is possible to add new fields to search indices but it is not possible to update existing ones. The recommended strategy is to create new indices with our desired mapping and reindex our documents. This is based on the Reindex API.
Warning
Before you continue, we recommend backing up your Elasticsearch data. Please read the official docs for instructions.
Assuming that your Elasticsearch cluster is available via 127.0.0.1:9200
,
this is how we can list existing indices:
$ curl -s -X GET 'http://localhost:9200/_cat/indices/%2A?v=&s=index:desc'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 0 0 1.2kb 1.2kb
yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb
yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb
yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 11987 0 2.9gb 2.9gb
Ensure that the Elasticsearch heap size is big enough to accomodate the size of
the indices. The current size can be found under /etc/default/elasticsearch
(Ubuntu) or /etc/sysconfig/elasticsearch
(CentOS):
$ grep ES_JAVA_OPTS= /etc/default/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"
For our example, it should be greater than 3G. Update ES_JAVA_OPTS
as
follows and restart the service to apply the changes:
ES_JAVA_OPTS="-Xms3g -Xmx3g"
Given our four indices (transfers, transferfiles, aips and aipfiles),
our plan is to rename them. Next, we will start the archivematica-dashboard
service which automatically creates the new indices with the desired mapping.
At that point, we will usee the Reindex API to re-ingest all the documents
into the new indices. Within this process, the new mappings will be
automatically applied. This can all be done automatically running the following
script:
#!/usr/bin/env bash
set -o errexit
set -o pipefail
set -o nounset
es_url="http://localhost:9200"
index_list='aips aipfiles transfers transferfiles'
echo -e "\nIndex list before reindexing:\n"
curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc"
echo -e "\n"
# Clone indices with _reindex API call:
for index in $index_list; do
echo "Reindex ${index} in ${index}_new..."
curl -s -X POST \
${es_url}/_reindex \
-H 'Content-Type: application/json' \
-d '{
"source": {
"index": "'"${index}"'"
},
"dest": {
"index": "'"${index}_new"'"
}
}' > /dev/null
done
echo -e "\n\n"
echo -e "Index list after tmp indices creation\n"
indices_output=$(curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc")
curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc"
echo -e "\n"
# Delete old indices
for index in $index_list; do
echo "Deleting ${index}..."
curl -s -X DELETE ${es_url}/${index} > /dev/null
done
# Restart archivematica-dashboard to create indices with new mappings
echo -e "\nRestarting archivematica-dashboard"
sudo service archivematica-dashboard restart
# Wait 30 seconds
echo "Wait 30 seconds to ensure dashboard has created the empty indices with new mapping"
sleep 30
echo -e "\n"
# When index has no docs the reindex doesn't create the new index (typically transferfiles index)
# There's a check to ensure the new index has been create before reindexing.
# Reindex from *_new indices:
for index in $index_list; do
if echo "$indices_output" | grep ${index}_new >/dev/null; then
echo "Indexing ${index} using ${index}_new ..."
curl -s -X POST \
${es_url}/_reindex \
-H 'Content-Type: application/json' \
-d '{
"source": {
"index": "'"${index}_new"'"
},
"dest": {
"index": "'"${index}"'"
}
}' > /dev/null
fi
done
echo -e "\n"
# Delete temporary indices
for index in $index_list; do
if echo "$indices_output" | grep ${index}_new >/dev/null; then
echo "Deleting ${index}_new..."
curl -s -X DELETE ${es_url}/${index}_new > /dev/null
fi
done
echo -e "\n\nReindexing done:\n"
curl -s -X GET "${es_url}/_cat/indices/%2A?v=&s=index:desc"
echo -e "\n"
For the example above, this script took 11 minutes to complete. If it failed,
try checking out the logs (/var/log/elasticsearch.log
). Most likely, the
JVM heap size ran out of memory. You can start over by restoring your back up
or putting back the old indices. The output that we expect to see is similar to
the following:
Index list before reindexing:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 3 0 11.6kb 11.6kb
yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb
yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb
yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 12905 0 2.6gb 2.6gb
Reindex aips in aips_new...
Reindex aipfiles in aipfiles_new...
Reindex transfers in transfers_new...
Reindex transferfiles in transferfiles_new...
Index list after tmp indices creation
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open transfers_new gdFevH8yRdiNTdrPcfo8Lg 5 1 0 0 460b 460b
yellow open transfers lYqkYjwZRy2XG8CP_3S3PQ 5 1 3 0 11.6kb 11.6kb
yellow open transferfiles K5gnDZyOQz2JdIeZ6adJsQ 5 1 0 0 1.2kb 1.2kb
yellow open aips_new uJ-ehaYLTfe_1lOSErfu3Q 5 1 17 0 96.8mb 96.8mb
yellow open aips yAyK_koXThaZcWsBYfzN7w 5 1 17 0 101.4mb 101.4mb
yellow open aipfiles_new 00Xxu7v2QvWsq92gM247xQ 5 1 12905 0 3.1gb 3.1gb
yellow open aipfiles TVrrX8jkRhWWxGfvK_M6zg 5 1 12905 0 2.6gb 2.6gb
Deleting aips...
Deleting aipfiles...
Deleting transfers...
Deleting transferfiles...
Restarting archivematica-dashboard
Wait 30 seconds to ensure dashboard has created the empty indices with new mapping
Indexing aips using aips_new ...
Indexing aipfiles using aipfiles_new ...
Indexing transfers using transfers_new ...
Deleting aips_new...
Deleting aipfiles_new...
Deleting transfers_new...
Reindexing done:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open transfers FC7aSVPmSmmCc_LTv1AQRA 5 1 3 0 1.2kb 1.2kb
yellow open transferfiles 5JMAft3FQwmosZQFi7eJNw 5 1 0 0 1.2kb 1.2kb
yellow open aips EtwXG3-4SO2Px-4QMRufXA 5 1 17 0 102.1mb 102.1mb
yellow open aipfiles -PFuzslgTeWJ4CWny8VZoA 5 1 12905 0 3gb 3gb
Note
We may implement this script as a Django command in the future for better usability. For the time being, please download the script and tweak as needed.
Recreate the indices¶
This method will allow you to delete and rebuild the existing Elasticsearch indices so that all the Backlog and Archival Storage column fields are fully populated, including for transfers and AIPs ingested prior to the upgrade to Archivematica 1.12.1. Run the commands described in Rebuild the indexes to fully delete and rebuild the indices.
Execution example:
sudo -u archivematica bash -c " \
set -a -e -x
source /etc/default/archivematica-dashboard || \
source /etc/sysconfig/archivematica-dashboard \
|| (echo 'Environment file not found'; exit 1)
cd /usr/share/archivematica/dashboard
/usr/share/archivematica/virtualenvs/archivematica-dashboard/bin/python \
manage.py rebuild_transfer_backlog --from-storage-service --no-prompt
";
sudo -u archivematica bash -c " \
set -a -e -x
source /etc/default/archivematica-dashboard || \
source /etc/sysconfig/archivematica-dashboard \
|| (echo 'Environment file not found'; exit 1)
cd /usr/share/archivematica/dashboard
/usr/share/archivematica/virtualenvs/archivematica-dashboard/bin/python \
manage.py rebuild_aip_index_from_storage_service --delete-all
";
Note
Please note, the use of encrypted or remote Transfer Backlog and AIP Store locations may require use of the option to rebuild indices from the Storage Service API rather than from the filesystem. At this time, it is not possible to rebuild the indices for all types of remote locations.
Note
Please note, the execution of this command may take a long time for big AIP and Transfer Backlog storage locations, especially if the packages are stored compressed or encrypted, or you are using a third party service. If that is the case, you may want to reindex the Elasticsearch documents instead.