Task output capturing configuration¶
Archivematica allows users to configure their MCP client(s) in order to control whether or not output streams (stdout and stderr) from the client scripts are captured and then passed from the task workers to the task manager (the MCP server).
In the default configuration, these output streams are captured. In this
state, when a user clicks on the gear icon of a microservice in order to view
the tasks that have run, each task representation will contain a section
Standard streams which will contain a sub-section entitled
Standard output (stdout) containing the stdout from the client script and
Standard error (stderr) containing the stderr.
However, in some cases, serializing these output streams and moving them around (capturing them) can have a non-trivial performance cost. (We have measured a 6% reduction in processing time for some transfers when output capturing is disabled.) Archivematica can be configured to avoid paying this cost by disabling the capturing of these output streams. The trade-off for doing this is that the stdout and stderr that documents the running of a preservation task will no longer be stored in the database and it will no longer be displayed in the tasks interface.
In order to configure Archivematica to stop capturing output streams, one must
environment variable to
false before starting (or re-starting) the MCP
client process. The way that Archivematica environment variables are set
depends on the deployment method used. Please consult the relevant
documentation for your deployment method if you are interested in disabling
- Environment variable configuration for CentOS package-based installations
- Environment variable configuration for Ubuntu package-based installations
- Environment variable configuration for Ubuntu Ansible-based installations
- Environment variable configuration for Docker Compose installations
Note that when output capturing is disabled the stdout will never be captured while the stderr will only be captured when the preservation task returns a non-zero exit code (i.e., when it fails). This allows the user to get the performance benefits while still having useful debugging information in the case where a task fails.
It is also possible to tweak the configuration of your database server in
order to push the existing limits further. This can be a good solution for
users familiar with the MySQL storage engine. In particular we have had some
success increasing the InnoDB log file size (
Should MySQL update the default innodb_log_file_size? for more. In the
long term, we expect to find better ways to store task results - if you are
interested, please see issue #314.