Configuration settings

The configuration file

Workbench uses a YAML configuration whose location is indicated in the --config argument. This file defines the various options it will use to create, update, or delete Islandora content. The simplest configuration file needs only the following four options:

task: create
host: https://localhost
username: admin
password: islandora

In this example, the task being performed is creating nodes (and optionally media). Other tasks are create_from_files, update, delete, add_media, update_media, and delete_media. Some of the configuration settings documented below are used in all tasks, while others are only used in specific tasks.

Note

Workbench has a lot of configuration settings, and knowing which ones to use can be a little overwhelming. A good practice is to create one or more template configuration files that you can reuse for commonly-performed tasks, changing only the settings that pertain to the current job. More info on this suggestion is vailable here.

Configuration settings

The settings defined in a configuration file are documented in the tables below, grouped into broad functional categories for easier reference. The order of the options in the configuration file doesn't matter, and settings do not need to be grouped together in any specific way in the configuration file.

Warning

Many settings have default values that are applied if the setting is absent from your configuration file. All default values are indicated in the table below. If ou need to override a default value for a setting, simply include that setting in your config file, so that value, instead of the default, will be applied.

Note that you can define some configuration settings as command-line arguments to the workbench script. If they are provided as command-line arguments, they override the same settings in the configuration file.

Printing all configuration values

If you need to see your current configuration (based on the default values for settings plus the settings in your config file), add --print_config as a command-line argument when you run Workbench. For example:

./workbench --config myconfig.yml --print_config

Workbench will not perform its task (e.g. it will not create or update content), it will just dump the configuration settings and stop running. Note that the value of the password setting is obfuscated but all other settings are exactly as defined by the Workbench defaults and what is in your current config file.

A couple of tricks you might find useful, if you are running Workbench on Linux or MacOS:

Print the configuration to a file: ./workbench --config myconfig.yml --print_config > myconfig.txt
Filter out a single configuration setting (using temp_dir as an example): ./workbench --config myconfig.yml --print_config | grep temp_dir

Use of quotation marks

Generally speaking, you do not need to use quotation marks around values in your configuration file. You may wrap values in quotation marks if you wish, and many examples in this documentation do that (especially the host setting), but the only values that should not be wrapped in quotation marks are those that take true or false as values because in YAML, and many other computer/markup languages,"true" is a string (in this case, an English word that can mean many things) and true is a reserved symbol that can mean one thing and one thing only, the boolean opposite of false (I'm sorry for this explanation, I can't describe the distinction in any other way without writing a primer on symbolic logic).

For example, the following is a valid configuration file:

task: create
host: http://localhost:8000
username: admin
password: islandora
nodes_only: true
csv_field_templates:
 - field_linked_agent: relators:aut:person:Jordan, Mark
 - field_model: 25

But the following version is not valid, since there are quotes around "true" in the nodes_only setting:

task: create
host: http://localhost:8000
username: admin
password: islandora
nodes_only: "true"
csv_field_templates:
 - field_linked_agent: relators:aut:person:Jordan, Mark
 - field_model: 25

Use of spaces and other syntactical features

Configuration setting names should start a new line and not have any leading spaces. The exception is illustrated in the values of the csv_field_templates setting in the above examples, where the setting's value is a list of other values. In this case the members of the list start with a dash and a space (-). The trailing space in these values is significant. (However, the leading space before the dash is insignificant, and is used for appearance only.) For example, this snippet is valid:

csv_field_templates:
 - field_linked_agent: relators:aut:person:Jordan, Mark
 - field_model: 25

whereas this one is not:

csv_field_templates:
 -field_linked_agent: relators:aut:person:Jordan, Mark
 -field_model: 25

Some setting values are represented in Workbench documentation using square brackets, like this one:

export_csv_field_list: ['field_description', 'field_extent']

Strictly speaking, YAML lists can be represented as either a series of entries on their own lines that start with - or as entries enclosed in [ and ]. It's best to follow the examples provided throughout the Workbench documentation.

Required configuration settings

Setting	Required	Description
task	✔️	One of 'create', 'create_from_files', 'update', delete', 'add_media', 'delete_media', 'update_media', 'export_csv', 'get_data_from_view', 'create_terms', or 'delete_media_by_node'. See "Choosing a task" for more information.
host	✔️	The hostname, including `http://` or `https://` of your Islandora repository, and port number if not the default 80.
username		The username used to authenticate the requests. This Drupal user should be a member of the "Administrator" role. If you want to create nodes that are owned by a specific Drupal user, include their numeric user ID in the `uid` column in your CSV.
password		The user's password. You can also set the password in your `ISLANDORA_WORKBENCH_PASSWORD` environment variable. If you do this, omit the `password` option in your configuration file. If a password is not available in either your configuration file or in the environment variable, Workbench will prompt for a password.

Drupal settings

Setting	Required	Default value	Description
content_type		islandora_object	The machine name of the Drupal node content type you are creating or updating. Required in "create" and "update" tasks.
drupal_filesystem		fedora://	One of 'fedora://', 'public://', or 'private://' (the wrapping quotation marks are required). Only used with Drupal 8.x - 9.1; starting with Drupal 9.2, the filesystem is automatically detected from the media's configuration. Will eventually be deprecated.
allow_adding_terms		false	In `create`, `update`, `add_media`, `update_media`, `create_terms`, and `update_terms` tasks, determines if Workbench will add taxonomy terms if they do not exist in the target vocabulary. See more information in the "Taxonomy reference fields" section. Note: this setting is not required in `create_terms` tasks unless you are adding new terms to a taxonomy reference field on the term entries.
protected_vocabularies		[] (empty list)	Allows you to exclude vocabularies from having new terms added via `allow_adding_terms`. See more information in the "Taxonomy reference fields" section.
vocab_id	✔️ in `create_terms` tasks.		Identifies the vocabulary you are adding terms to in `create_tersm` tasks. See more information in the "Creating taxonomy terms" section.
update_mode		replace	Determines if Workbench will `replace`, `append` (add to) , or `delete` field values during `update` tasks. See more information in the "Updating nodes" section. Also applies to `update_media` tasks.
validate_terms_exist		true	If set to false, during `--check` Workbench will not query Drupal to determine if taxonomy terms exist. The structure of term values in CSV are still validated; this option only tells Workbench to not check for each term's existence in the target Drupal. Useful to speed up the `--check` process if you know terms don't exist in the target Drupal.
validate_parent_node_exists		true	If set to false, during `--check` Workbench will not query Drupal to determine if nodes whose node IDs are in `field_member_of` exist. Useful to speed up the `--check` process if you know terms already exist in the target Drupal.
max_node_title_length		255	Set to the number of allowed characters for node titles if your Drupal uses Node Title Length. If unsure what your the maximum length of the node titles your site allows, check the length of the "title" column in your Drupal database's "node_field_data" table.
max_image_alt_text_length		255	Set to the number of allowed characters for image alt text values. Drupal's database structure limits this to 255 characters.
list_missing_drupal_fields		false	Set to `true` to tell Workbench to provide a list of fields that exist in your input CSV but that cannot be matched to Drupal field names (or reserved column names such as "file"). If `false`, Workbench will still check for CSV column headers that it can't match to Drupal fields, but will exit upon finding the first such field. This option produces a list of fields instead of exiting on detecting the first field.
standalone_media_url		false	Set to `true` if your Drupal instance has the "Standalone media URL" option at `/admin/config/media/media-settings` checked. The Drupal default is to have this unchecked, so you only need to use this Workbench option if you have changed Drupal's default. More information is available.
require_entity_reference_views		true	Set to `false` to tell Workbench to not require a View to expose the values in an entity reference field configured to use an Entity Reference View. Additional information is available here.
entity_reference_view_endpoints			A list of mappings from Drupal/CSV field names to Views REST Export endpoints used to look up term names for entity reference field configured to use an Entity Reference View. Additional information is available here.
text_format_id		basic_html	The text format ID (machine name) to apply to all Drupal text fields that have a "formatted" field type. See "Text fields with markup" for more information.
field_text_format_ids			Defines a mapping between field machine names the machine names of format IDs for "formatted" fields. See "Text fields with markup" for more information.
paragraph_fields			Defines structure of paragraph fields in the input CSV. See "Entity Reference Revisions fields (paragraphs)" for more information.
credentials_file_path			The absolute or relative path to a simple YAML file that contains the `username` and `password` config settings. Putting your `username` and `password` settings in this file allows you to omit them from the main configuration file. See "User management" for more information.
credentials_key_file_path			The absolute or relative path to a plain text file that contains the key required to use an encrypted credentials file. Putting your `username` and `password` settings in this file replaces prompting the user for the key.

Input data location settings

Setting	Default value	Description
input_dir	input_data	The full or relative path to the directory containing the files and metadata CSV file.
input_csv	metadata.csv	Path to the CSV metadata file. Can be absolute, or if just the filename is provided, will be assumed to be in the directory named in `input_dir`. Can also be the URL to a Google spreadsheet (see the "Using Google Sheets as input data" section for more information).
google_sheets_csv_filename	google_sheet.csv	Local CSV filename for data from a Google spreadsheet. See the "Using Google Sheets as input data" section for more information.
google_sheets_gid	0	The "gid" of the worksheet to use in a Google Sheet. See "Using Google Sheets as input data" section for more information.
excel_worksheet	Sheet1	If using an Excel file as your input CSV file, the name of the worksheet that the CSV data will be extracted from.
input_data_zip_archives	[]	List of local file paths and/or URLs to .zip archives to extract into the directory defined in `input_dir`. See "Using a local or remote .zip archive as input data" for more info.
delete_zip_archive_after_extraction	true	Tells Workbench to delete a remote input zip archive after it has been downloaded and extracted. Applies to remote input archive files only (i.e., starts with "http"); local .zip archives are not deleted.

Input CSV file settings

Setting	Default value	Description
id_field	id	The name of the field in the CSV that uniquely identifies each record.
delimiter	, [comma]	The delimiter used in the CSV file, for example, "," or "\t" (must use double quotes with "\t"). If omitted, defaults to ",".
subdelimiter	\| [pipe]	The subdelimiter used in the CSV file to define multiple values in one field. If omitted, defaults to "\|". Can be a string of multiple characters, e.g. "^^^".
ignore_csv_columns		Used in the `create` and `update` tasks only. A list of CSV column headers that Workbench should ignore. For example, `ignore_csv_columns: [Target Collection, Ready to publish]`
csv_start_row		Used in `create` and `update` tasks. Tells Workbench to ignore all rows/records in input CSV (or Google Sheet or Excel) before the designated row number. More information is available.
csv_stop_row		Used in `create` and `update` tasks. Tells Workbench to ignore all rows/records in input CSV (or Google Sheet or Excel) after the designated row number. More information is available.
csv_start_row_skip		Used in `create` and `update` tasks. Tells Workbench to ignore all rows/records in input CSV (or Google Sheet or Excel) after the designated row number. More information is available.
csv_stop_row_skip		Used in `create` and `update` tasks. Tells Workbench to ignore all rows/records in input CSV (or Google Sheet or Excel) before the designated row number. More information is available.
csv_rows_to_process		Used in `create` and `update` tasks. Tells Workbench to process only the rows/records in input CSV (or Google Sheet or Excel) with the specified "id" column values. More information is available.
csv_headers	names	Used in "create", "update" and "create_terms" tasks. Set to "labels" to allow use of field labels (as opposed to machine names) as CSV column headers.
clean_csv_values_skip	[] (empty list)	Used in all tasks that use CSV input files. See "How Workbench cleans your input data" for more information.
columns_with_term_names	[] (empty list)	Used in all tasks that allow creation of terms on entity ingest. See "Using numbers as term names" for more information.

Input CSV content templating settings

Setting	Default value	Description
csv_field_templates		Used in the `create` and `update` tasks only. A list of Drupal field machine names and corresponding values that are copied into the CSV input file. More detail provided in the "CSV field templates" section.
csv_value_templates		Used in the `create` and `update` tasks only. A list of Drupal field machine names and corresponding templates. More detail provided in the "CSV value templates" section.
csv_value_templates_for_paged_content		Used in `create` tasks only. Similar to `csv_value_templates` but applies to paged/child items created using the "Using subdirectories" method of creating paged content. More detail provided in the "CSV value templates" section.
csv_value_templates_rand_length	5	Length of the `$random_alphanumeric_string` and `$random_number_string` variables CSV value templates. More detail provided in the "CSV value templates" section.
allow_csv_value_templates_if_field_empty	[]	List of fields to populate with CSV value templates if the CSV field is empty. More detail provided in the "CSV value templates" section.
field_viewer_override_models		List of mappings from values in the Islandora Display vocabulary to values in the input CSV's `field_model` column. Allows for automatic population of the `field_viewer_override` field. More detail provided at "Automatically populating the "Viewer override" field".
field_viewer_override_extensions		List of mappings from values in the Islandora Display vocabulary to extensions on files named in the input CSV `file` column. Allows for automatic population of the `file` field. More detail provided at "Automatically populating the "Viewer override" field".

Exporting CSV and files settings

See "Generating CSV files" section for more information.

Setting	Required	Default value	Description
output_csv			Used in "create" tasks. The full or relative (to the "workbench" script) path to a CSV file with one record per node created by Workbench.
output_csv_include_input_csv		false	Used in "create" tasks in conjunction with `output_csv`. Include in the output CSV all the fields (and their values) from the input CSV.
export_csv_term_mode		tid	Used in "export_csv" tasks to indicate whether vocabulary term IDs or names are included in the output CSV file. Set to "tid" (the default) to include term IDs, or set to "name" to include term names. See "Exporting field data into a CSV file" for more information.
export_csv_field_list		[] (empty list)	List of fields to include in exported CSV data. If empty, all fields will be included. See "Using a Drupal View to identify content to export as CSV" for more information.
view_parameters			List of URL parameter/value strings to include in requests to a View. See "Using a Drupal View to identify content to export as CSV" for more information.
export_csv_file_path	✔️ in `get_data_from_view` tasks.		Used in the "export_csv" and "get_data_from_view" tasks. The path to the exported CSV file. Required in the "get_data_from_view" task; in the "export_csv" task, if left empty (the default), the file will be named after the value of the `input_csv` with ".csv_file_with_field_values" appended and saved in the directory identified in `input_dir`.
export_file_directory			Used in the "export_csv" and "get_data_from_view" tasks. The path to the directory where files corresponding to the data in the CSV output file will be written. If the directory doesn't exist, Workbench will create it (but not any leading directories); if Workbench cannot create the directory, it will exit.
export_file_url_instead_of_download		false	Used in the "export_csv" and "get_data_from_view" tasks. Tells Workbench to not download media files but instread to export the direct URLs to the files.

Media settings

Setting	Required	Default value	Description
nodes_only		false	Include this option in `create` tasks, set to `true`, if you want to only create nodes and not their accompanying media. See the "Creating nodes but not media" section for more information.
allow_missing_files		false	Determines if `file` values that point to missing (not found) files are allowed. Used in the `create` and `add_media` tasks. If set to true, `file` values that point to missing files are allowed. For `create` tasks, a `true` value will result in nodes without attached media. For `add_media` tasks, a `true` value will skip adding a media for the missing `file` CSV value. Defaults to false (which means all `file` values must name files that exist at their specified locations). Note that this setting has no effect on empty `file` values; these are always logged, and their corresponding media are simply not created.
exit_on_first_missing_file_during_check		true	Removed as a configuration setting November 1, 2022. Use `strict_check` instead.
strict_check			Replaced with `perform_soft_checks` as of commit dfa60ff (July 14, 2023).
media_use_tid		`http://pcdm.org/use#OriginalFile`	The term ID for the term from the "Islandora Media Use" vocabulary you want to apply to the media being created in `create` and `add_media` tasks. You can provide a term URI instead of a term ID, for example `"http://pcdm.org/use#OriginalFile"`. You can specify multiple values for this setting by joining them with the subdelimiter configured in the `subdelimiter` setting; for example, `media_use_tid: 17\|18`. You can also set this at the object level by including `media_use_tid` in your CSV file; values there will override the value set in your configuration file. If you are "Adding multiple media", you define media use term IDs in a slightly different way.
media_type	✔️ in `add_media` and `update_media` tasks		Overrides, for all media being created, Workbench's default definition of whether the media being created is an image, file, document, audio, or video. Used in the `create`, `create_from_files`, `add_media`, and `update_media`, tasks. More detail provided in the "Configuring Media Types" section. Required in all `update_media` and `add_media` tasks, not to override Workbench's defaults, but to explicitly indicate the media type being updated/added.
media_types_override			Overrides default media type definitions on a per file extension basis. Used in the `create`, `add_media`, and `create_from_files` tasks. More detail provided in the "Configuring Media Types" section.
media_type_file_fields			Defines the name of the media field that references media's file (i.e., the field on the Media type). Usually used with custom media types and accompanied by either the `media_type` or `media_types_override` option. For more information, see the "Configuring Media Types" section.
mimetype_extensions			Overrides Workbench's default mappings between MIME types and file extensions. Usually used with remote files where the remote web server returns a MIME type that is not standard. For more information, see the "Configuring Media Types" section.
extensions_to_mimetypes			Overrides Workbench's default mappings between file extension and MIME types. For more information, see the "Configuring Media Types" section.
delete_media_with_nodes		true	When a node is deleted using a `delete` task, by default, all if its media are automatically deleted. Set this option to false to not delete all of a node's media (you do not generally want to keep the media without the node).
use_node_title_for_media_title		true	If set to `true` (default), name media the same as the parent node's title value. Truncates the value of the field to 255 characters. Applies to both `create` and `add_media` tasks.
use_nid_in_media_title		false	If set to `true`, assigns a name to the media following the pattern `{node_id}-Original File`. Set to true to use the parent node's node ID as the media title. Applies to both `create` and `add_media` tasks.
field_for_media_title			Identifies a CSV field name (machine name, not human readable) that will be used as the media title in create tasks. For example, `field_for_media_title: id`. Truncates the value of the field to 255 characters. Applies to both `create` and `add_media` tasks.
use_node_title_for_remote_filename		false	Set to true to use a version of the parent node's title as the filename for a remote (http[s]) file. Replaces all non-alphanumeric characters with an underscore (`_`). Truncates the value of the field to 255 characters. Applies to both `create` and `add_media` tasks. Note: this setting replaces (the previously undocumented) `use_nid_in_media_filename` setting.
field_for_remote_filename			Identifies a CSV field name (machine name, not human readable) that will be used as the filename for a remote (http[s]) file. For example, `field_for_remote_filename: id`. Truncates the value of the field to 255 characters. If the field is empty in the CSV, the CSV ID field value will be used. Applies to both `create` and `add_media` tasks. Note: this setting replaces (the previously undocumented) `field_for_media_filename` setting.
delete_tmp_upload		false	For remote files, if set to `true`, the temporary copy of the remote file is deleted after it is used to create media. If `false`, the copy will remain in the location defined in your `temp_dir` setting. If the file cannot be deleted (e.g. a virus scanner is scanning it), it will remain and an error message will be added to the log file.
additional_files			Maps a set of CSV field names to media use terms IDs to create additional media (additional to the media created from the file named in the "file" column, that is) in `create` and `add_media` tasks. See "Adding multiple media" for more information. Also used in `export_csv` and `get_data_from_view` tasks to indicate which media to export. See "Exporting image, video, etc. files along with CSV data" for more information.
fixity_algorithm		None	Checksum/hash algorithm to use during transmission fixity checking. Must be one of "md5", "sha1", or "sha256". See "Fixity checking" for more information.
validate_fixity_during_check		false	Perform checksum validation during `--check`. See "Fixity checking" for more information.
delete_media_by_node_media_use_tids		[] (empty list)	During `delete_media_by_node` tasks, allows you to specify which media to delete. Only media with the listed terms IDs from the Islandora Media Use vocabulary will be deleted. By default (an empty list), all media are deleted. See "Deleting media using node IDs" for more information.
update_media_by_node_media_use_tids	✔️	[] (empty list)	During `update_media_by_node` tasks, in conjunction with the `media_type` setting, allows you to specify which media to update. Only media of the specified type with the listed terms IDs from the Islandora Media Use vocabulary will be updated. By default (an empty list), no media are updated. See "Updating media using node IDs" for more information.
keep_filename_parent_directory		true	For all tasks that create or replace media, this setting determines if the file created by Drupal includes in its filename the name of the file's parent directory as it appears in the input CSV or in directories used during the creation of paged content. For example, when creating paged content, if a page file named "1918-04-12-04.tif" is in a newspaper issue directory named "1918-04-12", with `keep_filename_parent_directory` set to "true" (the default), the name of the file created by Drupal will be "1918-04-12/1918-04-12-04.tif"; if `keep_filename_parent_directory` is set to false, the file's name will be simply "1918-04-12-04.tif".
remote_file_cookie_name			Name of the session cookie used to provide access to remote files. See "Session cookies for remote files" for more information.
remote_file_cookie_value			The session cookie data corresponding to `remote_file_cookie_name`.

Islandora model settings

Setting	Required	Default value	Description
model [singular]			Used in the `create_from_files` task only. Defines the term ID from the the "Islandora Models" vocabulary for all nodes created using this task. Note: one of `model` or `models` is required. More detail provided in the "Creating nodes from files" section.
models [plural]			Used in the `create_from_files` task only. Provides a mapping between file extensions and terms in the "Islandora Models" vocabulary. Note: one of `model` or `models` is required. More detail provided in the Creating nodes from files" section.

Paged and compound content settings

See the section "Creating paged content" for more information.

Setting	Default value	Description
paged_content_from_directories	false	Set to true if you are using the "Using subdirectories" method of creating paged content.
paged_content_from_directories_parents_exist	false	Set to true if you are using the "Adding children to nodes that already exist method of creating paged content.
page_files_source_dir_field	id [or whatever is defined as your `id` column using the `id_field` configuration setting]	Set to `directory` if your input CSV contains a "directory" column that names each row's page, if are using the "Using subdirectories" method of creating paged content.
paged_content_sequence_separator	- [hyphen]	The character used to separate the page sequence number from the rest of the filename. Used when creating paged content with the "Using subdirectories" method. Note: this configuration option was originally misspelled "paged_content_sequence_seprator".
paged_content_page_model_tid		Required if `paged_content_from_directories` is true. The the term ID from the Islandora Models (or its URI) taxonomy to assign to pages.
paged_content_image_file_extension		If the subdirectories containing your page image files also contain other files (that are not page images), you need to use this setting to tell Workbench which files to create pages from. Common values include `tif`, `tiff`, and `jpg`.
paged_content_additional_page_media		A mapping of Media Use term IDs (or URIs) to file extensions telling Workbench which Media Use term to apply to media created from additional files such as OCR text files.
paged_content_page_viewer_override		The term ID from the Islandora Display taxonomy to assign to pages. If not included, defaults to the value of the `field_viewer_override` (or whatever is configured in the `viewer_override_fieldname` setting) in the parent's record in the CSV file.
viewer_override_fieldname	field_viewer_override	The Drupal fieldname in your content type that stores entries from the "Islandora Display" vocabulary. The most likely other value for this setting is `field_display_hints`.
paged_content_page_content_type		Set to the machine name of the Drupal node content type for pages created using the "Using subdirectories" method if it is different than the content type of the parent (which is specified in the content_type setting).
page_title_template	'$parent_title, page $weight'	Template used to generate the titles of pages/members in the "Using subdirectories" method.
paged_content_page_weight_multiplier	1	Multiples the sequence indicator value embedded in page filenames by this value, to space out the resulting `field_weight` values. See documentation on the "Using subdirectories" method for more information.
paged_content_ignore_files	["Thumbs.db"]	List of filenames or `*` wildcard patterns that you want Workbench to ignore when it scans directories to create page- or child-level nodes. See "Creating paged, compound, and collection content" for more inforation.

Logging settings

See the "Logging" section for more information.

Setting	Default value	Description
log_file_path	workbench.log	The path to the log file, absolute or relative to the directory Workbench is run from.
log_file_mode	a [append]	Set to "w" to overwrite the log file, if it exists.
log_request_url	false	Whether or not to log the request URL (and its method). Useful for debugging.
log_json	false	Whether or not to log the raw request JSON POSTed, PUT, or PATCHed to Drupal. Useful for debugging.
log_headers	false	Whether or not to log the raw HTTP headers used in all requests. Useful for debugging.
log_response_status_code	false	Whether or not to log the HTTP response code. Useful for debugging.
log_response_time	false	Whether or not to log the response time of each request that is slower than the average response time for the last 20 HTTP requests Workbench makes to the Drupal server. Useful for debugging.
log_response_body	false	Whether or not to log the raw HTTP response body. Useful for debugging.
log_file_name_and_line_number	false	Whether or not to add the filename and line number that created the log entry. Useful for debugging.
log_term_creation	true	Whether or not to log the creation of new terms during "create" and "update" tasks (does not apply to the "create_terms" task). `--check` will still report that terms in the CSV do not exist in their respective vocabularies.

HTTP settings

Setting	Default value	Description
user_agent	Islandora Workbench	String to use as the User-Agent header in HTTP requests.
allow_redirects	true	Whether or not to allow Islandora Workbench to respond to HTTP redirects.
secure_ssl_only	true	Whether or not to require valid SSL certificates. Set to `false` if you want to ignore SSL certificates.
enable_http_cache	true	Whether or not to enable Workbench's client-side request cache. Set to `false` if you want to disable the cache during troubleshooting, etc.
http_cache_storage	memory	The backend storage type for the client-side cache. Set to `sqlite` if you are getting out of memory errors while running Islandora Workbench.
http_cache_storage_expire_after	1200	Length of the client-side cache lifespan (in seconds). Reduce this number if you are using the `sqlite` storage backend and the database is using too much disk space. Note that reducing the cache lifespan will result in increased load on your Drupal server.
http_max_retries	3	Number of times to retry a request. Set to 0 to disable retries. See "Reducing Workbench's impact on Drupal" for more information.
http_backoff_factor	1	Number of seconds that the HTTP retry mechanism uses to determine how long to pause between retries. See "Reducing Workbench's impact on Drupal" for more information.
http_retry_on_status_codes	[500, 502, 503, 504]	List of HTTP response status codes to retry for. You should not normally change this setting. See "Reducing Workbench's impact on Drupal" for more information.
http_retry_allowed_methods	["HEAD", "GET", "POST", "PUT", "PATCH", "DELETE"]	List of HTTP request methods to retry for. You should not normally change this setting. See "Reducing Workbench's impact on Drupal" for more information.

Rollback configuration and CSV file settings

See "Rolling back" for more information.

Setting	Default value	Description
rollback_dir	Value of `input_dir` setting	Absolute path to the directory where you want your "rollback.csv" file to be written.
rollback_config_file_path	rollback.yml	Relative (to workbench) or absolute path to write the rollback config YAML file to.
rollback_csv_file_path	[input_dir/rollback.csv]	Relative (to workbench) or absolute path to write the rollback CSV file to.
timestamp_rollback	false	Set to `true` to add a timestamp to the "rollback.yml" and corresponding "rollback.csv" generated in "create" and "create_from_files" tasks.
rollback_config_filename_template		Defines a template that will be used to create the rollback configuration file. The two placeholders availalble in this template are `$config_filename` and `$input_csv_filename`.
rollback_csv_filename_template		Defines a template that will be used to create the rollback CSV file. The two placeholders availalble in this template are `$config_filename` and `$input_csv_filename`.
rollback_file_include_node_info	`false`	Set to `true` to include `title`, `field_member_of` and `file` columns in your rollback CSV file.
rollback_file_comments		Defines a list of lines to be added to both the rollback config and CSV file as comments.
include_password_in_rollback_config_file	`false`	Set to `true` to include the value of the `password` configuration setting in your rollback config YAML file.

CSV ID to node ID map settings

See "The CSV ID to node ID map" documentation for more information.

Setting	Default value	Description
csv_id_to_node_id_map_path	[temp_dir]/csv_id_to_node_id_map.db	Directory path to the SQLite database filename used to store CSV row ID to node ID mappings for nodes created during `create` and `create_from_files` tasks. If you want to store your map database outside of the temporary directory, use an absolute path to the database file for this setting. If you want to disable population of this database, set this config setting to `false`.
csv_id_to_node_id_map_dir	Value of the `temp_dir` setting.	Directory path to the SQLite database filename used to store CSV row ID to node ID mappings for nodes created during `create` and `create_from_files` tasks. This setting does the same thing as `csv_id_to_node_id_map_path` but is named more descriptively and will likely replace `csv_id_to_node_id_map_path` in the future.
csv_id_to_node_id_map_filename	Filename of the SQLite database file. Does not require a `.db` extension.
query_csv_id_to_node_id_map_for_parents	false	Queries the CSV ID to node ID map when creating compound content. Set to `true` if you want to use parent IDs from previous Workbench sessions. Note: this setting is automatically set to true in secondary task config files. See "Creating parent/child relationships across Workbench sessions for more information."
ignore_duplicate_parent_ids	true	Tells Workbench to ignore entries in the CSV ID to node ID map that have the same parent ID value. Set to `false` if you want Workbench to warn you that there are duplicate parent IDs in your CSV ID to node ID map. See "Creating parent/child relationships across Workbench sessions" for more information.
csv_id_to_node_id_map_allowed_hosts	`["", the value of the "host" configuration setting]`	Tells Workbench to only use entries in the CSV ID to node ID map that have one of the specified values in the table's "host" column. `""` is for empty "host" column values, which applies to all rows added prior to June 2025, when this setting was introduced. See ""host" values in the map" for more information."

Hook settings

See "Hooks" for more information.

Setting	Default value	Description
bootstrap		List of absolute paths to one or more scripts that execute prior to Workbench connecting to Drupal.
show_bootstrap_script_output	false	Print to the console any output (e.g. printed) from bootstrap scripts. Applies to all registered bootstrap scripts.
shutdown		List of absolute paths to one or more scripts that execute after Workbench connecting to Drupal.
show_shutdown_script_output	false	Print to the console any output (e.g. printed) from shutdown scripts. Applies to all registered shutdown scripts.
preprocessors		List of absolute paths to one or more scripts that are applied to CSV values prior to the values being ingested into Drupal.
node_post_create		List of absolute paths to one or more scripts that execute after a node is created.
node_post_update		List of absolute paths to one or more scripts that execute after a node is updated.
media_post_create		List of absolute paths to one or more scripts that execute after a media is created.

`run_scripts` task settings

See "Running scripts on existing entities" for more information.

Setting	Required	Default value	Description
run_scripts	✔️		List of absolute paths to one or more scripts that execute.
run_scripts_entity_type	✔️		One of "node", "media", or "term".
run_scripts_threads		1	Number of concurrent threads to use.
run_scripts_threads		true	Whether or not to log the output of scripts in the Workbench log file. Set to false if your script writes its own log.

Miscellaneous settings

Setting	Default value	Description
check_for_workbench_updates	true	Whether or not to have `--check` compare your local Git commits with those in the "main" branch at https://github.com/mjordan/islandora_workbench and log whether your copy of Workbench is up to date or not.
perform_soft_checks	false	If set to true, `--check` will not exit when it encounters an error with parent/child order, file extension registration with Drupal media file fields, missing files named in the `files` CSV column, or EDTF validation. Instead, it will log any errors it encounters and exit after it has checked all rows in the CSV input file. Note: this setting replaces `strict_check` as of commit dfa60ff (July 14, 2023).
temp_dir	Value of the temporary directory defined by your system as defined by Python's gettempdir() function.	Relative or absolute path to the directory where you want Workbench's temporary files to be written. These include the ".preprocessed" version of the your input CSV, remote files temporarily downloaded to create media, and the CSV ID to node ID map database.
sqlite_db_filename	[temp_dir]/workbench_temp_data.db	Name of the SQLite database filename used to store session data.
pause		Defines the number of seconds to pause between all 'POST', 'PUT', 'PATCH', 'DELETE' requests to Drupal. Include it in your configuration to lessen the impact of Islandora Workbench on your site during large jobs, for example pause: 1.5. More information is available in the "Reducing Workbench's impact on Drupal" documentation.
adaptive_pause		Defines the number of seconds to pause between each REST request to Drupal. Works like "pause" but only takes effect when the Drupal server's response to the most recent request is slower (determined by the "adaptive_pause_threshold" value) than the average response time for the last 20 requests. More information is available in the "Reducing Workbench's impact on Drupal" documentation.
adaptive_pause_threshold	2	A weighting of the response time for the most recent request, relative to the average response times of the last 20 requests. This weighting determines how much slower the Drupal server's response to the most recent Workbench request must be in order for adaptive pausing to take effect for the next request. For example, if set to "1", adaptive pausing will happen when the response time is equal to the average of the last 20 response times; if set to "2", adaptive pausing will take effect if the last requests's response time is double the average.
progress_bar	false	Show a progress bar when running Workbench instead of row-by-row output.
show_percentage_of_csv_input_processed	false	Show the percentage of input CSV processed in each line of output in `create`, `update`, `delete`, and `add_media` tasks.
drupal_8		Deprecated.
path_to_python	python	Used in `create` tasks that also use the `secondary_tasks` option. Tells Workbench the path to the python interpreter. For details on when to use this option, refer to the end of the "Secondary Tasks" section of "Creating paged, compound, and collection content". Note: this setting goes in the config file for the primary task, not the secondary tasks' config files.
path_to_workbench_script	workbench	Used in `create` tasks that also use the `secondary_tasks` option. Tells Workbench the path to the `workbench` script. For details on when to use this option, refer to the end of the "Secondary Tasks" section of "Creating paged, compound, and collection content". Note: this setting goes in the config file for the primary task, not the secondary tasks' config files.
contact_sheet_output_dir	contact_sheet_output	Used in `create` tasks to specify the name of the directory where contact sheet output is written. Can be relative (to the Workbench directory) or absolute. See "Generating a contact sheet" for more information.
contact_sheet_css_path	assets/contact_sheet/contact-sheet.css	Used in `create` tasks to specify the path of the CSS stylesheet to use in contact sheets. Can be relative (to the Workbench directory) or absolute. See "Generating a contact sheet" for more information.
node_exists_verification_view_endpoint		Used in `create` tasks to tell Workbench to check whether a node with a matching value in your input CSV already exists. See "Checking if nodes already exist" for more information.
redirect_status_code	301	Used in `create_redirect` tasks to set the HTTP response code in redirects. See "Prompting the user" for more information.
remind_user_to_run_check	false	Prompt the user to confirm whether they have run Workbench with `--check`. See "Prompting the user" for more information.
user_prompts	[] (empty list)	List of custom prompts to present to the user. See "Prompting the user" for more information.
check_lock_file_path		Defines a "lock file" that contains data generated by a successfule `--check`. See "Checking configuration and input data" for more information.
completion_message	None	A message to display to the user just before Workbench completes executing.
check_lock_file_path		Defines a "lock file" that contains data generated by a successful `--check`. See "Checking configuration and input data" for more information.
secondary_tasks		A list of configuration file names that are executed as secondary tasks after the primary task to create compound objects. See "Using a secondary task" for more information.
prompt_user_before_delete_task	false	If set to `true`, Workbench will prompt the user "You are about to delete [number] nodes and their attached media. Continue? (y/n)". See "Deleting nodes" for more information.
recovery_mode_starting_from_node_id		Identifies the node ID of the last node created during an interupted `create` task. See "Recovering from interrupted "create" tasks" for more information.

When you run Islandora Workbench with the --check argument, it will verify that all configuration options required for the current task are present, and if they aren't tell you so.

Note

Islandora Workbench automatically converts any configuration keys to lowercase, e.g., Task is equivalent to task.

Validating the syntax of the configuration file

When you run Workbench, it confirms that your configuration file is valid YAML. This is a syntax check only, not a content check. If the file is valid YAML, Workbench then goes on to perform a long list of application-specific checks.

If this syntax check fails, some detail about the problem will be displayed to the user. The same information plus the entire Python stack trace is also logged to a file named "workbench.log" in the directory Islandora Workbench is run from. This file name is Workbench's default log file name, but in this case (validating the config file's YAML syntax), that file name is used regardless of the log file location defined in the configuration's log_file_path option. The reason the error is logged in the default location instead of the value in the configuration file (if one is present) is that the configuration file isn't valid YAML and therefore can't be parsed.

Example configuration files

These examples provide inline annotations explaining why the settings are included in the configuration file. Blank rows/lines are included for readability.

Create nodes only, no media

task: create
host: https://localhost
username: admin
password: islandora

# This setting tells Workbench to create nodes with no media.
# Also, this tells --check to skip all validation of "file" locations.
# Other media settings, like "media_use_tid", are also ignored.
nodes_only: true

Use a custom log file location

task: create
host: https://localhost
username: admin
password: islandora

# This setting tells Workbench to write its log file to the location specified
# instead of the default "workbench.log" within the directory Workbench is run from.
log_file_path: /home/mark/workbench_log.txt

Include some CSV field templates

task: create
host: https://localhost
username: admin
password: islandora

# The values in this list of field templates are applied to every row in the
# input CSV file before the CSV file is used to populate Drupal fields. The
# field templates are also applied during the "--check" in order to validate
# the values of the fields.
csv_field_templates:
 - field_member_of: 205
 - field_model: 25

Use a Google Sheet as input CSV

task: create
host: https://localhost
username: admin
password: islandora
input_csv: https://docs.google.com/spreadsheets/d/13Mw7gtBy1A3ZhYEAlBzmkswIdaZvX18xoRBxfbgxqWc/edit
# You only need to specify the google_sheets_gid option if the worksheet in the Google Sheet
# is not the default one.
google_sheets_gid: 1867618389

Create nodes and media from files (no input CSV file)

task: create_from_files
host: https://localhost
username: admin
password: islandora

# The files to create the nodes from are in this directory.
input_dir: /tmp/sample_files

# This tells Workbench to write a CSV file containing node IDs of the
# created nodes, plus the field names used in the target content type
# ("islandora_object" by default).
output_csv: /tmp/sample_files.csv

# All nodes should get the "Model" value corresponding to this URI.
model: 'https://schema.org/DigitalDocument'

Create taxonomy terms

task: create_terms
host: https://localhost
username: admin
password: islandora
input_csv: my_term_data.csv
vocab_id: myvocabulary

Ignore some columns in your input CSV file

task: create
host: https://localhost
username: admin
password: islandora
input_csv: input.csv

# This tells Workbench to ignore the 'date_generated' and 'batch_id'
# columns in the input.csv file.
ignore_csv_columns: ['date_generated', 'batch_id']

Generating sample Islandora content

task: create_from_files
host: https://localhost
username: admin
password: islandora
# This directory must match the on defined in the script's 'dest_dir' variable.
input_dir: /tmp/autogen_input
media_use_tid: 17
output_csv: /tmp/my_sample_content_csv.csv
model: http://purl.org/coar/resource_type/c_c513
# This is the script that generates the sample content.
bootstrap:
 - "/home/mark/Documents/hacking/workbench/generate_image_files.py"

Running a post-action script

ask: create
host: https://localhost
username: admin
password: islandora
node_post_create: ['/home/mark/hacking/islandora_workbench/scripts/entity_post_task_example.py']
# node_post_update: ['/home/mark/hacking/islandora_workbench/scripts/entity_post_task_example.py']
# media_post_create: ['/home/mark/hacking/islandora_workbench/scripts/entity_post_task_example.py']

Create nodes and media from files

task: create_from_files
host: https://localhost
username: admin
password: islandora
input_dir: path/to/files
models:
 - 'http://purl.org/coar/resource_type/c_1843': ['zip', 'tar', '']
 - 'https://schema.org/DigitalDocument': ['pdf', 'doc', 'docx', 'ppt', 'pptx']
 - 'http://purl.org/coar/resource_type/c_c513': ['tif', 'tiff', 'jp2', 'png', 'gif', 'jpg', 'jpeg']
 - 'http://purl.org/coar/resource_type/c_18cc': ['mp3', 'wav', 'aac']
 - 'http://purl.org/coar/resource_type/c_12ce': ['mp4']

Configuration settings

The configuration file

Configuration settings

Printing all configuration values

Use of quotation marks

Use of spaces and other syntactical features

Required configuration settings

Drupal settings

Input data location settings

Input CSV file settings

Input CSV content templating settings

Exporting CSV and files settings

Media settings

Islandora model settings

Paged and compound content settings

Logging settings

HTTP settings

Rollback configuration and CSV file settings

CSV ID to node ID map settings

Hook settings

run_scripts task settings

Miscellaneous settings

Validating the syntax of the configuration file

Example configuration files

Create nodes only, no media

Use a custom log file location

Include some CSV field templates

Use a Google Sheet as input CSV

Create nodes and media from files (no input CSV file)

Create taxonomy terms

Ignore some columns in your input CSV file

Generating sample Islandora content

Running a post-action script

Create nodes and media from files

`run_scripts` task settings