Skip to content

Configuration

The configuration file

Workbench uses a YAML configuration whose location is indicated in the --config argument. This file defines the various options it will use to create, update, or delete Islandora content. The simplest configuration file needs only the following four options:

task: create
host: "http://localhost:8000"
username: admin
password: islandora

In this example, the task being performed is creating nodes (and optionally media). Other tasks are create_from_files, update, delete, add_media, update_media, and delete_media. Some of the configuration settings documented below are used in all tasks, while others are only used in specific tasks.

Configuration settings

The settings defined in a configuration file are documented in the tables below, grouped into broad functional categories for easier reference. The order of the options in the configuration file doesn't matter, and settings do not need to be grouped together in any specific way in the configuration file.

Use of quotation marks

Generally speaking, you do not need to use quotation marks around values in your configuration file. You may wrap values in quotation marks if you wish, and many examples in this documentation do that (especially the host setting), but the only values that should not be wrapped in quotation marks are those that take true or false as values because in YAML, and many other computer/markup languages,"true" is a string (in this case, an English word that can mean many things) and true is a reserved symbol that can mean one thing and one thing only, the boolean opposite of false (I'm sorry for this explanation, I can't describe the distinction in any other way without writing a primer on symbolic logic).

For example, the following is a valid configuration file:

task: create
host: http://localhost:8000
username: admin
password: islandora
nodes_only: true
csv_field_templates:
 - field_linked_agent: relators:aut:person:Jordan, Mark
 - field_model: 25

But the following version is not valid, since there are quotes around "true" in the nodes_only setting:

task: create
host: http://localhost:8000
username: admin
password: islandora
nodes_only: "true"
csv_field_templates:
 - field_linked_agent: relators:aut:person:Jordan, Mark
 - field_model: 25

Use of spaces and other syntactical features

Configuration setting names should start a new line and not have any leading spaces. The exception is illustrated in the values of the csv_field_templates setting in the above examples, where the setting's value is a list of other values. In this case the members of the list start with a dash and a space (-). The trailing space in these values is significant. (However, the leading space before the dash is insignificant, and is used for appearance only.) For example, this snippet is valid:

csv_field_templates:
 - field_linked_agent: relators:aut:person:Jordan, Mark
 - field_model: 25

whereas this one is not:

csv_field_templates:
 -field_linked_agent: relators:aut:person:Jordan, Mark
 -field_model: 25

Some setting values are represented in Workbench documentation using square brackets, like this one:

export_csv_field_list: ['field_description', 'field_extent']

Strictly speaking, YAML lists can be represented as either a series of entries on their own lines that start with - or as entries enclosed in [ and ]. It's best to follow the examples provided throughout the Workbench documentation.

Required configuration settings

Setting Required Default value Description
task ✔️ One of 'create', 'create_from_files', 'update', delete', 'add_media', 'delete_media', 'update_media', 'export_csv', 'get_data_from_view', 'create_terms', or 'delete_media_by_node'. See "Choosing a task" for more information.
host ✔️ The hostname, including http:// or https:// of your Islandora repository, and port number if not the default 80.
username ✔️ The username used to authenticate the requests. This Drupal user should be a member of the "Administrator" role. If you want to create nodes that are owned by a specific Drupal user, include their numeric user ID in the uid column in your CSV.
password The user's password. You can also set the password in your ISLANDORA_WORKBENCH_PASSWORD environment variable. If you do this, omit the password option in your configuration file. If a password is not available in either your configuration file or in the environment variable, Workbench will prompt for a password.

Drupal settings

Setting Required Default value Description
content_type islandora_object The machine name of the Drupal node content type you are creating or updating. Required in "create" and "update" tasks.
drupal_filesystem fedora:// One of 'fedora://', 'public://', or 'private://' (the wrapping quotation marks are required). Only used with Drupal 8.x - 9.1; starting with Drupal 9.2, the filesystem is automatically detected from the media's configuration. Will eventually be deprecated.
allow_adding_terms false In create, update, add_media, update_media, create_terms, and update_terms tasks, determines if Workbench will add taxonomy terms if they do not exist in the target vocabulary. See more information in the "Taxonomy reference fields" section. Note: this setting is not required in create_terms tasks unless you are adding new terms to a taxonomy reference field on the term entries.
protected_vocabularies [] (empty list) Allows you to exclude vocabularies from having new terms added via allow_adding_terms. See more information in the "Taxonomy reference fields" section.
vocab_id ✔️ in create_terms tasks. Identifies the vocabulary you are adding terms to in create_tersm tasks. See more information in the "Creating taxonomy terms" section.
update_mode replace Determines if Workbench will replace, append (add to) , or delete field values during update tasks. See more information in the "Updating nodes" section. Also applies to update_media tasks.
validate_terms_exist true If set to false, during --check Workbench will not query Drupal to determine if taxonomy terms exist. The structure of term values in CSV are still validated; this option only tells Workbench to not check for each term's existence in the target Drupal. Useful to speed up the --check process if you know terms don't exist in the target Drupal.
validate_parent_node_exists true If set to false, during --check Workbench will not query Drupal to determine if nodes whose node IDs are in field_member_of exist. Useful to speed up the --check process if you know terms already exist in the target Drupal.
max_node_title_length 255 Set to the number of allowed characters for node titles if your Drupal uses Node Title Length. If unsure what your the maximum length of the node titles your site allows, check the length of the "title" column in your Drupal database's "node_field_data" table.
list_missing_drupal_fields false Set to true to tell Workbench to provide a list of fields that exist in your input CSV but that cannot be matched to Drupal field names (or reserved column names such as "file"). If false, Workbench will still check for CSV column headers that it can't match to Drupal fields, but will exit upon finding the first such field. This option produces a list of fields instead of exiting on detecting the first field.
standalone_media_url false Set to true if your Drupal instance has the "Standalone media URL" option at /admin/config/media/media-settings checked. The Drupal default is to have this unchecked, so you only need to use this Workbench option if you have changed Drupal's default. More information is available.
require_entity_reference_views true Set to false to tell Workbench to not require a View to expose the values in an entity reference field configured to use an Entity Reference View. Additional information is available here.
entity_reference_view_endpoints A list of mappings from Drupal/CSV field names to Views REST Export endpoints used to look up term names for entity reference field configured to use an Entity Reference View. Additional information is available here.
text_format_id basic_html The text format ID (machine name) to apply to all Drupal text fields that have a "formatted" field type. See "Text fields with markup" for more information.
field_text_format_ids Defines a mapping between field machine names the machine names of format IDs for "formatted" fields. See "Text fields with markup" for more information.
paragraph_fields Defines structure of paragraph fields in the input CSV. See "Entity Reference Revisions fields (paragraphs)" for more information.

Input data location settings

Setting Required Default value Description
input_dir input_data The full or relative path to the directory containing the files and metadata CSV file.
input_csv metadata.csv Path to the CSV metadata file. Can be absolute, or if just the filename is provided, will be assumed to be in the directory named in input_dir. Can also be the URL to a Google spreadsheet (see the "Using Google Sheets as input data" section for more information).
google_sheets_csv_filename google_sheet.csv Local CSV filename for data from a Google spreadsheet. See the "Using Google Sheets as input data" section for more information.
google_sheets_gid 0 The "gid" of the worksheet to use in a Google Sheet. See "Using Google Sheets as input data" section for more information.
excel_worksheet Sheet1 If using an Excel file as your input CSV file, the name of the worksheet that the CSV data will be extracted from.
input_data_zip_archives [] List of local file paths and/or URLs to .zip archives to extract into the directory defined in input_dir. See "Using a local or remote .zip archive as input data" for more info.
delete_zip_archive_after_extraction true Tells Workbench to delete an inpu zip archive after it has been extracted.

Input CSV file settings

Setting Required Default value Description
id_field id The name of the field in the CSV that uniquely identifies each record.
delimiter , [comma] The delimiter used in the CSV file, for example, "," or "\t" (must use double quotes with "\t"). If omitted, defaults to ",".
subdelimiter | [pipe] The subdelimiter used in the CSV file to define multiple values in one field. If omitted, defaults to "|". Can be a string of multiple characters, e.g. "^^^".
csv_field_templates Used in the create and update tasks only. A list of Drupal field machine names and corresponding values that are copied into the CSV input file. More detail provided in the "CSV field templates" section.
csv_value_templates Used in the create and update tasks only. A list of Drupal field machine names and corresponding templates. More detail provided in the "CSV value templates" section.
csv_value_templates_for_paged_content Used in the create only. Similar to csv_value_templates but applies to paged/child items created using the "Using subdirectories" method of creating paged content. More detail provided in the "CSV value templates" section.
csv_value_templates_rand_length 5 Length of the $random_alphanumeric_string and $random_number_string variables CSV value templates. More detail provided in the "CSV value templates" section.
allow_csv_value_templates_if_field_empty [] List of fields to populate with CSV value templates if the CSV field is empty.
ignore_csv_columns Used in the create and update tasks only. A list of CSV column headers that Workbench should ignore. For example, ignore_csv_columns: [Target Collection, Ready to publish]
csv_start_row Used in create and update tasks. Tells Workbench to ignore all rows/records in input CSV (or Google Sheet or Excel) before the designated row number. More information is available.
csv_stop_row Used in create and update tasks. Tells Workbench to ignore all rows/records in input CSV (or Google Sheet or Excel) after the designated row number. More information is available.
csv_rows_to_process Used in create and update tasks. Tells Workbench to process only the rows/records in input CSV (or Google Sheet or Excel) with the specified "id" column values. More information is available.
csv_headers names Used in "create", "update" and "create_terms" tasks. Set to "labels" to allow use of field labels (as opposed to machine names) as CSV column headers.
clean_csv_values_skip [] (empty list) Used in all tasks that use CSV input files. See "How Workbench cleans your input data" for more information.
columns_with_term_names [] (empty list) Used in all tasks that allow creation of terms on entity ingest. See "Using numbers as term names" for more information.

Output CSV settings

See "Generating CSV files" section for more information.

Setting Required Default value Description
output_csv Used in "create" tasks. The full or relative (to the "workbench" script) path to a CSV file with one record per node created by Workbench.
output_csv_include_input_csv false Used in "create" tasks in conjunction with output_csv. Include in the output CSV all the fields (and their values) from the input CSV.
export_csv_term_mode tid Used in "export_csv" tasks to indicate whether vocabulary term IDs or names are included in the output CSV file. Set to "tid" (the default) to include term IDs, or set to "name" to include term names. See "Exporting field data into a CSV file" for more information.
export_csv_field_list [] (empty list) List of fields to include in exported CSV data. If empty, all fields will be included. See "Using a Drupal View to identify content to export as CSV" for more information.
view_parameters List of URL parameter/value strings to include in requests to a View. See "Using a Drupal View to identify content to export as CSV" for more information.
export_csv_file_path ✔️ in get_data_from_view tasks. Used in the "export_csv" and "get_data_from_view" tasks. The path to the exported CSV file. Required in the "get_data_from_view" task; in the "export_csv" task, if left empty (the default), the file will be named after the value of the input_csv with ".csv_file_with_field_values" appended and saved in the directory identified in input_dir.
export_file_directory Used in the "export_csv" and "get_data_from_view" tasks. The path to the directory where files corresponding to the data in the CSV output file will be written.
export_file_media_use_term_id http://pcdm.org/use#OriginalFile Used in the "export_csv" and "get_data_from_view" tasks. The term ID or URI from the Islandora Media Use vocabulary that identifies the file you want to export.

Media settings

Setting Required Default value Description
nodes_only false Include this option in create tasks, set to true, if you want to only create nodes and not their accompanying media. See the "Creating nodes but not media" section for more information.
allow_missing_files false Determines if file values that point to missing (not found) files are allowed. Used in the create and add_media tasks. If set to true, file values that point to missing files are allowed. For create tasks, a true value will result in nodes without attached media. For add_media tasks, a true value will skip adding a media for the missing file CSV value. Defaults to false (which means all file values must name files that exist at their specified locations). Note that this setting has no effect on empty file values; these are always logged, and their corresponding media are simply not created.
exit_on_first_missing_file_during_check true Removed as a configuration setting November 1, 2022. Use strict_check instead.
strict_check Replaced with perform_soft_checks as of commit dfa60ff (July 14, 2023).
media_use_tid http://pcdm.org/use#OriginalFile The term ID for the term from the "Islandora Media Use" vocabulary you want to apply to the media being created in create and add_media tasks. You can provide a term URI instead of a term ID, for example "http://pcdm.org/use#OriginalFile". You can specify multiple values for this setting by joining them with the subdelimiter configured in the subdelimiter setting; for example, media_use_tid: 17|18. You can also set this at the object level by including media_use_tid in your CSV file; values there will override the value set in your configuration file. If you are "Adding multiple media", you define media use term IDs in a slightly different way.
media_type ✔️ in add_media and update_media tasks Overrides, for all media being created, Workbench's default definition of whether the media being created is an image, file, document, audio, or video. Used in the create, create_from_files, add_media, and update_media, tasks. More detail provided in the "Configuring Media Types" section. Required in all update_media and add_media tasks, not to override Workbench's defaults, but to explicitly indicate the media type being updated/added.
media_types_override Overrides default media type definitions on a per file extension basis. Used in the create, add_media, and create_from_files tasks. More detail provided in the "Configuring Media Types" section.
media_type_file_fields Defines the name of the media field that references media's file (i.e., the field on the Media type). Usually used with custom media types and accompanied by either the media_type or media_types_override option. For more information, see the "Configuring Media Types" section.
mimetype_extensions Overrides Workbench's default mappings between MIME types and file extensions. Usually used with remote files where the remote web server returns a MIME type that is not standard. For more information, see the "Configuring Media Types" section.
extensions_to_mimetypes Overrides Workbench's default mappings between file extension and MIME types. For more information, see the "Configuring Media Types" section.
delete_media_with_nodes true When a node is deleted using a delete task, by default, all if its media are automatically deleted. Set this option to false to not delete all of a node's media (you do not generally want to keep the media without the node).
use_node_title_for_media_title true If set to true (default), name media the same as the parent node's title value. Truncates the value of the field to 255 characters. Applies to both create and add_media tasks.
use_nid_in_media_title false If set to true, assigns a name to the media following the pattern {node_id}-Original File. Set to true to use the parent node's node ID as the media title. Applies to both create and add_media tasks.
field_for_media_title Identifies a CSV field name (machine name, not human readable) that will be used as the media title in create tasks. For example, field_for_media_title: id. Truncates the value of the field to 255 characters. Applies to both create and add_media tasks.
use_node_title_for_remote_filename false Set to true to use a version of the parent node's title as the filename for a remote (http[s]) file. Replaces all non-alphanumeric characters with an underscore (_). Truncates the value of the field to 255 characters. Applies to both create and add_media tasks. Note: this setting replaces (the previously undocumented) use_nid_in_media_filename setting.
field_for_remote_filename Identifies a CSV field name (machine name, not human readable) that will be used as the filename for a remote (http[s]) file. For example, field_for_remote_filename: id. Truncates the value of the field to 255 characters. If the field is empty in the CSV, the CSV ID field value will be used. Applies to both create and add_media tasks. Note: this setting replaces (the previously undocumented) field_for_media_filename setting.
delete_tmp_upload false For remote files, if set to true, the temporary copy of the remote file is deleted after it is used to create media. If false, the copy will remain in the location defined in your temp_dir setting. If the file cannot be deleted (e.g. a virus scanner is scanning it), it will remain and an error message will be added to the log file.
additional_files Maps a set of CSV field names to media use terms IDs to create additional media (additional to the media created from the file named in the "file" column, that is) in create and add_media tasks. See "Adding multiple media" for more information.
fixity_algorithm None Checksum/hash algorithm to use during transmission fixity checking. Must be one of "md5", "sha1", or "sha256". See "Fixity checking" for more information.
validate_fixity_during_check false Perform checksum validation during --check. See "Fixity checking" for more information.
delete_media_by_node_media_use_tids [] (empty list) During delete_media_by_node tasks, allows you to specify which media to delete. Only media with the listed terms IDs from the Islandora Media Use vocabulary will be deleted. By default (an empty list), all media are deleted. See "Deleting Media" for more information.

Islandora model settings

Setting Required Default value Description
model [singular] Used in the create_from_files task only. Defines the term ID from the the "Islandora Models" vocabulary for all nodes created using this task. Note: one of model or models is required. More detail provided in the "Creating nodes from files" section.
models [plural] Used in the create_from_files task only. Provides a mapping between file extensions and terms in the "Islandora Models" vocabulary. Note: one of model or models is required. More detail provided in the Creating nodes from files" section.

Paged and compound content settings

See the section "Creating paged content" for more information.

Setting Required Default value Description
paged_content_from_directories false Set to true if you are using the "Using subdirectories" method of creating paged content.
page_files_source_dir_field id [or whatever is defined as your id column using the id_field configuration setting] Set to directory if your input CSV contains a "directory" column that names each row's page, if are using the "Using subdirectories" method of creating paged content.
paged_content_sequence_separator - [hyphen] The character used to separate the page sequence number from the rest of the filename. Used when creating paged content with the "Using subdirectories" method. Note: this configuration option was originally misspelled "paged_content_sequence_seprator".
paged_content_page_model_tid Required if paged_content_from_directories is true. The the term ID from the Islandora Models (or its URI) taxonomy to assign to pages.
paged_content_image_file_extension If the subdirectories containing your page image files also contain other files (that are not page images), you need to use this setting to tell Workbench which files to create pages from. Common values include tif, tiff, and jpg.
paged_content_additional_page_media A mapping of Media Use term IDs (or URIs) to file extensions telling Workbench which Media Use term to apply to media created from additional files such as OCR text files.
paged_content_page_display_hints The term ID from the Islandora Display taxonomy to assign to pages. If not included, defaults to the value of the field_display_hints in the parent's record in the CSV file.
paged_content_page_content_type Set to the machine name of the Drupal node content type for pages created using the "Using subdirectories" method if it is different than the content type of the parent (which is specified in the content_type setting).
page_title_template '$parent_title, page $weight' Template used to generate the titles of pages/members in the "Using subdirectories" method.
paged_content_ignore_files ["Thumbs.db"] List of filenames that you want Workbench to ignore when it scans directories to create page- or child-level nodes. See "Using subdirectories" for more inforation.

Logging settings

See the "Logging" section for more information.

Setting Required Default value Description
log_file_path workbench.log The path to the log file, absolute or relative to the directory Workbench is run from.
log_file_mode a [append] Set to "w" to overwrite the log file, if it exists.
log_request_url false Whether or not to log the request URL (and its method). Useful for debugging.
log_json false Whether or not to log the raw request JSON POSTed, PUT, or PATCHed to Drupal. Useful for debugging.
log_headers false Whether or not to log the raw HTTP headers used in all requests. Useful for debugging.
log_response_status_code false Whether or not to log the HTTP response code. Useful for debugging.
log_response_time false Whether or not to log the response time of each request that is slower than the average response time for the last 20 HTTP requests Workbench makes to the Drupal server. Useful for debugging.
log_response_body false Whether or not to log the raw HTTP response body. Useful for debugging.
log_file_name_and_line_number false Whether or not to add the filename and line number that created the log entry. Useful for debugging.
log_term_creation true Whether or not to log the creation of new terms during "create" and "update" tasks (does not apply to the "create_terms" task). --check will still report that terms in the CSV do not exist in their respective vocabularies.

HTTP settings

Setting Required Default value Description
user_agent Islandora Workbench String to use as the User-Agent header in HTTP requests.
allow_redirects true Whether or not to allow Islandora Workbench to respond to HTTP redirects.
secure_ssl_only true Whether or not to require valid SSL certificates. Set to false if you want to ignore SSL certificates.
enable_http_cache true Whether or not to enable Workbench's client-side request cache. Set to false if you want to disable the cache during troubleshooting, etc.
http_cache_storage memory The backend storage type for the client-side cache. Set to sqlite if you are getting out of memory errors while running Islandora Workbench.
http_cache_storage_expire_after 1200 Length of the client-side cache lifespan (in seconds). Reduce this number if you are using the sqlite storage backend and the database is using too much disk space. Note that reducing the cache lifespan will result in increased load on your Drupal server.

Rollback configuration and CSV file settings

See "Rolling back" for more information.

Setting Required Default value Description
rollback_dir Value of input_dir setting Absolute path to the directory where you want your "rollback.csv" file to be written.
rollback_config_file_path rollback.yml Relative (to workbench) or absolute path to write the rollback config YAML file to.
rollback_csv_file_path [input_dir/rollback.csv] Relative (to workbench) or absolute path to write the rollback CSV file to.
timestamp_rollback false Set to true to add a timestamp to the "rollback.yml" and corresponding "rollback.csv" generated in "create" and "create_from_files" tasks.
rollback_config_filename_template Defines a template that will be used to create the rollback configuration file. The two placeholders availalble in this template are $config_filename and $input_csv_filename.
rollback_csv_filename_template Defines a template that will be used to create the rollback CSV file. The two placeholders availalble in this template are $config_filename and $input_csv_filename.
rollback_file_comments Defines a list of lines to be added to both the rollback config and CSV file as comments.
include_password_in_rollback_config_file false Set to true to include the value of the password configuration setting in your rollback config YAML file.

Miscellaneous settings

Setting Required Default value Description
perform_soft_checks false If set to true, --check will not exit when it encounters an error with parent/child order, file extension registration with Drupal media file fields, missing files named in the files CSV column, or EDTF validation. Instead, it will log any errors it encounters and exit after it has checked all rows in the CSV input file. Note: this setting replaces strict_check as of commit dfa60ff (July 14, 2023).
temp_dir Value of the temporary directory defined by your system as defined by Python's gettempdir() function. Relative or absolute path to the directory where you want Workbench's temporary files to be written. These include the ".preprocessed" version of the your input CSV, remote files temporarily downloaded to create media, and the CSV ID to node ID map database.
sqlite_db_filename [temp_dir]/workbench_temp_data.db Name of the SQLite database filename used to store session data.
pause Defines the number of seconds to pause between all 'POST', 'PUT', 'PATCH', 'DELETE' requests to Drupal. Include it in your configuration to lessen the impact of Islandora Workbench on your site during large jobs, for example pause: 1.5. More information is available in the "Reducing Workbench's impact on Drupal" documentation.
adaptive_pause Defines the number of seconds to pause between each REST request to Drupal. Works like "pause" but only takes effect when the Drupal server's response to the most recent request is slower (determined by the "adaptive_pause_threshold" value) than the average response time for the last 20 requests. More information is available in the "Reducing Workbench's impact on Drupal" documentation.
adaptive_pause_threshold 2 A weighting of the response time for the most recent request, relative to the average response times of the last 20 requests. This weighting determines how much slower the Drupal server's response to the most recent Workbench request must be in order for adaptive pausing to take effect for the next request. For example, if set to "1", adaptive pausing will happen when the response time is equal to the average of the last 20 response times; if set to "2", adaptive pausing will take effect if the last requests's response time is double the average.
progress_bar false Show a progress bar when running Workbench instead of row-by-row output.
bootstrap List of absolute paths to one or more scripts that execute prior to Workbench connecting to Drupal. More information is available in the "Hooks" documentation.
shutdown List of absolute paths to one or more scripts that execute after Workbench connecting to Drupal. More information is available in the "Hooks" documentation.
preprocessors List of absolute paths to one or more scripts that are applied to CSV values prior to the values being ingested into Drupal. More information is available in the "Hooks" documentation.
node_post_create List of absolute paths to one or more scripts that execute after a node is created. More information is available in the "Hooks" documentation.
node_post_update List of absolute paths to one or more scripts that execute after a node is updated. More information is available in the "Hooks" documentation.
media_post_create List of absolute paths to one or more scripts that execute after a media is created. More information is available in the "Hooks" documentation.
drupal_8 Deprecated.
path_to_python python Used in create tasks that also use the secondary_tasks option. Tells Workbench the path to the python interpreter. For details on when to use this option, refer to the end of the "Secondary Tasks" section of "Creating paged, compound, and collection content".
path_to_workbench_script workbench Used in create tasks that also use the secondary_tasks option. Tells Workbench the path to the python interpreter. For details on when to use this option, refer to the end of the "Secondary Tasks" section of "Creating paged, compound, and collection content".
contact_sheet_output_dir contact_sheet_output Used in create tasks to specify the name of the directory where contact sheet output is written. Can be relative (to the Workbench directory) or absolute. See "Generating a contact sheet" for more information.
contact_sheet_css_path assets/contact_sheet/contact-sheet.css Used in create tasks to specify the path of the CSS stylesheet to use in contact sheets. Can be relative (to the Workbench directory) or absolute. See "Generating a contact sheet" for more information.
node_exists_verification_view_endpoint Used in create tasks to tell Workbench to check whether a node with a matching value in your input CSV already exists. See "Checking if nodes already exist" for more information.
redirect_status_code 301 Used in create_redirect tasks to set the HTTP response code in redirects. See "Prompting the user" for more information.
remind_user_to_run_check false Prompt the user to confirm whether they have run Workbench with --check. See "Prompting the user" for more information.
user_prompts [] (empty list) List of custom prompts to present to the user. See "Prompting the user" for more information.
check_lock_file_path Defines a "lock file" that contains data generated by a successfule --check. See "Checking configuration and input data" for more information.
secondary_tasks A list of configuration file names that are executed as secondary tasks after the primary task to create compound objects. See "Using a secondary task" for more information.
csv_id_to_node_id_map_path [temp_dir]/csv_id_to_node_id_map.db Name of the SQLite database filename used to store CSV row ID to node ID mappings for nodes created during create and create_from_files tasks. If you want to store your map database outside of a temporary directory, use an absolute path to the database file for this setting. If you want to disable population of this database, set this config setting to false. See "Generating CSV files" for more information.
query_csv_id_to_node_id_map_for_parents false Queries the CSV ID to node ID map when creating compound content. Set to true if you want to use parent IDs from previous Workbench sessions. Note: this setting is automatically set to true in secondary task config files. See "Creating parent/child relationships across Workbench sessions for more information."
ignore_duplicate_parent_ids true Tells Workbench to ignore entries in the CSV ID to node ID map that have the same parent ID value. Set to false if you want Workbench to warn you that there are duplicate parent IDs in your CSV ID to node ID map. See "Creating parent/child relationships across Workbench sessions for more information."

When you run Islandora Workbench with the --check argument, it will verify that all configuration options required for the current task are present, and if they aren't tell you so.

Note

Islandora Workbench automatically converts any configuration keys to lowercase, e.g., Task is equivalent to task.

Validating the syntax of the configuration file

When you run Workbench, it confirms that your configuration file is valid YAML. This is a syntax check only, not a content check. If the file is valid YAML, Workbench then goes on to perform a long list of application-specific checks.

If this syntax check fails, some detail about the problem will be displayed to the user. The same information plus the entire Python stack trace is also logged to a file named "workbench.log" in the directory Islandora Workbench is run from. This file name is Workbench's default log file name, but in this case (validating the config file's YAML syntax), that file name is used regardless of the log file location defined in the configuration's log_file_path option. The reason the error is logged in the default location instead of the value in the configuration file (if one is present) is that the configuration file isn't valid YAML and therefore can't be parsed.

Example configuration files

These examples provide inline annotations explaining why the settings are included in the configuration file. Blank rows/lines are included for readability.

Create nodes only, no media

task: create
host: "http://localhost:8000"
username: admin
password: islandora

# This setting tells Workbench to create nodes with no media.
# Also, this tells --check to skip all validation of "file" locations.
# Other media settings, like "media_use_tid", are also ignored.
nodes_only: true

Use a custom log file location

task: create
host: "http://localhost:8000"
username: admin
password: islandora

# This setting tells Workbench to write its log file to the location specified
# instead of the default "workbench.log" within the directory Workbench is run from.
log_file_path: /home/mark/workbench_log.txt

Include some CSV field templates

task: create
host: "http://localhost:8000"
username: admin
password: islandora

# The values in this list of field templates are applied to every row in the
# input CSV file before the CSV file is used to populate Drupal fields. The
# field templates are also applied during the "--check" in order to validate
# the values of the fields.
csv_field_templates:
 - field_member_of: 205
 - field_model: 25

Use a Google Sheet as input CSV

task: create
host: "http://localhost:8000"
username: admin
password: islandora
input_csv: 'https://docs.google.com/spreadsheets/d/13Mw7gtBy1A3ZhYEAlBzmkswIdaZvX18xoRBxfbgxqWc/edit
# You only need to specify the google_sheets_gid option if the worksheet in the Google Sheet
# is not the default one.
google_sheets_gid: 1867618389

Create nodes and media from files (no input CSV file)

task: create_from_files
host: "http://localhost:8000"
username: admin
password: islandora

# The files to create the nodes from are in this directory.
input_dir: /tmp/sample_files

# This tells Workbench to write a CSV file containing node IDs of the
# created nodes, plus the field names used in the target content type
# ("islandora_object" by default).
output_csv: /tmp/sample_files.csv

# All nodes should get the "Model" value corresponding to this URI.
model: 'https://schema.org/DigitalDocument'

Create taxonomy terms

task: create_terms
host: "http://localhost:8000"
username: admin
password: islandora
input_csv: my_term_data.csv
vocab_id: myvocabulary

Ignore some columns in your input CSV file

task: create
host: "http://localhost:8000"
username: admin
password: islandora
input_csv: input.csv

# This tells Workbench to ignore the 'date_generated' and 'batch_id'
# columns in the input.csv file.
ignore_csv_columns: ['date_generated', 'batch_id']

Generating sample Islandora content

task: create_from_files
host: "http://localhost:8000"
username: admin
password: islandora
# This directory must match the on defined in the script's 'dest_dir' variable.
input_dir: /tmp/autogen_input
media_use_tid: 17
output_csv: /tmp/my_sample_content_csv.csv
model: http://purl.org/coar/resource_type/c_c513
# This is the script that generates the sample content.
bootstrap:
 - "/home/mark/Documents/hacking/workbench/generate_image_files.py"

Running a post-action script

ask: create
host: "http://localhost:8000"
username: admin
password: islandora
node_post_create: ['/home/mark/hacking/islandora_workbench/scripts/entity_post_task_example.py']
# node_post_update: ['/home/mark/hacking/islandora_workbench/scripts/entity_post_task_example.py']
# media_post_create: ['/home/mark/hacking/islandora_workbench/scripts/entity_post_task_example.py']

Create nodes and media from files

task: create_from_files
host: "http://localhost:8000"
username: admin
password: islandora
input_dir: path/to/files
models:
 - 'http://purl.org/coar/resource_type/c_1843': ['zip', 'tar', '']
 - 'https://schema.org/DigitalDocument': ['pdf', 'doc', 'docx', 'ppt', 'pptx']
 - 'http://purl.org/coar/resource_type/c_c513': ['tif', 'tiff', 'jp2', 'png', 'gif', 'jpg', 'jpeg']
 - 'http://purl.org/coar/resource_type/c_18cc': ['mp3', 'wav', 'aac']
 - 'http://purl.org/coar/resource_type/c_12ce': ['mp4']