Extract step properties

The Extract step takes information from the data source and places it in the record set that is the result of the extraction workflow. For more information see Extract step and Extracting data.

Description

This subsection is collapsed by default in the interface, to give more screen space to other important parts.

Name: The name of the step. This name will be displayed on top of the step's icon in the Steps pane.

Comments: The text entered here will be displayed in the tooltip that appears when hovering over the step in the Steps pane.

Extraction Definition

  • Data Table: Defines where the data will be placed in the extracted record. The root table is record, any other table inside the record is a detail table. For more information see Extracting transactional data.

  • Append values to current record: When the Extract step is inside a loop, check this to ensure that the extraction will be done in the same detail table as any previous extractions within the same loop. This ensures that, if multiple extracts are present, only one detail table is created.

Field Definition

The following field definition settings are identical for all fields.

  • Field List: The Field List displays each of the single fields that belong to the selected step in a drop-down. Fields can be re-ordered and re-named within the Order and rename fields dialog (see Order and rename fields dialog). Select one of the fields to make further settings for that field.

    Tip: To change the name of a field quickly, right-click it in the Data Model and select Rename.

  • Add Unique ID to extraction field: Check to have a unique identifier (a random set of numerical characters) added to the end of the extracted value if it is the same as a previously extracted value. For instance, if an extracted field named "FullName" contains the value "John.Smith" and another document record, later, also contains the same value, then this second "John.Smith" will have a unique identifier (e.g. "John.Smith187").

    This option is only available for String type data fields.

  • Mode: Determines the origin of the data. Fields always belong to an Extract step, but they don't necessarily contain extracted data. See Fields for more information.

    • Type: The data type of the selected data; see Data types. Make sure that the data format that the DataMapper expects matches the actual format of the data in the data source; see Data Format.

Settings for location-based fields in a Text file

  • Left: Defines the start of the data selection to extract, counting the number of characters from the left. E.g. 1 means: start with the first character from the left.

  • Right: Defines the end of the data selection to extract, counting the number of characters from the left. E.g. 3 means: stop after the third character from the left.

  • Top offset: The vertical offset (the number of lines) from the current pointer location in the Data Viewer).

  • Height: The height of the selection box, in the number of lines. When set to 0, this instructs the DataMapper to extract all lines starting from the given position until the end of the record and store them in a single field.

  • Use selection: Click to use the value (Left, Right, Top offset and Height) of the current data selection (in the Data Viewer) for the extraction.

    Note: If the selection contains multiple lines, only the first line is extracted.

  • Post Function: Enter a JavaScript expression to be run after the extraction.

    A Post function script operates directly on the extracted data, and its results replace the extracted data. For example, the Post function script replace("-", ""); would replace the first dash character that occurs inside the extracted string.

    • Use JavaScript Editor: Click to display the boundaries dialog.

    Tip: Click and drag the icon Resize Post function icon at the bottom right of the Post function area to make it larger.

  • Trim: Select to trim empty characters at the beginning or the end of the field.

  • Concatenation string: The (HTML) string used to concatenate lines when they are joined.

  • Split: Separate the selection into individual fields based on the Concatenation stringdefined above.

Settings for location-based fields in a PDF file

These are the settings for location-based fields in a PDF file.

  • Left: Defines the start of the data selection to extract, counting the number of characters from the left. E.g. 1 means: start with the first character from the left.

  • Right: Defines the end of the data selection to extract, counting the number of characters from the left. E.g. 3 means: stop after the third character from the left.

  • Top offset: The vertical offset (the number of lines) from the current pointer location in the Data Sample (Viewer).

  • Height: The height of the selection box, in the number of lines.

  • Use selection: Click to use the value (Left, Right, Top offset and Height) of the current data selection for the extraction.

    Note: If the selection contains multiple lines, the lines are by default joined and extracted into one field. To split the lines, select the option Split lines (see below).

  • Post Function: Enter a JavaScript expression to be run after the extraction. For examplereplace("-","")would replace a single dash character inside the extracted string.

  • Trim: Select to trim empty characters at the beginning or the end of the field.

  • Type: The data type of the selected data; see Data types. If the selected data is split (see below), this setting is applied to the first extracted field. Make sure that the data format that the DataMapper expects matches the actual format of the data in the data source; see Data Format.

  • Split:

    • Split lines: Separate a multi-line selection into individual fields .

    • Join lines: Join the lines in the selection with the Concatenation string defined below.

  • Concatenation string: The (HTML) string used to concatenate lines when they are joined.

Settings for location-based fields in CSV and Database files

These are the settings for location-based fields in CSV and Database files.

  • Column: Drop-down listing all fields in the Data Sample, of which the value will be used.

  • Top offset: The vertical offset from the current pointer location in the Data Sample (Viewer).

    • Use selection: Click to use the value of the current data selection for the extraction.

      Note: If the selection contains multiple lines, only the first line is selected.

  • Post Function: Enter a JavaScript expression to be run after the extraction. For examplereplace("-","")would replace a single dash character inside the extracted string.

    • Use JavaScript Editor: Click to display the boundaries dialog.

  • Trim: Select to trim empty characters at the beginning or the end of the field.

Settings for location-based fields in an XML file

These are the settings for location-based fields in an XML file.

  • XPath: The path to the XML field that is extracted.

    • Use selection: Click to use the value of the current data selection for the extraction.

      Note: If the selection contains multiple lines, only the first line is selected.

  • Post Function: Enter a JavaScript expression to be run after the extraction. For examplereplace("-","")would replace a single dash character inside the extracted string.

    • Use JavaScript Editor: Click to display the boundaries dialog.

  • Trim: Select to trim empty characters at the beginning or the end of the field.

Settings for location-based fields in a JSON file

These are the settings for location-based fields in a JSON file.

  • JsonPath: The path to the JSON element that is extracted. The JsonPath can be relative or absolute. An absolute path starts with $ (the root), a relative path starts with . (the current element). For an overview of the JsonPath syntax, see https://github.com/json-path/jsonpath.

    Note: A JsonPath expression can define more than one item (for example: .* returns anything in the current element). If more than one item is returned, the Extract step will keep an array of all returned items.

    The full JsonPath to an element is displayed at the bottom left of the window when you select it. To copy the path, right-click it and select Copy.

    • Use selection: Click to use the value of the current data selection for the extraction.

    Note: If a key in a JSON file has a name that looks like a function (e.g. "TLIST(A1)"), then the Extract step has to use a JsonPath with bracket notation instead of the default dot notation. For information about the bracket notation see https://goessner.net/articles/JsonPath/.

  • Post Function: Enter a JavaScript expression to be run after the extraction. For examplereplace("-","")would replace a single dash character inside the extracted string.

    • Use JavaScript Editor: Click to display the boundaries dialog.

  • Trim: Select to trim empty characters at the beginning or the end of the field.

Data Format

Format settings can be defined in three places: in the user preferences (DataMapper preferences), the current data mapping configuration (Data format settings) and per field via the Step properties.
Any format settings specified per field are always used, regardless of the user preferences or data source settings.

Note: Data format settings tell the DataMapper how to read and parse data from the data source. They don't determine how these data are formatted in the Data Model or in a template. In the Data Model, data are converted to the native data type. Dates, for example, are converted to a DateTime object. How they are displayed in the Data Model depends on the preferences (see Default Format).

  • Negative Sign Before: Any value in a numeric field that has a "-" sign is interpreted as a negative value.
  • Decimal Separator: Set the decimal separator for a numerical value.
  • Thousand Separator: Set the thousand separator for a numerical value.
  • Currency Sign: Set the currency sign for a currency value.
  • Treat empty as 0: A numerical empty value is treated as a 0 value.
  • Date/Time Format: Set the date format for a date value.
    • Automatic: Select this option to parse dates automatically, without specifying a format. This is the default setting for new Date fields.
    • ISO8601: This setting allows for dates with different timestamp formats, or belonging to different time zones, to be parsed inside a single job. Dates that do not include a specific time are automatically considered to use the current locale's time zone.
      Select the ISO template to be used when parsing the timestamp. Other ISO8601 formats can be handled via the Custom option.
    • Custom: Set a custom date format. For the markers available in the DataMapper see Date.
  • Date Language: Set the language for a date value (ex: If English is selected, the term May will be identified as the month of May).
  • Use offset from UTC: Select the default time zone, which is to be used to extract any timestamp that does not already include time zone information with the time.