Extracting transactional data

Promotional data are data about customers, such as addresses, names and phone numbers. In Connect, each record in the extracted record set represents one recipient. The number of fields that contain promotional data is the same in each record. These data are stored on the root level of the extracted record.

Transactional data, on the other hand, are used in communications about transactions between a company and their customers or suppliers: invoices, statements, and purchase orders, for example. Naturally these data differ per customer. They are stored in detail tables in the extracted record. The number of detail lines in a detail table can vary from record to record.

(For more information about detail tables, multiple detail tables and nested detail tables, see Detail tables.)

Detail tables are created when an Extract step is added within a Repeat step. The Repeat step goes through a number of lines or nodes. An Extract step within that loop extracts data from each line or node.
How exactly this loop is constructed depends on the type of source data.

Tip: To break out of a loop and immediately jump to the next task following the current loop, use an Action task and set its action to Break out of repeat loop.

Note that such loops work well when the data are structured uniformly, e.g. when all rows have the same number of columns, and when detail data always start at the same point in the line. Obviously, that is not always the case.
How to extract transactional data that is structured differently is explained in another topic: Extracting data with an Action step script.
The topic: Extracting data of variable length explains a few ways to extract data from a variable number of lines into one data field.

From a CSV file or a Database

The transactional data (also called line items) appear in multiple rows.

  1. Select a field in the column that contains the first line item information.

  2. Right-click this data selection and select Add Repeat.

    This adds a Repeat step with a GoTo step inside it. The GoTo step moves the cursor down to the next line, until there are no more lines (see Goto step).

  3. (Optional.) Add an empty detail table via the Data Model pane: right-click the Data Model and select Add a table. Give the detail table a name.

  4. Select the Repeat step on the Steps pane.

  5. Start extracting data (see Adding an extraction).
    When you drag & drop data on the name of a detail table in the Data Model pane, the data are added to that detail table.
    Dropping the data somewhere else on the Data Model pane creates a new detail table, with a default name that you can change later on (see Renaming a detail table).
    The extraction step is placed inside the Repeat step, just before the GoTo step.

From an XML file

The transactional data appears in repeated elements.

  1. Right-click one of the repeating elements and select Add Repeat.

    This adds a Repeat step to the data mapping configuration.
    By default, the Repeat type of this step is set to For Each, so that each of the repeated elements is iterated over. You can see this on the Step properties pane, as long as the Repeat step is selected on the Steps pane. In the Collection field, you will find the corresponding node path.

    Tip: You may edit the XPath in the Collection field, to include or exclude elements from the loop.
    One example of this is given in a How-to: Using Xpath in a Repeat step.
    The example in the How-to uses the starts-with() function.

    For an overview of XPath functions, see Mozilla: XPath Functions.
    In addition, it is possible to use JavaScript statements in an XPath in the Collection field to dynamically select elements; see Repeat Definition.

    The Goto step isn't used in XML extraction workflows in most cases. Instead, the cursor can be moved using Xpath, a path-like syntax to identify and navigate nodes in an XML document.

  2. (Optional.) Add an empty detail table via the Data Model pane: right-click the Data Model and select Add a table. Give the detail table a name.

  3. Select the Repeat step on the Steps pane.

  4. Extract the data: inside a repeating element, select the data that you want to extract. Then right-click the selected nodes and select Add Extraction, or drag & drop them in the Data Model.
    When you drag & drop data on the name of a detail table in the Data Model pane, the data are added to that detail table.
    Dropping the data somewhere else on the Data Model pane creates a new detail table. By default, the table is named after the repeating element. You can change it later on (see Renaming a detail table).
    The new Extract step will be located in the Repeat step.

From a JSON file

The transactional data appears in repeated elements.

  1. Move the cursor to the parent element of the repeating elements. By default the cursor is located at the top of the page, but previous steps may have moved it. Note that an Extract step does not move the cursor.

    1. Select the parent element of the repeating elements.

    2. Right-click and select Add Goto. The Goto step will move the cursor to the start of the first line item.

    Instead of using a Goto step you could define the path to the collection of elements directly in the JsonPath Collection field of the Repeat step. Note that this cannot be an absolute path. An absolute path points to one element only.

  2. Right-click the opening bracket of the first of the repeating elements and select Add Repeat. This adds a Repeat step to the data mapping configuration.
    By default, the Repeat type of this step is set to For Each. With this setting the Extract step goes through the defined collection of elements, without the need for a Goto step within the loop. (With the other Repeat type settings the loop must contain a Goto step.)
    You can find the corresponding JsonPath in the JsonPath Collection field on the Step properties pane, as long as the Repeat step is selected on the Steps pane.

    Tip: You may edit the JsonPath in the JsonPath Collection field to include or exclude elements from the loop. For an overview of the JsonPath syntax, see https://github.com/json-path/jsonpath.

  3. (Optional.) Add an empty detail table via the Data Model pane: right-click the Data Model and select Add a table. Give the detail table a name.

  4. Select the Repeat step on the Steps pane.

  5. Extract the data: inside the first of the repeating elements, select the data that you want to extract. Then right-click the selected nodes and select Add Extraction, or drag & drop in the Data Model.
    When you drag & drop data on the name of a detail table in the Data Model pane, the data are added to that detail table.
    Dropping the data somewhere else on the Data Model pane creates a new detail table. By default, the table is named after the repeating node. You can change it later on (see Renaming a detail table).
    The new Extract step will be located in the Repeat step.

About JsonPath

The DataMapper moves through JSON files using JsonPath, a path-like syntax to identify and navigate elements in a JSON document. For an overview of the JsonPath syntax and functions supported, see https://github.com/json-path/jsonpath.

In the DataMapper the JsonPath can be absolute (start with $ which is the root) or relative to the current position (start with . which is the current element).
Note, however, that with a relative JsonPath going up to a parent element is not possible.

Tip: The full JsonPath to an element is displayed at the bottom left of the window when you select it. To copy the path, right-click it and select Copy.

Note: If a key in a JSON file has a name that looks like a function (e.g. "TLIST(A1)"), then the Extract step has to use a JsonPath with bracket notation instead of the default dot notation. For information about the bracket notation see https://goessner.net/articles/JsonPath/.

From a Text or a PDF file

In a PDF or Text file, transactional data appears on multiple lines and can be spread over multiple pages.

  1. Add a Goto step if necessary. Make sure that the cursor is located where the extraction loop must start. By default the cursor is located at the top of the page, but previous steps may have moved it. Note that an Extract step does not move the cursor.

    1. Select an element in the first line item.

    2. Right-click on the selection and select Add Goto. The Goto step will move the cursor to the start of the first line item.

  2. Add a Repeat step where the loop must stop.

    1. In the line under the last line item, look for a text that can be used as a condition to stop the loop, for example "Subtotals",  Total" or "Amount".

    2. Select that text, right-click on it and select Add Repeat. The Repeat step loops over all lines until the selected text is found.

  3. Include/exclude lines. Lines between the start and end of the loop that don't contain a line item must be excluded from the extraction. Or rather, all lines that contain a line item have to be included. This is done by adding a Condition step within the Repeat step.

    1. Select the start of the Repeat step on the Steps pane.

    2. Look for something in the data that distinguishes lines with a line item from other lines (or the other way around). Often, a "." or "," appears in prices or totals at the same place in every line item, but not on other lines.

    3. Select that data, right-click on it and select Add Conditional.

      Selecting data - especially something as small as a dot - can be difficult in a PDF file. To make sure that a Condition step checks for certain data: Set the Right operand to Value (in the Step properties pane). Make a selection in the Data Viewer and click the Use selected text button in the Right Operand section. You will now be able to see whether or not the proper text is extracted by the current selection. Repeat this until you are satisfied that the proper data is being extracted. Click on the Use selection button in the Left Operand section to fill out the coordinates. The point of origin of each character is at the bottom left of each of them and extends up and to the right.

    In the Data Viewer, you will see a green check mark in the left margin next to each included line and an X for other lines.

    Example with condition

  4. (Optional.) Add an empty detail table to the Data Model: right-click the Data Model and select Add a table. Give the detail table a name.

  5. Extract the data (see Adding an extraction).
    When you drag & drop data on the name of a detail table in the Data Model pane, the data are added to that detail table.
    Dropping the data somewhere else on the Data Model pane, or using the contextual menu in the Data Viewer, creates a new detail table, with a default name that you can change later on (see Renaming a detail table).

    Note: In a PDF or Text file, pieces of data often have a variable size: a product description, for example, may be short and fit on one line, or be long and cover two lines. To learn how to handle this, see Extracting data of variable length.

    1. Select the amount or amounts.

    2. Click on the end of the Repeat step () in the Steps panel.

    3. Right-click on the selected data and select Add Extraction.

    Extract the sum or totals. If the record contains sums or totals at the end of the line items list, the end of the Repeat step is a good place to add an Extract step for these data. After the loop step, the cursor position is at the end of line items.Alternatively, right-click on the end of the Repeat step in the Steps panel and select Add a Step > Add Extraction.

Tip: This how-to describes in detail how to extract an item description that appears in a variable number of lines: How to extract multiline items.