Setting up detail tables with a Text or PDF file

Detail tables are created when an Extract step is added within a Repeat step. The Repeat step goes through a number of lines or nodes. An Extract step within that loop extracts data from each line or node.
How exactly this loop is constructed depends on the type of source data.

This topic explains how to set up detail tables when extracting data from a Text or PDF file.

Tip: To break out of a loop and immediately jump to the next task following the current loop, use an Action task and set its action to Break out of repeat loop.

How to extract transactional data that is not structured uniformly is explained in another topic: Extracting data with an Action step script.
The topic: Extracting data of variable length explains a few ways to extract data from a variable number of lines into one data field.

For more information about detail tables, multiple detail tables and nested detail tables, see Detail tables.

Creating the loop

In a PDF or Text file, transactional data appear on multiple lines and can be spread over multiple pages.

  1. Add a Goto step if necessary. Make sure that the cursor is located where the extraction loop must start. By default the cursor is located at the top of the page, but previous steps may have moved it. Note that an Extract step does not move the cursor.

    1. Select an element in the first line item.

    2. Right-click on the selection and select Add Goto. The Goto step will move the cursor to the start of the first line item.

  2. Add a Repeat step where the loop must stop.

    1. In the line under the last line item, look for a text that can be used as a condition to stop the loop, for example "Subtotals",  Total" or "Amount".

    2. Select that text, right-click on it and select Add Repeat. The Repeat step loops over all lines until the selected text is found.

  3. Include/exclude lines. Lines between the start and end of the loop that don't contain a line item must be excluded from the extraction. Or rather, all lines that contain a line item have to be included. This is done by adding a Condition step within the Repeat step.

    1. Select the start of the Repeat step on the Steps pane.

    2. Look for something in the data that distinguishes lines with a line item from other lines (or the other way around). Often, a "." or "," appears in prices or totals at the same place in every line item, but not on other lines.

    3. Select that data, right-click on it and select Add Conditional.

      Selecting data - especially something as small as a dot - can be difficult in a PDF file. To make sure that a Condition step checks for certain data: Set the Right operand to Value (in the Step properties pane). Make a selection in the Data Viewer and click the Use selected text button in the Right Operand section. You will now be able to see whether or not the proper text is extracted by the current selection. Repeat this until you are satisfied that the proper data is being extracted. Click on the Use selection button in the Left Operand section to fill out the coordinates. The point of origin of each character is at the bottom left of each of them and extends up and to the right.

    In the Data Viewer, you will see a green check mark in the left margin next to each included line and an X for other lines.

    Example with condition

  4. (Optional.) Add an empty detail table to the Data Model: right-click the Data Model and select Add a table. Give the detail table a name.

  5. Extract the data (see Adding an extraction).
    When you drag & drop data on the name of a detail table in the Data Model pane, the data are added to that detail table.
    Dropping the data somewhere else on the Data Model pane, or using the contextual menu in the Data Viewer, creates a new detail table, with a default name that you can change later on (see Renaming a detail table).

    Note: In a PDF or Text file, pieces of data often have a variable size: a product description, for example, may be short and fit on one line, or be long and cover two lines. To learn how to handle this, see Extracting data of variable length.

  6. Select the amount or amounts.

  7. Click on the end of the Repeat step () in the Steps panel.

  8. Right-click on the selected data and select Add Extraction.

  9. Extract the sum or totals. If the record contains sums or totals at the end of the line items list, the end of the Repeat step is a good place to add an Extract step for these data. After the loop step, the cursor position is at the end of line items.Alternatively, right-click on the end of the Repeat step in the Steps panel and select Add a Step > Add Extraction.

Tip: This how-to describes in detail how to extract an item description that appears in a variable number of lines: How to extract multiline items.