Merge PDFs

This tutorial demonstrates a simple technique for merging multiple PDF files using standard OL Connect Automate nodes. The flow processes an array of file paths, converts each PDF into a content set, and then combines those content sets into a single output using the paginated output node.

The method relies on the document mapping node to interpret each PDF and generate a content set, which the output engine can later merge.

Before starting, ensure you’re familiar with the essential concepts of OL Connect Automate and Node-RED, including editor features and flow design. See Documentation, Training, and Support for links to getting started topics and information about using samples and tutorials. Nodes shown in flow example images may display their entered names, set in the node Properties panel, instead of their default names.

Overview

The flow follows this sequence. Details of each step are provided below.

Input an array of PDF file paths, using the input and folder listing nodes.
Iterate through the files using the split node.
Generate a content set for each file with the document mapping node.
Prepare the message, using the change node.
Collect all generated content set IDs, using the join node.
Prepare the data for output, using the change node.
Pass the IDs to the paginated output node to merge them.

Steps

Provide an array of PDF paths
- The process begins with an inject node. This node triggers the flow manually or on a schedule.
- When activated, the flow retrieves the list of PDF files from a folder using the folder listing node. This node reads the contents of a directory and generates a message containing an array of file paths in msg.payload.
- Sample output:

Copy

msg.payload = [
  "C:\\workspace\\jobs\\1\\Invoice 1.pdf",
  "C:\\workspace\\jobs\\1\\Invoice 2.pdf",
  "C:\\workspace\\jobs\\1\\Invoice 3.pdf"
]

Iterate through the files
- The split node is used to iterate over the array of file paths. It emits one message per file, allowing the downstream nodes to process each document individually.
- Example output from the Split node:
  
  msg.payload = "C:\\workspace\\jobs\\1\\Invoice 1.pdf"

Generate content sets
- The document mapping node processes each PDF and generates a content set in the OL Connect Automate database by executing a data mapping configuration.
- For a simple PDF merge, configure the boundaries as follows:
  
  Boundaries Trigger: On all pages
- This treats the entire input file as a single record, ensuring that each PDF produces exactly one content set.
- After processing, the node adds the content set ID to the message:
  
  msg.contentSetId = 1234

Prepare the message for the Join node
- The join node collects values from msg.payload, so the content set ID must be copied there first.
- Add a change node before the join node with the rule:
  
  Set msg.payload = msg.contentSetId
- The message now becomes:
  
  msg.payload = 1234

Collect Content Sets
- The join node collects all incoming payload values into a single array.
- Configuration:
  
  Mode: Automatic
- After all files are processed, the message will contain something like:
  
  msg.payload = [1234, 2345, 3456]
- Each value represents a content set stored in the OL Connect Automate database.

Prepare the data for output
- The paginated output node expects the content set IDs to be stored in msg.contentSetId.
- Add another change node with the rule:
  
  Set msg.contentSetId = msg.payload
- Result:
  
  msg.contentSetId = [1234, 2345, 3456]

Generate the merged output
- The paginated output node processes the array of content set IDs.
- The output can be configured using an output preset to generate a PDF or for other formats like PostScript, PCL, AFP etc.

Result

Using this technique, OL Connect Automate converts each input PDF into a content set, and then merges those content sets into a single document during output generation.

Because the PDFs are processed through the standard OL Connect pipeline, the merged output can benefit from the full set of output capabilities available in the output preset. For example, the output configuration can apply:

Impositioning.
Banner or separator pages.
Other output finishing options supported by OL Connect.

This approach also allows additional processing steps to be introduced before output generation.

For example, a paginated job node could be added to the flow to perform sorting or grouping of the PDFs prior to merging. In such a scenario, the document mapping node extracts data from the input PDFs using its data mapping configuration. The extracted data can then be used by the paginated job node to control document ordering, grouping, or other job-level processing before the final output is generated.