How to Extract Images From Documents

Use the BA Insight Image Extractor to extract images inside PDFs and Open XML documents such as .docx, .xlsx, and .pptx files.

How to Configure the Image Extractor

  • Feature page > Existing Components:

    • Click the named link for the Image Extractor feature to see the Configuration section.

      See that there are no Configuration settings for the Image Extractor.

There is no configuration for the image extraction.

Input PropertyType
FileRawData

Output PropertyType
ExtractedImagesBinaryDataByte Array – Multi

How to Test Your Image Extractor Pipeline

BA Insight recommends you test the Image Extractor before using it in real-time.

Use the following steps to test your Image Extractor:

  1. Click Feature Testing.

    See the Testing UI.

  2. Test Target: Select one of the following:
    • Test the whole configuration:
      • The document processing functions in the same way as for real documents.
    • Test a specific feature:
      • The document is only processed by the feature that you select in the drop-down list.
    • Test a specific component:
      • The document is only processed by the component that you select in the drop-down list.
      • You can also click Skip Trigger in order to test using the feature but without any of the feature's triggers.
  3. Select either of these choices: 
    • Recorded Data:
      • Choose your prerecorded data, if you have this data.
    • Or Paste RAW Text Data:
      • The input data for testing can be defined here.
      • Copy-Paste the XML with the same format used to create the Recorder.
  4. Log Level
    1. Use the drop-down if you do not want to leave the default selection Error, and select Warning, Info, or Debug.
  5. Click Start Test to see your testing results.

    See the Input and Output Properties.

Note: If there are no errors, the log does not return any results.