Analyze graphic content with the AI Visual Interpreter

Prerequisites
Configure the component
Output details

The AI Visual Interpreter component allows you to use an AI model to analyze visual content, such as images, graphs, and non-structured data, from your documents. When this information is interpreted and extracted, it can be appended as metadata and used to enrich chat bots with context-aware responses. Users will then be able to query their documents for information that is interpreted from non-textual assets.

By integrating your own AI with AutoClassifier, you agree to the Upland AI Disclaimer. This disclaimer appears on all components with AI functionality in the administration portal.

Prerequisites

Note the following prerequisites before configuring the AI Visual Interpreter component:

If you are using Azure Open AI, you must create and deploy an Azure OpenAI service resource. This resource must support visual analysis.
If you are using Open AI, the model you use must be gpt-4o or later.

Configure the component

To configure you component, do the following:

In the AutoClassifier administration portal, Add a new component to a new or existing pipeline.
When adding your component, select AI Visual Interpreter from the New Component list and provide a name for your component.
In the Configuration section, specify if you want to use Azure Open AI or Open AI as your model in the Select AI Model field.
1. If you select Azure Open AI, provide the deployment URL for your Azure Open AI service resource in the Deployment URL field.
2. If you select Open AI, provide the model you are using in the Model field. By default, gpt-4o is entered.
In the API Key field, enter the API Key for your AI resource.

In the Accepted Extensions field, enter a comma delimited list of extension types that you want the component to support. By default, all supported extension types are provided in this field: .doc, .docx, .ppt, .pptx, .xls, .xlsx, .one, .jpg, .jpeg, .png, .tiff, .tif, and .pdf. Refer to the following table for supported and unsupported features for all file types:

Feature	images*	.doc	.docx	.pdf	.ppt	.pptx	.xls	.xlsx	.one
Default	a	a	a	a	a	a	a	a	a
Remove minimum size images from analysis	a	a	a	a	X	X	a	a	X
Page range to process	a**	a	a	a	a	a	X	X	a
Document intelligence	a	X	a	a	X	a	X	a	a

* Image files include .jpg, .jpeg, .png, .tiff, and .tif.

** Since images are a single entity, the page range concept is not valid.

In the Prompt field, enter a prompt that you will send to your AI model to assist in interpreting your visual information. By default, the following prompt is provided:

"Extract and analyze all visible content from the image. Capture all readable text, including titles, labels, and annotations. Convert line, bar, and pie charts into markdown tables with approximate values (add a note: 'Numbers are approximate.'). Use markdown for clarity (headers, lists, tables). Provide a detailed description and key takeaways from the image. Retain superscripts and citations where applicable. Return only the extracted content and analysis—no greetings, explanations, or extra text. {NormalResponseFormat}"

You can configure this prompt to your needs, but you must not change the {NormalResponseFormat} string.
In the Page Range to Process field, specify the pages that you want to process from your document. This field accepts 3 different formats:
- Dash separated: This uses a dash to include all numbers in a given range. For example, "1-4" will include pages 1, 2, 3, and 4.
- Comma separated: This uses a comma to specify the exact pages to be processed. For example, "1, 4" will only include pages 1 and 4.
- Combination: This uses a combination of dash and comma separated pages. For example, "1-4, 9" will include pages 1, 2, 3, 4, and 9.
  
  If you are processing an Excel file, leave this field empty.
In the Minimum Size Of Images field, specify the minimum size of images you want to be processed. By default, this field is set to 80x80 (width x height in pixels). Decimal values are not supported.
Select the Remove Minimum Size Images from Analysis checkbox to remove images below the Minimum Size Of Images value from being analyzed.
Select the Save as HTML files checkbox to generate the analysis result as an HTML file and save it to a configured location.
In the Location to Save HTML Files field, specify a location path to save the analysis result if you selected the Save as HTML files checkbox.
Select the Generate Questions from AI checkbox to allow your AI model to generate questions based on the interpretation and provide those questions as metadata to enrich indexing and querying. When this is enabled, the {NormalResponseFormat} string in your Prompt will be updated to {QuestionSuggestionsResponseFormat}.

You must not change the {QuestionSuggestionsResponseFormat} string.
Select the Analyze with Document Intelligence checkbox to apply document intelligence capabilities to your processed documents. To use this feature, you must configure a Document Intelligence component and place it before the AI Visual Interpreter component in your pipeline.
- If this is enabled, in your Document Intelligence component, the following must be configured:
  - The Document Intelligence Model must be prebuilt-layout.
  - The Page Range must be the same as your configuration in step 7.
  - The Accepted Extensions must be the same as your configuration in step 5.
Save your configuration.

Additional information

Note the following:

If caching is enabled, the generated output for a particular document is cached and used on any subsequent calls to reduce costs.
HTML dump data is not cached at all. If caching is turned on, the extracted output is received from cache, but for HTML a call to AI would be done
The HTML Dump feature should be used for troubleshooting purposes, to visualize the output returned by the AI. It is not recommended to use the HTML Dump feature in production.

Output details

On the Pipeline Testing page, you can test your component configuration. When you do so, the following output properties will be visible:

Output property	Description
VisualInterpretation	This output property displays the complete plain text that was analyzed from the file in markdown format.
PageWiseQuestions	This output property displays the questions generated for a specific page by your AI model if the Generate Questions from AI setting is configured.
TotalQuestionSuggestions	This output property displays the questions generated for all pages by your AI model if the Generate Questions from AI setting is configured.