About Preprocessing Settings

Preprocessing settings are for document corrections, transformations, or enhancements that occur before the Optical Character Recognition (OCR) process begins.

See the following Preprocessing Settings below.

Note: Not all Preprocessing and Postprocessing Settings have examples.

Image

  • Blank Page Removal – removes none, all, first, or last blank page from a document. This setting only detects blank pages with no text on it. For example, a blank page cannot have a header, footer, or use the “This page intentionally left blank” convention.

See Blank Page Removal example below.

Before

 

After

 

  • Split Facing Pages – splits facing pages in a book, magazine, and other document types that are bound into two separate pages.

See Split Facing Pages example below.

Before

 

After

 
  • Detect Page Orientation – detects page orientation and rotates the page to the correct orientation in a document.

See Detect Page Orientation example below.

Before

 

After

 
  • Deskew Images – corrects a document’s text and images that are tilted at a slight angle up to a maximum of 20%.

See Deskew Images example below.

Before

 

After

 
  • Correct Trapezoid Distortions – flattens distortions that may occur with scanned documents like books. The bindings may lift the document away from the scanner glass at the separation between the two pages.

  • Invert Images - inverts the colors of document pages. For example, if the pages have a white background then the colors are inverted, and the pages appear like a film negative. For limited use only.

See Invert Images example below.

Before

 

After

 
  • Remove Color Marks - removes all color objects from the document. It removes red, green, blue, and yellow objects from the entire document. This includes background as well as stamps and signatures. For limited use only.

See Remove Color Marks example below.

Before

 

After

 
  • Binarize - turns all pages black and white.

See Binarize example below.

Before

 

After

 

Note: There is a similar setting available in the Postprocessing settings.

  • Allow Enhanced Resolution Preparation If Needed - enhances the resolution of a document to 300 dpi if it is set below 150 dpi.

  • Clean White Noise – cleans white noise which is white dots embedded within the characters of scanned text images printed from legacy printers. This makes the text appear ragged and is known as white noise. This setting attempts to fill the dots. You can specify a dot size, if necessary. Some of the white dots may be over or under corrected based on the dot size. Use Auto for best results.

See Clean White Noise example below.

  • Clean Black Noise – cleans black noise which is black specks embedded within the characters of scanned text images. This is known as black noise. This setting attempts to remove the black specks. You can specify a dot size, if necessary.  Some of the black dots may be over or under corrected based on the dot size.  Use Auto for best results.  

See Black Noise example below.

Languages

  • Language – select a language(s) used in the document for character recognition.  You can select more than one language, but this may increase the likelihood of recognition issues. Select only one Asian language for use in Compose Profiles. The Language setting does not support the recognition of multiple Asian languages.

Tip: Create separate Compose Profiles for different languages used in your company's documents to minimize recognition issues.

Settings

  • OCR Profile – select a priority for OCR recognition or text extraction type
    • Balanced (both speed and accuracy)
    • Speed
    • Accuracy
    • Aggressive Text Extraction - detects  blurry text, text that appears close to images or graphs, and output produced by plotter devices.
  • Detect Font Formatting – detects and preserves font formatting to text like bold, underline, italic, font size, font family and more in the file output if the output supports it. Use the default for best results.
  • Detect Table Of Contents - detects a table of contents in a document. The table of contents will appear as a set of links to the topic headings in the final document if the table of contents was set up correctly in the original document.
  • Detect Footnotes - detects footnotes in a document.
  • Use Optimized Processing for Large Files - detects large PDFs over 100 pages. Large PDF files may take longer to process and result in a compose failure with memory errors if this setting is not selected.  Bookmarks in the original PDF will  not appear in the composed document. This is a limitation of the Use Optimized Processing for Large Files setting.

Note: Clear the Detect Footnotes and Use Optimized Processing for Large Files default settings if you experience any issues.

See also

About Compose Profiles

About Postprocessing Settings

Creating a Compose Profile

Applying a Compose Profile

Compose Profiles in Action