How to Extract Languages

The Language Detector pipeline stage detects all languages in a specific processed multi-lingual metadata (for example document body) using the NTextCat library.

How to Add the Language Detector to AutoClassifier

  1. Navigate to the AutoClassifier Pipelines component page.

  1. Click New Component and select Language Detector from the component list:


  1. Name your new Language Detector component and click Add

  1. Click Apply to save your changes.

  2. Ensure your new Language Detector component is placed in the list of existing pipeline stages.

How to Configure the Language Detector Component

  1. Open your Language Detector component.
  2. Paragraph Threshold - Minimum length of paragraph threshold.
  3. Input Property - Property that you want to process.
  4. Regex Pattern - Pattern to be used to split the content of documents in paragraphs. (only modify for custom paragraph splitting)
  5. Click Apply.

Output Property

Description

DetectedLanguages Text-multi