How to Extract Languages
The Language Detector pipeline stage detects all languages in a specific processed multi-lingual metadata (for example document body) using the NTextCat library.
How to Add the Language Detector to AutoClassifier
- Navigate to the AutoClassifier Pipelines component page.
-
Click New Component and select Language Detector from the component list.
-
Name your new Language Detector component and click Add.
-
Click Apply to save your changes.
-
Ensure your new Language Detector component is placed in the list of existing pipeline stages.
How to Configure the Language Detector Component
To configure your Language Detector component, select it from the components list and complete the following fields in the Configuration section:
- Paragraph Threshold: Enter the minimum length of paragraph threshold.
- Input Property: Enter the property name that you want to process.
- Regex Pattern: Enter the regex pattern that will be used to split the content of documents in paragraphs. You should only modify this value for custom paragraph splitting.
- Click Apply.
Output Property |
Description |
---|---|
DetectedLanguages | Text-multi |