AutoClassifier Components

You can add AutoClassifier components to your pipelines to enable specific functionality when the pipeline is triggered. The following table lists all of the AutoClassifier pipeline components that are currently available.

Component	Description
Amazon Comprehend Medical	This component allows for medical text for entity extraction, sentiment analysis, and language detection based on AWS Comprehend services.
Amazon Comprehend NLP	This component allows for text for entity extraction, sentiment analysis, and language detection based on AWS Comprehend services.
Azure Document Intelligence	component allows you to leverage models created in Azure Document Intelligence Studio to extract relevant data from your documents
Chunk Processor	This component allows you to independently process the chunks that were created in a document chunker stage. Each chunk can be processed as a separate document and additional pipeline functionality can be applied to each chunk.
Content Enrichment	This component allows you to integrate SharePoint Content Enrichment Web Services into your pipeline.
Custom Entity Extraction	This component extracts entities from any text based on a list of entity names.
Custom Vision AI	This component analyzes images and extracts detected tags based on trainable services from Custom Vision API.
Document chunker	This component allows you to break down your indexed content into smaller, more manageable segments, to surface search results that are more accurate and relevant to the user search query.
Duplicates Detection	This component detects the degree of similarity between documents and can be used to group documents based on the similarity.
Email Processing	This component automatically extracts email related metadata from .msg and .eml files, extracting the following properties: To From Subject Sent date Received date
HTML Markup Cleaner	This component removes all HTML markup tags from the configured metadata properties and returns plain text.
Image Extractor	This component extracts images inside PDFs and Open XML documents (docx, xlsx, pptx).
Image Processor Amazon Rekognition	This component analyzes text from images based on AWS Rekognition services.
Image Processor MS Computer Vision	This component analyzes images and extracts detected text and concepts based on Microsoft Computer Vision API.
Item Sorter	This component saves processed item(s) metadata on disk, and sorts them by common values of specific metadata property.
Language Detector	This component detects multiple languages in documents using NTextCat library.
LexisNexis	This component calls the Lexis Search Advantage Classification Engine to extract legal metadata from content.
MeSH Tagger	This component returns related PubMed articles and applies Medical subject heading (MeSH) term tags.
Metadata Filtering	This component filters the metadata that is received and only allows the configured ones to be returned as output.
Metadata Name Sanitizer	This component removes special characters from metadata names.
Metadata Singularization	This component singularizes received metadata.
Metadata Values Capture	This component captures and exports metadata values usage across the processed items.
Microsoft Computer Vision OCR	This component analyzes images and extracts detected text and concepts based on Microsoft Computer Vision READ API.
Microsoft Text Analytics	This component analyzes text for entity extraction, sentiment analysis, and language detection based on Microsoft Cortana Intelligence Suite services.
NLQ Metadata Capturer	This component captures and processes metadata values to be used in NLQ processing.
Offline Processing	This component allows for offline processing of documents and metadata. Used for time consuming algorithms which impact crawl performance. Results are picked up on next incremental crawl.
PACER Metadata Extractor	This component extracts PACER court documents specific metadata.
Recorder	This component records content and metadata of documents during crawl. This can be used to collect content from source systems or play back documents for testing and troubleshooting.
Regex Extractor	This component provides a standardized approach for Regex Expressions.
Rules Engine	This component is used by BAInsight AutoClassifier to provide a rule-based way to automatically tag documents during crawling using content and metadata.
SciSpacy NER	This component analyzes text with ScispaCy models and extract named entities and their entity linking.
Script	This component allows you to define a script in C# or VB.NET to process document content and metadata. For example, you can add a script component to store information, extract additional metadata, use external datasources like databases, etc.
Section Headers Extractor	This component extracts section headers from documents based on regex patterns.
Section Information Extractor	This component is used to identify document sections and extract specific information from each section using NLP.
Slide Title Extractor	This component extracts the titles from PowerPoint documents (pptx).
Smart Previews	This component is needed by BA Insight Smart Previews to generate crawl time document previews. Smart Previews provides high level preview functionality for search results in SharePoint.
SmartHub Best Bets Feeder	This component feeds the SmartHub Best Bets engine with data at crawl time.
SmartHub QnA Document Feeder	This component feeds the SmartHub Q&A provider index with data at crawl time.
SmartHub QnA Feeder	This component feeds the SmartHub Q&A engine with data at crawl time
Spacy NER	This component analyzes text with spaCy models and extracts named entities, key phrases, entities and sentences.
Summary Generator	This component generates text summary based on provided or calculated important words or entities.
Tag Threshold Limits	This component applies thresholds to properties to limit the number of values returned.
Tika Extractor	This component extracts the body and metadata from raw binary files. You can use this component when the calling source system is unable to extract the body and metadata, as classification stages require this data.
Video Processor Microsoft Video Indexer	This component analyzes videos and extracts detected text and concepts based on Azure Video Indexer API.
West KM Metadata V6 Elastic	This component calls the West KM Version 6 Legal Knowledge Management Classification Elastic Engine to extract legal metadata from content.
West KM Transactional V5	This component calls the West km Legal Knowledge Management Classification Engine to extract legal metadata from content.