Use AutoClassifier to enrich the metadata of your indexed content

AutoClassifier is BA Insight’s content enrichment and classification engine. It analyzes documents and metadata as they pass through the indexing pipeline to generate meaningful, structured information that improves search relevance and enables AI-driven discovery. By applying linguistic analysis, entity extraction, and taxonomy mapping, AutoClassifier transforms unstructured data into context-rich, searchable content. Integrated directly into the Connectivity Hub, AutoClassifier operates as an enrichment service—enhancing content before it is indexed in the search platform.

Best Practices for using AutoClassifier

Proper setup of AutoClassifier is key to maximizing the accuracy and efficiency of metadata enrichment. The following best practices can help maintain optimal performance and ensure reliable results:

  • Integrate with Connectivity Hub early in the pipeline: Configure AutoClassifier as a mid-pipeline enrichment service within Connectivity Hub. Early enrichment ensures that downstream components such as SmartHub and Smart Previews receive enriched metadata for advanced display and filtering.

  • Start with a Controlled Taxonomy: Begin with a clear, well-defined taxonomy or classification schema. Avoid overly complex models at first—start small and expand gradually as classification accuracy improves.

  • Leverage Metadata Mapping: Map enriched fields returned from AutoClassifier to standardized index schema fields in the Indexing Service. Proper mapping ensures metadata is consistently exposed in search results and UI filters.

  • Use the Recorder component for validation and testing: Enable the Recorder component to capture sample enrichment transactions between the Connectivity Hub and AutoClassifier. This allows administrators to review classification results, analyze metadata accuracy, and troubleshoot enrichment issues before deploying changes to production.

  • Monitor enrichment throughput: Track the performance of AutoClassifier during large-scale crawls. Monitor processing times and error rates in the Connectivity Hub logs to identify bottlenecks.

  • Validate enrichment results: After configuration, sample indexed items and confirm that enriched metadata appears correctly in the target search index. Validate entity accuracy and classification confidence levels.

  • Plan for scalability: For large environments, deploy AutoClassifier in a distributed or load-balanced configuration to handle parallel enrichment requests efficiently.