How to Use Natural Language Processing (NLP)
- Microsoft Text Analytics Natural Language Processing (NLP)
- Amazon Comprehend (NLP)
- Amazon Comprehend Medical (NLP)
- How to Perform On-Premise NLP with spaCy
High Data Usage: Save Money by Using a Trigger
To help you avoid accidentally high cloud data costs, BA Insight designed a predefined "trigger" for the NLP component:
- This is applicable only for AutoClassifier version 5.0.
- The trigger is enabled by default, in sample form.
- The trigger provides sample code that includes the sources to process, (with sample values).
- You must modify the trigger code according to your needs.
Not implementing this trigger can result in high data costs from your cloud data provider.
For example, when crawling 5 different content sources with a total of 1M items, a high number of documents processed for Natural Language extraction can generate high cloud data costs.
Use the following steps:
- Add the desired component, normally.
- When configuring the component, expand the Trigger section and adapt the predefined trigger to your needs.
- Continue with component configuration.
The predefined (modify before using) script value:
Sample Script - Modify Before Using//change the trigger script accordingly
//to the sources that you want this stage to process
//for the example below to work, ContentSource metadata must be
//available on the item to be processed; otherwise adapt your
//script to match your metadata and sources you want to process
var allowedContentSources = new List<string>() { "Source1", "Source2" };
string contentSource = item.Get<string>("ContentSource");
if (string.IsNullOrEmpty(contentSource) ||
!allowedContentSources.Contains(contentSource))
return false;
return true;
The Components affected by this change are all the BA Insight components that make API requests to Microsoft:
- Microsoft Text Analytics
- Custom Vision AI
- Image Processor MS Computer Vision
- Video Processor Microsoft Video Indexer
How to Trigger Your Pipelines Only for Video Files
The script below can be entered into the Trigger screen code window above.
This script runs only if the file extension detected is a supported video format.
- Add to the script only the formats you truly wish to process.
- Adding all the supported video file extensions is unnecessary and inefficient.
This aids you in reducing data usage, thereby saving money.
For more on using Pipeline Triggers, see Add Triggers to Determine When Your Pipelines Run.
// add here all allowed extensions, but always in lowercase
var allowedExtensions = new List<string>(){"mpeg4", "mp4", "avi"};
string fileext = item.Get<string>("escbase_fileextension");
if(fileext!=null)
{
if(allowedExtensions.Contains(fileext.ToLower()))
return true;
}
return false;