How to Classify Videos

High Data Usage: Save Money by Using a Trigger
How to Trigger Your Pipelines Only for Video Files
How to Extract Labels and Concepts from Videos

High Data Usage: Save Money by Using a Trigger

To help you avoid accidentally high cloud data costs, BA Insight designed a predefined "trigger" for the NLP component:

The trigger is enabled by default, in sample form.
The trigger provides sample code that includes the sources to process, (with sample values).
You must modify the trigger code according to your needs.

IMPORTANT! Not implementing this trigger can result in high data costs from your cloud data provider.

For example, when crawling 5 different content sources with a total of 1M items, a high number of documents processed for Natural Language extraction can generate high cloud data costs.

Use the following steps:

Add the desired component, normally.
When configuring the component, expand the Trigger section and adapt the predefined trigger to your needs.
Continue with component configuration.

The predefined (modify before using) script value:

Sample Script - Modify Before Using

Copy

//change the trigger script accordingly
//to the sources that you want this stage to process
//for the example below to work, ContentSource metadata must be
//available on the item to be processed; otherwise adapt your
//script to match your metadata and sources you want to process

var allowedContentSources = new List<string>() { "Source1", "Source2" };
string contentSource = item.Get<string>("ContentSource");
if (string.IsNullOrEmpty(contentSource) ||
    !allowedContentSources.Contains(contentSource))
      return false;

return true;

The Components affected by this change are all the BA Insight components that make API requests to Microsoft:

Microsoft Text Analytics
Custom Vision AI
Image Processor MS Computer Vision
Video Processor Microsoft Video Indexer

How to Trigger Your Pipelines Only for Video Files

The script below can be entered into the Trigger screen code window above. This script runs only if the file extension detected is a supported video format.

Add to the script only the formats you truly wish to process.
Adding all the supported video file extensions is unnecessary and inefficient.

This aids you in reducing data usage, thereby saving money.

For more on using Pipeline Triggers, see Add Triggers to Determine When Your Pipelines Run.

Copy

// add here all allowed extensions, but always in lowercase
var allowedExtensions = new List<string>(){"mpeg4", "mp4", "avi"};
string fileext = item.Get<string>("escbase_fileextension");
if(fileext!=null)
{
  if(allowedExtensions.Contains(fileext.ToLower()))
   return true;
}
return false;

How to Extract Labels and Concepts from Videos

Video Processor: Microsoft Video Indexer

This component uses the Microsoft Video Indexer API to analyze videos and extract detected labels and concepts.

To configure, use the following steps:

Open your pipeline.
Expand the New Component section.
Select Video Processor Microsoft Video Indexer.
Enter a component name.
Click the + Add link.
Click Apply.
Click the name of the component in the ordered list to open it for configuration.

Api Key: Copy the Microsoft Video Indexer API key and paste it in this field.
Account name: This is the name of the Microsoft Video Indexer account.
Location:This is the Microsoft Azure region associated with the Video Indexer account.
Extract transcripts: Enable this checkbox if you want to obtain transcripts from the videos.
Explicit content detection: Enable this checkbox to see if the video has explicit and inappropriate scenes.
Language detection: Enable this checkbox to detect the language spoken in the videos.
Accepted extensions: Specify the video file extensions you want the stage to process. You must separate these extensions using a semicolon (;).
Results interrogation interval in seconds: Specify the interval that the stage will use to request results from Microsoft. The annotation process is asynchronous, the video file will be uploaded to Microsoft and it will interrogate the server for results on a periodic basis.
Number of interrogation retries: Specify the maximum number of retries for the interrogation process. If the number of retries exceeds the specified number, the annotation process is aborted without returning any results.
Maximum video size (in MB): The video processor will only process videos of this size or lower
Overwrite raw data with extracted info: Enable this checkbox if you want the raw data to be replaced with the extracted labels and transcripts. This is useful when you don't want to index the video raw data but the information from the video.
Send raw response as metadata: Enable this checkbox to store the annotation response as a serialized JSON.
Additional input property: Specify an input property of type List<byte []>, which represents the additional videos to be processed by the pipeline. For example, a list of all of the videos previously extracted from a document.

Input Properties

File RawData
(Optional) The property specified in the “Additional input property” configuration option.

Output Properties

Property	Type
`MSVideoAllVideoLabels`	Text – Multi
`MSVideoTranscripts`	Text – Single
`MSVideoEntireJSON`	Minified JSON
`MSVideoExplicitContent`	Text – “True”/”False”
`MSVideoLanguage`	Text – Multi