Microsoft Text Analytics Natural Language Processing (NLP)

This component analyzes text and extracts detected languages, entities, key phrases and sentiments. This component uses the Microsoft Text Analytics API.

How to Add the Microsoft Text Analytics Component

Navigate to the AutoClassifier Pipelines component page.
Click New Component and select Microsoft Text Analytics from the component list:
Name your new Microsoft Text Analytics component and click Add
Click Apply to save your changes.
Ensure your new Microsoft Text Analytics component is placed in the list of existing pipeline stages.

How to Configure the Microsoft Text Analytics Component

To configure your Microsoft Text Analytics component, select it from the components list and complete the following fields in the Configuration section:

Azure Endpoint: The Base of the Azure Endpoint of your Text Analytics Cognitive Service. For example: https://centralus.api.cognitive.microsoft.com. For information on locating your Microsoft Azure endpoint address, see the Microsoft documentation
Api Key: Setup your Microsoft Azure Account, obtain an API key, and enter your key into this field.
Input Property: Property configured for entity extraction. Default value: 'body'
Extract Languages: Enable this checkbox to extract and output the languages that are detected in the current document.
Language score threshold: Specify a value between 0 and 1 that represents the minimum confidence score accepted for a detected language.
Default Language: Specify the predefined language that is used to analyze the text in case no other detected language is found.
Max Characters for Language Detection:Specify number characters (first X characters from each document) that will be used for document language detection. By using this threshold, you execute less requests to Microsoft Azure Cloud service , you reduce costs and improve performance.
Use detected language: Enable this checkbox to use the detected language with the highest confidence to analyze the text for entities, key phrases, or sentiments.
Extract entities: Enable this checkbox to detect entities in the input text.
Extract Linked Entities: Enable this checkbox to detect linked entities in the input text.
Entities “No. of Matches” threshold: Specify the minimum number of occurrences for a given entity to be included in the output results.
Extract key phrases: Enable this checkbox to detect key phrases in the input text.
Extract sentiments: Enable this checkbox to analyze the input text to determine whether it contains negative or a positive content. The sentiment value has a value between 0 and 1, 0.5 is neutral sentiment. Enabling sentiment extraction will also enable sentence extraction.
Sentiment score threshold: Specify the minimum number to determine if an overall document has a positive or negative sentiment. A score above the threshold flags the MicrosoftPositiveSentimentDetected property as true. A score below the threshold flags the property as false.
Maximum No. of requested labels: Specify the maximum number of distinct entities to return per item property.
Sub-documents batch size: Processing of one big document is done in batches of documents under 5K characters. Specify the number of such sub-documents to be processed at once. Recommended maximum 5.
Generate Document Summary: Enable this checkbox to generate a summary for the indexed documents.
Summary Type: Specify if you want your document summary to be extractive or abstractive.
Summary Sentence Count: Specify a number of sentences that you want to include in your generate summary.
Send raw response as metadata: Click to attach the JSON file response from Microsoft Text Analytics to the list of output properties.

Input Properties

File RawData
(Optional) The property specified in the Additional input property configuration option.

Output Properties

Property	Type
MicrosoftExtractedLanguages	Text – Multi
MicrosoftExtractedEntities	Text – Multi
MicrosoftExtractedEntitiesLinks	Text – Multi
MicrosoftExtractedPhrases	Text – Multi
MicrosoftExtractedSentiments	Boolean
MicrosoftPositiveSentimentDetected	Boolean
MicrosoftExtractedSentimentScore	Double
MicrosoftExtractedSentences*	Text - Multi
MicrosoftRawResponse	Text
MSSerializedEntitiesJson	Text

Note: *MicrosoftExtractedSentences metadata property will only be returned if Extract Sentiments is set to True