Microsoft Text Analytics Natural Language Processing (NLP)
This component analyzes text and extracts detected languages, entities, key phrases and sentiments. This component uses the Microsoft Text Analytics API.
How to Add the Microsoft Text Analytics Component
- Navigate to the AutoClassifier Pipelines component page.
- Click New Component and select Microsoft Text Analytics from the component list:
- Name your new Microsoft Text Analytics component and click Add
- Click Apply to save your changes.
-
Ensure your new Microsoft Text Analytics component is placed in the list of existing pipeline stages.
How to Configure the Microsoft Text Analytics Component
To configure your Microsoft Text Analytics component, select it from the components list and complete the following fields in the Configuration section:
- Azure Endpoint: The Base of the Azure Endpoint of your Text Analytics Cognitive Service. For example: https://centralus.api.cognitive.microsoft.com. For information on locating your Microsoft Azure endpoint address, see the Microsoft documentation
- Api Key: Setup your Microsoft Azure Account, obtain an API key, and enter your key into this field.
- Input Property: Property configured for entity extraction. Default value: 'body'
- Extract Languages: Enable this checkbox to extract and output the languages that are detected in the current document.
- Language score threshold: Specify a value between 0 and 1 that represents the minimum confidence score accepted for a detected language.
- Default Language: Specify the predefined language that is used to analyze the text in case no other detected language is found.
- Max Characters for Language Detection:Specify number characters (first X characters from each document) that will be used for document language detection. By using this threshold, you execute less requests to Microsoft Azure Cloud service , you reduce costs and improve performance.
- Use detected language: Enable this checkbox to use the detected language with the highest confidence to analyze the text for entities, key phrases, or sentiments.
- Extract entities: Enable this checkbox to detect entities in the input text.
- Extract Linked Entities: Enable this checkbox to detect linked entities in the input text.
- Entities “No. of Matches” threshold: Specify the minimum number of occurrences for a given entity to be included in the output results.
- Extract key phrases: Enable this checkbox to detect key phrases in the input text.
- Extract sentiments: Enable this checkbox to analyze the input text to determine whether it contains negative or a positive content. The sentiment value has a value between 0 and 1, 0.5 is neutral sentiment. Enabling sentiment extraction will also enable sentence extraction.
- Sentiment score threshold: Specify the minimum number to determine if an overall document has a positive or negative sentiment. A score above the threshold flags the
MicrosoftPositiveSentimentDetected
property as true. A score below the threshold flags the property as false. - Maximum No. of requested labels: Specify the maximum number of distinct entities to return per item property.
- Sub-documents batch size: Processing of one big document is done in batches of documents under 5K characters. Specify the number of such sub-documents to be processed at once. Recommended maximum 5.
- Generate Document Summary: Enable this checkbox to generate a summary for the indexed documents.
- Summary Type: Specify if you want your document summary to be extractive or abstractive.
- Summary Sentence Count: Specify a number of sentences that you want to include in your generate summary.
- Send raw response as metadata: Click to attach the JSON file response from Microsoft Text Analytics to the list of output properties.
Input Properties
File RawData
- (Optional) The property specified in the Additional input property configuration option.
Output Properties
Property |
Type |
---|---|
MicrosoftExtractedLanguages |
Text – Multi |
MicrosoftExtractedEntities |
Text – Multi |
MicrosoftExtractedEntitiesLinks |
Text – Multi |
MicrosoftExtractedPhrases | Text – Multi |
MicrosoftExtractedSentiments | Boolean |
MicrosoftPositiveSentimentDetected | Boolean |
MicrosoftExtractedSentimentScore | Double |
MicrosoftExtractedSentences* | Text - Multi |
MicrosoftRawResponse | Text |
MSSerializedEntitiesJson | Text |
Note: *MicrosoftExtractedSentences metadata property will only be returned if Extract Sentiments is set to True