3 How to Set Up Your Content Source for Indexing
- Azure Cognitive Search: How to Extract Plain Text from Documents
- Define the Data to be Extracted and Pushed Into the Target
- How to Extract HTML Text and Metadata from Documents
Use the instructions below to define a new content source to index.
-
When you define a new content source, you specify the repository that your Connector crawls to extract data.
-
You also make the content source available in the Elastic index.
Below, you use the Content Info page to do 2 things:
- Link the content in a repository to the BA Insight connector that you define
- Provide basic indexing information
To setup your web service content source for indexing manually, use the following method:
- Open ConnectivityHub.
-
Click Content Sources from the top horizontal menu.
- Click New > Advanced Web Service content.
-
A new screen appears with the Content Info tab open. See the following graphic (contains sample values).
All of the fields in this screen are required.
*The Target Index is a READ-ONLY field that is automatically populated (from the "Title" field) after you save the Content Source.
Appropriate naming conventions are used. - Complete the following fields:
Field | Description |
---|---|
Target |
Where the content is pushed. |
Connection and Title: |
See the connection to the content named (Title) source from which your Connector pulls content. Use the drop-down to change the connection. |
Crawl start date: |
Specify the starting date for the crawled data. Use the US format |
Max paging size: |
Leave the default, or use the drop-down, to specify the number of items that can be queued at any time. BA Insight recommends |
Content Localization: | To specify, go to the Microsoft list and scroll down to the Language table. |
Max file size |
Leave the default setting, 50 (MB), or enter the maximum file size to be processed in MB. Any files that are larger than the specified size are not indexed. |
Property prefix |
Make sure that (This is the specified property name prefix for each metadata name in your content source system.) |