Website Connector Prerequisites
Connector Features and Requirements
| Features | Supported | Additional Information |
|---|---|---|
| Searchable content types | Yes |
All content types. Meta tags found in HTML documents can be extracted via Connectivity Hub. See Connectivity Hub document on how to configure your content sources for this. |
| Content Update | Full and Incremental |
Since websites do not have APIs to report changes, incremental crawls perform a full rescan of the website, accessing each and every page. However, only updated pages will be further processed. Updated pages are identified by the etag or last-modified HTTP headers. If either or both of these change, the page will be considered updated. |
| Permission Types | No |
All content is indexed as public. If you wish to assign security, you can do so via the ACL Script in the Content > Advanced tab |
| Required Software | .NET Framework v4.7.2 |
|
| Hardware |
|
Rending HTML web pages requires a large amount of CPU resources and memory. BA Insight recommends the following hardware:
|
Authentication Protocols
The following Authentication protocols are supported
| Authentication Protocol | Description | Prerequisite |
|---|---|---|
| Anonymous Access |
The connector will not pass any information to the web server |
None |
| HTTP Basic Authentication |
The connector will pass the username and password for authentication via the standard HTTP Headers |
The username/ password of the account to use for authentication |
| Azure AD Application |
The connector will interact with Azure Active Directory to obtain a token and pass it as the HTTP Authorization header |
The website must be secured via Azure AD The connector requires:
|
| oAuth Authentication |
The connector interacts with the identity provider to obtain refresh, access and ID tokens to use for authentication. The access and ID tokens will be provided to a bootstrapping page on the website for initialization |
The Application used by the website must be configured as follows:
The website must be modified to add an extra page to initialize the application for the purpose of crawling. When the website is crawled, this page will be called with the ID or Access tokens passed via the URL. The page is then responsible for storing the necessary token in the right location so that the crawling account is considered as successfully authenticated and the browser will not prompt for authentication Additionally, make sure you have the following information before starting the installation and configuration:
|