Website Connector Prerequisites

Connector Features and Requirements

Features	Supported	Additional Information
Searchable content types	Yes	All content types. Meta tags found in HTML documents can be extracted via Connectivity Hub. See Connectivity Hub document on how to configure your content sources for this.
Content Update	Full and Incremental	Since websites do not have APIs to report changes, incremental crawls perform a full rescan of the website, accessing each and every page. However, only updated pages will be further processed. Updated pages are identified by the etag or last-modified HTTP headers. If either or both of these change, the page will be considered updated.
Permission Types	No	All content is indexed as public. If you wish to assign security, you can do so via the ACL Script in the Content > Advanced tab
Required Software	.NET Framework v4.7.2
Hardware		Rending HTML web pages requires a large amount of CPU resources and memory. BA Insight recommends the following hardware: Server with at minimum of 5 GB RAM and 8 CPU cores available for the connector to process sites correctly.

Authentication Protocols

The following Authentication protocols are supported

Authentication Protocol	Description	Prerequisite
Anonymous Access	The connector will not pass any information to the web server	None
HTTP Basic Authentication	The connector will pass the username and password for authentication via the standard HTTP Headers	The username/ password of the account to use for authentication
Azure AD Application	The connector will interact with Azure Active Directory to obtain a token and pass it as the HTTP Authorization header	The website must be secured via Azure AD The connector requires: ID of the Azure tenancy where the website is deployed Client ID of the application in Azure AD A certificate with which to obtain an Azure AD token to be uploaded in the certificate store on the computer where the connector is installed.
oAuth Authentication	The connector interacts with the identity provider to obtain refresh, access and ID tokens to use for authentication. The access and ID tokens will be provided to a bootstrapping page on the website for initialization	The Application used by the website must be configured as follows: Allow PKCE authentication code flow Provide refresh, access and ID tokens Add the Connector oAuth Redirect Url to the list of authorized Redirect URLs. Typically: http://localhost:2406/oauthresult.aspx Please note that the redirect url is case-sensitive and must correspond to the exact same way the connector will be accessed. The /oauthresult.aspx part of the url will always be in lower case. The website must be modified to add an extra page to initialize the application for the purpose of crawling. When the website is crawled, this page will be called with the ID or Access tokens passed via the URL. The page is then responsible for storing the necessary token in the right location so that the crawling account is considered as successfully authenticated and the browser will not prompt for authentication Additionally, make sure you have the following information before starting the installation and configuration: Client IDof the application used by the website Authentication endpoint of the identity server