About the Website Connector
The Website connector is used to crawl web pages and documents from any given website.
- Starting with a given URL, the connector goes through the web page and recursively indexes the URLs found inside.
 - The Website Connector supports various authentication mechanisms when accessing websites.
 
Authentication
The Website connector accesses sites using the following authentication methods:
- Public access
 - Basic login
 - Trusted certificate authentication
 - oAuth Specifies a process for resource owners to authorize third-party access to their server resources without providing credentials. based authentication
 - NTLM authentication
 
Capabilities and Limitations
- The connector honors the robots.txt and site map files if found
 - The connector renders the pages crawled and executes any JavaScript found. 
- As a result, the time taken to crawl each per page can be significant, and the crawl speed varies greatly depending on the complexity of the pages to index.
 - The decision that the page load is complete is based on networkidle0 property which tells the connector to consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
 
 - As a result, the time taken to crawl each per page can be significant, and the crawl speed varies greatly depending on the complexity of the pages to index.
 - The connector will not crawl links with a hash mark, number sign, or pound sign ( # ).
 - Links are collected by calling 
querySelectorAll('a[href]')on a page. 
Web Applications
                                                
                                        When indexing web applications with the connector, make sure the account used to crawl the connector has NO WRITE permission.
                                                Since all pages are rendered in a headless browser before indexing, any link triggering actions such as add, edit, or delete may be detected by the connector and accidentally trigger. 
                                                Alternatively, any add/edit/delete action should be implemented via JavaScript click events rather than HTML A tags.