How to Define What Data to Crawl

Push Incremental Crawl Mode

Push incremental crawl mode enables you to override the default enumeration process for incremental crawl and configure Connectivity Hub to crawl the items and folders feed via the CrawlQueue API.

  • This option is useful if you need to achieve near-real time incremental crawl and you have the means to provide the items that must be recrawled to keep the index fresh.

    • For example, for SharePoint Online you can leverage remote event receiver to identify the items to be crawled and submit those sites, lists, items to Connectivity Hub for processing.

  • When "push incremental crawl mode" is enabled for your content source, the standard incremental crawl logic from the connector will be disabled and never used. 


Push incremental crawl processes:

  • Items submitted via the CrawlQueue API.
  • Any items that registered warnings in the previous run.
  • Any folders that failed to be enumerated in the previous run.
  • Any datastore that was modified after the previous run.
Repository change requires full crawl
If you edit the Connection and change the selected repository list, then a full crawl is required.
Push incremental crawl mode cannot detect this type of change.