How to Schedule Crawls and Related Tasks

Use the following information to schedule specific types of crawls. Crawl types are defined in the first topic below. For more information about scheduling crawls, see How to Use Cron Expressions.

Crawling your content sources

You can crawl your content sources in order to capture all of the content within them. The crawl type can be either full or incremental.

In Connectivity Hub 4.0, BA Insight introduced a new distributed crawler, allowing you to execute multiple parallel crawl jobs from a single Quartz instance. By distributing crawl processes across a network of interconnected servers, your Connectivity Hub crawl jobs will have improved load balancing and scalability for your server farm. Connectivity Hub will now apply the target configuration settings that are related to crawls, such as the number of sync threads, to the entire server farm. For more information, see how to configure your target.

Full Crawl

Full crawls catalog or "crawl" the content in the content sources you have defined.

Full crawls have the following characteristics:

  • Displayed in the user interface as Target Full Update. See the graphic below.
  • Typically run just after Connectivity Hub or Connector Framework are installed and configured.
  • Capture all content in your content sources, regardless of its state (last updated, changed, etc.).
  • Can be scheduled.
  • Are selectively run as they can be very time and resource intensive.

Incremental Crawl

An Incremental crawl catalogs or "crawls" only the latest changes in your content sources since the last crawl.

Incremental crawls have the following characteristics:

  • Displayed in the user interface as Target Incremental Update. See the graphic above.
  • Cannot be run as the initial crawl; this crawl must be run AFTER the initial full crawl is performed
  • Captures only content that has changed since the last crawl
  • Can be scheduled
  • Are run routinely
  • Are not time or resource intensive

Cleaning Stage

  • The Cleaning Stage is not a crawl state like Full or Incremental.

  • The Cleaning Stage can appear in the user interface during either Full or Incremental crawls.

Cleaning stage in Full Crawl

All items that are not created or updated by a Full crawl are removed from the search index.

  • For example if you run a full crawl, and then specify a content filter to crawl fewer items, and again start a full crawl, the cleaning stage removes the items filtered out by the content filter.

Cleaning stage in Incremental Crawl

If your Connector is folder based, then this stage detects deleted folders, and removes all items from the search index from these folders.

How to Schedule Crawls

Note: As of Connectivity Hub 4.0, a single Quartz scheduler will handle all of your crawl jobs for your entire Connectivity Hub farm. To ensure a successful upgrade of your crawl jobs from Connectivity Hub 3.0 to Connectivity Hub 4.0, refer to the Before Upgrading instructions.

  1. To schedule a crawl against a defined Target, select Tasks from the links at the top of the Connectivity Hub page.
  2. The Timer Jobs page opens.
  3. Select a target from the Scope drop-down menu:
  4. In the Job field, select the type of crawl (job), to be created.
    1. Target Incremental Update:
      1. This is an incremental crawl that captures only the content from your content sources that has changed since the last crawl was executed.
    2. Target Full Update:
      1. This is a full crawl that captures all of the content from your content sources that are specified for capture, regardless of when it was last captured.
    3. Content Reset:
  5. Schedule: Select the crawl frequency
    1. One-time: Crawl runs once upon clicking the Create button.
    2. Scheduled: Set the frequency of the crawl by clicking the Schedule builder button or click the Advanced button to schedule the job in cron format.
  6. Log Level:
    1. Select the level of log for the job.
    2. Log level creates logs containing classes of job issues, such as Warnings or Cautions.
      1. Available log levels include:
        1. Error
        2. Warn
        3. Info
        4. Debug
        5. Trace
        6. All
  7. Alert recipients
    1. Optional
    2. Enter a list of the users (using their email addresses) to be notified in case of a job error.

Specify a custom timeout value for your crawls

If you want to specify a custom timeout value for your crawls, you can do so through a custom registry entry:

  1. Open the Windows Registry Editor.
  2. In the left panel, navigate to the \HKEY_LOCAL_MACHINE\SOFTWARE\Upland BA Insight\Connectivity Hub folder.
  3. In the Connectivity Hub folder, right-click the middle panel and select New > String Value.
  4. Name this entry "ConnectorServiceTimeout".
  5. Right-click on your entry and select Modify.
  6. In the Value data field, enter your desired timeout value in milliseconds.

You can also configure a timeout value for your jobs, see Run Required Jobs for more information.