Capacity Planning and Preview Mode: Offline or Online

Document Considerations

As you plan your capacity requirements for Smart Previews, provide requirements for the following criteria (your requirements may change):

Requirements Value required by your environment

How many documents are indexed per content source?

How frequently do the crawled documents change?

How many documents do you want to generate Previews?

For what types of documents do you want to generate Previews?

The following sections provide details about the Preview processes.

Preview Process

Previews are generated in 1 of 2 modes.

  • Online mode
  • Offline mode

Online Mode

  1. The user clicks the preview icon.
  2. A fetcher downloads the document.
  3. The preview is generated and placed into the database, and then returned to the user (1 time).
  4. Subsequent requests for the same preview retrieves the information directly from the database.

Offline Mode

  1. The documents are captured during crawling and sent for preview generation.
  2. The documents are moved to the preview server for preview generation, after which they are stored in the database.
  3. This method requires no user request.

Which Preview Process is Right for You?

Using offline Preview generation, Smart Previews generates and stores Previews at crawl time.

Preview Process Primary Benefit
Offline (crawl-time) Previews
  • Previews for large or complex documents.

  • Optimal response time.

Online (On-demand) Previews
  • Primarily for small or simple (text-only) documents.

  • Optimal use of limited system resources.

Crawling and Document Modifications

Consider the following when determining hardware requirements for the Smart Previews components: 

  • Crawling
  • Database cache
  • Document modification operations

Initial Offline Preview Cache Build

  • The initial Preview cache database is populated during a full crawl after Smart Previews is first deployed.
  • Future crawls, whether full or incremental, only update the Preview cache database with previews changed since the last crawl.
  • Smart Previews receives a copy of each crawled document.
  • This initial build of the Preview cache can require more hardware resources than normal operations.
    • BA Insight recommends this be completed before going live (initializing in a production environment)

Recommended Database Cache Size

  • Smart Preview Cache:
    • 8 GB per 100k documents
  • Longitude_Configuration database:
    • 150 Mb per 100k documents
  • Longitude_UserProfile database:
    • 10 Mb/user assuming 30 documents in workspace

How Long Does it Take to Generate a Preview?

  • Previews are typically generated at a rate of 600-1500 documents per core/hr (equivalent to about 230 MB/core/hr to 400 MB/core/hr).
  • The first time a user Previews a search result, the Preview can require more time to render because the browser needs to cache the required resources.
    • This is the same behavior that you see when you open any SharePoint site.
  • Some file types can require more time. For example:
    • Emails (excluding attachments) process near the top speed of 1500 files/core/hr.
    • Scanned PDF files are closer to the lower end of the range (600 files/core/hr).

How to Choose a Preview Process

Offline vs. Online

  • Offline (crawl time) and Online (On-Demand) Previews are each designed to address different requirements.
    • Both processes can be used simultaneously by the same hardware configuration.
    • Both processes are activated by default.
  • Typically, you specify a rule that enables documents up to a specified size to be generated online.
    • Rules can also be applied based on date or file type (regex).
    • Larger or less frequently accessed documents are generated offline.
    • You can choose to use either both processes or only 1 process.
  • Using Offline Preview generation, Smart Previews generates Previews for documents while they are being crawled.
    • This means that Previews are available and require no additional processing.