How to Use Offline Processing
Offline Processing enables any component to be processed offline.
- The purpose of Offline Processing is to maintain the speed of indexing for components that require longer processing times.
- When a document passes through the offline processor component, the data to be processed is captured and the document continues on through the rest of the pipeline.
- The captured document data will be processed by the Offline Processing service.
- When the request is complete, recrawling the document applies any available tags to the document.
- If the document or metadata has changed since it was first captured, the new data will be queued to reprocess.
Requirements
- Configure both:
- Online Pipeline
- Offline Pipeline
- Set up the Offline Processing component in an Online pipeline
- Set up 1 or more components in an Offline pipeline
Important!
All components are identical, whether they are used Online or Offline.
There is no difference in configuration between the two items.
Offline Processing Example
Use the following steps to set up and configure an existing or new pipeline with Offline Processing.
- Set up a new Offline Pipeline or Select an existing Pipeline
- Add the desired component to the Offline Pipeline
Configure the Component
Note: In this example the Component requires the Body as input.
- Add the Offline Processing component to a Online Pipeline
Configure the Component
- File Storage Location: (Optional) File Share for storing Raw file data
- Store Raw Data: (Optional) Select if a Component configured in the Offline Pipeline requires the raw data file
- Include Properties: (Optional) Specify any properties required by Components configured in the Offline Pipeline. (Comma separated List)
- Offline Pipeline: Select the Offline Pipeline which will process the data.
Info
If selecting Store Raw Data, a file storage location must be specified which is available to all AutoClassifier Engine site(s).
Crawl to Complete Configuration
-
Perform a full crawl.
-
Stats:
-
Success: Number of items the Offline Processing service has successfully processed.
-
Crawled: Number of items crawled and ready for the Offline Processing service.
-
Queued: Number of items currently being processed
-
Failed: Number of item which failed processing.
-
-
- Using the Delete option on Failed Items:
- When using "Delete all" failed items are purged from the system.
- These items are reprocessed on subsequent crawls
- Using the Reprocess option on Failed Items:
- When using "Reprocess" failed items are requested to be reprocessed.
- When using "Reprocess" failed items are requested to be reprocessed.
-
Perform a second full crawl to pick up metadata tags once Offline Processing is complete.
Info
When using Connectivity Hub to crawl, a second full crawl is unnecessary.
Any processed items notify AutoClassifier to re-crawl the item.