How to Configure Your Components
Add the Component Service Locations
- Go to the Configuration Settings > Configuration Settings page.
- Click the General link to expand it.
- The Product Version field displays the current version of you AutoClassifier engine.
- Maximum Number of Recorded Content Records: When using the Recorder component, use this option to specify the maximum number of records that are stored.
- Click Save to save your changes.
Specify Parallelism Performance Settings
Control parallelism when you run AutoClassifier by using the following steps:
- Click Performance to expand the section:
- Processing Max Degree of Parallelism Type: Use the drop-down list and select one of the following choices:
- ProcessorsCount: This options limits the maximum degree of parallelism to the number of processors that are available on the server.
- TaskScheduleDefault: This option relies on the
System.Tasks.TaskScheduler
implementation instead of maximum parallelism. - Custom: This option allows you to enter a number into the Processing Max Degree of Parallelism to specify the maximum degree of parallelism.
- Click Save at the bottom of the page to save your changes.
Specify the Query or Profiling Settings
- Click Profiling to expand it:
-
Click Default Language and select a default rules language other than the default:
English.
If you change the default language, the Lucene index must be flushed.
- Profiler Languages: This is the list of languages for the rule engine profiler.
- Exclude Metadata Fields: Add or delete any metadata fields that you want to exclude from the default entries.
- By default, a comma (
,
) delimited list of SharePoint keywords are listed to exclude. - Any metadata field is skipped by the profiler if the field appears in this list:
CurrentDefault: Keywords, TaxKeyword, d5cdd505-2e9c-101b-9397-08002b2cf9ae/TaxKeyword, TaxCatchAll, Windows XP Keywords, MetaInfo, TaxKeywordtaxhtfield
- By default, a comma (
- If you enable any of the following settings, performance is decreased. For this reason, whatever rule features you are not using should be disabled:
- Allow Case Sensitive Queries: If
True
, return results based on the query case.- For example, if
True
, this rule is matched as specified:CASE(“BOB”)
.
- For example, if
- Allow SoundsLike Queries: If
True
, you can write and return matches onSOUNDSLIKE("Paliperidone")
. - Allow White Space Bound Queries: If
True
, preserves white space when tokenizing. - Allow Regular Expression Queries: If
True
, returns results for REGEX queries such as URLs. - Allow Non Stemmed Queries Feature: You can use non-stemmed queries such as
NOSTEM("tricky")
.- When
True
, this rule returns a match on tricky, only. - If
False
, trick would also be returned as a match.
- When
- Disable Stemming: This option disables all stemming.
- When
False
, the term tricky would remain tricky instead of being stemmed to the word trick.
- When
Enable Stopwords: If
True
, it returns matches on, and stores, words such as the when entered in a query.(The list of stopwords are different for each language.)
English stopwords include:
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
Note: If you enable stop words, the stop words that match an operator, such as "
and
" and "or
," are ignored during rule analysis.For this reason, if you write a rule such as "
Big and Rich
" and the text contained the phrase "Big or Rich," the rule would match because both the operators "and
" and "or
" are not used during rule analysis.
- Allow Case Sensitive Queries: If
- Specify the following Prefilter options:
- Prefilter Rules:
- This option enables the rules engine to run faster processing for larger rule sets.
- BA Insight recommends that you leave this option set to
True.
- Prefilter Duplicate Rules:
- These values are only available if Prefilter Rules is set to
True
. - If this option is set to
False
, both of the following rules match:- Rule 1:
“New York”
- Rule 2:
“York”
- Text:
I live in New York
- Rule 1:
- Two entities should not return the same text. For this reason, set this option to
True
.
- These values are only available if Prefilter Rules is set to
- Prefilter Partition Size: Leave the default setting, or enter a new size.
- Prefilter Rules:
- Binary Data Extraction Limit: This setting is used with Tika Extractor only.
- This is the number of bytes that are extracted if the Tika Extractor is used.
- By default, this is set to
-1
, which means that there are no limits.
- Lazy Load Rules: If
True
, the initial load after an IIS reset operation is faster, but tagging is slower.- BA Insights recommends that you leave this value set to the default setting.
- Click Save at the bottom of the page to save your changes.
How to Set Up Your Lucene Index
The Lucene index is used with the Rules Engine when you test rules.
Use this section to specify the index settings:
- Click Lucene Index to expand:
- Enable Document Indexing: Leave the default setting, True, to index your tagged documents.
- Click False if:
- You want to tag documents but not include these documents in the Lucene index.
- You want to tag documents in a content source but not pass these documents into the index.
- You want to tag documents but not include these documents in the Lucene index.
- Click False if:
- Document Count: See the number of documents currently in the index.
- Clear index: Click, if for example, you make a change to the Profiling > Default/Profiler Language setting.
- Index Size: See the current size of the index.
- Max Lucene Buffer: See this buffer setting in MB, which is set by default.
- Free Space limit: Change the number of GB available for the index, which is set by default.
- Index Commit Interval: Specify the interval at which new documents are added to the index.
- Documents added to the Lucene index are not available until either:
- A commit operation is performed per scheduled interval
- You access the index, for example:
- Go to the Lucene Index page
- Access the Lucene index by an edit/test rule operation in the taxonomy manager
- Documents added to the Lucene index are not available until either:
- Index Optimization Interval: Leave the default setting or specify a new optimization point for the index. Index optimization increases performance.
- Document Count To Trigger Optimization: Leave the default setting, or enter a new number of documents that will trigger an index optimization operation.
- Index Segment Number: Leave the default setting, or specify a new number of index segments to be used for optimization.
- Click Save at the bottom of the page to save your changes.