How to Configure the Spellchecker Function
About Spellchecker
The spellchecker function enables the return of relevant results to queries that contain typos, misspellings, or when alternate spellings are available for query terms.
- By default, SmartHub provides an American English dictionary with approximately 100,000 words.
- This dictionary can be customized to fit your industry, company, and brand queries.
- The dictionary that is used by SmartHub for a specific query is driven by the language configured for the search page that is issuing the query.
- The default SmartHub dictionary is located in the <SmartHub root>\Dictionary directory:
About Using Single and Multiple Languages in SmartHub
The following information about languages and dictionaries in SmartHub is important to understand. Predefined languages are located in the <SmartHub_root>\js\cultures directory of your local SmartHub installation.
Single Language/Dictionary Setup
In a typical environment you use:
- 1 Language
- 1 Dictionary
- you can customize your dictionary by adding terms to it.
Multilingual/Dictionary Setup
In a multilingual environment, consider the following:
- To set up multiple languages, you define a set of languages: en-US, fr-FR, etc.
- You define which page matches which language.
- In this scenario you have multiple dictionaries
- One dictionary for each language you select
- If a dictionary for a language is missing, SmartHub defaults to English (en-US)
Changing Your Environment Language
- Changing your environment language is unnecessary but in special circumstances
- Consider languages other than the native language of your environment ONLY IF you set up a multilingual configuration and LanguageRedirects (via a custom settings file)
- The values you use under LanguageRedirects become the language of the page and the name of the dictionary
Language Example
If you edit any language files from the <SmartHub_root>\js\cultures directory, the language name appears and is independent of the language file name:
- fr.JS defines a language named fr-FR
- In your languageRedirects, in that case, you can use fr-FR so spellchecker looks for a dictionary named fr-FR.txt inside the SmartHub "Dictionary" directory.
Configuring SmartHub to Support Custom Dictionaries
Upgrading SmartHub overwrites all SmartHub files.
- Any changes made to the web.config file or the dictionary file must be backed-up, before SmartHub is upgraded.
- After SmartHub is upgraded, new dictionary files must be merged with old (if necessary), and restored.
To avoid this, perform the following steps:
- Navigate to your SmartHub installation directory.
- Clone the Dictionary folder and name a copy "CustomDictionary"
- Make sure that the dictionary files from the original Dictionary folder exist in the new Dictionary folder.
- Make sure that the dictionary files from the original Dictionary folder exist in the new Dictionary folder.
- Open IIS and expand the SmartHub site.
- Right-click on the SmartHub site and create a new Virtual Directory.
- Name it "Dictionary" and point it to the custom directory you created in step 2.
At this point you have your own copy of the dictionary without upgrades interfering with your customizations.
Upgrade note: Any additional words that SmartHub provides with new packages as part of the default dictionary will not appear in your dictionary copy. You need to add any new words to your copy of the dictionary.
Adding New Words to SmartHub for Spell Checking
The dictionary used for spellchecking is selected based on the language code of your SmartHub search page.
- By default, the dictionary is set to en-US.txt.
- You can create your own dictionary but the dictionary name must match the language code to be used by SmartHub.
Dictionary Format and Term Frequency Settings
The dictionary contains:
- One line for each recognized term (word).
- Each line contains the term along with a number (between 1 and 9223372036854775807).
- The number represents the frequency of the term.
- Spell checking suggestions are computed in descending order by their frequency and the top 1 is returned to the UI.
- Since there could be multiple possible suggestions for a typo, you should ensure the most frequent spelling of a term has the highest number in the dictionary.
- For example, "SharePoint" could be:
- "share point"
- "sharp point"
or - "sharepoint"
- For example, "SharePoint" could be:
Sample of Lines in en-US Dictionary
How to Set the Most Frequent Words for Spelling Correction
Use a text editor to create new lines in the dictionary and assign a frequency number to them depending on how important/relevant they are in your environment.
Frequency number: Please note that the formula used to compute spelling suggestions is more complex than just numeric comparison between the frequencies but the general rule should be that the frequency of the words should directly correlate to "how many times would I find that word in the documents available at search time" - the higher the number the more relevant that word is
Adding Dictionaries for Other Languages to SmartHub for Spell Checking
As mentioned in the previous steps, the dictionary used for spell checking is selected depending on the language code of your SmartHub search page. The out-of-the-box the dictionary used is en-US.txt, but you can create your own dictionary as long as the name matches the language code.
You can download additional "base" dictionaries which contain general words and frequencies for a given language from open source repositories, such as: https://github.com/hermitdave/FrequencyWords/tree/master/content/2018.
Rename the dictionary file to "<language-code>.txt" (where the language code matches the language configured for your search page), and store it in the Dictionary folder.