How to Configure StopWords
About StopWords
"Stop words" are the words in any language which do not add meaning to a sentence, and therefore can safely be ignored without sacrificing the meaning of the sentence.
Typically, these are some of the most common, short function words, such as:
- the
- is
- at
- which
- and
- on
etc.
In SmartHub, the list of stop words is used when processing query text to obtain the keywords that will be highlighted in the SmartPreviews document viewer.
Because any group of words can be chosen as stop words for a given purpose, the user can configure the list of stop words as they desire.
How to Configure StopWords
StopWords lists can be found under <SmartHubFolder>\LanguageDetails\StopWords.
- For each language that has stop words, there must be a .txt file named with the language code that corresponds to the language.
- For example:
- en-US.txt for English
- fr-FR.txt for French
- de-DE.txt for German
and so on.
- For example:
By default, in the StopWords folder, the en-US.txt file contains the stop words list for the English language:
i me my myself we our ours ourselves you your yours yourself yourselves he him his himself she her hers herself it its itself they them their theirs themselves what which who whom this that these those am is are was were be been being have has had having do does did doing a an the and but if or because as until while of at by for with about against between into through during before after above below to from up down in out on off over under again further then once here there when where why how all any both each few more most other some such no nor not only own same so than too very s t can will just don should now
How to Modify StopWords Files
The default files are overwritten every time SmartHub is updated, therefore if you want to modify them, take the following steps:
- Create a copy of the file.
- Rename it to <language-code>.custom.txt
- Modify the new file.
Example
For example, to add or remove stop words for the English language:
- Copy the file en-US.txt.
- Rename the file to en-US.custom.txt
- Modify the file as you desire.
StopWords are cached for 24 hours, therefore every time you modify the StopWords files you have to perform an iisreset.