How to Configure StopWords

 

About StopWords

"Stop words" are the words in any language which do not add meaning to a sentence, and therefore can safely be ignored without sacrificing the meaning of the sentence.

Typically, these are some of the most common, short function words, such as:

  • the
  • is
  • at
  • which
  • and
  • on
    etc.

In SmartHub, the list of stop words is used when processing query text to obtain the keywords that will be highlighted in the SmartPreviews document viewer.

Because any group of words can be chosen as stop words for a given purpose, the user can configure the list of stop words as they desire.

How to Configure StopWords

StopWords lists can be found under <SmartHubFolder>\LanguageDetails\StopWords. 

  • For each language that has stop words, there must be a .txt file named with the language code that corresponds to the language.
    • For example:
      • en-US.txt for English
      • fr-FR.txt for French
      • de-DE.txt for German
        and so on. 

By default, in the StopWords folder, the en-US.txt file contains the stop words list for the English language:

en-US.txt
i
me
my
myself
we
our
ours
ourselves
you
your
yours
yourself
yourselves
he
him
his
himself
she
her
hers
herself
it
its
itself
they
them
their
theirs
themselves
what
which
who
whom
this
that
these
those
am
is
are
was
were
be
been
being
have
has
had
having
do
does
did
doing
a
an
the
and
but
if
or
because
as
until
while
of
at
by
for
with
about
against
between
into
through
during
before
after
above
below
to
from
up
down
in
out
on
off
over
under
again
further
then
once
here
there
when
where
why
how
all
any
both
each
few
more
most
other
some
such
no
nor
not
only
own
same
so
than
too
very
s
t
can
will
just
don
should
now

How to Modify StopWords Files

The default files are overwritten every time SmartHub is updated, therefore if you want to modify them, take the following steps:

  1. Create a copy of the file.
  2. Rename it to <language-code>.custom.txt
  3. Modify the new file.

Example

For example, to add or remove stop words for the English language:

  1. Copy the file en-US.txt.
  2. Rename the file to en-US.custom.txt
  3. Modify the file as you desire.


StopWords are cached for 24 hours, therefore every time you modify the StopWords files you have to perform an iisreset