Stop-words

Stop-words support

Stop-Words

RediSearch has a pre-defined default list of stop-words. These are words that are usually so common that they do not add much information to search, but take up a lot of space and CPU time in the index.

When indexing, stop-words are discarded and not indexed. When searching, they are also ignored and treated as if they were not sent to the query processor. This is done when parsing the query.

At the moment, the default stop-word list applies to all full-text indexes in all languages and can be overridden manually at index creation time.

Default stop-word list

The following words are treated as stop-words by default:

 a,    is,    the,   an,   and,  are, as,  at,   be,   but,  by,   for,
 if,   in,    into,  it,   no,   not, of,  on,   or,   such, that, their,
 then, there, these, they, this, to,  was, will, with

Overriding the default stop-words

Stop-words for an index can be defined (or disabled completely) on index creation using the STOPWORDS argument in the [FT.CREATE command.

The format is STOPWORDS {number} {stopword} ... where number is the number of stopwords given. The STOPWORDS argument must come before the SCHEMA argument. For example:

FT.CREATE myIndex STOPWORDS 3 foo bar baz SCHEMA title TEXT body TEXT 

Disabling stop-words completely

Disabling stopwords completely can be done by passing STOPWORDS 0 on FT.CREATE.

Avoiding stop-word detection in search queries

In rare use cases, where queries are very long and are guaranteed by the client application to not contain stopwords, it is possible to avoid checking for them when parsing the query. This saves some CPU time and is only worth it if the query has dozens or more terms in it. Using this without verifying that the query doesn't contain stop-words might result in empty queries.

Rate this page