Rechercher dans le manuel MySQL
12.9.4 Full-Text Stopwords
The stopword list is loaded and searched for full-text queries
using the server character set and collation (the values of the
character_set_server
and
collation_server
system
variables). False hits or misses might occur for stopword
lookups if the stopword file or columns used for full-text
indexing or searches have a character set or collation different
from character_set_server
or
collation_server
.
Case sensitivity of stopword lookups depends on the server
collation. For example, lookups are case insensitive if the
collation is utf8mb4_0900_ai_ci
, whereas
lookups are case-sensitive if the collation is
utf8mb4_0900_as_cs
or
utf8mb4_bin
.
Stopwords for InnoDB Search Indexes
InnoDB
has a relatively short list of
default stopwords, because documents from technical, literary,
and other sources often use short words as keywords or in
significant phrases. For example, you might search for
“to be or not to be” and expect to get a sensible
result, rather than having all those words ignored.
To see the default InnoDB
stopword list,
query the
INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD
table.
- +-------+
- +-------+
- | a |
- | about |
- | an |
- | are |
- | at |
- | be |
- | by |
- | com |
- | de |
- | en |
- | for |
- | how |
- | i |
- | it |
- | la |
- | of |
- | that |
- | the |
- | this |
- | was |
- | what |
- | who |
- | will |
- | und |
- | the |
- | www |
- +-------+
To define your own stopword list for all
InnoDB
tables, define a table with the same
structure as the
INNODB_FT_DEFAULT_STOPWORD
table,
populate it with stopwords, and set the value of the
innodb_ft_server_stopword_table
option to a value in the form
before creating the full-text index. The stopword table must
have a single db_name
/table_name
VARCHAR
column
named value
. The following example
demonstrates creating and configuring a new global stopword
table for InnoDB
.
- -- Create a new stopword table
- Query OK, 0 rows affected (0.01 sec)
- -- Insert stopwords (for simplicity, a single stopword is used in this example)
- Query OK, 1 row affected (0.00 sec)
- -- Create the table
- Query OK, 0 rows affected (0.01 sec)
- -- Insert data into the table
- ('Call me Ishmael.','Herman Melville','Moby-Dick'),
- ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
- ('I am an invisible man.','Ralph Ellison','Invisible Man'),
- ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
- ('It was love at first sight.','Joseph Heller','Catch-22'),
- ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
- ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
- ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
- Query OK, 8 rows affected (0.00 sec)
- -- Set the innodb_ft_server_stopword_table option to the new stopword table
- Query OK, 0 rows affected (0.00 sec)
- -- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)
- Query OK, 0 rows affected, 1 warning (1.17 sec)
Verify that the specified stopword ('Ishmael') does not appear
by querying the words in
INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE
.
By default, words less than 3 characters in length or
greater than 84 characters in length do not appear in an
InnoDB
full-text search index. Maximum
and minimum word length values are configurable using the
innodb_ft_max_token_size
and
innodb_ft_min_token_size
variables. This default behavior does not apply to the ngram
parser plugin. ngram token size is defined by the
ngram_token_size
option.
- Query OK, 0 rows affected (0.00 sec)
- +-----------+
- | word |
- +-----------+
- | across |
- | burn |
- | buy |
- | comes |
- | dalloway |
- | flowers |
- | happened |
- | herself |
- | invisible |
- | less |
- | love |
- | man |
- +-----------+
To create stopword lists on a table-by-table basis, create
other stopword tables and use the
innodb_ft_user_stopword_table
option to specify the stopword table that you want to use
before you create the full-text index.
The stopword file is loaded and searched using
latin1
if
character_set_server
is
ucs2
, utf16
,
utf16le
, or utf32
.
To override the default stopword list for MyISAM tables, set
the ft_stopword_file
system
variable. (See Section 5.1.8, “Server System Variables”.) The
variable value should be the path name of the file containing
the stopword list, or the empty string to disable stopword
filtering. The server looks for the file in the data directory
unless an absolute path name is given to specify a different
directory. After changing the value of this variable or the
contents of the stopword file, restart the server and rebuild
your FULLTEXT
indexes.
The stopword list is free-form, separating stopwords with any
nonalphanumeric character such as newline, space, or comma.
Exceptions are the underscore character (_
)
and a single apostrophe ('
) which are
treated as part of a word. The character set of the stopword
list is the server's default character set; see
Section 10.3.2, “Server Character Set and Collation”.
The following list shows the default stopwords for
MyISAM
search indexes. In a MySQL source
distribution, you can find this list in the
storage/myisam/ft_static.c
file.
a's able about above according
accordingly across actually after afterwards
again against ain't all allow
allows almost alone along already
also although always am among
amongst an and another any
anybody anyhow anyone anything anyway
anyways anywhere apart appear appreciate
appropriate are aren't around as
aside ask asking associated at
available away awfully be became
because become becomes becoming been
before beforehand behind being believe
below beside besides best better
between beyond both brief but
by c'mon c's came can
can't cannot cant cause causes
certain certainly changes clearly co
com come comes concerning consequently
consider considering contain containing contains
corresponding could couldn't course currently
definitely described despite did didn't
different do does doesn't doing
don't done down downwards during
each edu eg eight either
else elsewhere enough entirely especially
et etc even ever every
everybody everyone everything everywhere ex
exactly example except far few
fifth first five followed following
follows for former formerly forth
four from further furthermore get
gets getting given gives go
goes going gone got gotten
greetings had hadn't happens hardly
has hasn't have haven't having
he he's hello help hence
her here here's hereafter hereby
herein hereupon hers herself hi
him himself his hither hopefully
how howbeit however i'd i'll
i'm i've ie if ignored
immediate in inasmuch inc indeed
indicate indicated indicates inner insofar
instead into inward is isn't
it it'd it'll it's its
itself just keep keeps kept
know known knows last lately
later latter latterly least less
lest let let's like liked
likely little look looking looks
ltd mainly many may maybe
me mean meanwhile merely might
more moreover most mostly much
must my myself name namely
nd near nearly necessary need
needs neither never nevertheless new
next nine no nobody non
none noone nor normally not
nothing novel now nowhere obviously
of off often oh ok
okay old on once one
ones only onto or other
others otherwise ought our ours
ourselves out outside over overall
own particular particularly per perhaps
placed please plus possible presumably
probably provides que quite qv
rather rd re really reasonably
regarding regardless regards relatively respectively
right said same saw say
saying says second secondly see
seeing seem seemed seeming seems
seen self selves sensible sent
serious seriously seven several shall
she should shouldn't since six
so some somebody somehow someone
something sometime sometimes somewhat somewhere
soon sorry specified specify specifying
still sub such sup sure
t's take taken tell tends
th than thank thanks thanx
that that's thats the their
theirs them themselves then thence
there there's thereafter thereby therefore
therein theres thereupon these they
they'd they'll they're they've think
third this thorough thoroughly those
though three through throughout thru
thus to together too took
toward towards tried tries truly
try trying twice two un
under unfortunately unless unlikely until
unto up upon us use
used useful uses using usually
value various very via viz
vs want wants was wasn't
way we we'd we'll we're
we've welcome well went were
weren't what what's whatever when
whence whenever where where's whereafter
whereas whereby wherein whereupon wherever
whether which while whither who
who's whoever whole whom whose
why will willing wish with
within without won't wonder would
wouldn't yes yet you you'd
you'll you're you've your yours
yourself yourselves zero
Document created the 26/06/2006, last modified the 26/10/2018
Source of the printed document:https://www.gaudry.be/en/mysql-rf-fulltext-stopwords.html
The infobrol is a personal site whose content is my sole responsibility. The text is available under CreativeCommons license (BY-NC-SA). More info on the terms of use and the author.
References
These references and links indicate documents consulted during the writing of this page, or which may provide additional information, but the authors of these sources can not be held responsible for the content of this page.
The author This site is solely responsible for the way in which the various concepts, and the freedoms that are taken with the reference works, are presented here. Remember that you must cross multiple source information to reduce the risk of errors.