Rechercher dans le manuel MySQL
15.6.2.4 InnoDB FULLTEXT Indexes
FULLTEXT
indexes are created on text-based
columns (CHAR
,
VARCHAR
, or
TEXT
columns) to help speed up
queries and DML operations on data contained within those columns,
omitting any words that are defined as stopwords.
A FULLTEXT
index is defined as part of a
CREATE TABLE
statement or added to
an existing table using ALTER TABLE
or CREATE INDEX
.
Full-text search is performed using MATCH()
... AGAINST
syntax. For usage information, see
Section 12.9, “Full-Text Search Functions”.
InnoDB
FULLTEXT
indexes are
described under the following topics in this section:
InnoDB Full-Text Index Design
InnoDB
FULLTEXT
indexes
have an inverted index design. Inverted indexes store a list of
words, and for each word, a list of documents that the word
appears in. To support proximity search, position information
for each word is also stored, as a byte offset.
When creating an InnoDB
FULLTEXT
index, a set of index tables is
created, as shown in the following example:
- +----------+----------------------------------------------------+-------+
- +----------+----------------------------------------------------+-------+
- | 333 | test/fts_0000000000000147_00000000000001c9_index_1 | 289 |
- | 334 | test/fts_0000000000000147_00000000000001c9_index_2 | 290 |
- | 335 | test/fts_0000000000000147_00000000000001c9_index_3 | 291 |
- | 336 | test/fts_0000000000000147_00000000000001c9_index_4 | 292 |
- | 337 | test/fts_0000000000000147_00000000000001c9_index_5 | 293 |
- | 338 | test/fts_0000000000000147_00000000000001c9_index_6 | 294 |
- | 330 | test/fts_0000000000000147_being_deleted | 286 |
- | 331 | test/fts_0000000000000147_being_deleted_cache | 287 |
- | 332 | test/fts_0000000000000147_config | 288 |
- | 328 | test/fts_0000000000000147_deleted | 284 |
- | 329 | test/fts_0000000000000147_deleted_cache | 285 |
- | 327 | test/opening_lines | 283 |
- +----------+----------------------------------------------------+-------+
The first six tables represent the inverted index and are
referred to as auxiliary index tables. When incoming documents
are tokenized, the individual words (also referred to as
“tokens”) are inserted into the index tables along
with position information and the associated Document ID
(DOC_ID
). The words are fully sorted and
partitioned among the six index tables based on the character
set sort weight of the word's first character.
The inverted index is partitioned into six auxiliary index
tables to support parallel index creation. By default, two
threads tokenize, sort, and insert words and associated data
into the index tables. The number of threads is configurable
using the
innodb_ft_sort_pll_degree
option. Consider increasing the number of threads when creating
FULLTEXT
indexes on large tables.
Auxiliary index table names are prefixed with
fts_
and postfixed with
index_*
. Each index table is associated with
the indexed table by a hex value in the index table name that
matches the table_id
of the indexed table.
For example, the table_id
of the
test/opening_lines
table is
327
, for which the hex value is 0x147. As
shown in the preceding example, the “147” hex value
appears in the names of index tables that are associated with
the test/opening_lines
table.
A hex value representing the index_id
of the
FULLTEXT
index also appears in auxiliary
index table names. For example, in the auxiliary table name
test/fts_0000000000000147_00000000000001c9_index_1
,
the hex value 1c9
has a decimal value of 457.
The index defined on the opening_lines
table
(idx
) can be identified by querying the
INFORMATION_SCHEMA.INNODB_INDEXES
table for this value (457).
- +----------+------+----------+-------+
- +----------+------+----------+-------+
- | 457 | idx | 327 | 283 |
- +----------+------+----------+-------+
Index tables are stored in their own tablespace if the primary table is created in a file-per-table tablespace.
The other index tables shown in the preceding example are
referred to as common index tables and are used for deletion
handling and storing the internal state of
FULLTEXT
indexes. Unlike the inverted index
tables, which are created for each full-text index, this set of
tables is common to all full-text indexes created on a
particular table.
Common auxiliary tables are retained even if full-text indexes
are dropped. When a full-text index is dropped, the
FTS_DOC_ID
column that was created for the
index is retained, as removing the FTS_DOC_ID
column would require rebuilding the table. Common axillary
tables are required to manage the FTS_DOC_ID
column.
fts_*_deleted
andfts_*_deleted_cache
Contain the document IDs (DOC_ID) for documents that are deleted but whose data is not yet removed from the full-text index. The
fts_*_deleted_cache
is the in-memory version of thefts_*_deleted
table.fts_*_being_deleted
andfts_*_being_deleted_cache
Contain the document IDs (DOC_ID) for documents that are deleted and whose data is currently in the process of being removed from the full-text index. The
fts_*_being_deleted_cache
table is the in-memory version of thefts_*_being_deleted
table.fts_*_config
Stores information about the internal state of the
FULLTEXT
index. Most importantly, it stores theFTS_SYNCED_DOC_ID
, which identifies documents that have been parsed and flushed to disk. In case of crash recovery,FTS_SYNCED_DOC_ID
values are used to identify documents that have not been flushed to disk so that the documents can be re-parsed and added back to theFULLTEXT
index cache. To view the data in this table, query theINFORMATION_SCHEMA.INNODB_FT_CONFIG
table.
When a document is inserted, it is tokenized, and the individual
words and associated data are inserted into the
FULLTEXT
index. This process, even for small
documents, could result in numerous small insertions into the
auxiliary index tables, making concurrent access to these tables
a point of contention. To avoid this problem,
InnoDB
uses a FULLTEXT
index cache to temporarily cache index table insertions for
recently inserted rows. This in-memory cache structure holds
insertions until the cache is full and then batch flushes them
to disk (to the auxiliary index tables). You can query the
INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE
table to view tokenized data for recently inserted rows.
The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.
The innodb_ft_cache_size
variable is used to configure the full-text index cache size (on
a per-table basis), which affects how often the full-text index
cache is flushed. You can also define a global full-text index
cache size limit for all tables in a given instance using the
innodb_ft_total_cache_size
option.
The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the full-text auxiliary tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.
InnoDB
uses a unique document identifier
referred to as a Document ID (DOC_ID
) to map
words in the full-text index to document records where the word
appears. The mapping requires an FTS_DOC_ID
column on the indexed table. If an FTS_DOC_ID
column is not defined, InnoDB
automatically
adds a hidden FTS_DOC_ID
column when the
full-text index is created. The following example demonstrates
this behavior.
The following table definition does not include an
FTS_DOC_ID
column:
When you create a full-text index on the table using
CREATE FULLTEXT INDEX
syntax, a warning is
returned which reports that InnoDB
is
rebuilding the table to add the FTS_DOC_ID
column.
- Query OK, 0 rows affected, 1 warning (0.19 sec)
- +---------+------+--------------------------------------------------+
- | Level | Code | Message |
- +---------+------+--------------------------------------------------+
- +---------+------+--------------------------------------------------+
The same warning is returned when using
ALTER TABLE
to add a full-text
index to a table that does not have an
FTS_DOC_ID
column. If you create a full-text
index at CREATE TABLE
time and do
not specify an FTS_DOC_ID
column,
InnoDB
adds a hidden
FTS_DOC_ID
column, without warning.
Defining an FTS_DOC_ID
column at
CREATE TABLE
time is less
expensive than creating a full-text index on a table that is
already loaded with data. If an FTS_DOC_ID
column is defined on a table prior to loading data, the table
and its indexes do not have to be rebuilt to add the new column.
If you are not concerned with CREATE FULLTEXT
INDEX
performance, leave out the
FTS_DOC_ID
column to have
InnoDB
create it for you.
InnoDB
creates a hidden
FTS_DOC_ID
column along with a unique index
(FTS_DOC_ID_INDEX
) on the
FTS_DOC_ID
column. If you want to create your
own FTS_DOC_ID
column, the column must be
defined as BIGINT UNSIGNED NOT NULL
and named
FTS_DOC_ID
(all upper case), as in the
following example:
The FTS_DOC_ID
column does not need to be
defined as an AUTO_INCREMENT
column, but
AUTO_INCREMENT
could make loading data
easier.
If you choose to define the FTS_DOC_ID
column
yourself, you are responsible for managing the column to avoid
empty or duplicate values. FTS_DOC_ID
values
cannot be reused, which means FTS_DOC_ID
values must be ever increasing.
Optionally, you can create the required unique
FTS_DOC_ID_INDEX
(all upper case) on the
FTS_DOC_ID
column.
If you do not create the FTS_DOC_ID_INDEX
,
InnoDB
creates it automatically.
FTS_DOC_ID_INDEX
cannot be defined as a
descending index because the InnoDB
SQL
parser does not use descending indexes.
The permitted gap between the largest used
FTS_DOC_ID
value and new
FTS_DOC_ID
value is 65535.
To avoid rebuilding the table, the FTS_DOC_ID
column is retained when dropping a full-text index.
Deleting a record that has a full-text index column could result
in numerous small deletions in the auxiliary index tables,
making concurrent access to these tables a point of contention.
To avoid this problem, the Document ID
(DOC_ID
) of a deleted document is logged in a
special FTS_*_DELETED
table whenever a record
is deleted from an indexed table, and the indexed record remains
in the full-text index. Before returning query results,
information in the FTS_*_DELETED
table is
used to filter out deleted Document IDs. The benefit of this
design is that deletions are fast and inexpensive. The drawback
is that the size of the index is not immediately reduced after
deleting records. To remove full-text index entries for deleted
records, run OPTIMIZE TABLE
on the indexed
table with
innodb_optimize_fulltext_only=ON
to rebuild the full-text index. For more information, see
Optimizing InnoDB Full-Text Indexes.
InnoDB
FULLTEXT
indexes
have special transaction handling characteristics due its
caching and batch processing behavior. Specifically, updates and
insertions on a FULLTEXT
index are processed
at transaction commit time, which means that a
FULLTEXT
search can only see committed data.
The following example demonstrates this behavior. The
FULLTEXT
search only returns a result after
the inserted lines are committed.
- ('Call me Ishmael.','Herman Melville','Moby-Dick'),
- ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
- ('I am an invisible man.','Ralph Ellison','Invisible Man'),
- ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
- ('It was love at first sight.','Joseph Heller','Catch-22'),
- ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
- ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
- ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
- +----------+
- +----------+
- | 0 |
- +----------+
- +----------+
- +----------+
- | 1 |
- +----------+
You can monitor and examine the special text-processing aspects
of InnoDB
FULLTEXT
indexes
by querying the following INFORMATION_SCHEMA
tables:
You can also view basic information for
FULLTEXT
indexes and tables by querying
INNODB_INDEXES
and
INNODB_TABLES
.
For more information, see Section 15.14.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables”.
Traduction non disponible
Le manuel MySQL n'est pas encore traduit en français sur l'infobrol. Seule la version anglaise est disponible pour l'instant.
Document créé le 26/06/2006, dernière modification le 26/10/2018
Source du document imprimé : https://www.gaudry.be/mysql-rf-innodb-fulltext-index.html
L'infobrol est un site personnel dont le contenu n'engage que moi. Le texte est mis à disposition sous licence CreativeCommons(BY-NC-SA). Plus d'info sur les conditions d'utilisation et sur l'auteur.
Références
Ces références et liens indiquent des documents consultés lors de la rédaction de cette page, ou qui peuvent apporter un complément d'information, mais les auteurs de ces sources ne peuvent être tenus responsables du contenu de cette page.
L'auteur de ce site est seul responsable de la manière dont sont présentés ici les différents concepts, et des libertés qui sont prises avec les ouvrages de référence. N'oubliez pas que vous devez croiser les informations de sources multiples afin de diminuer les risques d'erreurs.