Rechercher dans le manuel MySQL

15.6.2.4 InnoDB FULLTEXT Indexes

FULLTEXT indexes are created on text-based columns (CHAR, VARCHAR, or TEXT columns) to help speed up queries and DML operations on data contained within those columns, omitting any words that are defined as stopwords.

A FULLTEXT index is defined as part of a CREATE TABLE statement or added to an existing table using ALTER TABLE or CREATE INDEX.

Full-text search is performed using MATCH() ... AGAINST syntax. For usage information, see Section 12.9, “Full-Text Search Functions”.

InnoDB FULLTEXT indexes are described under the following topics in this section:

InnoDB Full-Text Index Design

InnoDB FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.

Table des matières Haut

InnoDB Full-Text Index Tables

When creating an InnoDB FULLTEXT index, a set of index tables is created, as shown in the following example:

  1. mysql> CREATE TABLE opening_lines (
  2.        opening_line TEXT(500),
  3.        author VARCHAR(200),
  4.        title VARCHAR(200),
  5.        FULLTEXT idx (opening_line)
  6.        ) ENGINE=InnoDB;
  7.  
  8. mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_TABLES
  9.        WHERE name LIKE 'test/%';
  10. +----------+----------------------------------------------------+-------+
  11. | table_id | name                                               | space |
  12. +----------+----------------------------------------------------+-------+
  13. |      333 | test/fts_0000000000000147_00000000000001c9_index_1 |   289 |
  14. |      334 | test/fts_0000000000000147_00000000000001c9_index_2 |   290 |
  15. |      335 | test/fts_0000000000000147_00000000000001c9_index_3 |   291 |
  16. |      336 | test/fts_0000000000000147_00000000000001c9_index_4 |   292 |
  17. |      337 | test/fts_0000000000000147_00000000000001c9_index_5 |   293 |
  18. |      338 | test/fts_0000000000000147_00000000000001c9_index_6 |   294 |
  19. |      330 | test/fts_0000000000000147_being_deleted            |   286 |
  20. |      331 | test/fts_0000000000000147_being_deleted_cache      |   287 |
  21. |      332 | test/fts_0000000000000147_config                   |   288 |
  22. |      328 | test/fts_0000000000000147_deleted                  |   284 |
  23. |      329 | test/fts_0000000000000147_deleted_cache            |   285 |
  24. |      327 | test/opening_lines                                 |   283 |
  25. +----------+----------------------------------------------------+-------+

The first six tables represent the inverted index and are referred to as auxiliary index tables. When incoming documents are tokenized, the individual words (also referred to as tokens) are inserted into the index tables along with position information and the associated Document ID (DOC_ID). The words are fully sorted and partitioned among the six index tables based on the character set sort weight of the word's first character.

The inverted index is partitioned into six auxiliary index tables to support parallel index creation. By default, two threads tokenize, sort, and insert words and associated data into the index tables. The number of threads is configurable using the innodb_ft_sort_pll_degree option. Consider increasing the number of threads when creating FULLTEXT indexes on large tables.

Auxiliary index table names are prefixed with fts_ and postfixed with index_*. Each index table is associated with the indexed table by a hex value in the index table name that matches the table_id of the indexed table. For example, the table_id of the test/opening_lines table is 327, for which the hex value is 0x147. As shown in the preceding example, the 147 hex value appears in the names of index tables that are associated with the test/opening_lines table.

A hex value representing the index_id of the FULLTEXT index also appears in auxiliary index table names. For example, in the auxiliary table name test/fts_0000000000000147_00000000000001c9_index_1, the hex value 1c9 has a decimal value of 457. The index defined on the opening_lines table (idx) can be identified by querying the INFORMATION_SCHEMA.INNODB_INDEXES table for this value (457).

  1. mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_INDEXES
  2.        WHERE index_id=457;
  3. +----------+------+----------+-------+
  4. | index_id | name | table_id | space |
  5. +----------+------+----------+-------+
  6. |      457 | idx  |      327 |   283 |
  7. +----------+------+----------+-------+

Index tables are stored in their own tablespace if the primary table is created in a file-per-table tablespace.

The other index tables shown in the preceding example are referred to as common index tables and are used for deletion handling and storing the internal state of FULLTEXT indexes. Unlike the inverted index tables, which are created for each full-text index, this set of tables is common to all full-text indexes created on a particular table.

Common auxiliary tables are retained even if full-text indexes are dropped. When a full-text index is dropped, the FTS_DOC_ID column that was created for the index is retained, as removing the FTS_DOC_ID column would require rebuilding the table. Common axillary tables are required to manage the FTS_DOC_ID column.

  • fts_*_deleted and fts_*_deleted_cache

    Contain the document IDs (DOC_ID) for documents that are deleted but whose data is not yet removed from the full-text index. The fts_*_deleted_cache is the in-memory version of the fts_*_deleted table.

  • fts_*_being_deleted and fts_*_being_deleted_cache

    Contain the document IDs (DOC_ID) for documents that are deleted and whose data is currently in the process of being removed from the full-text index. The fts_*_being_deleted_cache table is the in-memory version of the fts_*_being_deleted table.

  • fts_*_config

    Stores information about the internal state of the FULLTEXT index. Most importantly, it stores the FTS_SYNCED_DOC_ID, which identifies documents that have been parsed and flushed to disk. In case of crash recovery, FTS_SYNCED_DOC_ID values are used to identify documents that have not been flushed to disk so that the documents can be re-parsed and added back to the FULLTEXT index cache. To view the data in this table, query the INFORMATION_SCHEMA.INNODB_FT_CONFIG table.

Table des matières Haut

InnoDB Full-Text Index Cache

When a document is inserted, it is tokenized, and the individual words and associated data are inserted into the FULLTEXT index. This process, even for small documents, could result in numerous small insertions into the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, InnoDB uses a FULLTEXT index cache to temporarily cache index table insertions for recently inserted rows. This in-memory cache structure holds insertions until the cache is full and then batch flushes them to disk (to the auxiliary index tables). You can query the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table to view tokenized data for recently inserted rows.

The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.

The innodb_ft_cache_size variable is used to configure the full-text index cache size (on a per-table basis), which affects how often the full-text index cache is flushed. You can also define a global full-text index cache size limit for all tables in a given instance using the innodb_ft_total_cache_size option.

The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the full-text auxiliary tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.

Table des matières Haut

InnoDB Full-Text Index Document ID and FTS_DOC_ID Column

InnoDB uses a unique document identifier referred to as a Document ID (DOC_ID) to map words in the full-text index to document records where the word appears. The mapping requires an FTS_DOC_ID column on the indexed table. If an FTS_DOC_ID column is not defined, InnoDB automatically adds a hidden FTS_DOC_ID column when the full-text index is created. The following example demonstrates this behavior.

The following table definition does not include an FTS_DOC_ID column:

  1. mysql> CREATE TABLE opening_lines (
  2.        opening_line TEXT(500),
  3.        author VARCHAR(200),
  4.        title VARCHAR(200)
  5.        ) ENGINE=InnoDB;

When you create a full-text index on the table using CREATE FULLTEXT INDEX syntax, a warning is returned which reports that InnoDB is rebuilding the table to add the FTS_DOC_ID column.

  1. mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
  2. Query OK, 0 rows affected, 1 warning (0.19 sec)
  3. Records: 0  Duplicates: 0  Warnings: 1
  4.  
  5. mysql> SHOW WARNINGS;
  6. +---------+------+--------------------------------------------------+
  7. | Level   | Code | Message                                          |
  8. +---------+------+--------------------------------------------------+
  9. | Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |
  10. +---------+------+--------------------------------------------------+

The same warning is returned when using ALTER TABLE to add a full-text index to a table that does not have an FTS_DOC_ID column. If you create a full-text index at CREATE TABLE time and do not specify an FTS_DOC_ID column, InnoDB adds a hidden FTS_DOC_ID column, without warning.

Defining an FTS_DOC_ID column at CREATE TABLE time is less expensive than creating a full-text index on a table that is already loaded with data. If an FTS_DOC_ID column is defined on a table prior to loading data, the table and its indexes do not have to be rebuilt to add the new column. If you are not concerned with CREATE FULLTEXT INDEX performance, leave out the FTS_DOC_ID column to have InnoDB create it for you. InnoDB creates a hidden FTS_DOC_ID column along with a unique index (FTS_DOC_ID_INDEX) on the FTS_DOC_ID column. If you want to create your own FTS_DOC_ID column, the column must be defined as BIGINT UNSIGNED NOT NULL and named FTS_DOC_ID (all upper case), as in the following example:

Note

The FTS_DOC_ID column does not need to be defined as an AUTO_INCREMENT column, but AUTO_INCREMENT could make loading data easier.

  1. mysql> CREATE TABLE opening_lines (
  2.        opening_line TEXT(500),
  3.        author VARCHAR(200),
  4.        title VARCHAR(200)
  5.        ) ENGINE=InnoDB;

If you choose to define the FTS_DOC_ID column yourself, you are responsible for managing the column to avoid empty or duplicate values. FTS_DOC_ID values cannot be reused, which means FTS_DOC_ID values must be ever increasing.

Optionally, you can create the required unique FTS_DOC_ID_INDEX (all upper case) on the FTS_DOC_ID column.

  1. mysql> CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);

If you do not create the FTS_DOC_ID_INDEX, InnoDB creates it automatically.

Note

FTS_DOC_ID_INDEX cannot be defined as a descending index because the InnoDB SQL parser does not use descending indexes.

The permitted gap between the largest used FTS_DOC_ID value and new FTS_DOC_ID value is 65535.

To avoid rebuilding the table, the FTS_DOC_ID column is retained when dropping a full-text index.

Table des matières Haut

InnoDB Full-Text Index Deletion Handling

Deleting a record that has a full-text index column could result in numerous small deletions in the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, the Document ID (DOC_ID) of a deleted document is logged in a special FTS_*_DELETED table whenever a record is deleted from an indexed table, and the indexed record remains in the full-text index. Before returning query results, information in the FTS_*_DELETED table is used to filter out deleted Document IDs. The benefit of this design is that deletions are fast and inexpensive. The drawback is that the size of the index is not immediately reduced after deleting records. To remove full-text index entries for deleted records, run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to rebuild the full-text index. For more information, see Optimizing InnoDB Full-Text Indexes.

Table des matières Haut

InnoDB Full-Text Index Transaction Handling

InnoDB FULLTEXT indexes have special transaction handling characteristics due its caching and batch processing behavior. Specifically, updates and insertions on a FULLTEXT index are processed at transaction commit time, which means that a FULLTEXT search can only see committed data. The following example demonstrates this behavior. The FULLTEXT search only returns a result after the inserted lines are committed.

  1. mysql> CREATE TABLE opening_lines (
  2.        opening_line TEXT(500),
  3.        author VARCHAR(200),
  4.        title VARCHAR(200),
  5.        FULLTEXT idx (opening_line)
  6.        ) ENGINE=InnoDB;
  7.  
  8. mysql> BEGIN;
  9.  
  10. mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
  11.        ('Call me Ishmael.','Herman Melville','Moby-Dick'),
  12.        ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
  13.        ('I am an invisible man.','Ralph Ellison','Invisible Man'),
  14.        ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
  15.        ('It was love at first sight.','Joseph Heller','Catch-22'),
  16.        ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
  17.        ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
  18.        ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
  19.  
  20. mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
  21. +----------+
  22. | COUNT(*) |
  23. +----------+
  24. |        0 |
  25. +----------+
  26.  
  27. mysql> COMMIT;
  28.  
  29. mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
  30. +----------+
  31. | COUNT(*) |
  32. +----------+
  33. |        1 |
  34. +----------+

Table des matières Haut

Monitoring InnoDB Full-Text Indexes

You can monitor and examine the special text-processing aspects of InnoDB FULLTEXT indexes by querying the following INFORMATION_SCHEMA tables:

You can also view basic information for FULLTEXT indexes and tables by querying INNODB_INDEXES and INNODB_TABLES.

For more information, see Section 15.14.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables”.


Rechercher dans le manuel MySQL

Traduction non disponible

Le manuel MySQL n'est pas encore traduit en français sur l'infobrol. Seule la version anglaise est disponible pour l'instant.

Document créé le 26/06/2006, dernière modification le 26/10/2018
Source du document imprimé : https://www.gaudry.be/mysql-rf-innodb-fulltext-index.html

L'infobrol est un site personnel dont le contenu n'engage que moi. Le texte est mis à disposition sous licence CreativeCommons(BY-NC-SA). Plus d'info sur les conditions d'utilisation et sur l'auteur.

Références

  1. Consulter le document html Langue du document :en Manuel MySQL : https://dev.mysql.com/

Ces références et liens indiquent des documents consultés lors de la rédaction de cette page, ou qui peuvent apporter un complément d'information, mais les auteurs de ces sources ne peuvent être tenus responsables du contenu de cette page.
L'auteur de ce site est seul responsable de la manière dont sont présentés ici les différents concepts, et des libertés qui sont prises avec les ouvrages de référence. N'oubliez pas que vous devez croiser les informations de sources multiples afin de diminuer les risques d'erreurs.

Table des matières Haut