8.5.5 Bulk Data Loading for InnoDB Tables

Contents

These performance tips supplement the general guidelines for fast inserts in Section 8.2.5.1, “Optimizing INSERT Statements”.

When importing data into InnoDB, turn off autocommit mode, because it performs a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET autocommit and COMMIT statements:
1. SET autocommit=0;
2. ... SQL import statements ...
3. COMMIT;
The mysqldump option --opt creates dump files that are fast to import into an InnoDB table, even without wrapping them with the SET autocommit and COMMIT statements.
If you have UNIQUE constraints on secondary keys, you can speed up table imports by temporarily turning off the uniqueness checks during the import session:
1. SET unique_checks=0;
2. ... SQL import statements ...
3. SET unique_checks=1;
For big tables, this saves a lot of disk I/O because InnoDB can use its change buffer to write secondary index records in a batch. Be certain that the data contains no duplicate keys.
If you have FOREIGN KEY constraints in your tables, you can speed up table imports by turning off the foreign key checks for the duration of the import session:
1. SET foreign_key_checks=0;
2. ... SQL import statements ...
3. SET foreign_key_checks=1;
For big tables, this can save a lot of disk I/O.
Use the multiple-row INSERT syntax to reduce communication overhead between the client and the server if you need to insert many rows:
1. INSERT INTO yourtable VALUES (1,2), (5,5), ...;
This tip is valid for inserts into any table, not just InnoDB tables.
When doing bulk inserts into tables with auto-increment columns, set innodb_autoinc_lock_mode to 2 (interleaved) instead of 1 (consecutive). See Section 15.6.1.4, “AUTO_INCREMENT Handling in InnoDB” for details.
When performing bulk inserts, it is faster to insert rows in PRIMARY KEY order. InnoDB tables use a clustered index, which makes it relatively fast to use data in the order of the PRIMARY KEY. Performing bulk inserts in PRIMARY KEY order is particularly important for tables that do not fit entirely within the buffer pool.
For optimal performance when loading data into an InnoDB FULLTEXT index, follow this set of steps:
1. Define a column FTS_DOC_ID at table creation time, of type BIGINT UNSIGNED NOT NULL, with a unique index named FTS_DOC_ID_INDEX. For example:
  1. CREATE TABLE t1 (
  2. FTS_DOC_ID BIGINT unsigned NOT NULL AUTO_INCREMENT,
  3. title varchar(255) NOT NULL DEFAULT '',
  4. text mediumtext NOT NULL,
  5. PRIMARY KEY (`FTS_DOC_ID`)
  6. ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
  7. CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on t1(FTS_DOC_ID);
2. Load the data into the table.
3. Create the FULLTEXT index after the data is loaded.
Note

When adding FTS_DOC_ID column at table creation time, ensure that the FTS_DOC_ID column is updated when the FULLTEXT indexed column is updated, as the FTS_DOC_ID must increase monotonically with each INSERT or UPDATE. If you choose not to add the FTS_DOC_ID at table creation time and have InnoDB manage DOC IDs for you, InnoDB will add the FTS_DOC_ID as a hidden column with the next CREATE FULLTEXT INDEX call. This approach, however, requires a table rebuild which will impact performance.

Find a PHP function

Document created the 26/06/2006, last modified the 26/10/2018
Source of the printed document:https://www.gaudry.be/en/mysql-rf-optimizing-innodb-bulk-data-loading.html

The infobrol is a personal site whose content is my sole responsibility. The text is available under CreativeCommons license (BY-NC-SA). More info on the terms of use and the author.

References

Manuel MySQL : https://dev.mysql.com/

These references and links indicate documents consulted during the writing of this page, or which may provide additional information, but the authors of these sources can not be held responsible for the content of this page.
The author of this site is solely responsible for the way in which the various concepts, and the freedoms that are taken with the reference works, are presented here. Remember that you must cross multiple source information to reduce the risk of errors.