Rechercher dans le manuel MySQL
15.8.10.1 Configuring Persistent Optimizer Statistics Parameters
The persistent optimizer statistics feature improves plan stability by storing statistics to disk and making them persistent across server restarts so that the optimizer is more likely to make consistent choices each time for a given query.
Optimizer statistics are persisted to disk when
innodb_stats_persistent=ON
or
when individual tables are created or altered with
STATS_PERSISTENT=1
.
innodb_stats_persistent
is
enabled by default.
Formerly, optimizer statistics were cleared on each server restart and after some other operations, and recomputed on the next table access. Consequently, different estimates could be produced when recalculating statistics, leading to different choices in query execution plans and thus variations in query performance.
Persistent statistics are stored in the
mysql.innodb_table_stats
and
mysql.innodb_index_stats
tables, as described
in Section 15.8.10.1.5, “InnoDB Persistent Statistics Tables”.
To revert to using non-persistent optimizer statistics, you can
modify tables using an ALTER TABLE
statement. For related information, see
Section 15.8.10.2, “Configuring Non-Persistent Optimizer Statistics Parameters”
tbl_name
STATS_PERSISTENT=0
The innodb_stats_auto_recalc
configuration option, which is enabled by default, determines
whether statistics are calculated automatically whenever a
table undergoes substantial changes (to more than 10% of the
rows). You can also configure automatic statistics
recalculation for individual tables using a
STATS_AUTO_RECALC
clause in a
CREATE TABLE
or
ALTER TABLE
statement.
innodb_stats_auto_recalc
is
enabled by default.
Because of the asynchronous nature of automatic statistics
recalculation (which occurs in the background), statistics may
not be recalculated instantly after running a DML operation
that affects more than 10% of a table, even when
innodb_stats_auto_recalc
is
enabled. In some cases, statistics recalculation may be
delayed by a few seconds. If up-to-date statistics are
required immediately after changing significant portions of a
table, run ANALYZE TABLE
to
initiate a synchronous (foreground) recalculation of
statistics.
If innodb_stats_auto_recalc
is disabled, ensure the accuracy of optimizer statistics by
issuing the ANALYZE TABLE
statement for each applicable table after making substantial
changes to indexed columns. You might run this statement in
your setup scripts after representative data has been loaded
into the table, and run it periodically after DML operations
significantly change the contents of indexed columns, or on a
schedule at times of low activity. When a new index is added
to an existing table, or a column is added or dropped, index
statistics are calculated and added to the
innodb_index_stats
table regardless of the
value of
innodb_stats_auto_recalc
.
To ensure statistics are gathered when a new index is
created, either enable the
innodb_stats_auto_recalc
option, or run ANALYZE TABLE
after creating each new index when the persistent statistics
mode is enabled.
15.8.10.1.2 Configuring Optimizer Statistics Parameters for Individual Tables
innodb_stats_persistent
,
innodb_stats_auto_recalc
, and
innodb_stats_persistent_sample_pages
are global configuration options. To override these
system-wide settings and configure optimizer statistics
parameters for individual tables, you can define
STATS_PERSISTENT
,
STATS_AUTO_RECALC
, and
STATS_SAMPLE_PAGES
clauses in
CREATE TABLE
or
ALTER TABLE
statements.
STATS_PERSISTENT
specifies whether to enable persistent statistics for anInnoDB
table. The valueDEFAULT
causes the persistent statistics setting for the table to be determined by theinnodb_stats_persistent
configuration option. The value1
enables persistent statistics for the table, while the value0
turns off this feature. After enabling persistent statistics through aCREATE TABLE
orALTER TABLE
statement, issue anANALYZE TABLE
statement to calculate the statistics, after loading representative data into the table.STATS_AUTO_RECALC
specifies whether to automatically recalculate persistent statistics for anInnoDB
table. The valueDEFAULT
causes the persistent statistics setting for the table to be determined by theinnodb_stats_auto_recalc
configuration option. The value1
causes statistics to be recalculated when 10% of the data in the table has changed. The value0
prevents automatic recalculation for this table; with this setting, issue anANALYZE TABLE
statement to recalculate the statistics after making substantial changes to the table.STATS_SAMPLE_PAGES
specifies the number of index pages to sample when estimating cardinality and other statistics for an indexed column, such as those calculated byANALYZE TABLE
.
All three clauses are specified in the following
CREATE TABLE
example:
The MySQL query optimizer uses estimated
statistics about key
distributions to choose the indexes for an execution plan,
based on the relative
selectivity of the
index. Operations such as ANALYZE
TABLE
cause InnoDB
to sample
random pages from each index on a table to estimate the
cardinality of the
index. (This technique is known as
random dives.)
To give you control over the quality of the statistics
estimate (and thus better information for the query
optimizer), you can change the number of sampled pages using
the parameter
innodb_stats_persistent_sample_pages
,
which can be set at runtime.
innodb_stats_persistent_sample_pages
has a default value of 20. As a general guideline, consider
modifying this parameter when encountering the following
issues:
Statistics are not accurate enough and the optimizer chooses suboptimal plans, as shown by
EXPLAIN
output. The accuracy of statistics can be checked by comparing the actual cardinality of an index (as returned by runningSELECT DISTINCT
on the index columns) with the estimates provided in themysql.innodb_index_stats
persistent statistics table.If it is determined that statistics are not accurate enough, the value of
innodb_stats_persistent_sample_pages
should be increased until the statistics estimates are sufficiently accurate. Increasinginnodb_stats_persistent_sample_pages
too much, however, could causeANALYZE TABLE
to run slowly.ANALYZE TABLE
is too slow. In this caseinnodb_stats_persistent_sample_pages
should be decreased untilANALYZE TABLE
execution time is acceptable. Decreasing the value too much, however, could lead to the first problem of inaccurate statistics and suboptimal query execution plans.If a balance cannot be achieved between accurate statistics and
ANALYZE TABLE
execution time, consider decreasing the number of indexed columns in the table or limiting the number of partitions to reduceANALYZE TABLE
complexity. The number of columns in the table's primary key is also important to consider, as primary key columns are appended to each nonunique index.For related information, see Section 15.8.10.3, “Estimating ANALYZE TABLE Complexity for InnoDB Tables”.
By default, InnoDB
reads uncommitted data
when calculating statistics. In the case of an uncommitted
transaction that deletes rows from a table,
InnoDB
excludes records that are
delete-marked when calculating row estimates and index
statistics, which can lead to non-optimal execution plans for
other transactions that are operating on the table
concurrently using a transaction isolation level other than
READ UNCOMMITTED
. To avoid
this scenario,
innodb_stats_include_delete_marked
can be enabled to ensure that InnoDB
includes delete-marked records when calculating persistent
optimizer statistics.
When
innodb_stats_include_delete_marked
is enabled, ANALYZE TABLE
considers delete-marked records when recalculating statistics.
innodb_stats_include_delete_marked
is a global setting that affects all InnoDB
tables, and it is only applicable to persistent optimizer
statistics.
The persistent statistics feature relies on the internally
managed tables in the mysql
database, named
innodb_table_stats
and
innodb_index_stats
. These tables are set up
automatically in all install, upgrade, and build-from-source
procedures.
Table 15.7 Columns of innodb_table_stats
Column name | Description |
---|---|
database_name |
Database name |
table_name |
Table name, partition name, or subpartition name |
last_update |
A timestamp indicating the last time that InnoDB
updated this row |
n_rows |
The number of rows in the table |
clustered_index_size |
The size of the primary index, in pages |
sum_of_other_index_sizes |
The total size of other (non-primary) indexes, in pages |
Table 15.8 Columns of innodb_index_stats
Column name | Description |
---|---|
database_name |
Database name |
table_name |
Table name, partition name, or subpartition name |
index_name |
Index name |
last_update |
A timestamp indicating the last time that InnoDB
updated this row |
stat_name |
The name of the statistic, whose value is reported in the
stat_value column |
stat_value |
The value of the statistic that is named in stat_name
column |
sample_size |
The number of pages sampled for the estimate provided in the
stat_value column |
stat_description |
Description of the statistic that is named in the
stat_name column |
Both the innodb_table_stats
and
innodb_index_stats
tables include a
last_update
column showing when
InnoDB
last updated index statistics, as
shown in the following example:
- *************************** 1. row ***************************
- database_name: sakila
- table_name: actor
- last_update: 2014-05-28 16:16:44
- n_rows: 200
- clustered_index_size: 1
- sum_of_other_index_sizes: 1
- ...
- *************************** 1. row ***************************
- database_name: sakila
- table_name: actor
- index_name: PRIMARY
- last_update: 2014-05-28 16:16:44
- stat_name: n_diff_pfx01
- stat_value: 200
- sample_size: 1
- ...
The innodb_table_stats
and
innodb_index_stats
tables are ordinary
tables and can be updated manually. The ability to update
statistics manually makes it possible to force a specific
query optimization plan or test alternative plans without
modifying the database. If you manually update statistics,
issue the FLUSH TABLE
command to make
MySQL reload the updated statistics.
tbl_name
Persistent statistics are considered local information,
because they relate to the server instance. The
innodb_table_stats
and
innodb_index_stats
tables are therefore not
replicated when automatic statistics recalculation takes
place. If you run ANALYZE TABLE
to initiate a synchronous recalculation of statistics, this
statement is replicated (unless you suppressed logging for
it), and recalculation takes place on the replication slaves.
The innodb_table_stats
table contains one
row per table. The data collected is demonstrated in the
following example.
Table t1
contains a primary index (columns
a
, b
) secondary index
(columns c
, d
), and
unique index (columns e
,
f
):
After inserting five rows of sample data, the table appears as follows:
- +---+---+------+------+------+------+
- | a | b | c | d | e | f |
- +---+---+------+------+------+------+
- | 1 | 1 | 10 | 11 | 100 | 101 |
- | 1 | 2 | 10 | 11 | 200 | 102 |
- | 1 | 3 | 10 | 11 | 100 | 103 |
- | 1 | 4 | 10 | 12 | 200 | 104 |
- | 1 | 5 | 10 | 12 | 100 | 105 |
- +---+---+------+------+------+------+
To immediately update statistics, run
ANALYZE TABLE
(if
innodb_stats_auto_recalc
is
enabled, statistics are updated automatically within a few
seconds assuming that the 10% threshold for changed table rows
is reached):
- +---------+---------+----------+----------+
- +---------+---------+----------+----------+
- +---------+---------+----------+----------+
Table statistics for table t1
show the last
time InnoDB
updated the table statistics
(2014-03-14 14:36:34
), the number of rows
in the table (5
), the clustered index size
(1
page), and the combined size of the
other indexes (2
pages).
- *************************** 1. row ***************************
- database_name: test
- table_name: t1
- last_update: 2014-03-14 14:36:34
- n_rows: 5
- clustered_index_size: 1
- sum_of_other_index_sizes: 2
The innodb_index_stats
table contains
multiple rows for each index. Each row in the
innodb_index_stats
table provides data
related to a particular index statistic which is named in the
stat_name
column and described in the
stat_description
column. For example:
- +------------+--------------+------------+-----------------------------------+
- | index_name | stat_name | stat_value | stat_description |
- +------------+--------------+------------+-----------------------------------+
- | PRIMARY | n_diff_pfx01 | 1 | a |
- | PRIMARY | n_diff_pfx02 | 5 | a,b |
- | i1 | n_diff_pfx01 | 1 | c |
- | i1 | n_diff_pfx02 | 2 | c,d |
- | i1 | n_diff_pfx03 | 2 | c,d,a |
- | i1 | n_diff_pfx04 | 5 | c,d,a,b |
- | i2uniq | n_diff_pfx01 | 2 | e |
- | i2uniq | n_diff_pfx02 | 5 | e,f |
- +------------+--------------+------------+-----------------------------------+
The stat_name
column shows the following
types of statistics:
size
: Wherestat_name
=size
, thestat_value
column displays the total number of pages in the index.n_leaf_pages
: Wherestat_name
=n_leaf_pages
, thestat_value
column displays the number of leaf pages in the index.n_diff_pfx
: WhereNN
stat_name
=n_diff_pfx01
, thestat_value
column displays the number of distinct values in the first column of the index. Wherestat_name
=n_diff_pfx02
, thestat_value
column displays the number of distinct values in the first two columns of the index, and so on. Additionally, wherestat_name
=n_diff_pfx
, theNN
stat_description
column shows a comma separated list of the index columns that are counted.
To further illustrate the
n_diff_pfx
statistic, which provides cardinality data, consider once
again the NN
t1
table example that was
introduced previously. As shown below, the
t1
table is created with a primary index
(columns a
, b
), a
secondary index (columns c
,
d
), and a unique index (columns
e
, f
):
After inserting five rows of sample data, the table appears as follows:
- +---+---+------+------+------+------+
- | a | b | c | d | e | f |
- +---+---+------+------+------+------+
- | 1 | 1 | 10 | 11 | 100 | 101 |
- | 1 | 2 | 10 | 11 | 200 | 102 |
- | 1 | 3 | 10 | 11 | 100 | 103 |
- | 1 | 4 | 10 | 12 | 200 | 104 |
- | 1 | 5 | 10 | 12 | 100 | 105 |
- +---+---+------+------+------+------+
When you query the index_name
,
stat_name
, stat_value
,
and stat_description
where
stat_name LIKE 'n_diff%'
, the following
result set is returned:
- FROM mysql.innodb_index_stats
- +------------+--------------+------------+------------------+
- | index_name | stat_name | stat_value | stat_description |
- +------------+--------------+------------+------------------+
- | PRIMARY | n_diff_pfx01 | 1 | a |
- | PRIMARY | n_diff_pfx02 | 5 | a,b |
- | i1 | n_diff_pfx01 | 1 | c |
- | i1 | n_diff_pfx02 | 2 | c,d |
- | i1 | n_diff_pfx03 | 2 | c,d,a |
- | i1 | n_diff_pfx04 | 5 | c,d,a,b |
- | i2uniq | n_diff_pfx01 | 2 | e |
- | i2uniq | n_diff_pfx02 | 5 | e,f |
- +------------+--------------+------------+------------------+
For the PRIMARY
index, there are two
n_diff%
rows. The number of rows is equal
to the number of columns in the index.
For nonunique indexes, InnoDB
appends the
columns of the primary key.
Where
index_name
=PRIMARY
andstat_name
=n_diff_pfx01
, thestat_value
is1
, which indicates that there is a single distinct value in the first column of the index (columna
). The number of distinct values in columna
is confirmed by viewing the data in columna
in tablet1
, in which there is a single distinct value (1
). The counted column (a
) is shown in thestat_description
column of the result set.Where
index_name
=PRIMARY
andstat_name
=n_diff_pfx02
, thestat_value
is5
, which indicates that there are five distinct values in the two columns of the index (a,b
). The number of distinct values in columnsa
andb
is confirmed by viewing the data in columnsa
andb
in tablet1
, in which there are five distinct values: (1,1
), (1,2
), (1,3
), (1,4
) and (1,5
). The counted columns (a,b
) are shown in thestat_description
column of the result set.
For the secondary index (i1
), there are
four n_diff%
rows. Only two columns are
defined for the secondary index (c,d
) but
there are four n_diff%
rows for the
secondary index because InnoDB
suffixes all
nonunique indexes with the primary key. As a result, there are
four n_diff%
rows instead of two to account
for the both the secondary index columns
(c,d
) and the primary key columns
(a,b
).
Where
index_name
=i1
andstat_name
=n_diff_pfx01
, thestat_value
is1
, which indicates that there is a single distinct value in the first column of the index (columnc
). The number of distinct values in columnc
is confirmed by viewing the data in columnc
in tablet1
, in which there is a single distinct value: (10
). The counted column (c
) is shown in thestat_description
column of the result set.Where
index_name
=i1
andstat_name
=n_diff_pfx02
, thestat_value
is2
, which indicates that there are two distinct values in the first two columns of the index (c,d
). The number of distinct values in columnsc
and
is confirmed by viewing the data in columnsc
andd
in tablet1
, in which there are two distinct values: (10,11
) and (10,12
). The counted columns (c,d
) are shown in thestat_description
column of the result set.Where
index_name
=i1
andstat_name
=n_diff_pfx03
, thestat_value
is2
, which indicates that there are two distinct values in the first three columns of the index (c,d,a
). The number of distinct values in columnsc
,d
, anda
is confirmed by viewing the data in columnc
,d
, anda
in tablet1
, in which there are two distinct values: (10,11,1
) and (10,12,1
). The counted columns (c,d,a
) are shown in thestat_description
column of the result set.Where
index_name
=i1
andstat_name
=n_diff_pfx04
, thestat_value
is5
, which indicates that there are five distinct values in the four columns of the index (c,d,a,b
). The number of distinct values in columnsc
,d
,a
andb
is confirmed by viewing the data in columnsc
,d
,a
, andb
in tablet1
, in which there are five distinct values: (10,11,1,1
), (10,11,1,2
), (10,11,1,3
), (10,12,1,4
) and (10,12,1,5
). The counted columns (c,d,a,b
) are shown in thestat_description
column of the result set.
For the unique index (i2uniq
), there are
two n_diff%
rows.
Where
index_name
=i2uniq
andstat_name
=n_diff_pfx01
, thestat_value
is2
, which indicates that there are two distinct values in the first column of the index (columne
). The number of distinct values in columne
is confirmed by viewing the data in columne
in tablet1
, in which there are two distinct values: (100
) and (200
). The counted column (e
) is shown in thestat_description
column of the result set.Where
index_name
=i2uniq
andstat_name
=n_diff_pfx02
, thestat_value
is5
, which indicates that there are five distinct values in the two columns of the index (e,f
). The number of distinct values in columnse
andf
is confirmed by viewing the data in columnse
andf
in tablet1
, in which there are five distinct values: (100,101
), (200,102
), (100,103
), (200,104
) and (100,105
). The counted columns (e,f
) are shown in thestat_description
column of the result set.
The size of indexes for tables, partitions, or subpartitions
can be retrieved using the
innodb_index_stats
table. In the following
example, index sizes are retrieved for table
t1
. For a definition of table
t1
and corresponding index statistics, see
Section 15.8.10.1.6, “InnoDB Persistent Statistics Tables Example”.
- +-------+------------+-------+
- | pages | index_name | size |
- +-------+------------+-------+
- | 1 | PRIMARY | 16384 |
- | 1 | i1 | 16384 |
- | 1 | i2uniq | 16384 |
- +-------+------------+-------+
For partitions or subpartitions, the same query with a
modified WHERE
clause can be used to
retrieve index sizes. For example, the following query
retrieves index sizes for partitions of table
t1
:
Traduction non disponible
Le manuel MySQL n'est pas encore traduit en français sur l'infobrol. Seule la version anglaise est disponible pour l'instant.
Document créé le 26/06/2006, dernière modification le 26/10/2018
Source du document imprimé : https://www.gaudry.be/mysql-rf-innodb-persistent-stats.html
L'infobrol est un site personnel dont le contenu n'engage que moi. Le texte est mis à disposition sous licence CreativeCommons(BY-NC-SA). Plus d'info sur les conditions d'utilisation et sur l'auteur.
Références
Ces références et liens indiquent des documents consultés lors de la rédaction de cette page, ou qui peuvent apporter un complément d'information, mais les auteurs de ces sources ne peuvent être tenus responsables du contenu de cette page.
L'auteur de ce site est seul responsable de la manière dont sont présentés ici les différents concepts, et des libertés qui sont prises avec les ouvrages de référence. N'oubliez pas que vous devez croiser les informations de sources multiples afin de diminuer les risques d'erreurs.