Rechercher dans le manuel MySQL
8.2.1.16 GROUP BY Optimization
The most general way to satisfy a GROUP BY
clause is to scan the whole table and create a new temporary
table where all rows from each group are consecutive, and then
use this temporary table to discover groups and apply
aggregate functions (if any). In some cases, MySQL is able to
do much better than that and avoid creation of temporary
tables by using index access.
The most important preconditions for using indexes for
GROUP BY
are that all GROUP
BY
columns reference attributes from the same index,
and that the index stores its keys in order (as is true, for
example, for a BTREE
index, but not for a
HASH
index). Whether use of temporary
tables can be replaced by index access also depends on which
parts of an index are used in a query, the conditions
specified for these parts, and the selected aggregate
functions.
There are two ways to execute a GROUP BY
query through index access, as detailed in the following
sections. The first method applies the grouping operation
together with all range predicates (if any). The second method
first performs a range scan, and then groups the resulting
tuples.
Loose Index Scan can also be used in the absence of
GROUP BY
under some conditions. See
Skip Scan Range Access Method.
Loose Index Scan
The most efficient way to process GROUP
BY
is when an index is used to directly retrieve
the grouping columns. With this access method, MySQL uses
the property of some index types that the keys are ordered
(for example, BTREE
). This property
enables use of lookup groups in an index without having to
consider all keys in the index that satisfy all
WHERE
conditions. This access method
considers only a fraction of the keys in an index, so it is
called a Loose Index
Scan. When there is no WHERE
clause, a Loose Index Scan reads as many keys as the number
of groups, which may be a much smaller number than that of
all keys. If the WHERE
clause contains
range predicates (see the discussion of the
range
join type in
Section 8.8.1, “Optimizing Queries with EXPLAIN”), a Loose Index Scan looks
up the first key of each group that satisfies the range
conditions, and again reads the smallest possible number of
keys. This is possible under the following conditions:
The query is over a single table.
The
GROUP BY
names only columns that form a leftmost prefix of the index and no other columns. (If, instead ofGROUP BY
, the query has aDISTINCT
clause, all distinct attributes refer to columns that form a leftmost prefix of the index.) For example, if a tablet1
has an index on(c1,c2,c3)
, Loose Index Scan is applicable if the query hasGROUP BY c1, c2
. It is not applicable if the query hasGROUP BY c2, c3
(the columns are not a leftmost prefix) orGROUP BY c1, c2, c4
(c4
is not in the index).The only aggregate functions used in the select list (if any) are
MIN()
andMAX()
, and all of them refer to the same column. The column must be in the index and must immediately follow the columns in theGROUP BY
.Any other parts of the index than those from the
GROUP BY
referenced in the query must be constants (that is, they must be referenced in equalities with constants), except for the argument ofMIN()
orMAX()
functions.For columns in the index, full column values must be indexed, not just a prefix. For example, with
c1 VARCHAR(20), INDEX (c1(10))
, the index uses only a prefix ofc1
values and cannot be used for Loose Index Scan.
If Loose Index Scan is applicable to a query, the
EXPLAIN
output shows
Using index for group-by
in the
Extra
column.
Assume that there is an index
idx(c1,c2,c3)
on table
t1(c1,c2,c3,c4)
. The Loose Index Scan
access method can be used for the following queries:
The following queries cannot be executed with this quick select method, for the reasons given:
The columns in the
GROUP BY
clause do not form a leftmost prefix of the index:The query refers to a part of a key that comes after the
GROUP BY
part, and for which there is no equality with a constant:Were the query to include
WHERE c3 =
, Loose Index Scan could be used.const
The Loose Index Scan access method can be applied to other
forms of aggregate function references in the select list,
in addition to the MIN()
and
MAX()
references already
supported:
AVG(DISTINCT)
,SUM(DISTINCT)
, andCOUNT(DISTINCT)
are supported.AVG(DISTINCT)
andSUM(DISTINCT)
take a single argument.COUNT(DISTINCT)
can have more than one column argument.There must be no
GROUP BY
orDISTINCT
clause in the query.The Loose Index Scan limitations described previously still apply.
Assume that there is an index
idx(c1,c2,c3)
on table
t1(c1,c2,c3,c4)
. The Loose Index Scan
access method can be used for the following queries:
A Tight Index Scan may be either a full index scan or a range index scan, depending on the query conditions.
When the conditions for a Loose Index Scan are not met, it
still may be possible to avoid creation of temporary tables
for GROUP BY
queries. If there are range
conditions in the WHERE
clause, this
method reads only the keys that satisfy these conditions.
Otherwise, it performs an index scan. Because this method
reads all keys in each range defined by the
WHERE
clause, or scans the whole index if
there are no range conditions, it is called a
Tight Index Scan. With
a Tight Index Scan, the grouping operation is performed only
after all keys that satisfy the range conditions have been
found.
For this method to work, it is sufficient that there be a
constant equality condition for all columns in a query
referring to parts of the key coming before or in between
parts of the GROUP BY
key. The constants
from the equality conditions fill in any “gaps”
in the search keys so that it is possible to form complete
prefixes of the index. These index prefixes then can be used
for index lookups. If the GROUP BY
result
requires sorting, and it is possible to form search keys
that are prefixes of the index, MySQL also avoids extra
sorting operations because searching with prefixes in an
ordered index already retrieves all the keys in order.
Assume that there is an index
idx(c1,c2,c3)
on table
t1(c1,c2,c3,c4)
. The following queries do
not work with the Loose Index Scan access method described
previously, but still work with the Tight Index Scan access
method.
There is a gap in the
GROUP BY
, but it is covered by the conditionc2 = 'a'
:The
GROUP BY
does not begin with the first part of the key, but there is a condition that provides a constant for that part:
Traduction non disponible
Le manuel MySQL n'est pas encore traduit en français sur l'infobrol. Seule la version anglaise est disponible pour l'instant.
Document créé le 26/06/2006, dernière modification le 26/10/2018
Source du document imprimé : https://www.gaudry.be/mysql-rf-group-by-optimization.html
L'infobrol est un site personnel dont le contenu n'engage que moi. Le texte est mis à disposition sous licence CreativeCommons(BY-NC-SA). Plus d'info sur les conditions d'utilisation et sur l'auteur.
Références
Ces références et liens indiquent des documents consultés lors de la rédaction de cette page, ou qui peuvent apporter un complément d'information, mais les auteurs de ces sources ne peuvent être tenus responsables du contenu de cette page.
L'auteur de ce site est seul responsable de la manière dont sont présentés ici les différents concepts, et des libertés qui sont prises avec les ouvrages de référence. N'oubliez pas que vous devez croiser les informations de sources multiples afin de diminuer les risques d'erreurs.