Rechercher dans le manuel MySQL

8.2.2.1 Optimizing IN and EXISTS Subquery predicates with Semijoin Transformations

A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization. The optimizer uses semijoin strategies to improve subquery execution, as described in this section.

For an inner join between two tables, the join returns a row from one table as many times as there are matches in the other table. But for some questions, the only information that matters is whether there is a match, not the number of matches. Suppose that there are tables named class and roster that list classes in a course curriculum and class rosters (students enrolled in each class), respectively. To list the classes that actually have students enrolled, you could use this join:

  1. SELECT class.class_num, class.class_name
  2. FROM class INNER JOIN roster
  3. WHERE class.class_num = roster.class_num;

However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.

Assuming that class_num is a primary key in the class table, duplicate suppression is possible by using SELECT DISTINCT, but it is inefficient to generate all matching rows first only to eliminate duplicates later.

The same duplicate-free result can be obtained by using a subquery:

  1. SELECT class_num, class_name
  2. FROM class
  3. WHERE class_num IN (SELECT class_num FROM roster);

Here, the optimizer can recognize that the IN clause requires the subquery to return only one instance of each class number from the roster table. In this case, the query can use a semijoin; that is, an operation that returns only one instance of each row in class that is matched by rows in roster.

In MySQL 8.0.16 and later, this strategy can also be employed with EXISTS subqueries by transforming an EXISTS subquery that appears at the top level of a WHERE or ON clause of an IN or EXISTS predicate that refers to a subquery:

  1. WHERE ... EXISTS (SELECT ... FROM ...)
  2.  
  3. -> WHERE ... 1 IN (SELECT 1 FROM ...)

After this, the subquery operation can be handled as a semijoin.

Beginning with MySQL 8.0.17, the following subqueries are transformed into antijoins:

  • NOT IN (SELECT ... FROM ...)

  • NOT EXISTS (SELECT ... FROM ...).

  • IN (SELECT ... FROM ...) IS NOT TRUE

  • EXISTS (SELECT ... FROM ...) IS NOT TRUE.

  • IN (SELECT ... FROM ...) IS FALSE

  • EXISTS (SELECT ... FROM ...) IS FALSE.

In short, any case in which a subquery of the form IN (SELECT ... FROM ...) or EXISTS (SELECT ... FROM ...) is negated is transformed into an antijoin.

An antijoin is an operation that returns only rows for which there is no match. Consider the query shown here:

  1. SELECT class_num, class_name
  2. FROM class
  3. WHERE class_num NOT IN (SELECT class_num FROM roster);

This query is rewritten internally as the antijoin SELECT class_num, class_name FROM class ANTIJOIN roster ON class_num, which returns one instance of each row in class that is not matched by any rows in roster. This means that, for each row in class, as soon as a match is found in roster, the row in class can be discarded.

Antijoin transformations cannot in most cases be applied if the expressions being compared are nullable. An exception to this rule is that (... NOT IN (SELECT ...)) IS NOT FALSE and its equivalent (... IN (SELECT ...)) IS NOT TRUE can be transformed into antijoins.

Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, view references, or common table expressions.

In MySQL, a subquery must satisfy these criteria to be handled as a semijoin (or, in MySQL 8.0.17 and later, an antijoin if NOT modifies the subquery):

  • It must be an IN (or =ANY) subquery that appears at the top level of the WHERE or ON clause, possibly as a term in an AND expression. For example:

    1. SELECT ...
    2. FROM ot1, ...
    3. WHERE (oe1, ...) IN (SELECT ie1, ... FROM it1, ... WHERE ...);

    Here, ot_i and it_i represent tables in the outer and inner parts of the query, and oe_i and ie_i represent expressions that refer to columns in the outer and inner tables.

    In MySQL 8.0.17 and later, the subquery can also be the argument to an expression modified by NOT, IS [NOT] TRUE, or IS [NOT] FALSE.

  • It must be a single SELECT without UNION constructs.

  • It must not contain a GROUP BY or HAVING clause.

  • It must not be implicitly grouped (it must contain no aggregate functions).

  • It must not have ORDER BY with LIMIT.

  • The statement must not use the STRAIGHT_JOIN join type in the outer query.

  • The STRAIGHT_JOIN modifier must not be present.

  • The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.

The subquery may be correlated or uncorrelated. In MySQL 8.0.16 and later, decorrelation looks at equality predicates in the WHERE clause of a subquery used as the argument to EXISTS, and makes it possible to optimize it as if it was used within IN (SELECT b FROM ...). The term trivially correlated means that the predicate is an equality predicate, that it is the sole predicate in the WHERE clause (or is combined with AND), and that one operand is from a table referenced in the subquery and the other operand is from the outer query block.

DISTINCT is permitted, as is LIMIT unless ORDER BY is also used.

If a subquery meets the preceding criteria, MySQL converts it to a semijoin (or, in MySQL 8.0.17 or later, an antijoin if applicable) and makes a cost-based choice from these strategies:

  • Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.

  • Duplicate Weedout: Run the semijoin as if it was a join and remove duplicate records using a temporary table.

  • FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

  • LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

  • Materialize the subquery into an indexed temporary table that is used to perform a join, where the index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned. For more information about materialization, see Section 8.2.2.2, “Optimizing Subqueries with Materialization”.

Each of these strategies can be enabled or disabled using the following optimizer_switch system variable flags:

  • The semijoin flag controls whether semijoins are used. Starting with MySQL 8.0.17, this also applies to antijoins.

  • If semijoin is enabled, the firstmatch, loosescan, duplicateweedout, and materialization flags enable finer control over the permitted semijoin strategies.

  • If the duplicateweedout semijoin strategy is disabled, it is not used unless all other applicable strategies are also disabled.

  • If duplicateweedout is disabled, on occasion the optimizer may generate a query plan that is far from optimal. This occurs due to heuristic pruning during greedy search, which can be avoided by setting optimizer_prune_level=0.

These flags are enabled by default. See Section 8.9.2, “Switchable Optimizations”.

The optimizer minimizes differences in handling of views and derived tables. This affects queries that use the STRAIGHT_JOIN modifier and a view with an IN subquery that can be converted to a semijoin. The following query illustrates this because the change in processing causes a change in transformation, and thus a different execution strategy:

  1. FROM t1
  2.            FROM t2);
  3.  
  4. FROM t3 JOIN v ON t3.x = v.a;

The optimizer first looks at the view and converts the IN subquery to a semijoin, then checks whether it is possible to merge the view into the outer query. Because the STRAIGHT_JOIN modifier in the outer query prevents semijoin, the optimizer refuses the merge, causing derived table evaluation using a materialized table.

EXPLAIN output indicates the use of semijoin strategies as follows:

  • For extended EXPLAIN output, the text displayed by a following SHOW WARNINGS shows the rewritten query, which displays the semijoin structure. (See Section 8.8.3, “Extended EXPLAIN Output Format”.) From this you can get an idea about which tables were pulled out of the semijoin. If a subquery was converted to a semijoin, you will see that the subquery predicate is gone and its tables and WHERE clause were merged into the outer query join list and WHERE clause.

  • Temporary table use for Duplicate Weedout is indicated by Start temporary and End temporary in the Extra column. Tables that were not pulled out and are in the range of EXPLAIN output rows covered by Start temporary and End temporary have their rowid in the temporary table.

  • FirstMatch(tbl_name) in the Extra column indicates join shortcutting.

  • LooseScan(m..n) in the Extra column indicates use of the LooseScan strategy. m and n are key part numbers.

  • Temporary table use for materialization is indicated by rows with a select_type value of MATERIALIZED and rows with a table value of <subqueryN>.


Rechercher dans le manuel MySQL

Traduction non disponible

Le manuel MySQL n'est pas encore traduit en français sur l'infobrol. Seule la version anglaise est disponible pour l'instant.

Document créé le 26/06/2006, dernière modification le 26/10/2018
Source du document imprimé : https://www.gaudry.be/mysql-rf-semijoins.html

L'infobrol est un site personnel dont le contenu n'engage que moi. Le texte est mis à disposition sous licence CreativeCommons(BY-NC-SA). Plus d'info sur les conditions d'utilisation et sur l'auteur.

Références

  1. Consulter le document html Langue du document :en Manuel MySQL : https://dev.mysql.com/

Ces références et liens indiquent des documents consultés lors de la rédaction de cette page, ou qui peuvent apporter un complément d'information, mais les auteurs de ces sources ne peuvent être tenus responsables du contenu de cette page.
L'auteur de ce site est seul responsable de la manière dont sont présentés ici les différents concepts, et des libertés qui sont prises avec les ouvrages de référence. N'oubliez pas que vous devez croiser les informations de sources multiples afin de diminuer les risques d'erreurs.

Table des matières Haut