Rechercher dans le manuel MySQL
A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization. The optimizer uses semijoin strategies to improve subquery execution, as described in this section.
For an inner join between two tables, the join returns a row
from one table as many times as there are matches in the other
table. But for some questions, the only information that
matters is whether there is a match, not the number of
matches. Suppose that there are tables named
class
and roster
that
list classes in a course curriculum and class rosters
(students enrolled in each class), respectively. To list the
classes that actually have students enrolled, you could use
this join:
- SELECT class.class_num, class.class_name
However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.
Assuming that class_num
is a primary key in
the class
table, duplicate suppression is
possible by using
SELECT
DISTINCT
, but it is inefficient to generate all
matching rows first only to eliminate duplicates later.
The same duplicate-free result can be obtained by using a subquery:
Here, the optimizer can recognize that the
IN
clause requires the subquery to return
only one instance of each class number from the
roster
table. In this case, the query can
use a semijoin; that is,
an operation that returns only one instance of each row in
class
that is matched by rows in
roster
.
In MySQL 8.0.16 and later, this strategy can also be employed
with EXISTS
subqueries by transforming an
EXISTS
subquery that appears at the top
level of a WHERE
or ON
clause of an IN
or
EXISTS
predicate that refers to a subquery:
After this, the subquery operation can be handled as a semijoin.
Beginning with MySQL 8.0.17, the following subqueries are transformed into antijoins:
NOT IN (SELECT ... FROM ...)
NOT EXISTS (SELECT ... FROM ...)
.IN (SELECT ... FROM ...) IS NOT TRUE
EXISTS (SELECT ... FROM ...) IS NOT TRUE
.IN (SELECT ... FROM ...) IS FALSE
EXISTS (SELECT ... FROM ...) IS FALSE
.
In short, any case in which a subquery of the form IN
(SELECT ... FROM ...)
or EXISTS (SELECT ...
FROM ...)
is negated is transformed into an
antijoin.
An antijoin is an operation that returns only rows for which there is no match. Consider the query shown here:
This query is rewritten internally as the antijoin
SELECT class_num, class_name FROM class ANTIJOIN
roster ON class_num
, which returns one instance of
each row in class
that is
not matched by any rows in
roster
. This means that, for each row in
class
, as soon as a match is found in
roster
, the row in class
can be discarded.
Antijoin transformations cannot in most cases be applied if
the expressions being compared are nullable. An exception to
this rule is that (... NOT IN (SELECT ...)) IS NOT
FALSE
and its equivalent (... IN (SELECT
...)) IS NOT TRUE
can be transformed into antijoins.
Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, view references, or common table expressions.
In MySQL, a subquery must satisfy these criteria to be handled
as a semijoin (or, in MySQL 8.0.17 and later, an antijoin if
NOT
modifies the subquery):
It must be an
IN
(or=ANY
) subquery that appears at the top level of theWHERE
orON
clause, possibly as a term in anAND
expression. For example:Here,
ot_
andi
it_
represent tables in the outer and inner parts of the query, andi
oe_
andi
ie_
represent expressions that refer to columns in the outer and inner tables.i
In MySQL 8.0.17 and later, the subquery can also be the argument to an expression modified by
NOT
,IS [NOT] TRUE
, orIS [NOT] FALSE
.It must not contain a
GROUP BY
orHAVING
clause.It must not be implicitly grouped (it must contain no aggregate functions).
It must not have
ORDER BY
withLIMIT
.The statement must not use the
STRAIGHT_JOIN
join type in the outer query.The
STRAIGHT_JOIN
modifier must not be present.The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.
The subquery may be correlated or uncorrelated. In MySQL
8.0.16 and later, decorrelation looks at equality predicates
in the WHERE
clause of a subquery used as
the argument to EXISTS
, and makes it
possible to optimize it as if it was used within IN
(SELECT b FROM ...)
. The term trivially
correlated means that the predicate is an equality
predicate, that it is the sole predicate in the
WHERE
clause (or is combined with
AND
), and that one operand is from a table
referenced in the subquery and the other operand is from the
outer query block.
DISTINCT
is permitted, as is
LIMIT
unless ORDER BY
is
also used.
If a subquery meets the preceding criteria, MySQL converts it to a semijoin (or, in MySQL 8.0.17 or later, an antijoin if applicable) and makes a cost-based choice from these strategies:
Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.
Duplicate Weedout: Run the semijoin as if it was a join and remove duplicate records using a temporary table.
FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.
LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.
Materialize the subquery into an indexed temporary table that is used to perform a join, where the index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned. For more information about materialization, see Section 8.2.2.2, “Optimizing Subqueries with Materialization”.
Each of these strategies can be enabled or disabled using the
following optimizer_switch
system variable flags:
The
semijoin
flag controls whether semijoins are used. Starting with MySQL 8.0.17, this also applies to antijoins.If
semijoin
is enabled, thefirstmatch
,loosescan
,duplicateweedout
, andmaterialization
flags enable finer control over the permitted semijoin strategies.If the
duplicateweedout
semijoin strategy is disabled, it is not used unless all other applicable strategies are also disabled.If
duplicateweedout
is disabled, on occasion the optimizer may generate a query plan that is far from optimal. This occurs due to heuristic pruning during greedy search, which can be avoided by settingoptimizer_prune_level=0
.
These flags are enabled by default. See Section 8.9.2, “Switchable Optimizations”.
The optimizer minimizes differences in handling of views and
derived tables. This affects queries that use the
STRAIGHT_JOIN
modifier and a view with an
IN
subquery that can be converted to a
semijoin. The following query illustrates this because the
change in processing causes a change in transformation, and
thus a different execution strategy:
The optimizer first looks at the view and converts the
IN
subquery to a semijoin, then checks
whether it is possible to merge the view into the outer query.
Because the STRAIGHT_JOIN
modifier in the
outer query prevents semijoin, the optimizer refuses the
merge, causing derived table evaluation using a materialized
table.
EXPLAIN
output indicates the
use of semijoin strategies as follows:
For extended
EXPLAIN
output, the text displayed by a followingSHOW WARNINGS
shows the rewritten query, which displays the semijoin structure. (See Section 8.8.3, “Extended EXPLAIN Output Format”.) From this you can get an idea about which tables were pulled out of the semijoin. If a subquery was converted to a semijoin, you will see that the subquery predicate is gone and its tables andWHERE
clause were merged into the outer query join list andWHERE
clause.Temporary table use for Duplicate Weedout is indicated by
Start temporary
andEnd temporary
in theExtra
column. Tables that were not pulled out and are in the range ofEXPLAIN
output rows covered byStart temporary
andEnd temporary
have theirrowid
in the temporary table.FirstMatch(
in thetbl_name
)Extra
column indicates join shortcutting.LooseScan(
in them
..n
)Extra
column indicates use of the LooseScan strategy.m
andn
are key part numbers.Temporary table use for materialization is indicated by rows with a
select_type
value ofMATERIALIZED
and rows with atable
value of<subquery
.N
>
Nederlandse vertaling
U hebt gevraagd om deze site in het Nederlands te bezoeken. Voor nu wordt alleen de interface vertaald, maar nog niet alle inhoud.Als je me wilt helpen met vertalingen, is je bijdrage welkom. Het enige dat u hoeft te doen, is u op de site registreren en mij een bericht sturen waarin u wordt gevraagd om u toe te voegen aan de groep vertalers, zodat u de gewenste pagina's kunt vertalen. Een link onderaan elke vertaalde pagina geeft aan dat u de vertaler bent en heeft een link naar uw profiel.
Bij voorbaat dank.
Document heeft de 26/06/2006 gemaakt, de laatste keer de 26/10/2018 gewijzigd
Bron van het afgedrukte document:https://www.gaudry.be/nl/mysql-rf-semijoins.html
De infobrol is een persoonlijke site waarvan de inhoud uitsluitend mijn verantwoordelijkheid is. De tekst is beschikbaar onder CreativeCommons-licentie (BY-NC-SA). Meer info op de gebruiksvoorwaarden en de auteur.
Referenties
Deze verwijzingen en links verwijzen naar documenten die geraadpleegd zijn tijdens het schrijven van deze pagina, of die aanvullende informatie kunnen geven, maar de auteurs van deze bronnen kunnen niet verantwoordelijk worden gehouden voor de inhoud van deze pagina.
De auteur Deze site is als enige verantwoordelijk voor de manier waarop de verschillende concepten, en de vrijheden die met de referentiewerken worden genomen, hier worden gepresenteerd. Vergeet niet dat u meerdere broninformatie moet doorgeven om het risico op fouten te verkleinen.