Rechercher dans le manuel MySQL

8.2.1.12 Condition Filtering

In join processing, prefix rows are those rows passed from one table in a join to the next. In general, the optimizer attempts to put tables with low prefix counts early in the join order to keep the number of row combinations from increasing rapidly. To the extent that the optimizer can use information about conditions on rows selected from one table and passed to the next, the more accurately it can compute row estimates and choose the best execution plan.

Without condition filtering, the prefix row count for a table is based on the estimated number of rows selected by the WHERE clause according to whichever access method the optimizer chooses. Condition filtering enables the optimizer to use other relevant conditions in the WHERE clause not taken into account by the access method, and thus improve its prefix row count estimates. For example, even though there might be an index-based access method that can be used to select rows from the current table in a join, there might also be additional conditions for the table in the WHERE clause that can filter (further restrict) the estimate for qualifying rows passed to the next table.

A condition contributes to the filtering estimate only if:

  • It refers to the current table.

  • It depends on a constant value or values from earlier tables in the join sequence.

  • It was not already taken into account by the access method.

In EXPLAIN output, the rows column indicates the row estimate for the chosen access method, and the filtered column reflects the effect of condition filtering. filtered values are expressed as percentages. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering.

The prefix row count (the number of rows estimated to be passed from the current table in a join to the next) is the product of the rows and filtered values. That is, the prefix row count is the estimated row count, reduced by the estimated filtering effect. For example, if rows is 1000 and filtered is 20%, condition filtering reduces the estimated row count of 1000 to a prefix row count of 1000 × 20% = 1000 × .2 = 200.

Consider the following query:

  1.   FROM employee JOIN department ON employee.dept_no = department.dept_no
  2.   WHERE employee.first_name = 'John'
  3.   AND employee.hire_date BETWEEN '2018-01-01' AND '2018-06-01';

Suppose that the data set has these characteristics:

  • The employee table has 1024 rows.

  • The department table has 12 rows.

  • Both tables have an index on dept_no.

  • The employee table has an index on first_name.

  • 8 rows satisfy this condition on employee.first_name:

    1. employee.first_name = 'John'
  • 150 rows satisfy this condition on employee.hire_date:

    1. employee.hire_date BETWEEN '2018-01-01' AND '2018-06-01'
  • 1 row satisfies both conditions:

    1. employee.first_name = 'John'
    2. AND employee.hire_date BETWEEN '2018-01-01' AND '2018-06-01'

Without condition filtering, EXPLAIN produces output like this:

+----+------------+--------+------------------+---------+---------+------+----------+
| id | table      | type   | possible_keys    | key     | ref     | rows | filtered |
+----+------------+--------+------------------+---------+---------+------+----------+
| 1  | employee   | ref    | name,h_date,dept | name    | const   | 8    | 100.00   |
| 1  | department | eq_ref | PRIMARY          | PRIMARY | dept_no | 1    | 100.00   |
+----+------------+--------+------------------+---------+---------+------+----------+

For employee, the access method on the name index picks up the 8 rows that match a name of 'John'. No filtering is done (filtered is 100%), so all rows are prefix rows for the next table: The prefix row count is rows × filtered = 8 × 100% = 8.

With condition filtering, the optimizer additionally takes into account conditions from the WHERE clause not taken into account by the access method. In this case, the optimizer uses heuristics to estimate a filtering effect of 16.31% for the BETWEEN condition on employee.hire_date. As a result, EXPLAIN produces output like this:

+----+------------+--------+------------------+---------+---------+------+----------+
| id | table      | type   | possible_keys    | key     | ref     | rows | filtered |
+----+------------+--------+------------------+---------+---------+------+----------+
| 1  | employee   | ref    | name,h_date,dept | name    | const   | 8    | 16.31    |
| 1  | department | eq_ref | PRIMARY          | PRIMARY | dept_no | 1    | 100.00   |
+----+------------+--------+------------------+---------+---------+------+----------+

Now the prefix row count is rows × filtered = 8 × 16.31% = 1.3, which more closely reflects actual data set.

Normally, the optimizer does not calculate the condition filtering effect (prefix row count reduction) for the last joined table because there is no next table to pass rows to. An exception occurs for EXPLAIN: To provide more information, the filtering effect is calculated for all joined tables, including the last one.

To control whether the optimizer considers additional filtering conditions, use the condition_fanout_filter flag of the optimizer_switch system variable (see Section 8.9.2, “Switchable Optimizations”). This flag is enabled by default but can be disabled to suppress condition filtering (for example, if a particular query is found to yield better performance without it).

If the optimizer overestimates the effect of condition filtering, performance may be worse than if condition filtering is not used. In such cases, these techniques may help:

  • If a column is not indexed, index it so that the optimizer has some information about the distribution of column values and can improve its row estimates.

  • Similarly, if no column histogram information is available, generate a histogram (see Section 8.9.6, “Optimizer Statistics”).

  • Change the join order. Ways to accomplish this include join-order optimizer hints (see Section 8.9.3, “Optimizer Hints”), STRAIGHT_JOIN immediately following the SELECT, and the STRAIGHT_JOIN join operator.

  • Disable condition filtering for the session:

    1. SET optimizer_switch = 'condition_fanout_filter=off';

    Or, for a given query, using an optimizer hint:

    1. SELECT /*+ SET_VAR(optimizer_switch = 'condition_fanout_filter=off') */ ...

Find a PHP function

Document created the 26/06/2006, last modified the 26/10/2018
Source of the printed document:https://www.gaudry.be/en/mysql-rf-condition-filtering.html

The infobrol is a personal site whose content is my sole responsibility. The text is available under CreativeCommons license (BY-NC-SA). More info on the terms of use and the author.

References

  1. View the html document Language of the document:en Manuel MySQL : https://dev.mysql.com/

These references and links indicate documents consulted during the writing of this page, or which may provide additional information, but the authors of these sources can not be held responsible for the content of this page.
The author This site is solely responsible for the way in which the various concepts, and the freedoms that are taken with the reference works, are presented here. Remember that you must cross multiple source information to reduce the risk of errors.

Contents Haut