Rechercher dans le manuel MySQL

23.2.4 HASH Partitioning

Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions. With range or list partitioning, you must specify explicitly which partition a given column value or set of column values should be stored in; with hash partitioning, this decision is taken care of for you, and you need only specify a column value or expression based on a column value to be hashed and the number of partitions into which the partitioned table is to be divided.

To partition a table using HASH partitioning, it is necessary to append to the CREATE TABLE statement a PARTITION BY HASH (expr) clause, where expr is an expression that returns an integer. This can simply be the name of a column whose type is one of MySQL's integer types. In addition, you most likely want to follow this with PARTITIONS num, where num is a positive integer representing the number of partitions into which the table is to be divided.

Note

For simplicity, the tables in the examples that follow do not use any keys. You should be aware that, if a table has any unique keys, every column used in the partitioning expression for this table must be part of every unique key, including the primary key. See Section 23.6.1, “Partitioning Keys, Primary Keys, and Unique Keys”, for more information.

The following statement creates a table that uses hashing on the store_id column and is divided into 4 partitions:

  1. CREATE TABLE employees (
  2.     id INT NOT NULL,
  3.     fname VARCHAR(30),
  4.     lname VARCHAR(30),
  5.     hired DATE NOT NULL DEFAULT '1970-01-01',
  6.     separated DATE NOT NULL DEFAULT '9999-12-31',
  7.     job_code INT,
  8.     store_id INT
  9. )
  10. PARTITION BY HASH(store_id)
  11. PARTITIONS 4;

If you do not include a PARTITIONS clause, the number of partitions defaults to 1; using the PARTITIONS keyword without a number following it results in a syntax error.

You can also use an SQL expression that returns an integer for expr. For instance, you might want to partition based on the year in which an employee was hired. This can be done as shown here:

  1. CREATE TABLE employees (
  2.     id INT NOT NULL,
  3.     fname VARCHAR(30),
  4.     lname VARCHAR(30),
  5.     hired DATE NOT NULL DEFAULT '1970-01-01',
  6.     separated DATE NOT NULL DEFAULT '9999-12-31',
  7.     job_code INT,
  8.     store_id INT
  9. )
  10. PARTITION BY HASH( YEAR(hired) )
  11. PARTITIONS 4;

expr must return a nonconstant, nonrandom integer value (in other words, it should be varying but deterministic), and must not contain any prohibited constructs as described in Section 23.6, “Restrictions and Limitations on Partitioning”. You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.

The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for pruning on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.

For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col). Even so, YEAR(date_col) is a good candidate for a hashing function, because it varies directly with a portion of date_col and there is no possible change in date_col that produces a disproportionate change in YEAR(date_col).

By way of contrast, suppose that you have a column named int_col whose type is INT. Now consider the expression POW(5-int_col,3) + 6. This would be a poor choice for a hashing function because a change in the value of int_col is not guaranteed to produce a proportional change in the value of the expression. Changing the value of int_col by a given amount can produce widely differing changes in the value of the expression. For example, changing int_col from 5 to 6 produces a change of -1 in the value of the expression, but changing the value of int_col from 6 to 7 produces a change of -7 in the expression value.

In other words, the more closely the graph of the column value versus the value of the expression follows a straight line as traced by the equation y=cx where c is some nonzero constant, the better the expression is suited to hashing. This has to do with the fact that the more nonlinear an expression is, the more uneven the distribution of data among the partitions it tends to produce.

In theory, pruning is also possible for expressions involving more than one column value, but determining which of such expressions are suitable can be quite difficult and time-consuming. For this reason, the use of hashing expressions involving multiple columns is not particularly recommended.

When PARTITION BY HASH is used, the storage engine determines which partition of num partitions to use based on the modulus of the result of the expression. In other words, for a given expression expr, the partition in which the record is stored is partition number N, where N = MOD(expr, num). Suppose that table t1 is defined as follows, so that it has 4 partitions:

  1. CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
  2.     PARTITION BY HASH( YEAR(col3) )
  3.     PARTITIONS 4;

If you insert a record into t1 whose col3 value is '2005-09-15', then the partition in which it is stored is determined as follows:

  1. MOD(YEAR('2005-09-01'),4)
  2. =  MOD(2005,4)
  3. =  1

MySQL 8.0 also supports a variant of HASH partitioning known as linear hashing which employs a more complex algorithm for determining the placement of new rows inserted into the partitioned table. See Section 23.2.4.1, “LINEAR HASH Partitioning”, for a description of this algorithm.

The user-supplied expression is evaluated each time a record is inserted or updated. It may also—depending on the circumstances—be evaluated when records are deleted.


Suchen Sie im MySQL-Handbuch

Deutsche Übersetzung

Sie haben gebeten, diese Seite auf Deutsch zu besuchen. Momentan ist nur die Oberfläche übersetzt, aber noch nicht der gesamte Inhalt.

Wenn Sie mir bei Übersetzungen helfen wollen, ist Ihr Beitrag willkommen. Alles, was Sie tun müssen, ist, sich auf der Website zu registrieren und mir eine Nachricht zu schicken, in der Sie gebeten werden, Sie der Gruppe der Übersetzer hinzuzufügen, die Ihnen die Möglichkeit gibt, die gewünschten Seiten zu übersetzen. Ein Link am Ende jeder übersetzten Seite zeigt an, dass Sie der Übersetzer sind und einen Link zu Ihrem Profil haben.

Vielen Dank im Voraus.

Dokument erstellt 26/06/2006, zuletzt geändert 26/10/2018
Quelle des gedruckten Dokuments:https://www.gaudry.be/de/mysql-rf-partitioning-hash.html

Die Infobro ist eine persönliche Seite, deren Inhalt in meiner alleinigen Verantwortung liegt. Der Text ist unter der CreativeCommons-Lizenz (BY-NC-SA) verfügbar. Weitere Informationen auf die Nutzungsbedingungen und dem Autor.

Referenzen

  1. Zeigen Sie - html-Dokument Sprache des Dokuments:en Manuel MySQL : https://dev.mysql.com/

Diese Verweise und Links verweisen auf Dokumente, die während des Schreibens dieser Seite konsultiert wurden, oder die zusätzliche Informationen liefern können, aber die Autoren dieser Quellen können nicht für den Inhalt dieser Seite verantwortlich gemacht werden.
Der Autor Diese Website ist allein dafür verantwortlich, wie die verschiedenen Konzepte und Freiheiten, die mit den Nachschlagewerken gemacht werden, hier dargestellt werden. Denken Sie daran, dass Sie mehrere Quellinformationen austauschen müssen, um das Risiko von Fehlern zu reduzieren.

Inhaltsverzeichnis Haut