17.3.2 Handling an Unexpected Halt of a Replication Slave

Contents

In order for replication to be resilient to unexpected halts of the server (sometimes described as crash-safe) it must be possible for the slave to recover its state before halting. This section describes the impact of an unexpected halt of a slave during replication and how to configure a slave for the best chance of recovery to continue replication.

After an unexpected halt of a slave, upon restart the slave's SQL thread must recover which transactions have been executed already. The information required for recovery is stored in the slave's relay log info log. From MySQL 8.0, this log is created by default as an InnoDB table named mysql.slave_relay_log_info (with the system variable relay_log_info_repository set to the default of TABLE). By using this transactional storage engine the information is always recoverable upon restart.

Updates to the relay log info log table are committed together with the transactions, meaning that the slave's progress information recorded in that log is always consistent with what has been applied to the database, even in the event of an unexpected server halt. Previously, this information was stored by default in a file in the data directory that was updated after the transaction had been applied. This held the risk of losing synchrony with the master depending at which stage of processing a transaction the slave halted at, or even corruption of the file itself. The setting relay_log_info_repository = FILE is now deprecated, and will be removed in a future release. For further information on the slave logs, see Section 17.2.4, “Replication Relay and Status Logs”.

When the relay log info log is stored in the mysql.slave_relay_log_info table, DML transactions and also atomic DDL make the following three updates together, atomically:

Apply the transaction on the database.
Update the replication positions in the mysql.slave_relay_log_info table.
Update the GTID in the mysql.gtid_executed table (when GTIDs are enabled and the binary log is disabled on the server).

In all other cases, including DDL statements that are not fully atomic, and exempted storage engines that do not support atomic DDL, the mysql.slave_relay_log_info table might be missing updates associated with replicated data if the server halts unexpectedly. Restoring updates in this case is a manual process. For details on atomic DDL support in MySQL 8.0, and the resulting behavior for the replication of certain statements, see Section 13.1.1, “Atomic Data Definition Statement Support”.

Exactly how a replication slave recovers from an unexpected halt is influenced by the chosen method of replication, whether the slave is single-threaded or multithreaded, the setting of variables such as relay_log_recovery, and whether features such as MASTER_AUTO_POSITION are being used.

The following table shows the impact of these different factors on how a single-threaded slave recovers from an unexpected halt.

Table 17.3 Factors Influencing Single-threaded Replication Slave Recovery

GTID	MASTER_AUTO_POSITION	`relay_log_recovery`	`relay_log_info_repository`	Crash type	Recovery guaranteed	Relay log impact
OFF	Any	1	TABLE	Server	Yes	Lost
OFF	Any	1	Any	OS	No	Lost
OFF	Any	0	TABLE	Server	Yes	Remains
OFF	Any	0	TABLE	OS	No	Remains
ON	ON	Any	Any	Any	Yes	Lost
ON	OFF	0	TABLE	Server	Yes	Remains
ON	OFF	0	Any	OS	No	Remains

As the table shows, when using a single-threaded slave the following configurations are most resilient to unexpected halts:

When using GTIDs and MASTER_AUTO_POSITION, set relay_log_recovery=1. With this configuration the setting of relay_log_info_repository and other variables does not impact on recovery. Note that to guarantee recovery, sync_binlog=1 (which is the default) must also be set on the slave, so that the slave's binary log is synchronized to disk at each write. Otherwise, committed transactions might not be present in the slave's binary log.
When using file position based replication, set relay_log_recovery=1 and relay_log_info_repository=TABLE.

Note

During recovery the relay log is lost.

The following table shows the impact of these different factors on how a multithreaded slave recovers from an unexpected halt.

Table 17.4 Factors Influencing Multithreaded Replication Slave Recovery

GTID	`sync_relay_log`	`MASTER_AUTO_POSITION`	`relay_log_recovery`	`relay_log_info_repository`	Crash type	Recovery guaranteed	Relay log impact
OFF	1	Any	1	TABLE	Any	Yes	Lost
OFF	>1	Any	1	TABLE	Server	Yes	Lost
OFF	>1	Any	1	Any	OS	No	Lost
OFF	1	Any	0	TABLE	Server	Yes	Remains
OFF	1	Any	0	TABLE	OS	No	Remains
ON	Any	ON	Any	Any	Any	Yes	Lost
ON	1	OFF	0	TABLE	Server	Yes	Remains
ON	1	OFF	0	Any	OS	No	Remains

As the table shows, when using a multithreaded slave the following configurations are most resilient to unexpected halts:

When using GTIDs and MASTER_AUTO_POSITION, set relay_log_recovery=1. With this configuration the setting of relay_log_info_repository and other variables does not impact on recovery.
When using file position based replication, set relay_log_recovery=1, sync_relay_log=1, and relay_log_info_repository=TABLE.

Note

During recovery the relay log is lost.

It is important to note the impact of sync_relay_log=1, which requires a write of to the relay log per transaction. Although this setting is the most resilient to an unexpected halt, with at most one unwritten transaction being lost, it also has the potential to greatly increase the load on storage. Without sync_relay_log=1, the effect of an unexpected halt depends on how the relay log is handled by the operating system. Also note that when relay_log_recovery=0, the next time the slave is started after an unexpected halt the relay log is processed as part of recovery. After this process completes, the relay log is deleted.

An unexpected halt of a multithreaded replication slave using the recommended file position based replication configuration above may result in a relay log with transaction inconsistencies (gaps in the sequence of transactions) caused by the unexpected halt. See Section 17.4.1.33, “Replication and Transaction Inconsistencies”. If the relay log recovery process encounters such transaction inconsistencies they are filled and the recovery process continues automatically.

When you are using multi-source replication and relay_log_recovery=1, after restarting due to an unexpected halt all replication channels go through the relay log recovery process. Any inconsistencies found in the relay log due to an unexpected halt of a multithreaded slave are filled.

Find a PHP function

Document created the 26/06/2006, last modified the 26/10/2018
Source of the printed document:https://www.gaudry.be/en/mysql-rf-replication-solutions-unexpected-slave-halt.html

The infobrol is a personal site whose content is my sole responsibility. The text is available under CreativeCommons license (BY-NC-SA). More info on the terms of use and the author.

References

Manuel MySQL : https://dev.mysql.com/

These references and links indicate documents consulted during the writing of this page, or which may provide additional information, but the authors of these sources can not be held responsible for the content of this page.
The author This site is solely responsible for the way in which the various concepts, and the freedoms that are taken with the reference works, are presented here. Remember that you must cross multiple source information to reduce the risk of errors.

Contents Haut