Mysql replication

  • Master has a “dump thread” that reads the binary log and sends it to the slave IO thread
  • IO thread reads the binlog from master and puts it into a relay log
  • A SQL thread reads entries from the relay log and writes it to slave disk
  • By default both the IO and SQL thread is single threaded. In mysql 5.7 onwards there is the ability to enable parallel replication.


  • binlog file position based replication
  • GTID based replication

binlog file position

  • unique by filename and position - replication coordinates
  • Configuration
    • On primary
      • set: server-id
      • enable binary logging: log-bin
      • create replication user
    • On replica
      • mysql> CHANGE MASTER TO -> MASTER_HOST='source_host_name', -> MASTER_USER='replication_user_name', -> MASTER_PASSWORD='replication_password', -> MASTER_LOG_FILE='recorded_log_file_name', -> MASTER_LOG_POS=recorded_log_position;


  • unique identifier produced for every transaction upon commit
  • GTID = source_id: transaction_id
  • GTIDs are stored in a table named gtid_executed, in the mysql database
  • each transaction can be identified and tracked as it is committed on the originating server and applied by any replicas; not necessary to refer to log files or positions within those files when starting a new replica or failing over to a new source, which greatly simplifies these tasks
  • Configuration
    • Primary and replica
      • Enable gtid : gtid_mode=ON, enforce-gtid-consistency=ON


replication types
  • async - default, commits happen on the master, doesn’t care if slave has applied the change
  • semi-sync - ensures that at least one slave has received the the commit and stored it into the relay log, only then writes it to master binlog. However slave not necessarily committed it
  • group replication / sync - 2PC - ensures that data is committed on at least one slave
debugging replication lag
  • Replication lag can happen either due to a delay on the IO Thread (delays in transferring the data from the primary to the replica) or SQL Thread (due to delays in processing the backlog of the binlogs)
  • To determine which is the root cause
    1. Run “SHOW MASTER STATUS” on the primary node. Take note of the file and the position
    2. On the replica run “SHOW SLAVE STATUS”. Take note of the following fields:
      • Master_Log_File and the Read_master_log_Pos - shows the copied over binlogs
      • Relay_Master_Log_file - indicates at which point which the SQL Thread has caught up with the binlog
    3. If the gap is between the maste_log_file on the primary and replica, the lag is due to the IO Tread. If the gap is between the relay_master_log_file and the master_log_file on the replica, then the delay is due a lag in the SQL Thread


Enabling parallel replication

Slave parallel type

  • DATABASE: Parallelism only among multiple databases, not too useful
  • LOGICAL_CLOCK: Parallism within the same database. In 5.7.4 it uses intervals to determine which are the transactions that can be executed together. The larger the interval, the more transactions that can be executed together. In 8.0 the ability to identify the intervals have been approved.


How to determine the performance of parallel replication?

In the performance_schema tables, you can get timing of the coordinator and workers on the replica via the replication_applier_status_by_coordinator, replication_applier_status_by_worker tables


Optimizing parallel replication

  • Slowling down the master to batch commit transactions together binlog_group_commit_sync_delay, binlog_group_commit_sync_no_delay_count


Binlog servers

Server that can forward binary logs along. Removes the need for intermediary masters solely for the purpose of forwarding bin logs between distributed DCs. Also ensures that the binary logs forwarded are exactly the same as received from the master.

checking actual replication lag

logical backup and restore


Differences between primary and replica in replication

Because the binlog is logical - this is possible. However if there are alter tables that refer to indexes that dont exist on the other node it could break the replication