An easy way to describe MySQL’s Binary Log Group Commit

It struck me today; there is an easy way to describe MySQL’s Binary Log group commit improvements from MySQL 5.0-5.7 by using the example of a single ferry trying to ship passengers from point A to point B:

Ferry from Point A to point B

MySQL 5.0 Behaviour

In MySQL 5.0, the ferry will pick up the next passenger in line from point A, and transfer them to point B. The trip between A and B takes about 10 minutes return trip, so it’s possible that several new passengers will arrive while the ferry is in transit. That doesn’t matter; when the ferry arrives back at point A, it will only pick up the very next passenger in line.

MySQL 5.6 Behaviour

In MySQL 5.6, the ferry will pick up all passengers from the line at point A, and then transfer them to point B. Each time it returns to point A to pick up new passengers, it will collect everyone who is waiting and transfer them across to point B.

This is measurably better performance in real-life situations where many passengers tend to arrive while waiting for the ferry to arrive back at point A, and the trip between A and B tends to take some time. It is not so measurable in naive benchmarks that run in a single-thread.

There is no configuration necessary to enable group commit in 5.6. It works by default.

MySQL 5.7 Behaviour

MySQL 5.7 behaves similarly to 5.6 in that it will pick up all waiting passengers from point A and transfer them to point B, but with one notable enhancement!

When the ferry arrives back at point A to pick up waiting passengers, it can be configured to wait just a little bit longer with the knowledge that new passengers will likely arrive. For example: if you know the trip between point A and point B is 10 minutes in duration, why not wait an extra 30 seconds at point A before departing? This may save you on roundtrips and improve the overall number of passengers that can be transported.

The configuration variables for artificial delay are binlog-group-commit-sync-delay (delay in microseconds) and binlog-group-commit-sync-no-delay-count (number of transactions to wait for before deciding to abort waiting).

Conclusion

In this example passengers are obviously transactions, and the ferry is an expensive fsync operation. It’s important to note that there is just one ferry in operation (a single set of ordered binary logs), so being able to tune this in 5.7 provides a nice level of advanced configuration.

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I’ve also worked at Percona and InPowered.

  • Mark Callaghan

    I would have preferred back in the day for this to be called out in the upstream docs as a serious performance problem. I even remember a post about group commit being fixed for InnoDB, which was true when you used InnoDB without replication.

    Most vendors do the same (marketing in documentation). I don’t think it helps the community.

    • I don’t think it helps the product experience either (to have lurking pitfalls).

      Not to make excuses for why it wasn’t documented as such, but we have been working very hard on removing lurking rough edges in MySQL 5.7.

  • Weixiang Zhai

    Hi, Morgan, I have a question : Why using the same mutex/cond during every stages? This will lead to hot mutex contention of LOCK_done

    • I have an answer back from the Developer on this:

      ——-

      We use it during every stage, because it is simple and no obvious cost. In BGC, there are leader thread and non-leader thread. non-leader threads need to wait its leader to finish the BGC process. They wait Cond_done broadcasted by their leader. That is protected by LOCK_done. Leader threads need to wakeup all other thread in its queue after finishing the BGC process. They broadcast Cond_done to non-leader threads and that is also protected by LOCK_done.

      You can say it is hot, since very thread needs to use it. But first, it doesn’t block BGC process. Second, both of leader and non-leader threads just hold the lock a very little time. And most of the threads use it when they are waiting for their leader to finish BGC process. So it is not a matter.

      —-