High Availability is key to any production environment. Anyone who has worked in consumer-based applications knows how hard it is to maintain uptime and make the application available 24/7 — especially in a fast-evolving environment, or one that needs to change quickly to meet demand for tough competition.
New features in applications mostly require changes at schema level of the database, and to make such changes with billions of rows in a single table is very hard. It always requires metadata lock on a table where you want to make changes and MySQL makes a copy of the table, which takes significant time depending on the size of the table. This become more complex if we also maintain referential integrity at the database level.
We’ve made database level changes of all sizes, and with few of proprietary databases for Oracle and SQL Server. To make changes on these databases, if they are running in an enterprise kind of environment, its not that hard you can handle downtime for few hours on low-peak hours, or even on weekends, to alter the schema or perform version upgrades.
Many of the big, globally successful startups choose MySQL as their back-end database, and we are no exception. Back in 2014, Careem databases were less than a few hundred gigabytes, but as Careem became a phenomenal success, the data growth skyrocketed and altering of the schema become extremely challenging.
At the start of 2016 we started looking at a few alternatives to make the changes in Careem database schema. I still remember our discussion with one Square engineer who made a utility for online schema change and it was available on GitHub with the name of Shift. But in our case, the biggest obstacle was database foreign key constraints which Shift didn’t support. We also has a look at Percona, the famous utility of pt-online-schema-change, but again, the problem was foreign key constraints, but we decided to go with the same trigger based approach.
Our experience with that approach went horribly wrong and at that time we decided never to go this route again. Careem runs its infrastructure on AWS and heavily use its RDS service for it all databases. We decided to start work on an alternative approach and since then we’ve been working to make any alterations first in the dedicated slave (read replica) and then use our terraform scripts to create the new master-slave hierarchy.
- Create a new Slave(read replica).
- Stop replication manually.
- Alter the schema – e.g. add columns/index
- Resume replication.
- Create new chain replication.
- Shift all read-only traffic dedicated replicas.
At the time of planned downtime:-
- Stop writing to master database by stopping applications.
- Update Route53 to point to the new database.
- Reboot the old database to release all connections.
- Promote the new database as a new master.
- Start all applications and verify they are making connections to new DB.
- Make the new master as Multi-AZ.
After creating the hierarchy we usually start redirecting traffic by using AWS Route53 weightage routing and at time of the swap, we take clean planned downtime of no more than 10 minutes to promote the new master.
The downside of this approach is, of course, a 10 min downtime, but we’ve learned that if you manage this at very low peak hours it’s manageable.
After the release of the new MySQL version 8.0.13 — which comes with the Tencent patch supporting instant alteration of the schema — it will make life easier but we are still evaluating this new MySQL feature.
This article was written by Khurram Naseem, a database architect at Careem, with contributions from Lamak Qaizar.