The last time I wrote this (did you read that dev blog? It's really good, I promise), Tranquility's auto-reboot on weekends was approximately 4 minutes and 20-40 seconds, just enough for a quick cup of tea. Today's auto-reboot, a year and a half later, was 3 minutes and 34 seconds - just enough for a quicker cup of tea (I measured it while writing this dev blog). Given the improvements we have made since 2019, an auto reboot downtime of 3 minutes and 30-40 seconds is pretty normal these days.
I could focus on this improvement of 50-60 seconds, a 22.5% improvement between 3 Dec 2019 and 3 July 2021, and predict the end of downtime on 19 December 2026 with this super-scientific graph, but the reality is more complicated than that.There is a (soft) lower bound of approximately 3 minutes given the three different activities during downtime - shutdown, database jobs, startup - which last approximately 1 minute each, unless fundamental changes are made, and the most fundamental one is still to not have any downtime at all. Downtime will not become much less than 160-200 seconds; instead there must first be fewer downtimes and then none at all. Nevertheless, I wanted to start this blog with a concrete example of improvements made in downtime reduction since last time. And now another no-downtime experiment is being planned for September 9.
- The purpose of this second no-downtime experiment is at least four-fold:
- Verify the fixes made for the issues discovered in the previous experiment in the live production environment
- Verify that no other code/features have regressed since last time and in general look for further issues
- Observe memory usage
- Verify that our technology platform (which you will hear more about later) is not making any downtime assumptions
So what did we discover last time, I hear you ask?
Please leave a comment and or let us know how we are doing.