On May 3, 2021 at around 8am Python Packaging Authority (pypa) made the decision disable access to pypi.org and files.pythonhosted.org from non-SNI clients including Python 2.7.6 which Meya v1 uses as it’s core:
- Meya v1 platform runs on Python 2.7.6. Python 2 is EOL as of 2020, and the recommendation is to migrate to Meya v2 running on Python 3
- Meya v1 AWS autoscaler relies on pip to spin up new nodes daily with increasing demand.
- pypa disabled Python 2.7.6 access to pip on May 3, preventing the platform from scaling up
- New new nodes could come online as demand increased, and existing nodes would eventually come offline to overload
- This resulted in a complete outage of Meya v1 (not v2)
How did we fix it?
We immediately became aware of the outage and created a Slack and Zoom incident response room to get Meya v1 back online as fast as possible, but the resulting solution was highly complex due to a series of cascading requirements:
- We upgraded Python to 2.7.12, which created a series of cascading changes
- We were required to upgrade the operating system to Ubuntu 16.04 (Xenial) from Ubuntu 14.04
- This upgrade resulted in MySQL connectivity issues related to SSL certificate verification, which we resolved using AWS certificate management features
- Bots now came back online, but they were too slow to meet QoS threshold, so we optimized our database connection management
The result of these changes:
- 4:30pm ET: bots came back online, but slow
- 6:30pm ET: bots were now fast, but not yet fully stable
- 8:00pm ET: bots were now stable
- Ongoing bots are even faster due to the combination of optimizations undertook in the process
What was affected?
- All of v1 platform was affected
- v2 was not affected whatsoever
What will change moving forward?
- We’ve made the call to freeze our v1 codebase into an image to remove the pip dependency for scale up. This will prevent future deprecations by pypa/pip/python however unlikely
- We’ve adjusted our monitoring to account for similar situations if they are to come up again
- We recommend customers migrate to Meya v2 within the next 6-12 months due to the EOL on Python 2 itself