I’ve recently been handed over the reigns to a project at work which is rather important to our traders. Unfortunately, it’s a system with a lot of external dependencies – by my count, there’s about seven different external systems that could fail at anytime, and if any one of them go down, there’s not much we can do until it comes back up.
So, today was pretty much as bad as it gets – three of the external systems went down during different parts of the day. Unfortunately, the reason they go down seems to vary every time, making it almost impossible to simulate in a test environment. So the best I can do is code as defensively as possible around the issue, try and extrapolate other reasons which may cause failure, and then release another patch. And of course, just when I think I’ve got everything solved, along comes another new problem, the likes of which I haven’t seen before.
Worst of all, due to the different timezones supported by the external systems, I have to wait until they’ve all finished their end-of-day cycles before I can release a patch. Unfortunately for me, this happens to be at 10:45 PM local time. So, recently, night after night I’ve been staying up, waiting for 10:45 PM to tick over, so I can update the server, run some tests, and then go to bed.
15 minutes more…