Lessons We All Learn
-
Email is the worst monitoring and alerting mechanism except for all the others.
-
A good monitoring system is the basis of any good architecture.
-
Do not make production changes on Fridays, you will gain enemies if you do.
-
You cannot deliver a secure, maintainable, well-documented project if your biggest priority is delivering it as fast as possible.
-
Your most critical services are kept alive by a handful of people whose job description does not mention those services at all.
-
Most of your actual work is not covered by your OKRs.
-
Absence of a signal is itself a signal.
-
The severity of an incident is measured by the number of rules broken in resolving it.
-
If a post-mortem follow-up task is not picked up within a week, it's unlikely to be completed at all.
-
There is no cloud, it's just someone else's computer.
-
Serverless isn't.
-
When you determine "human error" as the root cause, then you're doing it wrong.
-
Git should be your only source of truth. Discard any local files or changes, what's not pushed into the repository, does not exist.
-
If you break it, you own it - for now. If you fix it you own it forever.
-
"Obsolete" doesn't mean it's not in use and relied on heavily.
-
If you see a big name company give a talk at a conference about some cool thing they made, it's probably already been abandoned by that company.
-
"Prod" is just another name for "staging". In other words you test in Prod.
-
Your infrastructure uses a lot more self-signed certificates than you think. A lot more. In places that make you weep.
-
Self-signed certificates beget long lived certs, which beget lack of certificate validity monitoring, which begets curl -k, which begets a lack of certificate deployment automation, which begets self-signed certificates.
-
Containers create at least as many problems as they solve.
-
Kubernetes creates problems that aren't even invented yet.
-
The source you're looking at is not the code running in production.
-
Two is one, and one is none.
-
Very few operations are truly idempotent.
-
"Asserting state" beats "monitoring for compliance" any day.
-
Your network team has a way into the network that your security team doesn't know about.
-
There are very few network restrictions creative and determined use of ssh port forwarding can't overcome.
-
It is tempting to jump right into implementing a solution when the right thing may well be to not do the thing that requires the solution in the first place.
-
Turning things off permanently is surprisingly difficult.
-
That "completely automated" solution you set up requires at least three manual steps you didn't document.
-
Schrödinger's Backup -- "The condition of any backup is unknown until a restore is attempted." -- is overly optimistic. If you’ve never restored from a backup, you don’t actually have backups.
-
If you’ve never failed over to another region, you don’t actually have failovers.
-
If you’ve never rolled back a deploy, you don’t have a mature deploy pipeline.
-
In any organization practicing continuous integration, half of all commits are to fake out CI tests.
-
There's an xkcd for the precise situation you find yourself in.
-
Eventual consistency doesn't help when the system you're debugging hasn't converged yet.
-
Real change can only be implemented above layer 7.
-
Any sufficiently successful product launch is indistinguishable from a DDoS; any sufficiently advanced user indistinguishable from an attacker.
-
Your herculean efforts to upgrade the OS across your entire fleet completed just in time for the EOL announcement of the version you upgraded to.
-
Doubling your time estimate in the hopes of beating expectations won't work because your manager takes your estimate, has a hardy laugh, and then resets it back to what they already promised upchain.
-
Management will always happily spend $$$ on outside consultants to tell them what you've been saying for years.
-
Management will much rather invest in inventing a new, square wheel than fixing an old round one.
-
A: However well you understand concurrency, you only need one coworker who doesn't understand concurrency to make your life an unending hell. B: You always have at least one coworker who doesn't completely understand concurrency.
Note: Many of these came from Jan Schaumann's Tweets :thumbsup:.