When Rot Happens
日本語はこちら。
I'm sure everyone has seen repositories that you aren't really happy working in. I've seen my fare share of them and, to be honest, have helped create (or didn't help fix) some of them.
Some people call it "rot" and in this post, I'd like to write about how I've seen them start, the small symptoms that consitute them, and how to try to mitigate them.
Broken Windows Theory
No repository or starts with a rot. The first file you create, the first line you write, the first build process you setup, it's all nice and clean. Most likely, you've written your best code in the first few days of the repository.
But as time goes by and you (and your team) keep on adding line after line, deadlines start to loom, there's that little bit of something that is not up to snuff, but you don't have time to fix it. You'll get to it later. The small thing is nothing major, it's some little bit of if
statement that you know will work but you also know doesn't quite fit in there.
Now if that was all there is to it, it's probably ok. The problem is most likely it's not the last line you write nor the last your team writes. Somebody in the team sees the code, thinks it's "ok" to do the same, adds another if
statement or maybe does something that's equally not up to snuff...
Repeat this process 5 times..voila! Your code has rotten.
This is what we call the Broken Windows Theory in action. The theory states that if you have a broken window in a building, soon, all the windows will be broken too. It illustrates how one bad practice can lead to other bad practices and in the end you have a mess.
The Symptoms of a the Little Rot
The above example should give you some hint on what the first symptoms are but let me give some more examples of what I've seen in the past.
Bad naming: Bad names in methods and classes are a easy one to creep in. They seem benign at first but one bad name tends to lead to another as many engineers try to match the code around them so one bad naming will get copied over multiple times.
Unused files: These are a files left over but not actually used (e.g. previous implementations, classes not being referred etc). It's actually a dangerous rot as other engineers might trip upon it thinking it's some working code, try to read it and mistakenly think that's the behavior of the system.
Warnings not dealt with: I'm sure people have seen code where there's hundreds for warning popping up either at compile time or runtime. They're "warnings" so it doesn't actually stop your development (just annoying). People tend to deal with the first warning that pops up as it feels embarrassing but once one is ignored, it's easy to ignore the rest resulting in tens and hundreds of them popping up.
READMEs out of date: This is actually quite common. READMEs are actually quite important as it makes it much easier for new engineers to get up to speed. But as the development progresses and things change, the READMEs are something that gets forgotten about very quickly.
There's many more examples but I hope you get the idea. Again, it's that small bit of something that's not quite right that you let it creep in.
The Rot that Got Big
Now that I've got some little symptoms out of the way, let's talk about how they can end up i.e. the "Horror Stories".
One time, I was on a team that wasn't really interested in making the engineers' experience better. Most engineers on the the team were only interested in getting "their own work done". This led to a lot of rot creeping in.
One that caught my eye was the sheer number of Git branches in the repo. We had about 20 engineers working on the same repo and repo had 250 branches. There was branches from people who had left, some forgotten, PRs that is yet to be reviewed, etc. Also you could see engineers suddenly introducing new versions of an API and at the same time another engineer making changes to the old version of the API
What was really worse was that, once in a while, there would be a team member that asks "shouldn't we clean them up?" but there was pretty much silence from the rest of the team.
I guess the "rot" had jumped out of the code and into the team.
How to Fix the Rot
The last example, I didn't have the power to fix it but if I were to fix it now (and assuming I had the authority), the one thing I would start with would be to fix a small thing. It could be fixing a typo in the code, adding some documentation, updating the README, removing a compiler warning. The small thing should preferably be something that's:
easy to fix (like under an hour)
not controversial
noticable
The reason for the first two points are probably obvious. You probably want the fix to be something that's easy to do and not something that's going to cause a lot of discussion. The last point is important as you want the fix to be something that's noticable. You want people to see that something has changed. From there, other engineers will notice and may start to fix other things (or at least try not to add more rot).
This assumes people are not evil and when they see good practice, they will try to follow. When one person follow, there's a higher chance that others will follow too. This is what we're aiming for when we fix the small things. You want that virtuous cycle to start. There's a limit to what a single person can do but if you get others on board, I can assure you it will make a difference.
I might be repeating myself many times at the end of every blog but this is no easy task and it will take time. Best is not to allow too much rot to creep in but once it does, you will need to keep at it to actually fix it.