Research shows that code comments rarely stay in sync with the code they’re describing. Other psychology research shows that incorrect comments are worse than no comments at all. Should we get rid of code comments entirely or are some worth keeping around?

A 2019 study¹ of three million commits over 1500 projects, confirmed earlier findings that in most of the cases, code and comments don’t co-evolve. In other words, comments easily drift out of sync from the code that they are supposedly describing.

That study looked at comments that were checked in with the code. I can only assume that it’s even worse when the documentation is written in another system entirely, such as a Word document or a wiki.

Psychology tells us that we will tend to believe those incorrect comments even if we recognize that they may be out of date.² This will result in the wasting of time and effort and we head down the wrong path.

This problem would go away entirely if we stopped writing documentation for the code at all, and yet that introduces other problems. There is some level of documentation that is legitimately valuable enough that we want it present. The question is where do we draw the line? What things do we want to document and what things do we not?

API documentation

When we’re writing API’s, there is a fairly clear need for documentation. What does this API do? What parameters does it take? What side effects should we be aware of?

The assumption for most API’s is that the source code is not available, or at least not easily accessible. This makes the need for documentation much stronger.

“Why” comments

Comments that explain why a thing is being done can be legitimately valuable.

Why did we choose this particular algorithm?
Why are we explicitly handling an edge case that should be impossible?
Why are we calling a method a second time when once should have been enough?

These are things that will not be obvious from just reading the code and so having a comment can be useful.

“What” comments

Comments that explain what the code is doing are sloppy. Refactor the code to make it easier to read and to remove the need for that comment.

Comments explaining what a method does, imply that the method name isn’t clear enough. Rename it.
Comments around blocks of code in a method, imply that that block is doing something different from the rest of the method. Extract it into a method with a good name.
Comments describing the purpose of a variable, imply that the variable has a poor name. Give it a better one.

Conclusion

Recognizing that code and comments rarely stay in sync, we want as few comments as possible. Those few should be valuable, explaining things that cannot easily be inferred from the code. Then the code itself should be as readable as possible as it is the best documentation we have.

Wen, F., Nagy, C., Bavota, G., & Lanza, M. (2019). A Large-Scale Empirical Study on Code-Comment Inconsistencies. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). doi:10.1109/icpc.2019.00019 ↩
“You surely understand in principle that worthless information should not be treated differently from a complete lack of information, but WYSIATI makes it very difficult to apply that principle. Unless you decide immediately to reject evidence (for example, by determining that you received it from a liar), your System 1 will automatically process the information available as if it were true” - Thinking Fast and Slow by Daniel Kahneman ↩