Recently I have been working on rewriting the innards of an open source library. When working through understanding how it fully works (since the full “feature set” is not clearly defined in documentation or tests), I will come along areas where “similarly-usable” blocks of code have been combined into parameterized methods. “Turn that copy paste code into a method”, is what your CS professor maybe told you. They were correct, in a way. As a novice programmer you may have run into situations where you were literally copying and pasting code across different uses, like some program that gets data from different sources and processes the text to calculate the length. But when code gets more complex and particularly more poorly defined (as is the case in Software Development), the whole “let me turn reused code into a function call” mentality can reach a breaking point.
The reason for this is because while creating functions oftentimes is good for preventing reuse of code, it is a bad for a more linearly readable code path, and that matters during debugging. Because you’re not just producing some little block of code that checks all the green boxes on an automated testing environment and thus you’ve completed your task for the grade. You are either maintaining or creating pieces of code that will be maintained moving forward. Either way, the code will be changed, tweaked, and improved upon consistently, and poorly comprehensible code paths will hurt that.
So the solution is to ask yourself when you are creating new methods, “what is this method actually used for?”. In the case of MQTT.js, the open source repository I’m in the depths of at present, I do not think the original authors knew what they wanted from each method. I think this because the method names are vague and their calls are highly parameterized. _cleanup
takes in a boolean and an optional callback. There is reconnect
which calls _reconnect
, and setupStream
, which sets up a stream in the sense that it creates a stream from a factory method and then creates a writable stream as well and connects that to the original stream and then creates a method called work which does work supposedly and then passes in that work function to be called somewhere and also we are creating a CONNECT
packet which will be used for connecting and then we also potentially add some information to the connect packet and then we send the connect packet and then we maybe send the authentication packet and then we set up the CONNACK
Timer and then we also adjust the max listeners on the stream and finally we have “setup the stream”.
The problem that this codebase runs into is one of the main problems that have to be actively combatted in modern collaborative codebases. Someone smart maybe writes a small bit of code to do something, maybe based around a spec. And then the spec changes or bugs / missed features are added, and maybe people add features outside of the spec but that are just useful to have in an implementation, and all of a sudden what might’ve been a useful method name becomes a vague gloop bucket, and anyone trying to repair the code will have to wade through every gloop bucket method to figure out what the hell the code is actually doing. And then when they get to implementing, they either choose the route of plopping their new functionality into the existing gloop bucket methods or creating sparkly new methods that are succinct in principle but also technically implement a concept that is now spread over multiple methods.
In my experience, fixing this problem when it’s gotten out of hand is far more difficult than catching the rot early on. Either way, don’t take it for granted that internal method signatures in a code base make sense, and when you sense there’s a misalignment with the functionality and the naming, it might indicate a greater entanglement of functionality in a codebase that needs to be nipped in the bud.