← Back to context

Comment by sesm

3 months ago

To me the most common confusion when reading the code is the lack of documented intention behind the code. You can think of deciphering intention from code as a backwards problem, which is exponentially harder then forward problem (checking if given code is actually doing what's intended). Deciphering intention also gets much harder when there are bugs in the code, and you can't be sure if they are bugs or intentional behavior.

A related problem I have as an amateur is the lack of structure documentation. I clone a repo and try to figure the code out, and I am met with complex folder structures hosting interconnected source files that call functions among them in ways I have to disentangle. There is usually no high level schematic, no bird's eye view documentation, no discussion of the overall flow of execution or the main code paths.

Why? I don't understand; twenty-thirty years ago, I was taught to provide visual high level schematics and structural documentation. The latter may be difficult, but surely some schematics with notes on intent cannot be too onerous to draw.

  • This is so interesting. I've heard that the whole "learning styles" thing is basically bunk, but this really reminds me of that.

    I don't find the kind of schematics you're asking for to be useful at all. When I do come across them, I see them as busy-work that people had to make to justify starting some project, or because it's a required artifact. But I never think "oh good, here's a useful thing".

    Those are honestly my unconscious thoughts on these sorts of things. But I do realize, if I think about it consciously for a moment, that some people must find this sort of thing useful.

    But it's just different strokes for different folks! What I want is a description of the goals of a piece of software - what should I expect to be able to do with this? - and an entry-point, and prose documentation on what each component of the code is and why it exists. But a visual birds eye view of what is connected to what is just not how I go about understanding things.

    • The thing that both of your goals have in common is that they require software which is architected and not just an interconnected mess of everything calling everything else.

      If you do have such a mess, you can't really get a good visual overview (it's approaching a complete graph), and you also can't get prose documentation of code components because there are no clear responsibilities.

      1 reply →

    • Learning styles is bunk because it's effectively a practice effect that's self-reinforcing, not because we don't have preferences at all. So the fact that you have not found utility innately in those schematics (per this thesis, i'm just using it to illustrate the learning concept) would mean that you have not had as much interactive exposure with it, and you therefore decline to engage with them. The bunk aspect of learning styles is that you could deliberately engage with them and doing so (again, in a theoretical learning context, not doing work) in combination with other ways of engaging with the code stack would lead to better and deeper learning than just sticking with what you are comfortable with. Plus the added bonus of you improving your use of the schematics for future learning endeavors. So to your point, in learning, "different strokes for different folks" just reflects the methods of learning you have been most exposed to, and is quite malleable!

      1 reply →

  • It's not just a problem when you are an amateur. This is sth that every project should provide.

    But there are also many projects which do. Sometimes you need to search a bit for it. Actually I would expect that most big projects have such documentation somewhere in some form.

    - WebKit: https://github.com/WebKit/WebKit/blob/main/Introduction.md

    - Chrome/Chromium: https://www.chromium.org/developers/how-tos/getting-around-t...

    - PyTorch: https://github.com/pytorch/pytorch/blob/main/CONTRIBUTING.md...

    - RETURNN (my own): https://returnn.readthedocs.io/en/latest/getting_started/tec...

    - Mold: https://github.com/rui314/mold/blob/main/docs/design.md

    And then for some popular projects you will also find some independent overviews:

    - Quake: https://fabiensanglard.net/quake3/ (and many more on https://fabiensanglard.net/)

    - Linux: https://tldp.org/LDP/khg/HyperNews/get/tour/tour.html

    - CPython: https://realpython.com/cpython-source-code-guide/

    - LLVM: https://blog.regehr.org/archives/1453

    One problem is of course that those documents can be outdated and don't go into much details. But they still will give you important insights and should be a good starting point.

  • Thank you for putting a name to something I really struggle with. Learning a programming language in school and hacking together quick projects did NOT prepare me for jumping into an actual codebase with all of its internal complexity. "Structural documentation" would be a godsend to me for most of the repos I look at.

    On a tangential note, I also wish I had a better understanding of which files are hand-crafted and important to grok and which are just boilerplate that was autogenerated by a script or copy+pasted from some doc. There are a lot of files that are intimidating to look at, but if I talked to the developer who implemented it they might say "Oh you don't need to worry about that, you just need to include that as config for package X".

  • I feel the same about folder structures. Each language, project, and framework seems to have its own conventions about where to put files. And it's often poorly documented.

    I think larger open-source projects are more likely to include schematics since they onboard more contributors. Niche projects, or projects with only a few core contributors, are less likely to spend time documenting high-level schematics.

  • I don't think I've ever seen a schematic for a program in all my years programming. Now that you mention it, it really is peculiar given most if not all other schools of engineering have schematics drawn up at some point or another.

    • Isn’t that sort of what a UML diagram is?

      I suspect they aren’t used often because most programming is carpentry rather than engineering. Which isn’t intended as a slight… my life would be much less pleasant without furniture.

> you can't be sure if they are bugs or intentional behavior.

I'm in the middle of a huge legacy codebase and I keep asking two questions:

HOW was this working? WAS this working?

This is quite a valuable comment that I agree with 100% and from what I understand, a highly unpopular opinion. I hope more people see this.

  • There is consensus documenting why you did something is good (which is what root comment is talking about). Documenting what you did is commonly thought to be a crutch for writing unreadable code.

    • I hate the second thought, because not documenting is clearly not stopping people from committing the unreadable code. Instead we get "my code is self-documenting, I'm not going to write documentation".

      And as for the third opinion of "the documentation becomes out of date when the code changes", I would prefer slightly incorrect comments to decipher code rather than no comments to decipher code. Doubly so because I can compare the comments to historical revisions.

    • While I _absolutely_ agree with those sentiments, I have seen nothing like consensus on them myself (in mostly tech startups, but also fintech, and financial (yes, those are different things)). If I limit it to programmers I respect, the percentages go up, but to _maybe_ 75% tops.

  • The idea that code should communicate intent, whether by comments or other means, is popular and common. It ends up being somewhat difficult to execute. In practice, you have to write code like you write prose. Keep your audience in mind, and be aware of the contextual information available to that audience, and write code that will make sense to your particular audience, ignoring the needs of non-audiences. The tricky part is when your audience includes people in the future. But even how you account for a future of missing design docs and broken links will vary based on things like team composition, business/problem domain, is this open-source or not, etc.

I regularly have this discussion when it comes to comments.

The intention of the code is much more resilient than the details, so comments should generally focus on the why.

(It’s also important to document the code that isn’t there: algorithms that were rejected due to performance, “obvious” enhancements that don’t actually achieve the intended effect, etc.)

And no one ever talks about this. Doing a code review when dropped in cold in SO hard for this reason.

Yes, I can tell what you're doing here, but is that what you're SUPPOSED to be doing?

Maybe it's me, but I can usually guess intention fairly quickly and also imagine what the thought process was and what kinds of mistakes it probably has.

But heck, I'm old and I've been reading other people's code for decades.