← Back to context

Comment by ykonstant

12 days ago

A related problem I have as an amateur is the lack of structure documentation. I clone a repo and try to figure the code out, and I am met with complex folder structures hosting interconnected source files that call functions among them in ways I have to disentangle. There is usually no high level schematic, no bird's eye view documentation, no discussion of the overall flow of execution or the main code paths.

Why? I don't understand; twenty-thirty years ago, I was taught to provide visual high level schematics and structural documentation. The latter may be difficult, but surely some schematics with notes on intent cannot be too onerous to draw.

This is so interesting. I've heard that the whole "learning styles" thing is basically bunk, but this really reminds me of that.

I don't find the kind of schematics you're asking for to be useful at all. When I do come across them, I see them as busy-work that people had to make to justify starting some project, or because it's a required artifact. But I never think "oh good, here's a useful thing".

Those are honestly my unconscious thoughts on these sorts of things. But I do realize, if I think about it consciously for a moment, that some people must find this sort of thing useful.

But it's just different strokes for different folks! What I want is a description of the goals of a piece of software - what should I expect to be able to do with this? - and an entry-point, and prose documentation on what each component of the code is and why it exists. But a visual birds eye view of what is connected to what is just not how I go about understanding things.

  • The thing that both of your goals have in common is that they require software which is architected and not just an interconnected mess of everything calling everything else.

    If you do have such a mess, you can't really get a good visual overview (it's approaching a complete graph), and you also can't get prose documentation of code components because there are no clear responsibilities.

    • Well, I agree that better architecture is useful, but I don't think it's required by my "digging into what's going on starting from an entry point" process. I do think trying to write the prose documentation I described tends to drive better architecture. Trying to write the documentation on a module of spaghetti often what helps me realize there is no through line to why all this stuff is in this module.

  • Learning styles is bunk because it's effectively a practice effect that's self-reinforcing, not because we don't have preferences at all. So the fact that you have not found utility innately in those schematics (per this thesis, i'm just using it to illustrate the learning concept) would mean that you have not had as much interactive exposure with it, and you therefore decline to engage with them. The bunk aspect of learning styles is that you could deliberately engage with them and doing so (again, in a theoretical learning context, not doing work) in combination with other ways of engaging with the code stack would lead to better and deeper learning than just sticking with what you are comfortable with. Plus the added bonus of you improving your use of the schematics for future learning endeavors. So to your point, in learning, "different strokes for different folks" just reflects the methods of learning you have been most exposed to, and is quite malleable!

    • Maybe, but as I get older I find this sort of stuff increasingly unconvincing. I've had a lot of time to get a lot of exposure to a lot of different things now, and I have more faith in my own sense of my preferences than in "well you just haven't done the other stuff enough!" now.

      I think people prefer different things because they prefer different things.

It's not just a problem when you are an amateur. This is sth that every project should provide.

But there are also many projects which do. Sometimes you need to search a bit for it. Actually I would expect that most big projects have such documentation somewhere in some form.

- WebKit: https://github.com/WebKit/WebKit/blob/main/Introduction.md

- Chrome/Chromium: https://www.chromium.org/developers/how-tos/getting-around-t...

- PyTorch: https://github.com/pytorch/pytorch/blob/main/CONTRIBUTING.md...

- RETURNN (my own): https://returnn.readthedocs.io/en/latest/getting_started/tec...

- Mold: https://github.com/rui314/mold/blob/main/docs/design.md

And then for some popular projects you will also find some independent overviews:

- Quake: https://fabiensanglard.net/quake3/ (and many more on https://fabiensanglard.net/)

- Linux: https://tldp.org/LDP/khg/HyperNews/get/tour/tour.html

- CPython: https://realpython.com/cpython-source-code-guide/

- LLVM: https://blog.regehr.org/archives/1453

One problem is of course that those documents can be outdated and don't go into much details. But they still will give you important insights and should be a good starting point.

Thank you for putting a name to something I really struggle with. Learning a programming language in school and hacking together quick projects did NOT prepare me for jumping into an actual codebase with all of its internal complexity. "Structural documentation" would be a godsend to me for most of the repos I look at.

On a tangential note, I also wish I had a better understanding of which files are hand-crafted and important to grok and which are just boilerplate that was autogenerated by a script or copy+pasted from some doc. There are a lot of files that are intimidating to look at, but if I talked to the developer who implemented it they might say "Oh you don't need to worry about that, you just need to include that as config for package X".

I feel the same about folder structures. Each language, project, and framework seems to have its own conventions about where to put files. And it's often poorly documented.

I think larger open-source projects are more likely to include schematics since they onboard more contributors. Niche projects, or projects with only a few core contributors, are less likely to spend time documenting high-level schematics.

I don't think I've ever seen a schematic for a program in all my years programming. Now that you mention it, it really is peculiar given most if not all other schools of engineering have schematics drawn up at some point or another.

  • To be fair, all other schools of engineering concern themselves with constructing actual physical objects, not abstract intangible mathematical structures.

    • To be fair, our school of engineering has moved from parsing payroll and accounts payable and is now controlling physical objects and their interactions with the physical world, so, its probably time to start taking things seriously like our peers

      3 replies →

  • Isn’t that sort of what a UML diagram is?

    I suspect they aren’t used often because most programming is carpentry rather than engineering. Which isn’t intended as a slight… my life would be much less pleasant without furniture.