Web Browser Engineering (2021)

15 hours ago (browser.engineering)

One great thing about this book is the 'stuff I didn't do' part.

Layout is really hard. Just tables by themselves are hard, even without any css around them. CSS makes layout impossibly difficult. I challenge anyone to keep the whole CSS spec and its associated behaviors in their head.

At this point css + html + javascript have become a dynamic PDL, and probably is one of the most complex pieces of software today.

As an aside, video decoding is offloaded onto hardware, so it's not as battery intensive as it used to be.

  • For the absolutely massive amount of code one needs to implement for production-grade CSS layout, the Servo source code is illustrative and IMO quite cool to see. For instance, this file just implements block and inline contexts; there's a bit of Rust boilerplate here, but the vast majority of lines are "business logic" around various parts of the specification. And there's a whole folder of these. https://github.com/servo/servo/blob/main/components/layout/f...

    But implementing a layout engine is doable. CSS is not magic; there's a spec that can be (meticulously) transformed into code. I've occasionally showed code like this to people frustrated that CSS seems arbitrary, just to show them that there is a logic to the execution environment. Granted, you're not going to regularly click into it the way you'd click into the implementation of a library, but it's no different from something like React in that regard. I think it helps!

    • FWIW, Pavel, one of the authors, has devoted considerable time into what is one of the very, very few attempts at a formal specification for CSS (the static/float layout fragment cf [1]). It's a Racket program generating Z3 SMT solver code for verifying an instance layout (which also looks like Scheme) so it's not for the faint-hearted ;) but maybe just what an FP fan on HN is looking for as a challenge.

      [1]: https://pavpanchekha.com/blog/css-floats.html

      3 replies →

  • Yes, layout is difficult, especially because (I think):

    1. The most "core" parts of layout, like CSS 2 stuff, is pretty poorly considered with a bunch of weird features that interact in strange ways. (Floats and clearance? Margin collapsing?) Some parts of this "core" were intended to be universal even though they're a bad fit for other layout modes. (Margin and padding, for example, don't have a clear purpose for say grid elements.)

    2. It's not well-modularized the way JS APIs are. A JS API can often be implemented fairly stand-alone, but each layout module interacts with every other layout module since they can be nested in various ways. I think newer specs like grid are trying to be stricter with this but there are fundamental challenges: the actual 2D screen is a shared resource that different layout modes must split up.

  • Layout is so difficult that it made me quit using Common Lisp and ncurses to build my passion project and become the very thing I swore to destroy (a React developer).

    I can't be the only one who wants a simpler layout language than CSS that's designed with two decades of hindsight to provide the maximum simplicity-expressiveness product. Are there any serious projects to engineer something like this, or has everyone given up and either embraced CSS3 (waiting for the LLVM backend) or gone back to plain text?

  • For React Native the facebook engineers just gave up and were like "all you get is flexbox layout" and people were quite okay with that (although some people grumble about lack of display grid)

    https://github.com/facebook/yoga

    • It works great for small devices, but I prefer ios constraint layout (and I think android got it too). No need for spacers.

  • This Babylonian tower will crumble one day.

    Layout does not have to be so complex. There are dozens of GUI frameworks with simpler layout system. Those are enough for applications everyone uses.

    • Actually, almost every GUI toolkit's scheme for layout has issues, and none of them are perfect.

      The ones that use absolute pixel positioning fail when using different resolution displays.

      The ones that use box packing fail when you need to deal with different sized displays.

      The ones that use constraint programming fail when you need to layout hundreds or thousands of widgets.

      CSS-style layout has its own pros and cons, but there is no alternative to it that is clearly better under all circumstances. If you doing layout and want to be resolution-independent, function on everything from phones to giant displays and have thousands of things to layout, CSS is actually likely better than any alternative.

    • And they all have massive issues, or just provide a worse version of CSS (QT's qss, for example, as it's just a less well documented, non standard and very sparsely talked about CSS implementation. Oh and it doesn't work for everything in QT)

    • Pixel positioning is so nice! I remember how easy it was to layout UIs with VB.

  • I did layout for a feature phone browser. WAP and mercifully XHTML-basic’s ideas of layout was simple enough that I could treat spans as a concave octagon - a rectangle with a bite out of the upper left and lower right corner. It made it at least an order of magnitude faster to paint and scroll a long document, and much simpler to think about.

    In the years since I’ve used a lot of tricks for web app CSS that I’m not sure I’m smart enough to figure out. But then I’ve never thought as long and as hard about typesetting as I did at the time so who knows.

  • Modern CSS implementations are full blown geometric constraint solvers now. The only things approaching their algorithmic complexity are now other geometric constraint solvers like CAD kernels and silicon layout software.

  • > As an aside, video decoding is offloaded onto hardware, so it's not as battery intensive as it used to be.

    This is technically but not usefully true with most videos on the web today.

    The video decode itself is accelerated, but each frame passes through JavaScript to be composited.

    The only time video is fully hardware decoded is when it's a simple video element to a static video file.

    • > The video decode itself is accelerated, but each frame passes through JavaScript to be composited

      I don't think that's true, and it's even less true once DRM video is involved - it becomes very difficult to get other software on the machine to even see the video, at least on Windows. You can very occasionally see bugs where the hardware accelerated playback ends up in a different place to where the browser thinks the video should have been put, too.

      What does happen is the video data gets reassembled in Javascript (e.g. Video.js) because the native player doesn't support HLS. Not quite the same thing. It's just reformatting MPEG-TS to the similar but not identical MP4. Oddly, the browser in my LG TV does play HLS video natively, and I think Safari does?

    • > not usefully true with most videos on the web today

      > The only time video is fully hardware decoded is when it's a simple video element to a static video file.

      These seem in disagreement to me. The vast majority of videos on the web are simple video elements going to static video files. It is not usual for each frame to pass through JavaScript before being displayed.

      1 reply →

    • I think by "JavaScript" here you mean rendering—that's partially true. In macOS and Windows these days (also I think Linux with GTK4 on Wayland, though only in a limited way), the window manager is itself composited and a window can send a small display list to that window manager for it to composite. In that case, it's possible to actually have the video decoding to happen entirely in hardware and never have the browser directly interact with decoded video bits. That said usually the window manager compositor is pretty limited and the browser will only do this when the stars align. The sort of things that can break it are any kind of weird clipping, transparency, or effects applied to the videos.

    • > each frame passes through JavaScript to be composited

      What do you mean by that? There is no Javascript doing the actual compositing, and the actual compositing is (usually) hardware accelerated.

  • > I challenge anyone to keep the whole CSS spec and its associated behaviors in their head.

    Lol, no way.

    People are always "guess what JS does, wut."

    Doesn't hold a candle to Cascading Stylesheets.

This looks awesome. About 15 years ago, I started working on a headless browser and maintained it for several years. It used SpiderMonkey as the js interpreter and had a custom DOM implementation. It ran all the modern js from the time, AJAX, etc. Later, I added a custom Flash runtime. It basically did everything but draw to the screen. That project was a lot of fun.

I'm definitely interested in going through this book.

  • Umm, if you wanted/want to draw to the screen, what library will you use?

    • The book uses Tk (via Python's tkinter library) for the first 10 chapters and then switches to Skia, which is used by maybe all of the browsers now (I believe Webkit on Linux just switched to it from Cairo). It seems to be by far the most common high-performance 2D graphics library.

    • I didn’t have a clue. That wasn’t part of my skillset. All we needed was a headless browser that was automated. It was crawling a few million pages a day. I had a debug console where I could see cached pages, headers, cookies, etc.

It's refreshing that browser engineering seems to become a "trend" now. The ecosystem is quite sparse with basically only Google, Apple and Mozilla defining it. I'd like to see forward into a future with more independent browser engines.

  • I don't think it's worth trying to write a rendering engine for HTML. You will never finish - HTML is a spec fully owned by Google and Apple at this point and it's just too complex to implement from scratch.

    The interesting space is really post-HTML UI/document tech. There's another thread running about Typst which is a sort of better LaTeX. Markdown was highly impactful. There's a lot of scope for people to do interesting things in this space that are "HTML but better". It doesn't even have to be a markup format - Typst and React HTML both blur the lines between code and data. Jetpack Compose shows how to use Kotlin's DSL features to make something that looks a bit like a UI description but which is actually code.

    Of course it means you have to then either distribute a 'browser' for your format, or find a way to display it in the browser. But compiling down to some JS/HTML/WASM thing is certainly possible. You can also use portable GUI toolkits like JavaFX; that also gives you accessibility. Or do both!

    Once you define your own UI language there's a lot of scope to try things that HTML doesn't do well. An obvious one is separation of content and style. HTML tried and never really got there. XSL:T tried harder but was a weird pure functional language with XML as its syntax. React does quite well with going JSON->boxes but the underlying protocols are always ad-hoc and tacked on, so you can't really write useful tooling on top of that.

    Another idea would be a format that's natively immune to XSS.

    • > I don't think it's worth trying to write a rendering engine for HTML. You will never finish - HTML is a spec fully owned by Google and Apple at this point and it's just too complex to implement from scratch.

      This keeps being repeated. But it leans on three false assumptions.

      - That is has to be "finished" at all. For many use-cases, a subset (of a subset) might just be fine. The screen in my refrigerator, or the information display in a train, might want to render some HTML, but when the HTML is controlled and constrained, there's no need for "everything".

      - That is has to adhere to "the spec". See above, but also if the HTML+CSS+JS is less controlled, quite a few use-cases it's fine to ignore lots of the quirks or even large parts of the specs. Even Chrome and FF don't implement "all", whatever "the spec" might be in the first place. But a browser in a TV set-top box, my e-reader, some dedicated wikipedia-device, or the "help section of an app" are fine if they break on complex sites.

      - That is must be implemented from scratch. Even if you forego the big rendering engines, JS VMs and so forth, there's a lot of libs that do DOM handling, CSS parsing, JS runtime etc. There's a lot of shoulders to stand on, aside from "just run chrome headless".

      By repeating this mantra that its not worth "building a new browser" or "rendering engine", we only cement the status quo further. And promote the idea that your car, refrigerator, test-runner, help-section, dashboard, e-reader and whatnot must run either a full chrome or firefox. We stiffle innovation.

    • I was an early user of KDE back in 2000 and thought they were absolutely insane for trying to write their own web browser engine when it was controlled by Microsoft and Netscape. The web was just too complex and nothing worked in it, there was just no way a tiny open source project like that could make any useful headway with browser technology.

      Of course, jump forward 24 years and the KDE browser engine is basically the only game in town- the basis of both Chrome and Safari. Absolutely no way I saw that coming.

    • Another thing you can feasibly do is implement flexbox or a similar useful subset of layout! https://www.yogalayout.dev/ is one such library that powers React Native. Letting people bring CSS intuition when writing greenfield code for a simpler engine can be a great way to onboard users.

    • > You will never finish - HTML is a spec fully owned by Google and Apple at this point and it's just too complex to implement from scratch.

      sounds like a skill issue

  • Something that uses less RAM would be nice. Other than that and the spyware from Capital-G Google Chrome and Capital-M Mozilla Firefox, I don't have a problem with it being sparse. It's millions of hours of de-duplicated work.

    I'd like an alternative to HTML though. If I was to make a browser maybe I'd focus on replacing HTML because I can't stand it, and replacing js just because the runtime is heavy.

    Like, a browser that only runs wasm and has nearly no JS runtime would make me giggle

    • > Like, a browser that only runs wasm

      That's not a browser.

      More or less by definition, a browser is an application that can use HTTP (and potentially other protocols) to make requests to other systems and retrieve stuff described using HTML (and possibly other formats).

      Sure, a tool that just loads wasm and executes it would be fun (and probably exists, at least for the local case). But it's not a web browser.

      1 reply →

  • have you by any chance looked into alternative browser engines such as servo, ladybird, goanna, netsurf, sciter, flow etc?

  • Or perhaps an entirely new platform/protocol, since this one is completely saturated with complexity at this point.

    • I keep coming back to this idea as the (albeit ideal) future of the web. HTML keeps morphing and changing to fit the increasingly complex requirements of modern web apps. I mean the W3C spec is 114 million words (1). I think that web apps as a concept are a good idea, but I just can't believe that HTML/CSS/JS are the best technologies to fill that out. I'd love to see someone tackle a new, "micro-sized app format", with a much simpler document format, and something like a FORTH as a scripting language. Uxn (2) and Decker (3) are good examples of this, but obviously a proper implementation would have to be built with the full range of possible UI and accessibility in mind, not just monochrome bitmap displays.

      A web standard so simple, anyone can implement it!

      One can dream...

      (1) https://drewdevault.com/2020/03/18/Reckless-limitless-scope.... (2) https://100r.co/site/uxn.html (3) https://internet-janitor.itch.io/decker

      EDIT: There is Project Gemini (https://geminiprotocol.net/), but it doesn't support styling or scripting.

The author's post explaining why Python was chosen: https://browserbook.substack.com/p/why-python

Apparently some of it now runs in the browser ("in the book itself") by compiling Python to JS?

https://browserbook.substack.com/p/compiling-python-to-js

  • writing it in python makes me actually want to read the book, in fact i definitely will give it a read. if it was done in rust or go id probably skip it, and c++ is just too hard to look at for a fun project. will come back after reading and hopefully have something more meaningful to say about it.

One of the authors here—thank you all for the nice words. Happy to answer questions!

  • Thank you for this amazing book! I always wanted to learn more about the technological foundations that I rely on as web developer (or engineer, as you put it).

    I am just curious, what was the process that lead you to decide to use Python to implement the browser? I feel like JavaScript via node would have offered a more related programming experience.

It is so exciting to see material like this being made!

Browsers seem like mysterious, undecipherable black boxes, which is very likely how G wants them to be perceived, but that is cracking by seeing the efforts/results of such projects like ladybird and others!

I hope to one day be able to jump in and contribute to break that moat! And this books looks like an amazing start!

  • > which is very likely how G wants them to be perceived

    One of the authors of the book is Chris, who leads the Blink rendering team at G :)

  • > I hope to one day be able to jump in and contribute to break that moat!

    The moat isn't caused by a lack of non-chrome browser engines, it's because so few people use a non-chrome browser engine. Firefox already exists - it's just that ~no-one uses it and for websites that don't work with it those users have learnt to just open up chrome.

    I'd love for the moat to be broken, and contributing to a browser engine like ladybird would be fun - but it doesn't contribute to breaking the moat. I'd love to know what would.

    • I'm one of those ~nobodies. Firefox is actually quite good these days, I use it at home and at work, 100% of the time - i.e. no Chrome or Safari fallback needed.

      If anyone's looking for a reason to try a switch again, consider this your sign.

      4 replies →

    • Firefox is great. I use Nightly. Sync bookmarks and everything. Chrome completely unnecessary.

  • I'm one of the book's two authors (the other is the head of Blink Rendering!) and I've talked to a number of people on the Chrome team. None of them have struck me as trying to keep browsers mysterious! On the contrary, folks who work on Chrome, Firefox, Safari, and Ladybird all seem incredibly excited to talk browsers and discuss how they work. The world of browser development is surprisingly small, the engineers often move between companies, and I think it'd be tough to keep a "conspiracy" going.

    But I do think there's a real lack of teaching material (why I wrote the book) and even "common vocabulary" to discuss browser internals, especially for the core phases like layout and raster, which is something Chris and I are hoping to create with the book.

I've been levelling up on browser internals, and this book is awesome. It helps build up intuition on how browsers work, without going through the millions of lines of chrome code.

Nice book. I would recommend splitting chapter 9 into two separate chapters where executing JavaScript via Duktape is one chapter and then interacting with the DOM and events are a separate later chapter.

  • It might be best read in two sittings! The chapters get longer as the book goes on and tackles more advanced topics, and I do recommend following along in code as you progress through the book.

This is wonderful!

I had an opportunity to run a tutorial on basic command line usage for newer software engineers. It's always fun to see people's expressions or read their reactions to seeing me telnet to port 25 and 80.

I'm so incredibly thankful that there are people like Pavel and Chris putting effort into articles like this. You are truly the best of us

This is amazing, I just want to drop everything and start digging through this. Well done!

why python, why not a system programming language like C, OCaml or Go (or newer languages like zig or odin)

Are web browsers, not considered to be "system software"

  • I think the point is to demonstrate how things work and are designed, and python is easy for everyone to understand. I don't think the author is recommending trying to write a production web browser in python. (or probably at all ;)

  • Going to take a wild guess that maybe they're going to rely on the excellent, extensive standard library of Python which C and Zig can't compete with. The second constraint was probably that they want to keep the number of lines of code low to encourage more people to buy the book. That's where Python does better than Go - you can do a lot with list comprehensions and you don't have if err != nil every few lines.

    • Definitely the second reason, but we actually try hard not to use too much of the standard library, for easy porting. But it's nice that sockets & ssl are standard, plus a (bad) GUI library.

  • They definitely are system software, since they include compilers and interpreters, software libraries and other such things which AFAIK have always been considered system software.

    Browsers these days are about as complex as any operating system, or perhaps more complex when you consider all the non-systems stuff in them.

Is there a promotional code for HN? I was a happy user of HTMLUnit [1] with Jython [2] in the past and am very interested in a future where we can automatically generate portions of browser code using code generation and verification techniques. I've never felt as comfortable with tools like Playwright/Cypress/Selenium as I did with HTMLUnit (with all due respect to both).

[1] https://htmlunit.sourceforge.io/

[2] https://www.jython.org/

  • The book isn’t out yet, so no promo code, but the whole thing is free online.

    • Thank you for your prompt response. I understand the item isn't available, but I clicked the "Add to Cart" button thinking it would either place me on a waitlist or ship once available. However, I was then prompted to enter a promo code, which caused the confusion.

      2 replies →

I hope the AI gets good enough to dynamically translate from one language to another with high reliability, in case not everyone is a fan of Python