← Back to context

Comment by tzs

1 year ago

> In the last 3 years, about 50k signatures had been uploaded to PyPI by 1069 unique keys. Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19)

Why not include the public key in the package?

99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.

> Why not include the public key in the package?

Because PyPI (or an attacker) could always substitute a new key. There's very little value in the signature and key coming from the same source: the key (and its justified identity) always need to come from a source of trust, not the source that's being verified.

> 99% of the time what I want out of package signing is to know that the new version of the package I'm downloading is from the same people as the old version. I don't actually need to know who those people are...just that they are the same people as before.

This might be a misunderstanding, but I don't think you actually want this: lots of large packages have multiple release managers (and contributors who come and go); you don't want to manually resolve each new human identity that appears for a package distribution.

What most people actually want is a strong cryptographic attestation that the package distribution came from the same source as the thing hosting the source code, since both that service and the owner of the repository are presumed trusted.

Notably, PGP is incapable of providing either of these: you only get key IDs, which are neither strong human identities nor a strong binding to a service. Key IDs might correspond to keys with email (or other identities) in them, but that's (1) not guaranteed, and (2) not a strong proof of identity (since anybody can claim any identity in a PGP key).

  • >> but I don't think you actually want this: lots of large packages have multiple release managers (and contributors who come and go); you don't want to manually resolve each new human identity that appears for a package distribution.

    Nope, you assume wrong. That's exactly what I (also) want, that is, knowing that the *authors* remained the same, whoever they are.

    >> What most people actually want is a strong cryptographic attestation that the package distribution came from the same source as the thing hosting the source code

    Nope, nobody really needs more of that, since that's what's your HTTPS certificate is for.

    People *really* want to mitigate the risk of pypi infrastructure getting fully compromised, which is very likely, given how many eggs you keep in the same basket there.

    PGP signatures were the last ditch, not very convenient but also not as bad as they are painted. But from now on there will be not even that very little.

    • > Nope, you assume wrong. That's exactly what I (also) want, that is, knowing that the authors remained the same, whoever they are.

      The point is that they don't remain the same. Assuming that they do is an operational error.

      > Nope, nobody really needs more of that, since that's what's your HTTPS certificate is for.

      HTTPS provides transport security, i.e. an authenticity relationship between you and GitHub's servers. It doesn't provide artifact authenticity for the source on that server, and cannot. That's what the comment above is referring to.

      7 replies →

    • > That's exactly what I (also) want, that is, knowing that the authors remained the same, whoever they are

      The authors are often many people. You can have one person signing on behalf of all the others. PGP isn't going to tell you that the authors remained the same, only that the signer did (or that many people have access to the same private key and hopefully every one of them is completely trustworthy).

      PGP doesn't let you verify that the authors remained the same. Only the key. If you wanted to actually verify authors, you'd have to have all of them sign their own commits, and you'd have to validate every commit, not just the release, otherwise you're just back to trusting whoever holds the key. Many projects very regularly get new committers, too, so you'd have to validate many new signatures with every single update.

      > Nope, nobody really needs more of that, since that's what's your HTTPS certificate is for.

      No it's not. Your HTTPS certificate will not tell you "this PyPi package release is actually built and uploaded by the same person who controls the GitHub repository linked on the package page". PyPi hosts distributions. It frequently has source distributions, but it doesn't necessarily host "source code", which would usually mean the source repository. Even with that, it's Transport Layer Security, or a Secure Socket Layer. It does not authenticate anything other than the Socket/Transport itself.

      I'm fine with PGP, but most people don't really know how to use it. They add a key and think they're safe when it validates, but that only protects you if you already trust the key. PGP signing doesn't tell you "this is safe", just "this was signed by the person who has the private key for this public key", which isn't as useful without a lot of personal footwork or a trusted authority.

      PGP key signing parties were a thing for a reason. Using PGP properly requires either an initial leap of trust (importing your distro's keys and trusting what they trust), a lot of dilligence (personally verifying identities), or a small amount of dilligence with a good web of trust (you sign keys that you know are good, and so does everybody you know, so a lot of what you find online you can validate through your links).

      3 replies →

    • > that's what's your HTTPS certificate is for.

      Not really... That certificate doesn't go back in time. If a domain expires, an attacker could reregister it under their name and get a valid certificate.

      You'd be downloading from the right domain name with a valid HTTPS certificate, but you're not downloading from the same place as before.

      2 replies →

    • >> knowing that the authors remained the same

      The problem is that "authors" is not a well-defined concept, and especially larger projects will have very regular author changes. Is the author the person who made the last commit? The person who uploaded it to PyPI? The person who is currently managing the project? What if it isn't a person but a company?

      >> that's what's your HTTPS certificate is for

      A lot of open source projects rely on untrusted third-party mirrors. The main server will just randomly redirect you to a mirror near you, so HTTPS certificates are pretty much useless because you are connecting to a third-party domain. They use signatures to prevent the mirror from doing weird stuff, and they guarantee that the mirror is serving the upstream content as-is.

      4 replies →

  • There is some security even if they provide the public key. Bootstrapping is a problem, but clients can keep track of a mapping from package names to public key and issue a warning if that ever changes. That’s how SSH and RDP works and while I’ve never had an actual security hole plugged with this, I’ve had cases where my remote machine went down so the DNS didn’t update yet the IP was reassigned so the warning about mismatching keys was actually helpful.

    • > There is some security even if they provide the public key.

      That security is integrity, which PyPI already provides through strong cryptographic digests of each package distribution. Codesigning schemes need to provide authenticity, not just integrity; a codesigning scheme that's downgradeable to arbitrary key trust is a more complicated than necessary hashing scheme.

    • The problem with TOFU is that it assumes long lived keys (itself a bad practice) OR it assumes that the end user will be fine with regular notices that the keys that have signed their packages have changed, and will be able to correctly differentiate false positives from real positives.

  • > Because PyPI ... could always substitute a new key.

    Isn't that what public key servers are for?

    For publishing my FOSS to sonatype, I had to first publish my public key, eg keyserver.ubuntu.com.

    I don't know PyPI, but from this OC, it sounds like PyPI does not have the same prerequisite.

    • Yep. Unfortunately, PGP's keyserver network has been dead for years[1]. There are two big (non-synchronizing) ones left, and they're the two I used to do the analysis that's linked in this announcement (meaning they're the ones that are largely missing well-formed keys for the signatures on PyPI).

      This was discussed a bit on Sunday's thread[2], and my understanding is that Maven's ability to use PGP in this way is effectively due to Sonatype assuming a large amount of operational and maintenance burden. PyPI doesn't have those kind of resources available to it. Even assuming that the service was gifted that kind of support, it would still cause a lot of heartburn with existing signatures and carry forwards all of the legacy baggage of PGP that we're trying to eliminate entirely.

      [1]: https://news.ycombinator.com/item?id=36021172

      2 replies →

  • > Notably, PGP is incapable of providing either of these: you only get key IDs, which are neither strong human identities nor a strong binding to a service. Key IDs might correspond to keys with email (or other identities) in them, but that's (1) not guaranteed, and (2) not a strong proof of identity (since anybody can claim any identity in a PGP key).

    Depends. If the distributor maintains a repository of trusted public keys (for example as repositories of Linux distributions do) it gives you a guarantee. As it's said, most of the time you just want to know that the key used to sign a package is not changed. That is the same level of security that SSH offers (first time you connect to a server saves the public key, then give an error if that public key is changed). That is really enough for a package in PyPy, or sign git commits and similar.

    We should ask ourself if the complexity of PGP is needed. Probably not, as it's not needed the complexity of x509 certificates, since a simple RSA signature of the package with a public key hosted on a server would be sufficient. But PGP is practical, you have a good tooling built around it, is pretty universal, so why not?

What happens if the developer looses his key? Or if it expires?

pypi could show a warning that the key has changed. Which is not an actionable or helpful warning. And then everyone gets used to seeing these warnings every now and then. And you won nothing.

Getting signatures to do something useful is hard.

  • > What happens if the developer looses his key? Or if it expires?

    What happens if a developer loses their google titan key that is required to login into pypi?

    • They either have their backup codes or there's probably a manual process the pypi team can get them their account back if they can sufficiently show they are the real developers. If you have any form of automated signature verification you basically need a concept how to handle recovery. But if this concept comes down to "trust pypi", then you really can just skip the whole thing and rely on pypi giving you the right packages and https to secure the connection).

Did I just witness the invention of a kind of "software package blockchain"?

If would be btw. a proper but sustainable prove of work blockchain. As you would need in most cases to pay developers to "mint new blocks".

OK, maybe let's forget about the blockchain. It's a loaded term. But the idea of software signature TOFU sounds indeed good!

  • Anyone that combats something based on the name alone isn't really worth listening to. There can be great use cases for Blockchain just like this, wherein the proof of work is less taxing, or optional. Of course, HN has a rabid response towards the term alone, but these technologies actually can provide some great solutions to a more robust form of git lfs, dockerhub, or huggingface model centralization that will inevitably fail at some point.

    • > There can be great use cases for Blockchain just like this, wherein the proof of work is less taxing, or optional.

      The prove of work part was more of a joke, I admit. Developers "mining" "software package blocks" is not really "prove of work" (even it is in some sense of course). :-)

      > but these technologies actually can provide some great solutions to a more robust form of git lfs, dockerhub, or huggingface model centralization

      Well, that's the Merkle trees part of the tech. You don't need any "blockchain" for that. Something like IPFS ( https://ipfs.tech/ ) is for example a nice demonstration of that.

      But yes, there are useful applications of blockchains. Like https://www.namecoin.org/

      But those are really rare.

      1 reply →