Removing PGP from PyPI (2023)

7 hours ago (blog.pypi.org)

This is slightly old news. For those curious, PGP support on the modern PyPI (i.e. the new codebase that began to be used in 2017-18) was always vestigial, and this change merely polished off a component that was, empirically[1], doing very little to improve the security of the packaging ecosystem.

Since then, PyPI has been working to adopt PEP 740[2], which both enforces a more modern cryptographic suite and signature scheme (built on Sigstore, although the design is adaptable) and is bootstrapped on PyPI's support for Trusted Publishing[3], meaning that it doesn't have the fundamental "identity" problem that PyPI-hosted PGP signatures have.

The hard next step from there is putting verification in client hands, which is the #1 thing that actually makes any signature scheme actually useful.

[1]: https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI...

[2]: https://peps.python.org/pep-0740/

[3]: https://docs.pypi.org/trusted-publishers/

  • It's good that PyPI signs whatever is uploaded to PyPI using PyPI's key now.

    GPG ASC support on PyPI was nearly as useful as uploading signatures to sigstore.

    1. Is it yet possible to - with pypa/twine - sign a package uploaded to PyPI, using a key that users somehow know to trust as a release key for that package?

    2. Does pip check software publisher keys at package install time? Which keys does pip trust to sign which package?

    3. Is there any way to specify which keys to trust for a given package in a requirements.txt file?

    4. Is there any way to specify which keys to trust for a version of a given package with different bdist releases, with Pipfile.lock, or pixi or uv?

    People probably didn't GPG sign packages on PyPI because it wasn't easy or required to sign a package using a registered key/DID in order to upload.

    Anyone can upload a signature for any artifact to sigstore. Sigstore is a centralized cryptographic signature database for any file.

    Why should package installers trust that a software artifact publisher key [on sigstore or the GPG keyserver] is a release key?

    gpg --recv-key downloads a public key for a given key fingerprint over HKP (HTTPS with the same CA cert bundle as everything else).

    GPG keys can be wrapped as W3C DIDs FWIU.

    W3C DIDs can optionally be centrally generated (like LetsEncrypt with ACME protocol).

    W3C DIDs can optionally be centrally registered.

    GPG or not, each software artifact publisher key must be retrieved over a different channel than the packages.

    If PYPI acts as the (package,release_signing_key) directory and/or the keyserver, is that any better than hosting .asc signatures next to the downloads?

    GPG signatures and wheel signatures were and are still better than just checksums.

    Why should we trust that a given key is a release signing key for that package?

    Why should we trust that a release signing key used at the end of a [SLSA] CI build hasn't been compromised?

    How do clients grant and revoke their trust of a package release signing key with this system?

    ... With GPG or [GPG] W3C DIDs or whichever key algo and signed directory service.

This feels like perfect being the enemy of good enough. There are examples where the system falls over but that doesn't mean that it completely negates the benefits.

It is very easy to get blinkered into thinking that the specific problems they're citing absolutely need to be solved, and quite possibly an element of trying to use that as an excuse to reduce some maintenance overhead without understanding its benefits.

  • Its benefits are very much completely negated in real-world use. See https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI... - the data suggests that nobody is verifying these PGP signatures at all.

    • I stopped reading after this: "PGP is an insecure [1] and outdated [2] ecosystem that hasn't reflected cryptographic best practices in decades [3]."

      The first link [1] suggests avoiding encrypted email due to potential plaintext CC issues and instead recommends Signal or (check this) WhatsApp. However, with encrypted email, I have (or can have) full control over the keys and infrastructure, a level of security that Signal or WhatsApp can't match.

      The second link [2] is Moxie's rant, which I don't entirely agree with. Yes, GPG has a learning curve. But instead of teaching people how to use it, we're handed dumbed-down products like Signal (I've been using it since its early days as a simple sms encryption app, and I can tell you, it's gone downhill), which has a brilliant solution: it forces you to remember (better to say to write down) a huge random hex monstrosity just to decrypt a database backup later. And no, you can't change it.

      Despite the ongoing criticisms of GPG, no suitable alternative has been put forward and the likes of Signal, Tarsnap, and others [1] simply don't cut it. Many other projects running for years (with relatively good security track records, like kernel, debian, or cpan) have no problem with GPG. This is 5c.

      [1] https://latacora.micro.blog/2019/07/16/the-pgp-problem.html

      [2] https://moxie.org/2015/02/24/gpg-and-me.html

      [3] https://blog.cryptographyengineering.com/2014/08/13/whats-ma...

      6 replies →

    • I believe the article you linked to doesn’t seem to say anything about “nobody verifying PGP signatures”. We would need PyPI to publish their Datadog & Google Analytics data, but I’d say the set of users who actually verify OpenPGP signatures intersects with the set of users faking/scrambling telemetry.

      3 replies →

Does it matter much if the key can be verified? I mean it seems like a pretty big step up security wise to know that a new version of a package is signed with the same key was previous versions.

  • > I mean it seems like a pretty big step up security wise to know that a new version of a package is signed with the same key was previous versions.

    A key part of the rationale for removing PGP uploads from PyPI was that you can't in fact know this, given the current state (and expected future) of key distribution in PGP.

    (But also: yes, it's indeed important that the key can be verified i.e. considered authentic for an identity. Without that, you're in "secure phone call with the devil" territory.)

I performed a similar analysis on RubyGems and found that of the top 10k most-downloaded gems, less than one percent had valid signatures. That plus the general hassle of managing key material means that this was a dead-end for large scale adoption.

I'm still hopeful that sigstore will see wide adoption and bring authorial attestation (code signing) to the masses.

What’s the current best solution for associating a public key to an identity or person?

This is not related to cryptography protocols.

OpenPGP key server verifies email. Keybase was a good idea but seems dead. Maybe identity providers?

  • > Maybe identity providers?

    That's essentially all Sigstore is: it uses an identity provider to bind an identity (like an email) to a short-lived signing key.

I am curious why we still need PyPI to hold packages: it may be better to install from github.

Github provides much better integrated experience: source code, issues, docs, etc.

  • I don't think this is that terrible of an idea, actually. Before PyPI disabled searching, I'd say that the value of centralization was from that, and possibly due to security, but I think any claim of security from a central repo is deluding ourselves these days. There are so many opportunities for supply chain attacks that maybe this isn't actually worse. Requiring pip to refer to a github owner/repo might eliminate some of the squatter problems we have, too.

On the other hand PGP keys were widely successful for cpan, the perl5 repo. It's very simple to use, not as complicated as with pypi.

  • I dunno. I mean, sure, it's a worldwide-mirrored, cryptographically secure, curated, hierarchically and categorically organized, simple set of flat files, with multiple separate community projects, to test all packages on all supported Perl versions and platforms, with multiple different frontends, bug tracking, search engines, documentation hubs, security groups, and an incredibly long history of support and maintenance by the community.

    But it's, like, old. You can't make something new be like something old. That's not cool. If what we're doing isn't new and cool, what is the point even?

> Of those 1069 unique keys, about 30% of them were not discoverable on major public keyservers, making it difficult or impossible to meaningfully verify those signatures. Of the remaining 71%, nearly half of them were unable to be meaningfully verified at the time of the audit (2023-05-19).

A PGP keyserver provides no identity verification. It is simply a place to store keys. So I don't understand this statement. What is the ultimate goal here? I thought that things like this mostly provided a consistent identity for contributing entities with no requirement to know who the people behind the identities actually were in real life.

  • You're thinking one step past the failure state here: the problem isn't that keyservers don't provide identity verification, but that the PGP key distribution ecosystem isn't effectively delivering keys anymore.

    There are probably multiple reasons for this, but the two biggest ones are likely (1) that nobody knows how to upload keys to keyservers anymore, and (2) that keyservers don't gossip/share keys anymore, following the SKS network's implosion[1].

    Or put another way: a necessary precondition of signature verification is key retrieval, whether or not trust in a given key identity (or claimant human identity) is established. One of PGP's historic strengths was that kind of key retrieval, and the data strongly suggests that that's no longer the case.

    [1]: https://gist.github.com/rjhansen/67ab921ffb4084c865b3618d695...

    • The SKS keyserver thing was 5 years ago. It seems to be working. Was uploading a key somewhere a requirement for submitting to PyPi? Why were the keys not available from PyPi?

      It just seems to me that there wasn't anything here in the first place. Something something PGP keys. Perhaps they were hoping for someone to come along and make a working system and no one ever did.

      1 reply →

  • These keys could have related signatures from other keys, that some users or maintainers may have reason to trust.

    (But for 30% of keys this was not even theoretically possible, while for another 40% of keys it was not practically possible, according to the article.)

Most people do security badly so let’s not do it at all.

Right.

  • Unfortunately we live in a world of limited time and resources, and priorities need to be adjusted accordingly.

    Honestly, I would put the blame on PGP. It has a … special UX. I tried to use it in 3 separate occasions, and ended up doing something else (probably less secure) because I would couldn’t manage to make the damn thing work. I might not be a genius, but I am also not completely stupid.

Wouldn't another very good answer be for PyPI to have a keyserver and require keys be sent to it for them to be used in package publishing?

  • Wouldn't that make the maintenance burden worse? Now PyPI has to host a keyserver, with its own attack service. And presumably 99.7% of the keys would be only for PyPI, so folks would have no incentive to secure them. The two modes that work are either no signing, or mandatory signing like many app stores. Obviously the middle way is the worst of both worlds, no security for 99+% of packages, but all the maintenance headache. And mandatory signing raises the possibility that PyPI would be replaced by an alternate repository that's easier to contribute to. The open source world depends to a shocking degree on the volunteer labor of people who have a lot of things they could be doing with their time, and a "small" speed bump for enhanced security can have knock-on effects that are not small.

    • Sure, it all hinges on whether the signatures provided any value. And it seems to be the conclusion that it didn't.

      Without something showing "keyservers present an untenable risk" and Debian, Ubuntu, Launchpad, others have keyserver infrastructure, it seems like too far of a conclusion to reach casually. But of course, it adds attack surface for the simple fact that a public facing thing was stood up where once it was not. Though that isn't the kind of trivial conclusion I imagine you had in mind.

      I don't see why there's a binary choice between "signing is no longer supported" and "signing is mandatory" when before that wasn't the case. If it truly provided no value, or so small a value with so high a maintenance burden that it harmed the project that way, then it makes sense--but that didn't seem to be the place from which the article argued.

  • From here: https://caremad.io/posts/2013/07/packaging-signing-not-holy-... which is linked to the article since PyPI has so many packages and that everyone can sign up to add a package it would be extremely unmanageable.

    • That's fair and I appreciate that detail even without having followed the link in the original article. But while not being "the holy grail" why must the perfect be the enemy of the good, if it was providing a value?

      I certainly allow for the "if it was providing a value" to be a gargantuan escape hatch through which any other perspective may be removed.

      But by highlighting the difficulty in verifying signatures and saying it was because they keys were hard to find (or may have been expired or other signing errors per the footnote) a fairly straight forward solution presents itself: add keyserver infrastructure, check it when signed packages are posted, reject if key verification fails using that keyserver.

      All told it seems like it wasn't providing a value, so throwing more resources at the effort was not done. But something about highlight how "keys being hard to find" helped justify the action doesn't quite pass muster to my mind.

I dunno, not all projects are equally important or popular, so it seems to me that the number of downloads which had keys is the better metric to look at.

But, if there are fundamental issues with the key system anyway, the percentages don’t matter anyway.

  • You're absolutely right that the number of downloads is probably a more important metric! But also yes, I think the basic "can't discover valid keys for a large majority of packages" is a sufficient justification, which is why I went with it :-)

    The raw data behind the blog post is archived here[1]. It would be pretty easy to reduce it back down to package names, and see which/what percent of those names are in the top 500/1000/5000/etc. of PyPI packages by downloads. My prediction is that there's no particular relationship between "uploads a PGP key" and popularity, but that's speculative.

    [1]: https://github.com/woodruffw/pypi-pgp-statistics

I feel like there is a broader issue being pushed aside here. Verifying a signature means you have a cryptographic guarantee that whoever generated an artifact possessed a private key associated with a public key. That key doesn't necessarily need to be published in a web-facing keystore to be useful. For packages associated with an OS-approved app store or a Linux distro's official repo, the store of trusted keys is baked into the package manager.

What value does that provide? As the installer of something, you almost never personally know the developer. You don't really trust them. At best, you trust the operating system vendor to sufficient vet contributors to a blessed app store. Whoever published package A is actually a maintainer of Arch Linux. Whoever published app B went through whatever the heck hoops Apple makes you go through. If malware gets through, some sort of process failed that can potentially be mediated.

If you're downloading a package from PyPI or RubyGems or crates.io or whatever, a web repository that does no vetting and allow anyone to publish anything, what assurance is this giving? Great, some package was legitimately published by a person who also published a public key. Who are they exactly? A pseudonym on Github with a cartoon avatar? Does that make them trustworthy? If they publish malware, what process can be changed to prevent that from happening again? As far as I can tell, nothing.

If you change the keystore provider to sigstore, what does that give you? Fulcio just requires that you control an e-mail address to issue you a signing key. They're not vetting you in any way or requiring you to disclose a real-world identity that can be pursued if you do something bad. It's a step up in a limited scope of use cases in which packages are published by corporate entities that control an e-mail domain and ideally use their own private artifact registry. It does nothing for public repositories in which anyone is allowed to publish anything.

Fundamentally, if a public repository allows anyone to publish anything, does no vetting and requires no real identity disclosure, what is the basis of trust? If you're going to say something like "well I'm looking for .whl files but only from Microsoft," then the answer is for Microsoft to host its own repository that you can download from, not for Microsoft to publish packages to PyPI.

There are examples of making this sort of simpler for the consumer to get everything from a single place. Docker Hub, for instance. You can choose to only ever pull official library images and verify them against sigstore, but that works because Docker is itself a well-funded corporate entity that restricts who can publish official library images by vetting and verifying real identities.