Checks And Balances: Machine Learning And Zero-Knowledge Proofs

by Elena Burger

For the past few years, zero-knowledge proofs on blockchains have been useful for two key purposes: (1) scaling compute-constrained networks by processing transactions off-chain and verifying the results on mainnet; and (2) protecting user privacy by enabling shielded transactions, viewable only to those who possess the decryption key. Within the context of blockchains, it’s clear why these properties are desirable: a decentralized network like Ethereum can’t increase throughput or block size without untenable demands on validator processing power, bandwidth, and latency (hence the need for validity rollups), and all transactions are visible to anyone (hence the demand for on-chain privacy solutions).

But zero-knowledge proofs are also useful for a third class of capabilities: efficiently verifying that any kind of computation (not just those within an off-chain instantiation of the EVM) has run correctly. This has implications far beyond blockchains.

Advancements in systems that leverage the ability of zero-knowledge proofs to succinctly verify computation are now making it possible for users to demand the same degree of trustlessness and verifiability assured by blockchains from every digital product in existence, most crucially from machine learning models. High demand for blockchain compute has incentivized zero-knowledge proof research, creating modern proving systems with smaller memory footprints and faster proving and verification times — making it now possible to verify certain small machine learning algorithms on-chain today.

We’ve all by now likely experienced the potential of interacting with an extremely powerful machine learning product. A few days ago, I used GPT-4 to help me create an AI that consistently beats me at chess. It felt like a poetic microcosm of all of the advances in machine learning that have occurred over the past few decades: it took the developers at IBM twelve years to produce Deep Blue, a model running on a 32-node IBM RS/6000 SP computer and capable of evaluating up to nearly 200 million chess moves per second, which beat the chess champion Gary Kasparov in 1997. By comparison, it took me a few hours – with minimal coding on my part – to create a program that could triumph over me.

Admittedly, I doubt the AI I created would be able to beat Garry Kasparov at chess, but that’s not the point. The point is anyone playing around with GPT-4 has likely had a similar experience gaining superpowers: with little effort, you can create something that approaches or surpasses your own capabilities. We are all researchers at IBM; we are all Garry Kasparov.

Obviously, this is thrilling and a bit daunting to consider. And for anyone working in the crypto industry, the natural impulse (after marveling at what machine learning can do) is to consider potential vectors of centralization, and ways those vectors can be decentralized into a network that people can transparently audit and own. Current models today are made by ingesting an enormous amount of publicly available text and data, but only a small number of people right now control and own those models. More specifically, the question isn’t “will AI be tremendously valuable,” the question is “how do we build these systems in such a way that anyone interacting with them will be able to reap its economic benefits and, if they so desire, ensure that their data is used in a way that honors their right to privacy.”

Recently, there has been a vocal effort to pause or mitigate the advancement of major AI projects like Chat-GPT. Halting progress is likely not the solution here: it would instead be better to push for models that are open-source, and in cases where model providers want their weights or data to be private, to secure them with privacy-preserving zero-knowledge proofs that are on-chain and fully auditable. Today, the latter use-case around private model weights and data is not yet feasible on-chain, but advances in zero-knowledge proving systems will make it possible in the future.

Verifiable and ownable machine learning

A chess AI like the one I built using Chat-GPT feels relatively benign at this point: a program with a fairly uniform output, which doesn’t use data that violates valuable intellectual property or infringes on privacy. But what happens when we want assurance that the model we are told is being run behind an API is indeed the one that ran? Or if I wanted to ingest attested data into a model that lives on-chain, with assurance that the data is indeed coming from a legitimate party? And what if I wanted assurance that the “people” submitting data were in fact people, and not bots seeking to sybil-attack my network? Zero-knowledge proofs, with their ability to succinctly represent and verify arbitrary programs are a way to do this.

It’s important to note that today, the primary use-case for zero-knowledge proofs in the context of machine learning on-chain is to verify correct computation. In other words, zero-knowledge proofs, and more specifically SNARKs (Succinct Non-Interactive Arguments of Knowledge), are most useful for their succinctness properties in the ML context. This is because zero-knowledge proofs protect the privacy of the prover (and of the data it processed) from a prying verifier. Privacy-enhancing technologies like Fully-Homomorphic Encryption (FHE), Functional Encryption, or Trusted Execution Environments (TEE) are more applicable for letting an untrusted prover run computations over private input data (exploring those more deeply falls outside the scope of this piece).

Let’s take a step back and understand at a high-level the kinds of machine learning applications you could represent in zero-knowledge. (For a deeper dive on ZK specifically, see our piece on improvements in zero-knowledge proving algorithms and hardware, Justin Thaler’s work on SNARK performance here and here, or our zero-knowledge canon.) Zero-knowledge proofs typically represent programs as arithmetic circuits: using these circuits, the prover generates a proof from public and private inputs, and the verifier mathematically computes that the output of this statement is correct — without obtaining any information about the private inputs.

We’re still at a very early stage of what is computationally practical to verify using zero-knowledge proofs on-chain, but improvements in algorithms are expanding the realm of what is feasible. Here are five ways zero knowledge proofs can be applied in machine learning.

1. Model Authenticity: You want assurance that the machine learning model some entity claims has been run is indeed the one that ran. Examples include a case where a model is accessible behind an API, and the purveyor of a particular model has multiple versions – say, a cheaper, less accurate one, and a more expensive, higher-performance one. Without proofs, you have no way of knowing whether the purveyor is serving you the cheaper model when you’ve actually paid for the more expensive one (e.g., the purveyor wants to save on server costs and boost their profit margin).

To do this, you’d want separate proofs for each instantiation of a model. A practical way to accomplish this is through Dan Boneh, Wilson Nguyen, and Alex Ozdemir’s framework for functional commitments, a SNARK-based zero-knowledge commitment scheme that allows a model owner to commit to a model, which users can input their data into and receive verification that the committed model has run. Some applications built on top of Risc Zero, a general purpose STARK-based VM, are also enabling this. Other research conducted by Daniel Kang, Tatsunori Hashimoto, Ion Stoica, and Yi Sun has demonstrated that it’s possible to verify valid inference on the ImageNet dataset, with 92% accuracy (which is on par with the highest performing non-ZK verified ImageNet models).

But just receiving proof that the committed model has run is not necessarily enough. A model may not accurately represent a given program, so one would want the committed model to be audited by a third party. Functional commitments allow the prover to establish that it used a committed model, but they don’t guarantee anything about the model that has been committed. If we can make zero-knowledge proofs performative enough for proving training (see example #4, below), we could one day start to get those guarantees as well.

2. Model Integrity: You want assurance that the same machine learning algorithm is being run on different users’ data the same way. This is useful in areas where you don’t want arbitrary bias applied, like credit scoring decisions and loan applications. You could use functional commitments for this as well. To do this, you would commit to a model and its parameters, and allow people to submit data. The output would verify that the model ran with the committed parameters for each user’s data. Alternatively, the model and its parameters could be made public and the users themselves could prove that they applied the appropriate model and parameters to their own (authenticated) data. This might be especially useful in the medical field, where certain information about patients is required by law to remain confidential. In the future, this could enable a medical diagnosis system that is able to learn and improve from realtime user data that remains completely private.

3. Attestations: You want to integrate attestations from external verified parties (e.g., any digital platform or piece of hardware that can produce a digital signature) into a model or any other kind of smart contract running on-chain. To do this, you would verify the signature using a zero-knowledge proof, and use the proof as an input in a program. Anna Rose and Tarun Chitra recently hosted an episode of the Zero Knowledge podcast with Daniel Kang and Yi Sun where they explored recent advancements in this field.

Specifically, Daniel and Yi recently released work on ways to verify that images taken by cameras with attested sensors were subject to transformations like cropping, resizing, or limited redactions – useful in cases where you want to prove that an image wasn’t deepfaked but did undergo some legitimate form of editing. Dan Boneh and Trisha Datta have also done similar work around verifying provenance of an image using zero-knowledge proofs.

But, more broadly, any digitally attested piece of information is a candidate for this form of verification: Jason Morton, who is working on the EZKL library (more on this in the following section) calls this “giving the blockchain eyes.” Any signed endpoint: (e.g., Cloudflare’s SXG service, third party notaries) produce digital signatures that can be verified, which could be useful for proving provenance and authenticity from a trusted party.

4. Decentralized Inference or Training: You want to perform machine-learning inference or training in a decentralized way, and allow people to submit data to a public model. To do this, you might deploy an already-existing model on-chain, or architect an entirely new network, and use zero-knowledge proofs to compress the model. Jason Morton’s EZKL library is creating a method for ingesting ONXX and JSON files, and converting them into ZK-SNARK circuits. A recent demo at ETH Denver showed that this can be used in applications like creating an image-recognition-based on-chain scavenger hunt, where creators of the game can upload a photo, generate a proof of the image, and players can upload images; the verifier checks whether the image the user uploads sufficiently matches the proof generated by the creator. EZKL now can verify models of up to 100 million parameters, implying that it could be used to verify ImageNet-sized models (which have 60 million parameters) on-chain.

Other teams, like Modulus Labs are benchmarking different proof systems for on-chain inference. Modulus’s benchmarks ran up to 18 million parameters. On the training side, Gensyn is building a decentralized compute system, where users can input public data, and have their models trained by a decentralized network of nodes, with verification for correctness of training.

5. Proof of Personhood: You want to verify that someone is a unique person without compromising their privacy. To do this, you would create a method of verification – for example, biometric scanning, or a method for submitting government ID in an encrypted manner. Then you would use zero-knowledge proofs to check that someone has been verified, without revealing any information about that person’s identity, whether that identity is fully recognizable, or pseudonymous, like a public key.

Worldcoin is doing this through their proof-of-personhood protocol, a way to ensure sybil-resistance by generating unique iris codes for users. Crucially, private keys created for the WorldID (and the other private keys for the crypto wallet created for Worldcoin users) are completely separate from the iris code generated locally by the project’s eye-scanning orb. This separation completely decouples biometric identifiers from any form of users’ keys that could be attributable to a person. Worldcoin also permits applications to embed an SDK that allows users to log in with the WorldID, and leverages zero-knowledge proofs for privacy, by allowing the application to check that the person has a WorldID, but does not enable individual user tracking (for more detail, see this blogpost).

This example is a form of combatting weaker, more malicious forms of artificial intelligence with the privacy-preserving properties of zero-knowledge proofs, so it’s quite different from the other examples listed above (e.g., proving that you are a real human, not a bot, without revealing any information about yourself).

Model architectures and challenges

Breakthroughs in proving systems that implement SNARKs (Succinct Non-Interactive Arguments of Knowledge) have been key drivers in putting many machine learning models on-chain. Some teams are making custom circuits in existing architectures (including Plonk, Plonky2, Air, and more). On the custom circuit side, Halo 2 has become a popular backend used by both Daniel Kang et. al. in their work, and Jason Morton’s EZKL project. Halo 2’s prover times are quasilinear, proof sizes are usually just a few kilobytes, and verifier times are constant. Perhaps more importantly, Halo 2 has strong developer tooling, making it a popular SNARK backend used by developers. Other teams, like Risc Zero, are aiming for a generalized VM strategy. And others are creating custom frameworks using Justin Thaler’s super-efficient proof systems based on the sum-check protocol.

Proof generation and verifier times depend, in absolute terms, on the hardware generating and checking the proofs as well as the size of the circuit for proof generation. But the crucial thing to note here is that regardless of the program being represented, the proof size will always be relatively small, so the burden on the verifier checking the proof is constrained. There are, however, some subtleties here: for proof systems like Plonky2 which use a FRI-based commitment scheme, proof size may increase. (Unless it is wrapped in a pairing-based SNARK like Plonk or Groth16 at the end, which don’t grow in size with the complexity of the statement being proven.)

The implication here for machine learning models is that once you have designed a proof system that accurately represents a model, the cost of actually verifying outputs will be quite cheap. The thing that developers have to make the most considerations of are prover time and memory: representing models in a way that they can be relatively quickly proven, and with proof sizes ideally around a few kilobytes. To prove the correct execution of machine learning models in zero knowledge, you need to encode model architecture (layers, nodes, and activation functions), parameters, constraints, and matrix multiplication operations and represent them as circuits. This involves breaking down these properties into arithmetic operations that can be performed over a finite field.

The area is still nascent. Accuracy and fidelity may suffer in the process of converting a model into a circuit. When a model is represented as an arithmetic circuit, those aforementioned model parameters, constraints, and matrix multiplication operations may need to be approximated and simplified. And when arithmetic operations are encoded as elements in the proof’s finite field, some precision might be lost (or the cost to generate a proof without these optimization with current zero-knowledge frameworks would be untenably high). Additionally, parameters and activations of machine learning models are often encoded as 32-bits for precision, but zero-knowledge proofs today can’t represent 32-bit floating point operations in the necessary arithmetic circuit format without massive overheads. As a result, developers may choose to use quantized machine learning models, whose 32-bit integers have already been converted into 8-bit precision. These types of models are favorable to representation as zero-knowledge proofs, but the model being verified might be a crude approximation of the higher-quality initial model.

At this stage, it’s admittedly a game of catch-up. As zero-knowledge proofs become more optimized, machine learning models grow in complexity. There are a number of promising areas for optimizations already: proof recursion could reduce overall proof size by allowing proofs to be used as inputs for the next proof, unlocking proof compression. There are emerging frameworks too, like Linear A’s fork of Apache’s Tensor Virtual Machine (TVM), which advances a transpiler for converting floating-point numbers into zero-knowledge friendly integer representations. And finally, we at a16z crypto are optimistic that future work will make it much more reasonable to represent 32-bit integers in SNARKs.

The two definitions of “scale”

Zero-knowledge proofs scale through compression: SNARKs allow you to take an enormously complex system (a virtual machine, a machine learning model) and mathematically represent it so that the cost of verifying it is less than the cost of running it. Machine learning, on the other hand, scales through expansion: models today get better with more data, parameters, and GPUs/TPUs involved in the training and inference process. Centralized companies can run servers at a pretty much unbounded magnitude: charge a monthly fee for API calls, and cover the costs of operation.

The economic realities of blockchain networks operate almost in the inverse: developers are encouraged to optimize their code to make it computationally feasible (and inexpensive) to run on-chain. This asymmetry has a tremendous benefit: it has created an environment where proof systems need to become more efficient. We should be pushing for ways to demand the same benefits blockchains provide – namely, verifiable ownership, and a shared notion of truth – in machine learning as well.

While blockchains have incentivized optimizing zk-SNARKs, every field in computing will benefit.

***

Acknowledgements: Justin Thaler, Dan Boneh, Guy Wuollet, Sam Ragsdale, Ali Yahya, Chris Dixon, Eddy Lazzarin, Tim Roughgarden, Robert Hackett, Tim Sullivan, Jason Morton, Peiyuan Liao, Tarun Chitra, Brian Retford, Daniel Kang, Yi Sun, Anna Rose, Modulus Labs, DC Builder.

***

Elena Burger is a deal partner at a16z crypto, with a focus on games, NFTs, web3 media, and decentralized infrastructure. Prior to joining the team, she spent four years as an equities analyst at Gilder, Gagnon, Howe, and Co. She has a Bachelor’s degree from Barnard College, Columbia University, where she majored in history.

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

Verifiable and ownable machine learning

Model architectures and challenges

The two definitions of “scale”

Sign in

Sign up

Find Your Account