Cointime

Download App
iOS & Android

Jump Crypto Releases Research on Proof of Solvency Vulnerabilities

Validated Project

Statistical Attacks on Proof of Solvency

written byNihar Shah

TL;DR

  • For proof of solvency mechanisms to prevent an exchange from misappropriating consumer deposits, consumers must check that their deposits are included in the exchange's reported list of deposits.
  • In theory, only a handful of random checks on the exchange are needed to keep an exchange honest — but this fails in practice for two reasons.
  • An exchange can likely predict which consumers will check, and an exchange can also likely suppress a handful of failed checks — which means it can weaken or undermine the probabilistic security that proof of solvency offers.
  • Thus, exchanges and users should be thoughtful about the mechanism for users to launch such checks and to raise potential issues to restore such guarantees.

Introduction

In the wake of FTX's collapse, many prominent crypto exchanges have announced plans around "proof of solvency." Indeed, exchanges as important and diverse as BinanceBitMEXByBitGate.ioKrakenOKX, and more have already rolled out infrastructure to support such checks.[1] Through some combination of professional and user-powered audits, exchanges use proof of solvency to attest that they hold the funds that users custody with them.

But, proof of solvency is not perfect. The crypto community already understands some of the flaws. From a verifiability perspective, exchanges may not control the on-chain addresses that they claim. From a financial perspective, proof of solvency does not guarantee actual corporate solvency, as exchanges hold other assets and liabilities on their balance sheet (ranging from the mundane, like honoring salary obligations, to the complex, like pledging assets as collateral). From a technical perspective, proof of solvency is not necessarily plug-and-play and requires care in selecting the appropriate approach (discussed by Chalkias, Chatzigiannis, and Ji among others). These are ongoing problems that the community is working to solve.

This article explores a less-discussed problem: the proof of solvency mechanism risks being undermined by a malicious exchange in statistical ways. Critically, if exchanges can predict future attestations or sow doubt on failed attestations, they can successfully misappropriate consumer funds. The strong probability guarantees behind proof of solvency in theory are remarkably brittle in practice.

There are a few solutions that a well-intentioned exchange can implement to ease this problem, but the solutions ultimately require care and future development. Indeed, most simple solutions that fix one part of the problem (and come with good optics on face value) actually worsen other parts of the problem — and a malicious exchange can use this to its advantage. It is ultimately up to the crypto community to hold exchanges to account on the thoughtfulness of their roadmap. Well-intentioned exchanges will surely welcome those standards.

Proof of Solvency's Strength, in Theory

There is a substantial literature on proof of solvency from a technical perspective (e.g. a primer by Vitalik), and so we will not rehash it here. Our focus is on the "proof of liabilities" component of the process, in which an exchange reports the sum total of customer deposits. (This is then balanced by the exchange proving the sum total of their reserves, for an on-chain representation of solvency).[2]

To operationalize and attest to this, exchanges typically pick one (or both) of two ways. They can present an anonymized list of deposits to an external auditor, or they can publish an anonymized tree of deposits (as a Merkle or Verkle tree). These have different trust assumptions and technical specifications, but they are equivalent for our purposes.

This system works as long as users check their deposits and ensure that they are included in the total sum. A malicious exchange may try to exclude or under-report deposits for a subset of customers, and then misappropriate those funds. Neither an auditor nor a tree will intrinsically know that Alice held 5 BTC with an exchange, unless Alice herself checks her inclusion. Thus, at face value, a successful check on the exchange holistically requires each and every user to check his or her deposit's inclusion.

And of course, most users will not check. The median crypto user does not know how to interact with a Merkle tree, which involves some technical complexity. The barrier to entry is lower for checking inclusion in an auditor's list, but it is still non-trivial. And more generally, we should expect fewer users to check than hoped for, due to the externality problem. A user who checks their deposits is providing a diffuse benefit to the community by keeping the exchange honest, and pays a concentrated cost (their time) to do so.[3]

But in theory, this does not matter. The math behind proof of liabilities appears wonderfully robust at first glance, and it works well with just a few good citizens policing the exchange. The probability that an exchange gets away with some form of misappropriating deposits is just the probability that every one of those depositors does not check. By contrast, in theory, an exchange can be caught if just a single misappropriated depositor checks on his or her deposit. To state this formally, if we denote the probability of a random user checking as \piπ, this simplifies to the following expression for an exchange that misappropriates nn deposits.

P(\text{Not Caught}) = \prod_{i=1}^n P(\text{Depositor $i$ doesn't check}) = (1 - \pi)^nP(Not Caught)=i=1∏n​P(Depositor i doesn’t check)=(1−π)n

For instance, if an exchange misappropriates the deposits of just sixty-nine users and each user checks with a 1% probability, it has a 50-50 chance of getting caught versus getting away with its actions. By the time an exchange targets even five-hundred users, it has basically a zero chance of getting away with its actions. For exchanges, which have user numbers in the millions, these are miniscule targets and not worth the risk.

Proof of Solvency's Brittleness, in Practice

As seductive as this framework is, it is remarkably less effective in practice. There are two critical issues:

  1. An exchange may be able to predict which customers are likely to check their deposits. In the simplest form, an exchange may use some basic demographics to distinguish crypto-savvy users (who are comfortable with downloading Merkle trees and verifying inclusion) and crypto-naive users (who are not).
  2. A single failed attestation is unlikely to trigger immediate action, for two reasons. First and foremost, there will be some false positive rate due to user error. Lists and trees represent snapshots of deposits in time — and users may misinterpret the output. Second, a single user likely will not generate enough reach to alert the community, particularly if the exchange quickly remedies or obfuscates the issue (e.g. by misappropriating a different user's deposit).

Thus, we augment the model above in a few critical ways. First, we partition the space of customers into two types: one that is sufficiently savvy to check their balances (denoted by \theta_1θ1​) and one that is not (denoted by \theta_2θ2​).[4] The first type checks on their deposit with some probability \piπ, while the second type never checks.

Second, a malicious exchange actively tries to predict which consumers are which type. The exchange only wants to misappropriate the deposits of the second type (i.e. the non-checking), as they can do so with zero risk if successful.[5] To represent this knowledge, we introduce the parameter \phi \in [0,1]ϕ∈[0,1], which an exchange uses to get an updated probability of a given targeted consumer's type.

Notice that when \phi = 0ϕ=0, the exchange has no predictive capabilities, and so their conditional guess of a given consumer's type is just the baseline probability of the two types. When \phi = 1ϕ=1, the exchange knows the consumer's type perfectly and thus can guarantee they are only misappropriating the deposit of the second (non-checking) type.

Third, the exchange only gets caught if more than a certain number of people have failed attestations. A single failed attestation may not be enough to land the exchange in trouble, and so we generalize the parameter to requiring jj failed attestations.

With these three complications, we can update the probability of the exchange getting away from misappropriating nn deposits as the following equation. A full derivation can be found in the appendix.

This changes the math radically. To illustrate, we parameterize the equation and compute the 90-10 point, i.e. the number of depositors that an exchange could misappropriate from to face a 90% chance of getting away and only a 10% chance of getting caught (informally, a good chance of getting away). In this simulation, we set \pi = 0.2π=0.2 and P(\theta_1) = 0.05P(θ1 )=0.05, i.e. the savvy population makes up 5% of the total population and checks 20% of the time, for an aggregate 1% of the population checking. We try a variety of values for \phiϕ and jj and compute the following values for the number nn consumers to target.

We can visualize these same results below, focusing on \phi = 0ϕ=0 and \phi = 0.99ϕ=0.99 (corresponding to no knowledge versus high knowledge, on the part of the exchange) across a wider range of jj values (the number of failed attestations before an exchange is caught). We plot the number of consumers needed for this 90% threshold on a logarithmic scale.

The results are striking, in that they show the strong theoretical guarantees around the proof of solvency mechanism fail quickly in practice. In the strongest version of the theoretical model, an exchange can only misappropriate ten accounts before facing a non-trivial chance of getting caught. By contrast, in the most extreme cases in practice, the exchange can misappropriate nearly half a million accounts for that same risk, if they do well predicting types and squashing intermittent reports. While we do not claim the plausibility of this particular parameterization, this table shows that either component — predictability of checkers or critical mass of failed attestations — can weaken any theoretical security offered by proof of solvency. Together, they can render it functionally useless.

Risks in the Status Quo

The concerns in this article may seem academic, but in fact exchanges are currently well-positioned to weaken proof of solvency in both ways. The existing designs around proof of solvency lend themselves to unusually strong forms of predictability and opaque forms of adjudicating failed attestations.

Consider first the predictability of the checking type. Exchanges could certainly use demographic information, as suggested earlier, for predictive abilities. But they actually can do something much better. Exchanges know exactly the subset of people likely to check: the users who navigate over to the portion of the exchange's native website or application to download the tree, check their proof, or get their address within the tree. Regardless of where the tree or list is ultimately stored, a user's initial interactions with proof of solvency always start on the exchange's platform, which is a highly revealing signal — and one that is trivial for the exchange to collect.

Indeed, a malicious exchange could take this to the extreme. Consider a simple design in which the exchange takes some time (e.g. twenty-four hours) to send a Merkle tree after a depositor requests an updated copy. Over that lag, the exchange re-tags deposits from non-checking customers to checking customers, regenerates the tree, and pushes it forward. If operationalized correctly, an exchange could misappropriate almost every deposit on its platform and yet "pass" every proof of solvency check.[6]

Even with trees that are readily downloadable, there is an attack surface. In particular, an exchange can still collect signals about users and weaponize those signals over long time scales. For instance, an exchange may (reasonably) figure that an account that has not checked on its deposit since inception is unlikely to check in the next month — and may misappropriate those tokens accordingly.

Second, consider the adjudication mechanism. The status quo implementations by exchanges generally assume that the proof of solvency checks succeed. If a check fails, there are often no official mechanisms to escalate or verify, leaving users to publicize it on Twitter or other social channels. This is problematic for two reasons. First, users will have naturally high error rates, as deposits are reported at a given point in time and users may forget about trades (i.e. changes to their deposits) made since then. The lack of an official adjudication strategy makes it hard to correct these false positives, which obfuscates true positives. Second, a lone voice, or handful of voices arguing on Twitter, can easily be mistaken for FUD. Again, a malicious exchange could easily lean into this narrative, critiquing such users as engagement farmers and convincing their userbases to ignore them.

Note that many exchanges currently work with external auditors, which helps these two issues slightly. First, auditors offer a separate surface for users to check their inclusion in a list or tree (although those users will likely still need to interact with the original exchange to learn how to do so). Second, auditors may also be better equipped to adjudicate concerns from users who find a failed check. But, auditors do not entirely solve either problem, particularly when the checking and adjudication workflows are opaque. Moreover, auditors introduce new trust assumptions, as they may be compromised.

No Easy Solutions

There are a variety of potential solutions to stop exchanges from predicting users who check and to process failed attestations reliably, but not all of them are ideal. Indeed, many solutions that improve one problem worsen the second problem, or introduce new problems or complications of their own.

At the very least, though, exchanges should take two direct steps. First, they should move as much of the mechanism off its native platform as possible. Second, they should set up clear adjudication processes for failed attestations. These steps may rely on auditors or other centralized third parties for now, but they at least allow for more transparency and clarity than the current designs in many cases. Off-platform processes allow for obvious signals to be hidden from the exchange. Clear adjudication processes allow for obvious false positives to be removed and the remaining cases to be investigated further.

But, the two problems outlined in this article will persist, and they will require more creative solutions to solve fully. We discuss five hypothetical fixes here. They are not perfect, but they indicate some potential directions for future work:

  1. Exchanges could hold the hands of users in checking proof of solvency (and many already do). This is good because it increases the baseline rate of users checking, which makes it easier to spot issues. But this is problematic for the two core reasons noted in this article. The more extensive the process, the more signals that an exchange acquires on its users. Moreover, a poorly-written process may confuse crypto-naive users (who would have otherwise not performed checks), and this increases the frequency of false positives.
  2. Exchanges could offer bounties to users who find correct failed attestations. Again, this is good because it increases the baseline rate of users checking. However, by making incentives overly high-powered, this may strongly increase the false positive rate and overwhelm any adjudicators (particularly if there are no penalties on users for leveling accusations).
  3. Exchanges could push the total tree or user-specific proofs to all of its users automatically, which almost entirely blocks its ability to predict which users check. However, this will increase the false positive rate, by making the information (which is non-trivial to understand correctly) widely disseminated. This may also scare new users to crypto unnecessarily.
  4. Exchanges could generate trees or proofs faster and more frequently. This will lower the false positive rate because those lists will more accurately reflect current balances (and so users will be less prone to misinterpreting the results). However, faster generation can be a double-edged sword, because it also makes it easier for exchanges to generate "corrected" proofs and trees after a suspicious consumer starts to investigate.
  5. Exchanges could ask auditing firms or regulatory bodies to employ "undercover" auditors, i.e. auditors masquerading as ordinary consumers. This can be effective because such individuals are far less prone to false positives and have the ability to investigate an exchange after even one or two failed attestations. They may also be savvier about covering their tracks. However, as a whole, consumers may not trust processes that rely heavily on an arms-length relationship between an exchange and another centralized institution.

In the short run, the best solution likely involves some mix of all these processes. Neither problem introduced in this article can ever be fully eliminated, but the optimal solution will mitigate both issues (around predicting checkers and handling failed attestations) to reasonable degrees.

In the long run, though, the crypto community can truly be creative in the solution space. For instance, the team at Jump Crypto has theorized tools like browser extensions that perform proof of solvency checks automatically in the background. Such hypothetical tools would also keep track of a given account's trades, and thus could automatically pair high verification rates (both in the number of users checking and the frequency of those users checking) with low false positive rates. This is just one potential idea amongst many others, and with time and technology, these problems may recede.[7]

Conclusion

This article is not a critique of exchanges, which are rapidly building up their proof of solvency infrastructures. These are commendable and timely efforts, and we anticipate that these mechanisms will become more commonplace and mature over time.

But, proof of solvency is also no silver bullet. We already understand the gaps that the mechanism leaves when it comes to financial health, technical security, and verifiability. This article adds to the growing set of issues by pointing out gaps that can emerge when the technology behind proof of solvency is embedded within a shoddy mechanism for interacting with users. Malicious exchanges can use this to launch statistical or probabilistic attacks and impair the technology in moderate to severe degrees.

The onus is thus on exchanges to develop robust mechanisms that undermine their predictive signals and process failed attestations. The onus is also on users to hold exchanges to that standard. Indeed, we believe honest exchanges would likely welcome such efforts, as it makes it easier for them to credibly distinguish themselves from malicious ones. With these and parallel efforts, proof of solvency can be the tool that restores trust in exchanges during this crypto winter.

Please let us know if we missed anything or got anything wrong. Thanks to the research team at Jump Crypto, especially to Rahul Maganti and Don Beaver for feedback. This article does not constitute financial advice.

Appendix

In this section, we explain the derivation of the model.

First, we note that the probability of an exchange not getting caught can be modeled through a simple Binomial distribution. If an exchange must fail jj attestations or more to get caught (out of nn consumer deposits misappropriated), then the probability of not getting caught is simply j - 1j−1 or fewer consumers reporting failed attestations, where all terms are conditioned on the knowledge \phiϕ that the exchange has about a user's type.

In turn, each of these terms can be decomposed into fundamentals. First, we estimate the probability of checking by conditioning on the two types of users. As a reminder, only the first type of user actually performs checks.

Second, we estimate the probability of not checking by conditioning on the two types of users. Both types of users do not check, although with different probabilities.

These two terms can be substituted into the original equation for the expression we use. Notice that this nests the simple model that does not partition users into two types, by setting \phi = 0ϕ=0.

  1. There is heterogeneity in how these various exchanges operationalize proof of solvency. Those details are not particularly important for the article. In general, though, most exchanges utilize an external auditor to audit deposits and intermediate user interactions, although many exchanges also let users directly access the Merkle tree of deposits. ↩︎
  2. Some of the terms in this domain are overloaded. For instance, Jesse Powell of Kraken has referred to the overall solvency check as "proof of reserves." In this article, we hew to the convention presented in Dagher et al (2015)Chalkias et al (2020), and other academic work — in which proof of reserves refers to assets, proof of liabilities refers to deposits, and proof of solvency refers to the balance of the two. ↩︎
  3. This externality problem would be mitigated if exchanges segmented each user's deposit from other users' deposits. However, exchanges do not do this from a practical perspective. Moreover, even if they did, it is unclear if bankruptcy proceedings would recognize this segmentation as meaningful. ↩︎
  4. This may seem like an assumption, but it is just a re-framing of the problem. Any space can be partitioned into disjoint and exhaustive types, e.g. the world is comprised of Joe Bidens and non-Joe Bidens. ↩︎
  5. In this stylized model, consumers have homogeneous deposit distributions, and so it is always a strictly dominant strategy for an exchange to target a \theta_2θ2​ consumer. In practice, there may be heterogeneity in deposit size and that would complicate the targeting decision. As one example, we expect institutional clients — who are both professional and well-capitalized — to be checking their deposits constantly, meaning that an exchange would likely leave their deposits untouched unless they have very sophisticated adversarial strategies. ↩︎
  6. There is an even more alarming attack vector available, which is where an exchange sends different trees to different individuals, allowing them to even bypass the step of regenerating trees. Fortunately, this particular vector can be mitigated as long as exchanges publish the hash of the Merkle root on their platform. ↩︎
  7. One novel solution that has been discussed by Vitalik and others is non-custodial exchanges. But, this represents a complete paradigm shift in the purpose of an exchange, and so we do not consider it as a direct solution to the problems outlined in this article. ↩︎
Comments

All Comments

Recommended for you

  • Cointime's Evening Highlights for May 24th

    1. CryptoPunks Launches “Super Punk World” Digital Avatar Series

  • An address mistakenly transferred about $7,000 in BTC to Satoshi Nakamoto’s wallet

    According to Arkham monitoring, someone accidentally sent 90% of their BTC assets to Satoshi Nakamoto's wallet address last night. They were trying to swap Ordinal for PupsToken, but ended up sending almost their entire wallet balance - about $7,000 worth of BTC.

  • USDC circulation increased by 200 million in the past 7 days

    According to official data, within the 7 days ending on May 16th, Circle issued 1.8 billion USDC, redeemed 1.6 billion USDC, and the circulation increased by 200 million. The total circulation of USDC is 33.2 billion US dollars, and the reserve is 33.4 billion US dollars, of which 3.8 billion US dollars are in cash, and Circle Reserve Fund holds 29.6 billion US dollars.

  • Bitcoin mining company Phoenix Group released its Q1 financial report: net profit of US$66.2 million, a year-on-year increase of 166%

    Phoenix Group, a listed mining company and blockchain technology provider for Bitcoin, released its Q1 financial report, with the following main points:

  • Pudgy Penguins and Lotte strategically cooperate to expand into the Korean market, and the floor price rose by 3.1% on the 7th

    The NFT series "Pudgy Penguins" has recently announced a strategic partnership with South Korean retail and entertainment giant Lotte Group on the X platform to expand its market in South Korea and surrounding areas. More information will be announced in the future. According to CoinGecko data, the floor price of Pudgy Penguins is currently 11.8 ETH, with a 7-day increase of 3.1%.

  • CryptoPunks Launches “Super Punk World” Digital Avatar Series

    Blue-chip NFT project CryptoPunks announced the launch of "Super Punk World" on X platform, which is the project's first release of 500 digital avatars inspired by the iconic CryptoPunks features and combined with Super Cool World attributes. It is reported that the series may launch auctions in the future, and more details about the collection and auction of this series will be announced soon.

  • Core Foundation launches $5 million innovation fund

    CoreDAO announced in a post on X platform that the Core Foundation has launched a $5 million innovation fund. The fund is currently mainly targeting the Indian market and has established strategic partnerships with the Indian Institute of Technology Bombay and some top venture capital companies to support the development of innovative blockchain projects in the country. At present, the fund has opened project funding applications.

  • Drift Foundation: The governance mechanism is gradually being improved, and DRIFT is one of the components

    The Drift Foundation stated on the X platform that the DRIFT token is a component of governance and a key element in empowering the community to shape the future. The governance mechanism is gradually improving, and more information will be announced soon.

  • U.S. Department of Justice: Two Chinese nationals arrested for allegedly defrauding at least $73 million through cryptocurrency investments

    According to the official website of the United States Department of Justice, a complaint from the central region of California was made public yesterday, accusing two Chinese nationals of playing a major role in a money laundering scheme involving cryptocurrency investment fraud.Daren Li, 41 years old, is a dual citizen of China and St. Kitts and Nevis, and is also a resident of China, Cambodia, and the United Arab Emirates. He was arrested on April 12th at Hartsfield-Jackson Atlanta International Airport and later transferred to the central region of California. Yicheng Zhang, 38 years old, is a Chinese national currently residing in Temple City, California. He was arrested yesterday in Los Angeles. Today, they are accused of leading a money laundering scheme related to an international cryptocurrency investment scam, involving at least $73 million. These arrests were made possible thanks to the assistance of our international and US partners, demonstrating the Department of Justice's commitment to continuing to combat the entire cybercrime ecosystem and prevent fraud in various financial markets.

  • Hong Kong expands digital yuan pilot to allow e-CNY wallets for cross-border payments

    The Hong Kong Monetary Authority and the People's Bank of China have expanded their cross-border digital yuan pilot to allow Hong Kong residents to use e-CNY wallets for cross-boundary payments. The digital yuan is China's central bank digital currency, which has been piloted for several years and is among the most advanced of its kind globally. Users can set up wallets using just a phone number and top them up in real-time through 17 Hong Kong retail banks. The HKMA plans to work with the Digital Currency Institute to explore enhancing interoperability in payments and corporate use cases, such as cross-border trade settlement.