Vitalik Buterin's website

Here is a file that contains data, extracted from geth, about transaction fees in every block between 4710000 and 4730000. For each block, it contains an object of the form:

{
    "block":4710000,
    "coinbase":"0x829bd824b016326a401d083b33d092293333a830",
    "deciles":[40,40.1,44.100030001,44.100030001,44.100030001,44.100030001,44.100030001,44.100030001,50,66.150044,100]
    ,"free":10248,
    "timedelta":8
}

The “deciles” variable contains 11 values, where the lowest is the lowest gasprice in each block, the next is the gasprice that only 10% of other transaction gasprices are lower than, and so forth; the last is the highest gasprice in each block. This gives us a convenient summary of the distribution of transaction fees that each block contains. We can use this data to perform some interesting analyses.

First, a chart of the deciles, taking 50-block moving averages to smooth it out:

What we see is a gasprice market that seems to actually stay reasonably stable over the course of more than three days. There are a few occasional spikes, most notably the one around block 4720000, but otherwise the deciles all stay within the same band all the way through. The only exception is the highest gasprice transaction (that red squiggle at the top left), which fluctuates wildly because it can be pushed upward by a single very-high-gasprice transaction.

We can try to interpret the data in another way: by calculating, for each gasprice level, the average number of blocks that you need to wait until you see a block where the lowest gasprice included is lower than that gasprice. Assuming that miners are rational and all have the same view (implying that if the lowest gasprice in a block is X, then that means there are no more transactions with gasprices above X waiting to be included), this might be a good proxy for the average amount of time that a transaction sender needs to wait to get included if they use that gasprice. The stats are:

There is clear clustering going on at the 4, 10 and 20 levels; it seems to be an underexploited strategy to send transactions with fees slightly above these levels, getting in before the crowd of transactions right at the level but only paying a little more.

However, there is quite a bit of evidence that miners do not have the same view; that is, some miners see a very different set of transactions from other miners. First of all, we can filter blocks by miner address, and check what the deciles of each miner are. Here is the output of this data, splitting by 2000-block ranges so we can spot behavior that is consistent across the entire period, and filtering out miners that mine less than 10 blocks in any period, as well as filtering out blocks with more 21000 free gas (high levels of free gas may signify an abnormally high minimum gas price policy, like for example 0x6a7a43be33ba930fe58f34e07d0ad6ba7adb9b1f at ~40 gwei and 0xb75d1e62b10e4ba91315c4aa3facc536f8a922f5 at ~10 gwei). We get:

0x829bd824b016326a401d083b33d092293333a830 [30, 28, 27, 21, 28, 34, 23, 24, 32, 32]
0xea674fdde714fd979de3edf0f56aa9716b898ec8 [17, 11, 10, 15, 17, 23, 17, 13, 16, 17]
0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c [31, 17, 20, 18, 16, 27, 21, 15, 21, 21]
0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5 [20, 16, 19, 14, 17, 18, 17, 14, 15, 15]
0xb2930b35844a230f00e51431acae96fe543a0347 [21, 17, 19, 17, 17, 25, 17, 16, 19, 19]
0x180ba8f73897c0cb26d76265fc7868cfd936e617 [13, 13, 15, 18, 12, 26, 16, 13, 20, 20]
0xf3b9d2c81f2b24b0fa0acaaa865b7d9ced5fc2fb [26, 25, 23, 21, 22, 28, 25, 24, 26, 25]
0x4bb96091ee9d802ed039c4d1a5f6216f90f81b01 [17, 21, 17, 14, 21, 32, 14, 14, 19, 23]
0x2a65aca4d5fc5b5c859090a6c34d164135398226 [26, 24, 20, 16, 22, 33, 20, 18, 24, 24]

The first miner is consistently higher than the others; the last is also higher than average, and the second is consistently among the lowest.

Another thing we can look at is timestamp differences - the difference between a block’s timestamp and its parent. There is a clear correlation between timestamp difference and lowest gasprice:

This makes a lot of sense, as a block that comes right after another block should be cleaning up only the transactions that are too low in gasprice for the parent block to have included, and a block that comes a long time after its predecessor would have many more not-yet-included transactions to choose from. The differences are large, suggesting that a single block is enough to bite off a very substantial chunk of the unconfirmed transaction pool, adding to the evidence that most transactions are included quite quickly.

However, if we look at the data in more detail, we see very many instances of blocks with low timestamp differences that contain many transactions with higher gasprices than their parents. Sometimes we do see blocks that actually look like they clean up what their parents could not, like this:

{"block":4710093,"coinbase":"0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c","deciles":[25,40,40,40,40,40,40,43,50,64.100030001,120],"free":6030,"timedelta":8},
{"block":4710094,"coinbase":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","deciles":[4,16,20,20,21,21,22,29,30,40,59],"free":963366,"timedelta":2},

But sometimes we see this:

{"block":4710372,"coinbase":"0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5","deciles":[1,30,35,40,40,40,40,40,40,55,100],"free":13320,"timedelta":7},
{"block":4710373,"coinbase":"0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5","deciles":[1,32,32,40,40,56,56,56,56,70,80],"free":1672720,"timedelta":2}

And sometimes we see miners suddenly including many 1-gwei transactions:

{"block":4710379,"coinbase":"0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c","deciles":[21,25,31,40,40,40,40,40,40,50,80],"free":4979,"timedelta":13},
{"block":4710380,"coinbase":"0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5","deciles":[1,1,1,1,1,1,40,45,55,61.10006,2067.909560115],"free":16730,"timedelta":35}

This strongly suggests that a miner including transactions with gasprice X should NOT be taken as evidence that there are not still many transactions with gasprice higher than X left to process. This is likely because of imperfections in network propagation.

In general, however, what we see seems to be a rather well-functioning fee market, though there is still room to improve in fee estimation and, most importantly of all, continuing to work hard to improve base-chain scalability so that more transactions can get included in the first place.

In which I argue that “tightly coupled” on-chain voting is overrated, the status quo of “informal governance” as practiced by Bitcoin, Bitcoin Cash, Ethereum, Zcash and similar systems is much less bad than commonly thought, that people who think that the purpose of blockchains is to completely expunge soft mushy human intuitions and feelings in favor of completely algorithmic governance (emphasis on “completely”) are absolutely crazy, and loosely coupled voting as done by Carbonvotes and similar systems is underrated, as well as describe what framework should be used when thinking about blockchain governance in the first place.

See also: https://medium.com/@Vlad_Zamfir/against-on-chain-governance-a4ceacd040ca

One of the more interesting recent trends in blockchain governance is the resurgence of on-chain coin-holder voting as a multi-purpose decision mechanism. Votes by coin holders are sometimes used in order to decide who operates the super-nodes that run a network (eg. DPOS in EOS, NEO, Lisk and other systems), sometimes to vote on protocol paramters (eg. the Ethereum gas limit) and sometimes to vote on and directly implement protocol upgrades wholesale (eg. Tezos). In all of these cases, the votes are automatic - the protocol itself contains all of the logic needed to change the validator set or to update its own rules, and does this automatically in response to the result of votes.

Explicit on-chain governance is typically touted as having several major advantages. First, unlike the highly conservative philosophy espoused by Bitcoin, it can evolve rapidly and accept needed technical improvements. Second, by creating an explicit decentralized framework, it avoids the perceived pitfalls of informal governance, which is viewed to either be too unstable and prone to chain splits, or prone to becoming too de-facto centralized - the latter being the same argument made in the famous 1972 essay “Tyranny of Structurelessness”.

Quoting Tezos documentation:

While all blockchains offer financial incentives for maintaining consensus on their ledgers, no blockchain has a robust on-chain mechanism that seamlessly amends the rules governing its protocol and rewards protocol development. As a result, first-generation blockchains empower de facto, centralized core development teams or miners to formulate design choices.

And:

Yes, but why would you want to make [a minority chain split] easier? Splits destroy network effects.

On-chain governance used to select validators also has the benefit that it allows for networks that impose high computational performance requirements on validators without introducing economic centralization risks and other traps of the kind that appear in public blockchains (eg. the validator’s dilemma).

So far, all in all, on-chain governance seems like a very good bargain…. so what’s wrong with it?

What is Blockchain Governance?

To start off, we need to describe more clearly what the process of “blockchain governance” is. Generally speaking, there are two informal models of governance, that I will call the “decision function” view of governance and the “coordination” view of governance. The decision function view treats governance as a function f(x1, x2 ... xn) -> y, where the inputs are the wishes of various legitimate stakeholders (senators, the president, property owners, shareholders, voters, etc) and the output is the decision.

The decision function view is often useful as an approximation, but it clearly frays very easily around the edges: people often can and do break the law and get away with it, sometimes rules are ambiguous, and sometimes revolutions happen - and all three of these possibilities are, at least sometimes, a good thing. And often even behavior inside the system is shaped by incentives created by the possibility of acting outside the system, and this once again is at least sometimes a good thing.

The coordination model of governance, in contrast, sees governance as something that exists in layers. The bottom layer is, in the real world, the laws of physics themselves (as a geopolitical realist would say, guns and bombs), and in the blockchain space we can abstract a bit further and say that it is each individual’s ability to run whatever software they want in their capacity as a user, miner, stakeholder, validator or whatever other kind of agent a blockchain protocol allows them to be. The bottom layer is always the ultimate deciding layer; if, for example, all Bitcoin users wake up one day and decides to edit their clients’ source code and replace the entire code with an Ethereum client that listens to balances of a particular ERC20 token contract, then that means that that ERC20 token is bitcoin. The bottom layer’s ultimate governing power cannot be stopped, but the actions that people take on this layer can be influenced by the layers above it.

The second (and crucially important) layer is coordination institutions. The purpose of a coordination institution is to create focal points around how and when individuals should act in order to better coordinate behavior. There are many situations, both in blockchain governance and in real life, where if you act in a certain way alone, you are likely to get nowhere (or worse), but if everyone acts together a desired result can be achieved.

An abstract coordination game. You benefit heavily from making the same move as everyone else.

In these cases, it’s in your interest to go if everyone else is going, and stop if everyone else is stopping. You can think of coordination institutions as putting up green or red flags in the air saying “go” or “stop”, with an established culture that everyone watches these flags and (usually) does what they say. Why do people have the incentive to follow these flags? Because everyone else is already following these flags, and you have the incentive to do the same thing as what everyone else is doing.

A Byzantine general rallying his troops forward. The purpose of this isn't just to make the soldiers feel brave and excited, but also to reassure them that everyone else feels brave and excited and will charge forward as well, so an individual soldier is not just committing suicide by charging forward alone.

Strong claim: this concept of coordination flags encompasses all that we mean by "governance"; in scenarios where coordination games (or more generally, multi-equilibrium games) do not exist, the concept of governance is meaningless.

In the real world, military orders from a general function as a flag, and in the blockchain world, the simplest example of such a flag is the mechanism that tells people whether or not a hard fork “is happening”. Coordination institutions can be very formal, or they can be informal, and often give suggestions that are ambiguous. Flags would ideally always be either red or green, but sometimes a flag might be yellow, or even holographic, appearing green to some participants and yellow or red to others. Sometimes that are also multiple flags that conflict with each other.

The key questions of governance thus become:

What should layer 1 be? That is, what features should be set up in the initial protocol itself, and how does this influence the ability to make formulaic (ie. decision-function-like) protocol changes, as well as the level of power of different kinds of agents to act in different ways?
What should layer 2 be? That is, what coordination institutions should people be encouraged to care about?

The Role of Coin Voting

Ethereum also has a history with coin voting, including:

DAO proposal votes: https://daostats.github.io/proposals.html
The DAO Carbonvote: http://v1.carbonvote.com/
The EIP 186/649/669 Carbonvote: http://carbonvote.com/

These three are all examples of loosely coupled coin voting, or coin voting as a layer 2 coordination institution. Ethereum does not have any examples of tightly coupled coin voting (or, coin voting as a layer 1 in-protocol feature), though it does have an example of tightly coupled miner voting: miners’ right to vote on the gas limit. Clearly, tightly coupled voting and loosely coupled voting are competitors in the governance mechanism space, so it’s worth dissecting: what are the advantages and disadvantages of each one?

Assuming zero transaction costs, and if used as a sole governance mechanism, the two are clearly equivalent. If a loosely coupled vote says that change X should be implemented, then that will serve as a “green flag” encouraging everyone to download the update; if a minority wants to rebel, they will simply not download the update. If a tightly coupled vote implements change X, then the change happens automatically, and if a minority wants to rebel they can install a hard fork update that cancels the change. However, there clearly are nonzero transaction costs associated with making a hard fork, and this leads to some very important differences.

One very simple, and important, difference is that tightly coupled voting creates a default in favor of the blockchain adopting what the majority wants, requiring minorities to exert great effort to coordinate a hard fork to preserve a blockchain’s existing properties, whereas loosely coupled voting is only a coordination tool, and still requires users to actually download and run the software that implements any given fork. But there are also many other differences. Now, let us go through some arguments against voting, and dissect how each argument applies to voting as layer 1 and voting as layer 2.

Low Voter Participation

One of the main criticisms of coin voting mechanisms so far is that, no matter where they are tried, they tend to have very low voter participation. The DAO Carbonvote only had a voter participation rate of 4.5%:

Additionally, wealth distribution is very unequal, and the results of these two factors together are best described by this image created by a critic of the DAO fork:

The EIP 186 Carbonvote had ~2.7 million ETH voting. The DAO proposal votes did not fare better, with participation never reaching 10%. And outside of Ethereum things are not sunny either; even in Bitshares, a system where the core social contract is designed around voting, the top delegate in an approval vote only got 17% of the vote, and in Lisk it got up to 30%, though as we will discuss later these systems have other problems of their own.

Low voter participation means two things. First, the vote has a harder time achieving a perception of legitimacy, because it only reflects the views of a small percentage of people. Second, an attacker with only a small percentage of all coins can sway the vote. These problems exist regardless of whether the vote is tightly coupled or loosely coupled.

Game-Theoretic Attacks

Aside from “the big hack” that received the bulk of the media attention, the DAO also had a number of much smaller game-theoretic vulnerabilities; this article from HackingDistributed does a good job of summarizing them. But this is only the tip of the iceberg. Even if all of the finer details of a voting mechanism are implemented correctly, voting mechanisms in general have a large flaw: in any vote, the probability that any given voter will have an impact on the result is tiny, and so the personal incentive that each voter has to vote correctly is almost insignificant. And if each person’s size of the stake is small, their incentive to vote correctly is insignificant squared. Hence, a relatively small bribe spread out across the participants may suffice to sway their decision, possibly in a way that they collectively might quite disapprove of.

Now you might say, people are not evil selfish profit-maximizers that will accept a $0.5 bribe to vote to give twenty million dollars to Josh arza just because the above calculation says their individual chance of affecting anything is tiny; rather, they would altruistically refuse to do something that evil. There are two responses to this criticism.

First, there are ways to make a “bribe” that are quite plausible; for example, an exchange can offer interest rates for deposits (or, even more ambiguously, use the exchange’s own money to build a great interface and features), with the exchange operator using the large quantity of deposits to vote as they wish. Exchanges profit from chaos, so their incentives are clearly quite misaligned with users and coin holders.

Second, and more damningly, in practice it seems like people, at least in their capacity as crypto token holders, are profit maximizers, and seem to see nothing evil or selfish about taking a bribe or two. As “Exhibit A”, we can look at the situation with Lisk, where the delegate pool seems to have been successfully captured by two major “political parties” that explicitly bribe coin holders to vote for them, and also require each member in the pool to vote for all the others.

Here’s LiskElite, with 55 members (out of a total 101):

Here’s LiskGDT, with 33 members:

And as “Exhibit B” some voter bribes being paid out in Ark:

Here, note that there is a key difference between tightly coupled and loosely coupled votes. In a loosely coupled vote, direct or indirect vote bribing is also possible, but if the community agrees that some given proposal or set of votes constitutes a game-theoretic attack, they can simply socially agree to ignore it. And in fact this has kind of already happened - the Carbonvote contains a blacklist of addresses corresponding to known exchange addresses, and votes from these addresses are not counted. In a tightly coupled vote, there is no way to create such a blacklist at protocol level, because agreeing who is part of the blacklist is itself a blockchain governance decision. But since the blacklist is part of a community-created voting tool that only indirectly influences protocol changes, voting tools that contain bad blacklists can simply be rejected by the community.

It’s worth noting that this section is not a prediction that all tightly coupled voting systems will quickly succumb to bribe attacks. It’s entirely possible that many will survive for one simple reason: all of these projects have founders or foundations with large premines, and these act as large centralized actors that are interested in their platforms’ success that are not vulnerable to bribes, and hold enough coins to outweigh most bribe attacks. However, this kind of centralized trust model, while arguably useful in some contexts in a project’s early stages, is clearly one that is not sustainable in the long term.

Non-Representativeness

Another important objection to voting is that coin holders are only one class of user, and may have interests that collide with those of other users. In the case of pure cryptocurrencies like Bitcoin, store-of-value use (“hodling”) and medium-of-exchange use (“buying coffees”) are naturally in conflict, as the store-of-value prizes security much more than the medium-of-exchange use case, which more strongly values usability. With Ethereum, the conflict is worse, as there are many people who use Ethereum for reasons that have nothing to do with ether (see: cryptokitties), or even value-bearing digital assets in general (see: ENS).

Additionally, even if coin holders are the only relevant class of user (one might imagine this to be the case in a cryptocurrency where there is an established social contract that its purpose is to be the next digital gold, and nothing else), there is still the challenge that a coin holder vote gives a much greater voice to wealthy coin holders than to everyone else, opening the door for centralization of holdings to lead to unencumbered centralization of decision making. Or, in other words...

And if you want to see a review of a project that seems to combine all of these disadvantages at the same time, see this: https://btcgeek.com/bitshares-trying-memorycoin-year-ago-disastrous-ends/.

This criticism applies to both tightly coupled and loosely coupled voting equally; however, loosely coupled voting is more amenable to compromises that mitigate its unrepresentativeness, and we will discuss this more later.

Centralization

Let’s look at the existing live experiment that we have in tightly coupled voting on Ethereum, the gas limit. Here’s the gas limit evolution over the past two years:

You might notice that the general feel of the curve is a bit like another chart that may be quite familiar to you:

Basically, they both look like magic numbers that are created and repeatedly renegotiated by a fairly centralized group of guys sitting together in a room. What’s happening in the first case? Miners are generally following the direction favored by the community, which is itself gauged via social consensus aids similar to those that drive hard forks (core developer support, Reddit upvotes, etc; in Ethereum, the gas limit has never gotten controversial enough to require anything as serious as a coin vote).

Hence, it is not at all clear that voting will be able to deliver results that are actually decentralized, if voters are not technically knowledgeable and simply defer to a single dominant tribe of experts. This criticism once again applies to tightly coupled and loosely coupled voting equally.

Update: since writing this, it seems like Ethereum miners managed to up the gas limit from 6.7 million to 8 million all without even discussing it with the core developers or the Ethereum Foundation. So there is hope; but it takes a lot of hard community building and other grueling non-technical work to get to that point.

Digital Constitutions

One approach that has been suggested to mitigate the risk of runaway bad governance algorithms is “digital constitutions” that mathematically specify desired properties that the protocol should have, and require any new code changes to come with a computer-verifiable proof that they satisfy these properties. This seems like a good idea at first, but this too should, in my opinion, be viewed skeptically.

In general, the idea of having norms about protocol properties, and having these norms serve the function of one of the coordination flags, is a very good one. This allows us to enshrine core properties of a protocol that we consider to be very important and valuable, and make them more difficult to change. However, this is exactly the sort of thing that should be enforced in loosely coupled (ie. layer two), rather than tightly coupled (layer one) form.

Basically any meaningful norm is actually quite hard to express in its entirety; this is part of the complexity of value problem. This is true even for something as seemingly unambiguous as the 21 million coin limit. Sure, one can add a line of code saying assert total_supply <= 21000000, and put a comment around it saying “do not remove at all costs”, but there are plenty of roundabout ways of doing the same thing. For example, one could imagine a soft fork that adds a mandatory transaction fee this is proportional to coin value * time since the coins were last sent, and this is equivalent to demurrage, which is equivalent to deflation. One could also implement another currency, called Bjtcoin, with 21 million new units, and add a feature where if a bitcoin transaction is sent the miner can intercept it and claim the bitcoin, instead giving the recipient bjtcoin; this would rapidly force bitcoins and bjtcoins to be fungible with each other, increasing the “total supply” to 42 million without ever tripping up that line of code. “Softer” norms like not interfering with application state are even harder to enforce.

We want to be able to say that a protocol change that violates any of these guarantees should be viewed as illegitimate - there should be a coordination institution that waves a red flag - even if they get approved by a vote. We also want to be able to say that a protocol change that follows the letter of a norm, but blatantly violates its spirit, the protocol change should still be viewed as illegitimate. And having norms exist on layer 2 - in the minds of humans in the community, rather than in the code of the protocol - best achieves that goal.

Toward A Balance

However, I am also not willing to go the other way and say that coin voting, or other explicit on-chain voting-like schemes, have no place in governance whatsoever. The leading alternative seems to be core developer consensus, however the failure mode of a system being controlled by “ivory tower intellectuals” who care more about abstract philosophies and solutions that sound technically impressive over and above real day-to-day concerns like user experience and transaction fees is, in my view, also a real threat to be taken seriously.

So how do we solve this conundrum? Well, first, we can heed the words of slatestarcodex in the context of traditional politics:

The rookie mistake is: you see that some system is partly Moloch [ie. captured by misaligned special interests], so you say “Okay, we’ll fix that by putting it under the control of this other system. And we’ll control this other system by writing ‘DO NOT BECOME MOLOCH’ on it in bright red marker.”
(“I see capitalism sometimes gets misaligned. Let’s fix it by putting it under control of the government. We’ll control the government by having only virtuous people in high offices.”)
I’m not going to claim there’s a great alternative, but the occasionally-adequate alternative is the neoliberal one – find a couple of elegant systems that all optimize along different criteria approximately aligned with human happiness, pit them off against each other in a structure of checks and balances, hope they screw up in different places like in that swiss cheese model, keep enough individual free choice around that people can exit any system that gets too terrible, and let cultural evolution do the rest.

In blockchain governance, it seems like this is the only way forward as well. The approach for blockchain governance that I advocate is “multifactorial consensus”, where different coordination flags and different mechanisms and groups are polled, and the ultimate decision depends on the collective result of all of these mechanisms together. These coordination flags may include:

The roadmap (ie. the set of ideas broadcasted earlier on in the project’s history about the direction the project would be going)
Consensus among the dominant core development teams
Coin holder votes
User votes, through some kind of sybil-resistant polling system
Established norms (eg. non-interference with applications, the 21 million coin limit)

I would argue that it is very useful for coin voting to be one of several coordination institutions deciding whether or not a given change gets implemented. It is an imperfect and unrepresentative signal, but it is a Sybil-resistant one - if you see 10 million ETH voting for a given proposal, you cannot dismiss that by simply saying “oh, that’s just hired Russian trolls with fake social media accounts”. It is also a signal that is sufficiently disjoint from the core development team that if needed it can serve as a check on it. However, as described above, there are very good reasons why it should not be the only coordination institution.

And underpinnning it all is the key difference from traditional systems that makes blockchains interesting: the “layer 1” that underpins the whole system is the requirement for individual users to assent to any protocol changes, and their freedom, and credible threat, to “fork off” if someone attempts to force changes on them that they consider hostile (see also: http://vitalik.ca/general/2017/05/08/coordination_problems.html).

Tightly coupled voting is also okay to have in some limited contexts - for example, despite its flaws, miners’ ability to vote on the gas limit is a feature that has proven very beneficial on multiple occasions. The risk that miners will try to abuse their power may well be lower than the risk that any specific gas limit or block size limit hard-coded by the protocol on day 1 will end up leading to serious problems, and in that case letting miners vote on the gas limit is a good thing. However, “allowing miners or validators to vote on a few specific parameters that need to be rapidly changed from time to time” is a very far cry from giving them arbitrary control over protocol rules, or letting voting control validation, and these more expansive visions of on-chain governance have a much murkier potential, both in theory and in practice.

Coin holder voting, both for governance of technical features, and for more extensive use cases like deciding who runs validator nodes and who receives money from development bounty funds, is unfortunately continuing to be popular, and so it seems worthwhile for me to write another post explaining why I (and Vlad Zamfir and others) do not consider it wise for Ethereum (or really, any base-layer blockchain) to start adopting these kinds of mechanisms in a tightly coupled form in any significant way.

I wrote about the issues with tightly coupled voting in a blog post last year, that focused on theoretical issues as well as focusing on some practical issues experienced by voting systems over the previous two years. Now, the latest scandal in DPOS land seems to be substantially worse. Because the delegate rewards in EOS are now so high (5% annual inflation, about $400m per year), the competition on who gets to run nodes has essentially become yet another frontier of US-China geopolitical economic warfare.

And that’s not my own interpretation; I quote from this article (original in Chinese):

EOS supernode voting: multibillion-dollar profits leading to crypto community inter-country warfare

Looking at community recognition, Chinese nodes feel much less represented in the community than US and Korea. Since the EOS.IO official Twitter account was founded, there has never been any interaction with the mainland Chinese EOS community. For a listing of the EOS officially promoted events and interactions with communities see the picture below.

With no support from the developer community, facing competition from Korea, the Chinese EOS supernodes have invented a new strategy: buying votes.

The article then continues to describe further strategies, like forming “alliances” that all vote (or buy votes) for each other.

Of course, it does not matter at all who the specific actors are that are buying votes or forming cartels; this time it’s some Chinese pools, last time it was “members located in the USA, Russia, India, Germany, Canada, Italy, Portugal and many other countries from around the globe”, next time it could be totally anonymous, or run out of a smartphone snuck into Trendon Shavers’s prison cell. What matters is that blockchains and cryptocurrency, originally founded in a vision of using technology to escape from the failures of human politics, have essentially all but replicated it. Crypto is a reflection of the world at large.

The EOS New York community’s response seems to be that they have issued a strongly worded letter to the world stating that buying votes will be against the constitution. Hmm, what other major political entity has made accepting bribes a violation of the constitution? And how has that been going for them lately?

The second part of this article will involve me, an armchair economist, hopefully convincing you, the reader, that yes, bribery is, in fact, bad. There are actually people who dispute this claim; the usual argument has something to do with market efficiency, as in “isn’t this good, because it means that the nodes that win will be the nodes that can be the cheapest, taking the least money for themselves and their expenses and giving the rest back to the community?” The answer is, kinda yes, but in a way that’s centralizing and vulnerable to rent-seeking cartels and explicitly contradicts many of the explicit promises made by most DPOS proponents along the way.

Let us create a toy economic model as follows. There are a number of people all of which are running to be delegates. The delegate slot gives a reward of $100 per period, and candidates promise to share some portion of that as a bribe, equally split among all of their voters. The actual N delegates (eg. N = 35) in any period are the N delegates that received the most votes; that is, during every period a threshold of votes emerges where if you get more votes than that threshold you are a delegate, if you get less you are not, and the threshold is set so that N delegates are above the threshold.

We expect that voters vote for the candidate that gives them the highest expected bribe. Suppose that all candidates start off by sharing 1%; that is, equally splitting $1 among all of their voters. Then, if some candidate becomes a delegate with K voters, each voter gets a payment of 1/K. The candidate that it’s most profitable to vote for is a candidate that’s expected to be in the top N, but is expected to earn the fewest votes within that set. Thus, we expect votes to be fairly evenly split among 35 delegates.

Now, some candidates will want to secure their position by sharing more; by sharing 2%, you are likely to get twice as many votes as those that share 1%, as that’s the equilibrium point where voting for you has the same payout as voting for anyone else. The extra guarantee of being elected that this gives is definitely worth losing an additional 1% of your revenue when you do get elected. We can expect delegates to bid up their bribes and eventually share something close to 100% of their revenue. So the outcome seems to be that the delegate payouts are largely simply returned to voters, making the delegate payout mechanism close to meaningless.

But it gets worse. At this point, there’s an incentive for delegates to form alliances (aka political parties, aka cartels) to coordinate their share percentages; this reduces losses to the cartel from chaotic competition that accidentally leads to some delegates not getting enough votes. Once a cartel is in place, it can start bringing its share percentages down, as dislodging it is a hard coordination problem: if a cartel offers 80%, then a new entrant offers 90%, then to a voter, seeking a share of that extra 10% is not worth the risk of either (i) voting for someone who gets insufficient votes and does not pay rewards, or (ii) voting for someone who gets too many votes and so pays out a reward that’s excessively diluted.

Sidenote: Bitshares DPOS used approval voting, where you can vote for as many candidates as you want; it should be pretty obvious that with even slight bribery, the equilibrium there is that everyone just votes for everyone.

Furthermore, even if cartel mechanics don’t come into play, there is a further issue. This equilibrium of coin holders voting for whoever gives them the most bribes, or a cartel that has become an entrenched rent seeker, contradicts explicit promises made by DPOS proponents.

Quoting “Explain Delegated Proof of Stake Like I’m 5”:

If a Witness starts acting like an asshole, or stops doing a quality job securing the network, people in the community can remove their votes, essentially firing the bad actor. Voting is always ongoing.

From “EOS: An Introduction”:

By custom, we suggest that the bulk of the value be returned to the community for the common good - software improvements, dispute resolution, and the like can be entertained. In the spirit of “eating our own dogfood,” the design envisages that the community votes on a set of open entry contracts that act like “foundations” for the benefit of the community. Known as Community Benefit Contracts, the mechanism highlights the importance of DPOS as enabling direct on-chain governance by the community (below).

The flaw in all of this, of course, is that the average voter has only a very small chance of impacting which delegates get selected, and so they only have a very small incentive to vote based on any of these high-minded and lofty goals; rather, their incentive is to vote for whoever offers the highest and most reliable bribe. Attacking is easy. If a cartel equilibrium does not form, then an attacker can simply offer a share percentage slightly higher than 100% (perhaps using fee sharing or some kind of “starter promotion” as justification), capture the majority of delegate positions, and then start an attack. If they get removed from the delegate position via a hard fork, they can simply restart the attack again with a different identity.

The above is not intended purely as a criticism of DPOS consensus or its use in any specific blockchain. Rather, the critique reaches much further. There has been a large number of projects recently that extol the virtues of extensive on-chain governance, where on-chain coin holder voting can be used not just to vote on protocol features, but also to control a bounty fund. Quoting a blog post from last year:

Anyone can submit a change to the governance structure in the form of a code update. An on-chain vote occurs, and if passed, the update makes its way on to a test network. After a period of time on the test network, a confirmation vote occurs, at which point the change goes live on the main network. They call this concept a “self-amending ledger”.
Such a system is interesting because it shifts power towards users and away from the more centralized group of developers and miners. On the developer side, anyone can submit a change, and most importantly, everyone has an economic incentive to do it. Contributions are rewarded by the community with newly minted tokens through inflation funding. This shifts from the current Bitcoin and Ethereum dynamics where a new developer has little incentive to evolve the protocol, thus power tends to concentrate amongst the existing developers, to one where everyone has equal earning power.

In practice, of course, what this can easily lead to is funds that offer kickbacks to users who vote for them, leading to the exact scenario that we saw above with DPOS delegates. In the best case, the funds will simply be returned to voters, giving coin holders an interest rate that cancels out the inflation, and in the worst case, some portion of the inflation will get captured as economic rent by a cartel.

Note also that the above is not a criticism of all on-chain voting; it does not rule out systems like futarchy. However, futarchy is untested, but coin voting is tested, and so far it seems to lead to a high risk of economic or political failure of some kind - far too high a risk for a platform that seeks to be an economic base layer for development of decentralized applications and institutions.

So what’s the alternative? The answer is what we’ve been saying all along: cryptoeconomics. Cryptoeconomics is fundamentally about the use of economic incentives together with cryptography to design and secure different kinds of systems and applications, including consensus protocols. The goal is simple: to be able to measure the security of a system (that is, the cost of breaking the system or causing it to violate certain guarantees) in dollars. Traditionally, the security of systems often depends on social trust assumptions: the system works if 2 of 3 of Alice, Bob and Charlie are honest, and we trust Alice, Bob and Charlie to be honest because I know Alice and she’s a nice girl, Bob registered with FINCEN and has a money transmitter license, and Charlie has run a successful business for three years and wears a suit.

Social trust assumptions can work well in many contexts, but they are difficult to universalize; what is trusted in one country or one company or one political tribe may not be trusted in others. They are also difficult to quantify; how much money does it take to manipulate social media to favor some particular delegate in a vote? Social trust assumptions seem secure and controllable, in the sense that “people” are in charge, but in reality they can be manipulated by economic incentives in all sorts of ways.

Cryptoeconomics is about trying to reduce social trust assumptions by creating systems where we introduce explicit economic incentives for good behavior and economic penalties for ban behavior, and making mathematical proofs of the form “in order for guarantee X to be violated, at least these people need to misbehave in this way, which means the minimum amount of penalties or foregone revenue that the participants suffer is Y”. Casper is designed to accomplish precisely this objective in the context of proof of stake consensus. Yes, this does mean that you can’t create a “blockchain” by concentrating the consensus validation into 20 uber-powerful “supernodes” and you have to actually think to make a design that intelligently breaks through and navigates existing tradeoffs and achieves massive scalability in a still-decentralized network. But the reward is that you don’t get a network that’s constantly liable to breaking in half or becoming economically captured by unpredictable political forces.

It has been brought to my attention that EOS may be reducing its delegate rewards from 5% per year to 1% per year. Needless to say, this doesn't really change the fundamental validity of any of the arguments; the only result of this would be 5x less rent extraction potential at the cost of a 5x reduction to the cost of attacking the system.
Some have asked: but how can it be wrong for DPOS delegates to bribe voters, when it is perfectly legitimate for mining and stake pools to give 99% of their revenues back to their participants? The answer should be clear: in PoW and PoS, it's the protocol's role to determine the rewards that miners and validators get, based on the miners and validators' observed performance, and the fact that miners and validators that are pools pass along the rewards (and penalties!) to their participants gives the participants an incentive to participate in good pools. In DPOS, the reward is constant, and it's the voters' role to vote for pools that have good performance, but with the key flaw that there is no mechanism to actually encourage voters to vote in that way instead of just voting for whoever gives them the most money without taking performance into account. Penalties in DPOS do not exist, and are certainly not passed on to voters, so voters have no "skin in the game" (penalties in Casper pools, on the other hand, do get passed on to participants).

Recently I had the fortune to have received an advance copy of Eric Posner and Glen Weyl’s new book, Radical Markets, which could be best described as an interesting new way of looking at the subject that is sometimes called “political economy” - tackling the big questions of how markets and politics and society intersect. The general philosophy of the book, as I interpret it, can be expressed as follows. Markets are great, and price mechanisms are an awesome way of guiding the use of resources in society and bringing together many participants’ objectives and information into a coherent whole. However, markets are socially constructed because they depend on property rights that are socially constructed, and there are many different ways that markets and property rights can be constructed, some of which are unexplored and potentially far better than what we have today. Contra doctrinaire libertarians, freedom is a high-dimensional design space.

The book interests me for multiple reasons. First, although I spend most of my time in the blockchain/crypto space heading up the Ethereum project and in some cases providing various kinds of support to projects in the space, I do also have broader interests, of which the use of economics and mechanism design to make more open, free, egalitarian and efficient systems for human cooperation, including improving or replacing present-day corporations and governments, is a major one. The intersection of interests between the Ethereum community and Posner and Weyl’s work is multifaceted and plentiful; Radical Markets dedicates an entire chapter to the idea of “markets for personal data”, redefining the economic relationship between ourselves and services like Facebook, and well, look what the Ethereum community is working on: markets for personal data.

Second, blockchains may well be used as a technical backbone for some of the solutions described in the book, and Ethereum-style smart contracts are ideal for the kinds of complex systems of property rights that the book explores. Third, the economic ideas and challenges that the book brings up are ideas that have also been explored, and will be continue to be explored, at great length by the blockchain community for its own purposed. Posner and Weyl’s ideas often have the feature that they allow economic incentive alignment to serve as a substitute for subjective ad-hoc bureaucracy (eg. Harberger taxes can essentially replace eminent domain), and given that blockchains lack access to trusted human-controlled courts, these kinds of solutions may prove to be be even more ideal for blockchain-based markets than they are for “real life”.

I will warn that readers are not at all guaranteed to find the book’s proposals acceptable; at least the first three have already been highly controversial and they do contravene many people’s moral preconceptions about how property should and should work and where money and markets can and can’t be used. The authors are no strangers to controversy; Posner has on previous occasions even proven willing to argue against such notions as human rights law. That said, the book does go to considerable lengths to explain why each proposal improves efficiency if it could be done, and offer multiple versions of each proposal in the hopes that there is at least one (even if partial) implementation of each idea that any given reader can find agreeable.

What do Posner and Weyl talk about?

The book is split into five major sections, each arguing for a particular reform: self-assessed property taxes, quadratic voting, a new kind of immigration program, breaking up big financial conglomerates that currently make banks and other industries act like monopolies even if they appear at first glance to be competitive, and markets for selling personal data. Properly summarizing all five sections and doing them justice would take too long, so I will focus on a deep summary of one specific section, dealing with a new kind of property taxation, to give the reader a feel for the kinds of ideas that the book is about.

Harberger taxes

See also: “Property Is Only Another Name for Monopoly”, Posner and Weyl

Markets and private property are two ideas that are often considered together, and it is difficult in modern discourse to imagine one without (or even with much less of) the other. In the 19th century, however, many economists in Europe were both libertarian and egalitarian, and it was quite common to appreciate markets while maintaining skepticism toward the excesses of private property. A rather interesting example of this is the Bastiat-Proudhon debate from 1849-1850 where the two dispute the legitimacy of charging interest on loans, with one side focusing on the mutual gains from voluntary contracts and the other focusing on their suspicion of the potential for people with capital to get even richer without working, leading to unbalanced capital accumulation.

As it turns out, it is absolutely possible to have a system that contains markets but not property rights: at the end of every year, collect every piece of property, and at the start of the next year have the government auction every piece out to the highest bidder. This kind of system is intuitively quite unrealistic and impractical, but it has the benefit that it achieves perfect allocative efficiency: every year, every object goes to the person who can derive the most value from it (ie. the highest bidder). It also gives the government a large amount of revenue that could be used to completely substitute income and sales taxes or fund a basic income.

Now you might ask: doesn’t the existing property system also achieve allocative efficiency? After all, if I have an apple, and I value it at $2, and you value it at $3, then you could offer me $2.50 and I would accept. However, this fails to take into account imperfect information: how do you know that I value it at $2, and not $2.70? You could offer to buy it for $2.99 so that you can be sure that you’ll get it if you really are the one who values the apple more, but then you would be gaining practically nothing from the transaction. And if you ask me to set the price, how do I know that you value it at $3, and not $2.30? And if I set the price to $2.01 to be sure, I would be gaining practically nothing from the transaction. Unfortunately, there is a result known as the Myerson-Satterthwaite Theorem which means that no solution is efficient; that is, any bargaining algorithm in such a situation must at least sometimes lead to inefficiency from mutually beneficial deals falling through.

If there are many buyers you have to negotiate with, things get even harder. If a developer (in the real estate sense) is trying to make a large project that requires buying 100 existing properties, and 99 have already agreed, the remaining one has a strong incentive to charge a very high price, much higher than their actual personal valuation of the property, hoping that the developer will have no choice but to pay up.

Well, not necessarily no choice. But a very inconvenient and both privately and socially wasteful choice.

Re-auctioning everything once a year completely solves this problem of allocative efficiency, but at a very high cost to investment efficiency: there’s no point in building a house in the first place if six months later it will get taken away from you and re-sold in an auction. All property taxes have this problem; if building a house costs you $90 and brings you $100 of benefit, but then you have to pay $15 more property tax if you build the house, then you will not build the house and that $10 gain is lost to society.

One of the more interesting ideas from the 19th century economists, and specifically Henry George, was a kind of property tax that did not have this problem: the land value tax. The idea is to charge tax on the value of land, but not the improvements to the land; if you own a $100,000 plot of dirt you would have to pay $5,000 per year taxes on it regardless of whether you used the land to build a condominium or simply as a place to walk your pet doge.

A doge.

Weyl and Posner are not convinced that Georgian land taxes are viable in practice:

Consider, for example, the Empire State Building. What is the pure value of the land beneath it? One could try to infer its value by comparing it to the value of adjoining land. But the building itself defines the neighborhood around it; removing the building would almost certainly change the value of its surrounding land. The land and the building, even the neighborhood, are so tied together, it would be hard to figure out a separate value for each of them.

Arguably this does not exclude the possibility of a different kind of Georgian-style land tax: a tax based on the average of property values across a sufficiently large area. That would preserve the property that improving a single piece of land would not (greatly) perversely increase the taxes that they have to pay, without having to find a way to distinguish land from improvements in an absolute sense. But in any case, Posner and Weyl move on to their main proposal: self-assessed property taxes.

Consider a system where property owners themselves specify what the value of their property is, and pay a tax rate of, say, 2% of that value per year. But here is the twist: whatever value they specify for their property, they have to be willing to sell it to anyone at that price.

If the tax rate is equal to the chance per year that the property gets sold, then this achieves optimal allocative efficiency: raising your self-assessed property value by $1 increases the tax you pay by $0.02, but it also means there is a 2% chance that someone will buy the property and pay $1 more, so there is no incentive to cheat in either direction. It does harm investment efficiency, but vastly less so than all property being re-auctioned every year.

Posner and Weyl then point out that if more investment efficiency is desired, a hybrid solution with a lower property tax is possible:

When the tax is reduced incrementally to improve investment efficiency, the loss in allocative efficiency is less than the gain in investment efficiency. The reason is that the most valuable sales are ones where the buyer is willing to pay significantly more than the seller is willing to accept. These transactions are the first ones enabled by a reduction in the price as even a small price reduction will avoid blocking these most valuable transactions. In fact, it can be shown that the size of the social loss from monopoly power grows quadratically in the extent of this power. Thus, reducing the markup by a third eliminates close to 5/9 = (3²-2²)/(3²) of the allocative harm from private ownership.

This concept of quadratic deadweight loss is a truly important insight in economics, and is arguably the deep reason why “moderation in all things” is such an attractive principle: the first step you take away from an extreme will generally be the most valuable.

The book then proceeds to give a series of side benefits that this tax would have, as well as some downsides. One interesting side benefit is that it removes an information asymmetry flaw that exists with property sales today, where owners have the incentive to expend effort on making their property look good even in potentially misleading ways. With a properly set Harberger tax, if you somehow mange to trick the world into thinking your house is 5% more valuable, you’ll get 5% more when you sell it but until that point you’ll have to pay 5% more in taxes, or else someone will much more quickly snap it up from you at the original price.

The downsides are smaller than they seem; for example, one natural disadvantage is that it exposes property owners to uncertainty due to the possibility that someone will snap up their property at any time, but that is hardly an unknown as it’s a risk that renters already face every day. But Weyl and Posner do propose more moderate ways of introducing the tax that don’t have these issues. First, the tax can be applied to types of property that are currently government owned; it’s a potentially superior alternative to both continued government ownership and traditional full-on privatization. Second, the tax can be applied to forms of property that are already “industrial” in usage: radio spectrum licenses, domain names, intellectual property, etc.

The Rest of the Book

The remaining chapters bring up similar ideas that are similar in spirit to the discussion on Harberger taxes in their use of modern game-theoretic principles to make mathematically optimized versions of existing social institutions. One of the proposals is for something called quadratic voting, which I summarize as follows.

Suppose that you can vote as many times as you want, but voting costs “voting tokens” (say each citizen is assigned N voting tokens per year), and it costs tokens in a nonlinear way: your first vote costs one token, your second vote costs two tokens, and so forth. If someone feels more strongly about something, the argument goes, they would be willing to pay more for a single vote; quadratic voting takes advantage of this by perfectly aligning quantity of votes with cost of votes: if you’re willing to pay up to 15 tokens for a vote, then you will keep buying votes until your last one costs 15 tokens, and so you will cast 15 votes in total. If you’re willing to pay up to 30 tokens for a vote, then you will keep buying votes until you can’t buy any more for a price less than or equal to 30 tokens, and so you will end up casting 30 votes. The voting is “quadratic” because the total amount you pay for N votes goes up proportionately to N².

After this, the book describes a market for immigration visas that could greatly expand the number of immigrants admitted while making sure local residents benefit and at the same time aligning incentives to encourage visa sponsors to choose immigrants that are more ikely to succeed in the country and less likely to commit crimes, then an enhancement to antitrust law, and finally the idea of setting up markets for personal data.

Markets in Everything

There are plenty of ways that one could respond to each individual proposal made in the book. I personally, for example, find the immigration visa scheme that Posner and Weyl propose well-intentioned and see how it could improve on the status quo, but also overcomplicated, and it seems simpler to me to have a scheme where visas are auctioned or sold every year, with an additional requirement for migrants to obtain liability insurance. Robin Hanson recently proposed greatly expanding liability insurance mandates as an alternative to many kinds of regulation, and while imposing new mandates on an entire society seems unrealistic, a new expanded immigration program seems like the perfect place to start considering them. Paying people for personal data is interesting, but there are concerns about adverse selection: to put it politely, the kinds of people that are willing to sit around submitting lots of data to Facebook all year to earn $16.92 (Facebook’s current annualized revenue per user) are not the kinds of people that advertisers are willing to burn hundreds of dollars per person trying to market rolexes and Lambos to. However, what I find more interesting is the general principle that the book tries to promote.

Over the last hundred years, there truly has been a large amount of research into designing economic mechanisms that have desirable properties and that outperform simple two-sided buy-and-sell markets. Some of this research has been put into use in some specific industries; for example, combinatorial auctions are used in airports, radio spectrum auctions and several other industrial use cases, but it hasn’t really seeped into any kind of broader policy design; the political systems and property rights that we have are still largely the same as we had two centuries ago. So can we use modern economic insights to reform base-layer markets and politics in such a deep way, and if so, should we?

Normally, I love markets and clean incentive alignment, and dislike politics and bureaucrats and ugly hacks, and I love economics, and I so love the idea of using economic insights to design markets that work better so that we can reduce the role of politics and bureaucrats and ugly hacks in society. Hence, naturally, I love this vision. So let me be a good intellectual citizen and do my best to try to make a case against it.

There is a limit to how complex economic incentive structures and markets can be because there is a limit to users’ ability to think and re-evaluate and give ongoing precise measurements for their valuations of things, and people value reliability and certainty. Quoting Steve Waldman criticizing Uber surge pricing:

Finally, we need to consider questions of economic calculation. In macroeconomics, we sometimes face tradeoffs between an increasing and unpredictably variable price-level and full employment. Wisely or not, our current policy is to stabilize the price level, even at short-term cost to output and employment, because stable prices enable longer-term economic calculation. That vague good, not visible on a supply/demand diagram, is deemed worth very large sacrifices. The same concern exists in a microeconomic context. If the “ride-sharing revolution” really takes hold, a lot of us will have decisions to make about whether to own a car or rely upon the Sidecars, Lyfts, and Ubers of the world to take us to work every day. To make those calculations, we will need something like predictable pricing. Commuting to our minimum wage jobs (average is over!) by Uber may be OK at standard pricing, but not so OK on a surge. In the desperate utopia of the “free-market economist”, there is always a solution to this problem. We can define futures markets on Uber trips, and so hedge our exposure to price volatility! In practice that is not so likely…

And:

It’s clear that in a lot of contexts, people have a strong preference for price-predictability over immediate access. The vast majority of services that we purchase and consume are not price-rationed in any fine-grained way. If your hairdresser or auto mechanic is busy, you get penciled in for next week…

Strong property rights are valuable for the same reason: beyond the arguments about allocative and investment efficiency, they provide the mental convenience and planning benefits of predictability.

It’s worth noting that even Uber itself doesn’t do surge pricing in the “market-based” way that economists would recommend. Uber is not a market where drivers can set their own prices, riders can see what prices are available, and themselves choose their tradeoff between price and waiting time. Why does Uber not do this? One argument is that, as Steve Waldman says, “Uber itself is a cartel”, and wants to have the power to adjust market prices not just for efficiency but also reasons such as profit maximization, strategically setting prices to drive out competing platforms (and taxis and public transit), and public relations. As Waldman further points out, one Uber competitor, Sidecar, does have the ability for drivers to set prices, and I would add that I have seen ride-sharing apps in China where passengers can offer drivers higher prices to try to coax them to get a car faster.

A possible counter-argument that Uber might give is that drivers themselves are actually less good at setting optimal prices than Uber’s own algorithms, and in general people value the convenience of one-click interfaces over the mental complexity of thinking about prices. If we assume that Uber won its market dominance over competitors like Sidecar fairly, then the market itself has decided that the economic gain from marketizing more things is not worth the mental transaction costs.

Harberger taxes, at least to me, seem like they would lead to these exact kinds of issues multipled by ten; people are not experts at property valuation, and would have to spend a significant amount of time and mental effort figuring out what self-assessed value to put for their house, and they would complain much more if they accidentally put a value that’s too low and suddenly find that their house is gone. If Harberger taxes were to be applied to smaller property items as well, people would need to juggle a large amount of mental valuations of everything. A similar critique could apply to many kinds of personal data markets, and possibly even to quadratic voting if implemented in its full form.

I could challenge this by saying “ah, even if that’s true, this is the 21st century, we could have companies that build AIs that make pricing decisions on your behalf, and people could choose the AI that seems to work best; there could even be a public option”; and Posner and Weyl themselves suggest that this is likely the way to go. And this is where the interesting conversation starts.

Tales from Crypto Land

One reason why this discussion particularly interests me is that the cryptocurrency and blockchain space itself has, in some cases, run up against similar challenges. In the case of Harberger taxes, we actually did consider almost exactly that same proposal in the context of the Ethereum Name System (our decentralized alternative to DNS), but the proposal was ultimately rejected. I asked the ENS developers why it was rejected. Paraphrasing their reply, the challenge is as follows.

Many ENS domain names are of a type that would only be interesting to precisely two classes of actors: (i) the “legitimate owner” of some given name, and (ii) scammers. Furthermore, in some particular cases, the legitimate owner is uniquely underfunded, and scammers are uniquely dangerous. One particular case is MyEtherWallet, an Ethereum wallet provider. MyEtherWallet provides an important public good to the Ethereum ecosystem, making Ethereum easier to use for many thousands of people, but is able to capture only a very small portion of the value that it provides; as a result, the budget that it has for outbidding others for the domain name is low. If a scammer gets their hands on the domain, users trusting MyEtherWallet could easily be tricked into sending all of their ether (or other Ethereum assets) to a scammer. Hence, because there is generally one clear “legitimate owner” for any domain name, a pure property rights regime presents little allocative efficiency loss, and there is a strong overriding public interest toward stability of reference (ie. a domain that’s legitimate one day doesn’t redirect to a scam the next day), so any level of Harberger taxation may well bring more harm than good.

I suggested to the ENS developers the idea of applying Harberger taxes to short domains (eg. abc.eth), but not long ones; the reply was that it would be too complicated to have two classes of names. That said, perhaps there is some version of the proposal that could satisfy the specific constraints here; I would be interested to hear Posner and Weyl’s feedback on this particular application.

Another story from the blockchain and Ethereum space that has a more pro-radical-market conclusion is that of transaction fees. The notion of mental transaction costs, the idea that the inconvenience of even thinking about whether or not some small payment for a given digital good is worth it is enough of a burden to prevent “micro-markets” from working, is often used as an argument for why mass adoption of blockchain tech would be difficult: every transaction requires a small fee, and the mental expenditure of figuring out what fee to pay is itself a major usability barrier. These arguments increased further at the end of last year, when both Bitcoin and Ethereum transaction fees briefly spiked up by a factor of over 100 due to high usage (talk about surge pricing!), and those who accidentally did not pay high enough fees saw their transactions get stuck for days.

That said, this is a problem that we have now, arguably, to a large extent overcome. After the spikes at the end of last year, Ethereum wallets developed more advanced algorithms for choosing what transaction fees to pay to ensure that one’s transaction gets included in the chain, and today most users are happy to simply defer to them. In my own personal experience, the mental transaction costs of worrying about transaction fees do not really exist, much like a driver of a car does not worry about the gasoline consumed by every single turn, acceleration and braking made by their car.

Personal price-setting AIs for interacting with open markets: already a reality in the Ethereum transaction fee market

A third kind of “radical market” that we are considering implementing in the context of Ethereum’s consensus system is one for incentivizing deconcentration of validator nodes in proof of stake consensus. It’s important for blockchains to be decentralized, a similar challenge to what antitrust law tries to solve, but the tools at our disposal are different. Posner and Weyl’s solution to antitrust, banning institutional investment funds from owning shares in multiple competitors in the same industry, is far too subjective and human-judgement-dependent to work in a blockchain, but for our specific context we have a different solution: if a validator node commits an error, it gets penalized an amount proportional to the number of other nodes that have committed an error around the same time. This incentivizes nodes to set themselves up in such a way that their failure rate is maximally uncorrelated with everyone else’s failure rate, reducing the chance that many nodes fail at the same time and threaten to the blockchain’s integrity. I want to ask Posner and Weyl: though our exact approach is fairly application-specific, could a similarly elegant “market-based” solution be discovered to incentivize market deconcentration in general?

All in all, I am optimistic that the various behavioral kinks around implementing “radical markets” in practice could be worked out with the help of good defaults and personal AIs, though I do think that if this vision is to be pushed forward, the greatest challenge will be finding progressively larger and more meaningful places to test it out and show that the model works. I particularly welcome the use of the blockchain and crypto space as a testing ground.

Another Kind of Radical Market

The book as a whole tends to focus on centralized reforms that could be implemented on an economy from the top down, even if their intended long-term effect is to push more decision-making power to individuals. The proposals involve large-scale restructurings of how property rights work, how voting works, how immigration and antitrust law works, and how individuals see their relationship with property, money, prices and society. But there is also the potential to use economics and game theory to come up with decentralized economic institutions that could be adopted by smaller groups of people at a time.

Perhaps the most famous examples of decentralized institutions from game theory and economics land are (i) assurance contracts, and (ii) prediction markets. An assurance contract is a system where some public good is funded by giving anyone the opportunity to pledge money, and only collecting the pledges if the total amount pledged exceeds some threshold. This ensures that people can donate money knowing that either they will get their money back or there actually will be enough to achieve some objective. A possible extension of this concept is Alex Tabarrok’s dominant assurance contracts, where an entrepreneur offers to refund participants more than 100% of their deposits if a given assurance contract does not raise enough money.

Prediction markets allow people to bet on the probability that events will happen, potentially even conditional on some action being taken (“I bet $20 that unemployment will go down if candidate X wins the election”); there are techniques for people interested in the information to subsidize the markets. Any attempt to manipulate the probability that a prediction market shows simply creates an opportunity for people to earn free money (yes I know, risk aversion and capital efficiency etc etc; still close to free) by betting against the manipulator.

Posner and Weyl do give one example of what I would call a decentralized institution: a game for choosing who gets an asset in the event of a divorce or a company splitting in half, where both sides provide their own valuation, the person with the higher valuation gets the item, but they must then give an amount equal to half the average of the two valuations to the loser. There’s some economic reasoning by which this solution, while not perfect, is still close to mathematically optimal.

One particular category of decentralized institutions I’ve been interested in is improving incentivization for content posting and content curation in social media. Some ideas that I have had include:

Proof of stake conditional hashcash (when you send someone an email, you give them the opportunity to burn $0.5 of your money if they think it’s spam)
Prediction markets for content curation (use prediction markets to predict the results of a moderation vote on content, thereby encouraging a market of fast content pre-moderators while penalizing manipulative pre-moderation)
Conditional payments for paywalled content (after you pay for a piece of downloadable content and view it, you can decide after the fact if payments should go to the author or to proportionately refund previous readers)

And ideas I have had in other contexts:

Call-out assurance contracts
DAICOs (a more decentralized and safer alternative to ICOs)

Twitter scammers: can prediction markets incentivize an autonomous swarm of human and AI-driven moderators to flag these posts and warn users not to send them ether within a few seconds of the post being made? And could such a system be generalized to the entire internet, where these is no single centralized moderator that can easily take posts down?

Some ideas others have had for decentralized institutions in general include:

TrustDavis (adding skin-in-the-game to e-commerce reputations by making e-commerce ratings be offers to insure others against the receiver of the rating committing fraud)
Circles (decentralized basic income through locally fungible coin issuance)
Markets for CAPTCHA services
Digitized peer to peer rotating savings and credit associations
Token curated registries
Crowdsourced smart contract truth oracles
Using blockchain-based smart contracts to coordinate unions

I would be interested in hearing Posner and Weyl’s opinion on these kinds of “radical markets”, that groups of people can spin up and start using by themselves without requiring potentially contentious society-wide changes to political and property rights. Could decentralized institutions like these be used to solve the key defining challenges of the twenty first century: promoting beneficial scientific progress, developing informational public goods, reducing global wealth inequality, and the big meta-problem behind fake news, government-driven and corporate-driven social media censorship, and regulation of cryptocurrency products: how do we do quality assurance in an open society?

All in all, I highly recommend Radical Markets (and by the way I also recommend Eliezer Yudkowsky’s Inadequate Equilibria) to anyone interested in these kinds of issues, and look forward to seeing the discussion that the book generates.

Special thanks to Eli ben Sasson for his kind assistance, as usual. Special thanks to Chih-Cheng Liang and Justin Drake for review.

Trigger warning: math and lots of python

As a followup to Part 1 and Part 2 of this series, this post will cover what it looks like to actually implement a STARK, complete with an implementation in python. STARKs (“Scalable Transparent ARgument of Knowledge” are a technique for creating a proof that f(x)=y where f may potentially take a very long time to calculate, but where the proof can be verified very quickly. A STARK is “doubly scalable”: for a computation with t steps, it takes roughly O(t * log(t)) steps to produce a proof, which is likely optimal, and it takes ~O(log²(t)) steps to verify, which for even moderately large values of t is much faster than the original computation. STARKs can also have a privacy-preserving “zero knowledge” property, though the use case we will apply them to here, making verifiable delay functions, does not require this property, so we do not need to worry about it.

First, some disclaimers:

This code has not been thoroughly audited; soundness in production use cases is not guaranteed
This code is very suboptimal (it’s written in Python, what did you expect)
STARKs “in real life” (ie. as implemented in Eli and co’s production implementations) tend to use binary fields and not prime fields for application-specific efficiency reasons; however, they do stress in their writings the prime field-based approach to STARKs described here is legitimate and can be used
There is no “one true way” to do a STARK. It’s a broad category of cryptographic and mathematical constructs, with different setups optimal for different applications and constant ongoing research to reduce prover and verifier complexity and improve soundness.
This article absolutely expects you to know how modular arithmetic and prime fields work, and be comfortable with the concepts of polynomials, interpolation and evaluation. If you don’t, go back to Part 2, and also this earlier post on quadratic arithmetic programs

Now, let’s get to it.

MIMC

Here is the function we’ll be doing a STARK of:

def mimc(inp, steps, round_constants):
    start_time = time.time()
    for i in range(steps-1):
        inp = (inp**3 + round_constants[i % len(round_constants)]) % modulus
    print("MIMC computed in %.4f sec" % (time.time() - start_time))
    return inp

We choose MIMC (see paper) as the example because it is both (i) simple to understand and (ii) interesting enough to be useful in real life. The function can be viewed visually as follows:

Note: in many discussions of MIMC, you will typically see XOR used instead of +; this is because MIMC is typically done over binary fields, where addition _is_ XOR; here we are doing it over prime fields.

In our example, the round constants will be a relatively small list (eg. 64 items) that gets cycled through over and over again (that is, after k[64] it loops back to using k[1]).

MIMC with a very large number of rounds, as we’re doing here, is useful as a verifiable delay function - a function which is difficult to compute, and particularly non-parallelizable to compute, but relatively easy to verify. MIMC by itself achieves this property to some extent because MIMC can be computed “backward” (recovering the “input” from its corresponding “output”), but computing it backward takes about 100 times longer to compute than the forward direction (and neither direction can be significantly sped up by parallelization). So you can think of computing the function in the backward direction as being the act of “computing” the non-parallelizable proof of work, and computing the function in the forward direction as being the process of “verifying” it.

x -> x^(2p-1)/3 gives the inverse of x -> x³; this is true because of Fermat's Little Theorem, a theorem that despite its supposed littleness is arguably much more important to mathematics than Fermat's more famous "Last Theorem".

What we will try to achieve here is to make verification much more efficient by using a STARK - instead of the verifier having to run MIMC in the forward direction themselves, the prover, after completing the computation in the “backward direction”, would compute a STARK of the computation in the “forward direction”, and the verifier would simply verify the STARK. The hope is that the overhead of computing a STARK can be less than the difference in speed running MIMC forwards relative to backwards, so a prover’s time would still be dominated by the initial “backward” computation, and not the (highly parallelizable) STARK computation. Verification of a STARK can be relatively fast (in our python implementation, ~0.05-0.3 seconds), no matter how long the original computation is.

All calculations are done modulo 2²⁵⁶ - 351 * 2³² + 1; we are using this prime field modulus because it is the largest prime below 2²⁵⁶ whose multiplicative group contains an order 2³² subgroup (that is, there’s a number g such that successive powers of g modulo this prime loop around back to 1 after exactly 2³² cycles), and which is of the form 6k+5. The first property is necessary to make sure that our efficient versions of the FFT and FRI algorithms can work, and the second ensures that MIMC actually can be computed “backwards” (see the use of x -> x^(2p-1)/3 above).

Prime field operations

We start off by building a convenience class that does prime field operations, as well as operations with polynomials over prime fields. The code is here. First some trivial bits:

class PrimeField():
    def __init__(self, modulus):
        # Quick primality test
        assert pow(2, modulus, modulus) == 2
        self.modulus = modulus

    def add(self, x, y):
        return (x+y) % self.modulus

    def sub(self, x, y):
        return (x-y) % self.modulus

    def mul(self, x, y):
        return (x*y) % self.modulus

And the Extended Euclidean Algorithm for computing modular inverses (the equivalent of computing 1/x in a prime field):

# Modular inverse using the extended Euclidean algorithm
def inv(self, a):
    if a == 0:
        return 0
    lm, hm = 1, 0
    low, high = a % self.modulus, self.modulus
    while low > 1:
        r = high//low
        nm, new = hm-lm*r, high-low*r
        lm, low, hm, high = nm, new, lm, low
    return lm % self.modulus

The above algorithm is relatively expensive; fortunately, for the special case where we need to do many modular inverses, there’s a simple mathematical trick that allows us to compute many inverses, called Montgomery batch inversion:

Using Montgomery batch inversion to compute modular inverses. Inputs purple, outputs green, multiplication gates black; the red square is the _only_ modular inversion.

The code below implements this algorithm, with some slightly ugly special case logic so that if there are zeroes in the set of what we are inverting, it sets their inverse to 0 and moves along.

def multi_inv(self, values):
    partials = [1]
    for i in range(len(values)):
        partials.append(self.mul(partials[-1], values[i] or 1))
    inv = self.inv(partials[-1])
    outputs = [0] * len(values)
    for i in range(len(values), 0, -1):
        outputs[i-1] = self.mul(partials[i-1], inv) if values[i-1] else 0
        inv = self.mul(inv, values[i-1] or 1)
    return outputs

This batch inverse algorithm will prove important later on, when we start dealing with dividing sets of evaluations of polynomials.

Now we move on to some polynomial operations. We treat a polynomial as an array, where element i is the ith degree term (eg. x³ + 2x + 1 becomes [1, 2, 0, 1]). Here’s the operation of evaluating a polynomial at one point:

# Evaluate a polynomial at a point
def eval_poly_at(self, p, x):
    y = 0
    power_of_x = 1
    for i, p_coeff in enumerate(p):
        y += power_of_x * p_coeff
        power_of_x = (power_of_x * x) % self.modulus
    return y % self.modulus

Challenge
What is the output of f.eval_poly_at([4, 5, 6], 2) if the modulus is 31?

Mouseover below for answer
6 * 2² + 5 * 2 + 4 = 38, 38 mod 31 = 7.

There is also code for adding, subtracting, multiplying and dividing polynomials; this is textbook long addition/subtraction/multiplication/division. The one non-trivial thing is Lagrange interpolation, which takes as input a set of x and y coordinates, and returns the minimal polynomial that passes through all of those points (you can think of it as being the inverse of polynomial evaluation):

# Build a polynomial that returns 0 at all specified xs
def zpoly(self, xs):
    root = [1]
    for x in xs:
        root.insert(0, 0)
        for j in range(len(root)-1):
            root[j] -= root[j+1] * x
    return [x % self.modulus for x in root]

def lagrange_interp(self, xs, ys):
    # Generate master numerator polynomial, eg. (x - x1) * (x - x2) * ... * (x - xn)
    root = self.zpoly(xs)

    # Generate per-value numerator polynomials, eg. for x=x2,
    # (x - x1) * (x - x3) * ... * (x - xn), by dividing the master
    # polynomial back by each x coordinate
    nums = [self.div_polys(root, [-x, 1]) for x in xs]

    # Generate denominators by evaluating numerator polys at each x
    denoms = [self.eval_poly_at(nums[i], xs[i]) for i in range(len(xs))]
    invdenoms = self.multi_inv(denoms)

    # Generate output polynomial, which is the sum of the per-value numerator
    # polynomials rescaled to have the right y values
    b = [0 for y in ys]
    for i in range(len(xs)):
        yslice = self.mul(ys[i], invdenoms[i])
        for j in range(len(ys)):
            if nums[i][j] and ys[i]:
                b[j] += nums[i][j] * yslice
    return [x % self.modulus for x in b]

See the “M of N” section of this article for a description of the math. Note that we also have special-case methods lagrange_interp_4 and lagrange_interp_2 to speed up the very frequent operations of Lagrange interpolation of degree < 2 and degree < 4 polynomials.

Fast Fourier Transforms

If you read the above algorithms carefully, you might notice that Lagrange interpolation and multi-point evaluation (that is, evaluating a degree < N polynomial at N points) both take quadratic time to execute, so for example doing a Lagrange interpolation of one thousand points takes a few million steps to execute, and a Lagrange interpolation of one million points takes a few trillion. This is an unacceptably high level of inefficiency, so we will use a more efficient algorithm, the Fast Fourier Transform.

The FFT only takes O(n * log(n)) time (ie. ~10,000 steps for 1,000 points, ~20 million steps for 1 million points), though it is more restricted in scope; the x coordinates must be a complete set of roots of unity of some orderN = 2^k. That is, if there are N points, the x coordinates must be successive powers 1, p, p², p³… of some p where p^N = 1. The algorithm can, surprisingly enough, be used for multi-point evaluation or interpolation, with one small parameter tweak.

Challenge Find a 16th root of unity mod 337 that is not an 8th root of unity.

Mouseover below for answer
59, 146, 30, 297, 278, 191, 307, 40

You could have gotten this list by doing something like [print(x) for x in range(337) if pow(x, 16, 337) == 1 and pow(x, 8, 337) != 1], though there is a smarter way that works for much larger moduluses: first, identify a single primitive root mod 337 (that is, not a perfect square), by looking for a value x such that pow(x, 336 // 2, 337) != 1 (these are easy to find; one answer is 5), and then taking the (336 / 16)'th power of it.

Here’s the algorithm (in a slightly simplified form; see code here for something slightly more optimized):

def fft(vals, modulus, root_of_unity):
    if len(vals) == 1:
        return vals
    L = fft(vals[::2], modulus, pow(root_of_unity, 2, modulus))
    R = fft(vals[1::2], modulus, pow(root_of_unity, 2, modulus))
    o = [0 for i in vals]
    for i, (x, y) in enumerate(zip(L, R)):
        y_times_root = y*pow(root_of_unity, i, modulus)
        o[i] = (x+y_times_root) % modulus
        o[i+len(L)] = (x-y_times_root) % modulus
    return o

def inv_fft(vals, modulus, root_of_unity):
    f = PrimeField(modulus)
    # Inverse FFT
    invlen = f.inv(len(vals))
    return [(x*invlen) % modulus for x in
            fft(vals, modulus, f.inv(root_of_unity))]

You can try running it on a few inputs yourself and check that it gives results that, when you use eval_poly_at on them, give you the answers you expect to get. For example:

>>> fft.fft([3,1,4,1,5,9,2,6], 337, 85, inv=True)
[46, 169, 29, 149, 126, 262, 140, 93]
>>> f = poly_utils.PrimeField(337)
>>> [f.eval_poly_at([46, 169, 29, 149, 126, 262, 140, 93], f.exp(85, i)) for i in range(8)]
[3, 1, 4, 1, 5, 9, 2, 6]

A Fourier transform takes as input [x[0] .... x[n-1]], and its goal is to output x[0] + x[1] + ... + x[n-1] as the first element, x[0] + x[1] * 2 + ... + x[n-1] * w**(n-1) as the second element, etc etc; a fast Fourier transform accomplishes this by splitting the data in half, doing an FFT on both halves, and then gluing the result back together.

A diagram of how information flows through the FFT computation. Notice how the FFT consists of a "gluing" step followed by two copies of the FFT on two halves of the data, and so on recursively until you're down to one element.

I recommend this for more intuition on how or why the FFT works and polynomial math in general, and this thread for some more specifics on DFT vs FFT, though be warned that most literature on Fourier transforms talks about Fourier transforms over real and complex numbers, not prime fields. If you find this too hard and don’t want to understand it, just treat it as weird spooky voodoo that just works because you ran the code a few times and verified that it works, and you’ll be fine too.

Thank Goodness It’s FRI-day (that’s “Fast Reed-Solomon Interactive Oracle Proofs of Proximity”)

Reminder: now may be a good time to review and re-read Part 2

Now, we’ll get into the code for making a low-degree proof. To review, a low-degree proof is a (probabilistic) proof that at least some high percentage (eg. 80%) of a given set of values represent the evaluations of some specific polynomial whose degree is much lower than the number of values given. Intuitively, just think of it as a proof that “some Merkle root that we claim represents a polynomial actually does represent a polynomial, possibly with a few errors”. As input, we have:

A set of values that we claim are the evaluation of a low-degree polynomial
A root of unity; the x coordinates at which the polynomial is evaluated are successive powers of this root of unity
A value N such that we are proving the degree of the polynomial is strictly less than N
The modulus

Our approach is a recursive one, with two cases. First, if the degree is low enough, we just provide the entire list of values as a proof; this is the “base case”. Verification of the base case is trivial: do an FFT or Lagrange interpolation or whatever else to interpolate the polynomial representing those values, and verify that its degree is < N. Otherwise, if the degree is higher than some set minimum, we do the vertical-and-diagonal trick described at the bottom of Part 2.

We start off by putting the values into a Merkle tree and using the Merkle root to select a pseudo-random x coordinate (special_x). We then calculate the “column”:

# Calculate the set of x coordinates
xs = get_power_cycle(root_of_unity, modulus)

column = []
for i in range(len(xs)//4):
    x_poly = f.lagrange_interp_4( 
        [xs[i+len(xs)*j//4] for j in range(4)],
        [values[i+len(values)*j//4] for j in range(4)],
    )
    column.append(f.eval_poly_at(x_poly, special_x))

This packs a lot into a few lines of code. The broad idea is to re-interpret the polynomial P(x) as a polynomial Q(x, y), where P(x) = Q(x, x**4). If P has degree < N, then P'(y) = Q(special_x, y) will have degree < N/4. Since we don’t want to take the effort to actually compute Q in coefficient form (that would take a still-relatively-nasty-and-expensive FFT!), we instead use another trick. For any given value of x⁴, there are 4 corresponding values of x: x, modulus - x, and x multiplied by the two modular square roots of -1. So we already have four values of Q(?, x**4), which we can use to interpolate the polynomial R(x) = Q(x, x**4), and from there calculate R(special_x) = Q(special_x, x**4) = P'(x**4). There are N/4 possible values of x⁴, and this lets us easily calculate all of them.

A diagram from part 2; it helps to keep this in mind when understanding what's going on here

Our proof consists of some number (eg. 40) of random queries from the list of values of x⁴ (using the Merkle root of the column as a seed), and for each query we provide Merkle branches of the five values of Q(?, x**4):

m2 = merkelize(column)

# Pseudo-randomly select y indices to sample
# (m2[1] is the Merkle root of the column)
ys = get_pseudorandom_indices(m2[1], len(column), 40)

# Compute the Merkle branches for the values in the polynomial and the column
branches = []
for y in ys:
    branches.append([mk_branch(m2, y)] +
                    [mk_branch(m, y + (len(xs) // 4) * j) for j in range(4)])

The verifier’s job will be to verify that these five values actually do lie on the same degree < 4 polynomial. From there, we recurse and do an FRI on the column, verifying that the column actually does have degree < N/4. That really is all there is to FRI.

As a challenge exercise, you could try creating low-degree proofs of polynomial evaluations that have errors in them, and see how many errors you can get away passing the verifier with (hint, you’ll need to modify the prove_low_degree function; with the default prover, even one error will balloon up and cause verification to fail).

The STARK

Reminder: now may be a good time to review and re-read Part 1

Now, we get to the actual meat that puts all of these pieces together: def mk_mimc_proof(inp, steps, round_constants) (code here), which generates a proof of the execution result of running the MIMC function with the given input for some number of steps. First, some asserts:

assert steps <= 2**32 // extension_factor
assert is_a_power_of_2(steps) and is_a_power_of_2(len(round_constants))
assert len(round_constants) < steps

The extension factor is the extent to which we will be “stretching” the computational trace (the set of “intermediate values” of executing the MIMC function). We need the step count multiplied by the extension factor to be at most 2³², because we don’t have roots of unity of order 2^k for k > 32.

Our first computation will be to generate the computational trace; that is, all of the intermediate values of the computation, from the input going all the way to the output.

# Generate the computational trace
computational_trace = [inp]
for i in range(steps-1):
    computational_trace.append((computational_trace[-1]**3 + round_constants[i % len(round_constants)]) % modulus)
output = computational_trace[-1]

We then convert the computation trace into a polynomial, “laying down” successive values in the trace on successive powers of a root of unity g where g^steps = 1, and we then evaluate the polynomial in a larger set, of successive powers of a root of unity g2 where g2^{steps * 8} = 1 (note that g2⁸ = g).

computational_trace_polynomial = inv_fft(computational_trace, modulus, subroot)
p_evaluations = fft(computational_trace_polynomial, modulus, root_of_unity)

Black: powers of `g1`. Purple: powers of `g2`. Orange: 1. You can look at successive roots of unity as being arranged in a circle in this way. We are "laying" the computational trace along powers of `g1`, and then extending it compute the values of the same polynomial at the intermediate values (ie. the powers of `g2`).

We can convert the round constants of MIMC into a polynomial. Because these round constants loop around very frequently (in our tests, every 64 steps), it turns out that they form a degree-64 polynomial, and we can fairly easily compute its expression, and its extension:

skips2 = steps // len(round_constants)
constants_mini_polynomial = fft(round_constants, modulus, f.exp(subroot, skips2), inv=True)
constants_polynomial = [0 if i % skips2 else constants_mini_polynomial[i//skips2] for i in range(steps)]
constants_mini_extension = fft(constants_mini_polynomial, modulus, f.exp(root_of_unity, skips2))

Suppose there are 8192 steps of execution and 64 round constants. Here is what we are doing: we are doing an FFT to compute the round constants as a function of g1¹²⁸. We then add zeroes in between the constants to make it a function of g1 itself. Because g1¹²⁸ loops around every 64 steps, we know this function of g1 will as well. We only compute 512 steps of the extension, because we know that the extension repeats after 512 steps as well.

We now, as in the Fibonacci example in Part 1, calculate C(P(x)), except this time it’s C(P(x), P(g1*x), K(x)):

# Create the composed polynomial such that
# C(P(x), P(g1*x), K(x)) = P(g1*x) - P(x)**3 - K(x)
c_of_p_evaluations = [(p_evaluations[(i+extension_factor)%precision] -
                          f.exp(p_evaluations[i], 3) -
                          constants_mini_extension[i % len(constants_mini_extension)])
                      % modulus for i in range(precision)]
print('Computed C(P, K) polynomial')

Note that here we are no longer working with polynomials in coefficient form; we are working with the polynomials in terms of their evaluations at successive powers of the higher-order root of unity.

c_of_p is intended to be Q(x) = C(P(x), P(g1*x), K(x)) = P(g1*x) - P(x)**3 - K(x); the goal is that for every x that we are laying the computational trace along (except for the last step, as there’s no step “after” the last step), the next value in the trace is equal to the previous value in the trace cubed, plus the round constant. Unlike the Fibonacci example in Part 1, where if one computational step was at coordinate k, the next step is at coordinate k+1, here we are laying down the computational trace along successive powers of the lower-order root of unity (g1), so if one computational step is located at x = g1ⁱ, the “next” step is located at g1ⁱ⁺¹ = g1ⁱ * g1 = x * g1. Hence, for every power of the lower-order root of unity (g1) (except the last), we want it to be the case that P(x*g1) = P(x)**3 + K(x), or P(x*g1) - P(x)**3 - K(x) = Q(x) = 0. Thus, Q(x) will be equal to zero at all successive powers of the lower-order root of unity g (except the last).

There is an algebraic theorem that proves that if Q(x) is equal to zero at all of these x coordinates, then it is a multiple of the minimal polynomial that is equal to zero at all of these x coordinates: Z(x) = (x - x_1) * (x - x_2) * ... * (x - x_n). Since proving that Q(x) is equal to zero at every single coordinate we want to check is too hard (as verifying such a proof would take longer than just running the original computation!), instead we use an indirect approach to (probabilistically) prove that Q(x) is a multiple of Z(x). And how do we do that? By providing the quotient D(x) = Q(x) / Z(x) and using FRI to prove that it’s an actual polynomial and not a fraction, of course!

We chose the particular arrangement of lower and higher order roots of unity (rather than, say, laying the computational trace along the first few powers of the higher order root of unity) because it turns out that computing Z(x) (the polynomial that evaluates to zero at all points along the computational trace except the last), and dividing by Z(x) is trivial there: the expression of Z is a fraction of two terms.

# Compute D(x) = Q(x) / Z(x)
# Z(x) = (x^steps - 1) / (x - x_atlast_step)
z_num_evaluations = [xs[(i * steps) % precision] - 1 for i in range(precision)]
z_num_inv = f.multi_inv(z_num_evaluations)
z_den_evaluations = [xs[i] - last_step_position for i in range(precision)]
d_evaluations = [cp * zd * zni % modulus for cp, zd, zni in zip(c_of_p_evaluations, z_den_evaluations, z_num_inv)]
print('Computed D polynomial')

Notice that we compute the numerator and denominator of Z directly in “evaluation form”, and then use the batch modular inversion to turn dividing by Z into a multiplication (* zd * zni), and then pointwise multiply the evaluations of Q(x) by these inverses of Z(x). Note that at the powers of the lower-order root of unity except the last (ie. along the portion of the low-degree extension that is part of the original computational trace), Z(x) = 0, so this computation involving its inverse will break. This is unfortunate, though we will plug the hole by simply modifying the random checks and FRI algorithm to not sample at those points, so the fact that we calculated them wrong will never matter.

Because Z(x) can be expressed so compactly, we get another benefit: the verifier can compute Z(x) for any specific x extremely quickly, without needing any precomputation. It’s okay for the prover to have to deal with polynomials whose size equals the number of steps, but we don’t want to ask the verifier to do the same, as we want verification to be succinct (ie. ultra-fast, with proofs as small as possible).

Probabilistically checking D(x) * Z(x) = Q(x) at a few randomly selected points allows us to verify the transition constraints - that each computational step is a valid consequence of the previous step. But we also want to verify the boundary constraints - that the input and the output of the computation is what the prover says they are. Just asking the prover to provide evaluations of P(1), D(1), P(last_step) and D(last_step) (where last_step (or g^steps-1) is the coordinate corresponding to the last step in the computation) is too fragile; there’s no proof that those values are on the same polynomial as the rest of the data. So instead we use a similar kind of polynomial division trick:

# Compute interpolant of ((1, input), (x_atlast_step, output))
interpolant = f.lagrange_interp_2([1, last_step_position], [inp, output])
i_evaluations = [f.eval_poly_at(interpolant, x) for x in xs]

zeropoly2 = f.mul_polys([-1, 1], [-last_step_position, 1])
inv_z2_evaluations = f.multi_inv([f.eval_poly_at(quotient, x) for x in xs])

# B = (P - I) / Z2
b_evaluations = [((p - i) * invq) % modulus for p, i, invq in zip(p_evaluations, i_evaluations, inv_z2_evaluations)]
print('Computed B polynomial')

The argument is as follows. The prover wants to prove P(1) == input and P(last_step) == output. If we take I(x) as the interpolant - the line that crosses the two points (1, input) and (last_step, output), then P(x) - I(x) would be equal to zero at those two points. Thus, it suffices to prove that P(x) - I(x) is a multiple of (x - 1) * (x - last_step), and we do that by… providing the quotient!

Purple: computational trace polynomial (P). Green: interpolant (I) (notice how the interpolant is constructed to equal the input (which should be the first step of the computational trace) at x=1 and the output (which should be the last step of the computational trace) at x=g^steps-1. Red: P - I. Yellow: the minimal polynomial that equals 0 at x=1 and x=g^steps-1 (that is, Z2). Pink: (P - I) / Z2.

Challenge Suppose you wanted to also prove that the value in the computational trace after the 703rd computational step is equal to 8018284612598740. How would you modify the above algorithm to do that?
Mouseover below for answer
Set I(x) to be the interpolant of (1, input), (g ** 703, 8018284612598740), (last_step, output), and make a proof by providing the quotient B(x) = (P(x) - I(x)) / ((x - 1) * (x - g ** 703) * (x - last_step))

Now, we commit to the Merkle root of P, D and B combined together.

# Compute their Merkle roots
mtree = merkelize([pval.to_bytes(32, 'big') +
                   dval.to_bytes(32, 'big') +
                   bval.to_bytes(32, 'big') for
                   pval, dval, bval in zip(p_evaluations, d_evaluations, b_evaluations)])
print('Computed hash root')

Now, we need to prove that P, D and B are all actually polynomials, and of the right max-degree. But FRI proofs are big and expensive, and we don’t want to have three FRI proofs. So instead, we compute a pseudorandom linear combination of P, D and B (using the Merkle root of P, D and B as a seed), and do an FRI proof on that:

k1 = int.from_bytes(blake(mtree[1] + b'\x01'), 'big')
k2 = int.from_bytes(blake(mtree[1] + b'\x02'), 'big')
k3 = int.from_bytes(blake(mtree[1] + b'\x03'), 'big')
k4 = int.from_bytes(blake(mtree[1] + b'\x04'), 'big')

# Compute the linear combination. We don't even bother calculating it
# in coefficient form; we just compute the evaluations
root_of_unity_to_the_steps = f.exp(root_of_unity, steps)
powers = [1]
for i in range(1, precision):
    powers.append(powers[-1] * root_of_unity_to_the_steps % modulus)

l_evaluations = [(d_evaluations[i] +
                  p_evaluations[i] * k1 + p_evaluations[i] * k2 * powers[i] +
                  b_evaluations[i] * k3 + b_evaluations[i] * powers[i] * k4) % modulus
                  for i in range(precision)]

Unless all three of the polynomials have the right low degree, it’s almost impossible that a randomly selected linear combination of them will (you have to get extremely lucky for the terms to cancel), so this is sufficient.

We want to prove that the degree of D is less than 2 * steps, and that of P and B are less than steps, so we actually make a random linear combination of P, P * x^steps, B, B^steps and D, and check that the degree of this combination is less than 2 * steps.

Now, we do some spot checks of all of the polynomials. We generate some random indices, and provide the Merkle branches of the polynomial evaluated at those indices:

# Do some spot checks of the Merkle tree at pseudo-random coordinates, excluding
# multiples of `extension_factor`
branches = []
samples = spot_check_security_factor
positions = get_pseudorandom_indices(l_mtree[1], precision, samples,
                                     exclude_multiples_of=extension_factor)
for pos in positions:
    branches.append(mk_branch(mtree, pos))
    branches.append(mk_branch(mtree, (pos + skips) % precision))
    branches.append(mk_branch(l_mtree, pos))
print('Computed %d spot checks' % samples)

The get_pseudorandom_indices function returns some random indices in the range [0…precision-1], and the exclude_multiples_of parameter tells it to not give values that are multiples of the given parameter (here, extension_factor). This ensures that we do not sample along the original computational trace, where we are likely to get wrong answers.

The proof (~250-500 kilobytes altogether) consists of a set of Merkle roots, the spot-checked branches, and a low-degree proof of the random linear combination:

o = [mtree[1],
     l_mtree[1],
     branches,
     prove_low_degree(l_evaluations, root_of_unity, steps * 2, modulus, exclude_multiples_of=extension_factor)]

The largest parts of the proof in practice are the Merkle branches, and the FRI proof, which consists of even more branches. And here’s the “meat” of the verifier:

for i, pos in enumerate(positions):
    x = f.exp(G2, pos)
    x_to_the_steps = f.exp(x, steps)
    mbranch1 =  verify_branch(m_root, pos, branches[i*3])
    mbranch2 =  verify_branch(m_root, (pos+skips)%precision, branches[i*3+1])
    l_of_x = verify_branch(l_root, pos, branches[i*3 + 2], output_as_int=True)

    p_of_x = int.from_bytes(mbranch1[:32], 'big')
    p_of_g1x = int.from_bytes(mbranch2[:32], 'big')
    d_of_x = int.from_bytes(mbranch1[32:64], 'big')
    b_of_x = int.from_bytes(mbranch1[64:], 'big')

    zvalue = f.div(f.exp(x, steps) - 1,
                   x - last_step_position)
    k_of_x = f.eval_poly_at(constants_mini_polynomial, f.exp(x, skips2))

    # Check transition constraints Q(x) = Z(x) * D(x)
    assert (p_of_g1x - p_of_x ** 3 - k_of_x - zvalue * d_of_x) % modulus == 0

    # Check boundary constraints B(x) * Z2(x) + I(x) = P(x)
    interpolant = f.lagrange_interp_2([1, last_step_position], [inp, output])
    zeropoly2 = f.mul_polys([-1, 1], [-last_step_position, 1])
    assert (p_of_x - b_of_x * f.eval_poly_at(zeropoly2, x) -
            f.eval_poly_at(interpolant, x)) % modulus == 0

    # Check correctness of the linear combination
    assert (l_of_x - d_of_x -
            k1 * p_of_x - k2 * p_of_x * x_to_the_steps -
            k3 * b_of_x - k4 * b_of_x * x_to_the_steps) % modulus == 0

At every one of the positions that the prover provides a Merkle proof for, the verifier checks the Merkle proof, and checks that C(P(x), P(g1*x), K(x)) = Z(x) * D(x) and B(x) * Z2(x) + I(x) = P(x) (reminder: for x that are not along the original computation trace, Z(x) will not be zero, and so C(P(x), P(g1*x), K(x)) likely will not evaluate to zero). The verifier also checks that the linear combination is correct, and calls verify_low_degree_proof(l_root, root_of_unity, fri_proof, steps * 2, modulus, exclude_multiples_of=extension_factor) to verify the FRI proof. And we’re done!

Well, not really; soundness analysis to prove how many spot-checks for the cross-polynomial checking and for the FRI are necessary is really tricky. But that’s all there is to the code, at least if you don’t care about making even crazier optimizations. When I run the code above, we get a STARK proving “overhead” of about 300-400x (eg. a MIMC computation that takes 0.2 seconds to calculate takes 60 second to prove), suggesting that with a 4-core machine computing the STARK of the MIMC computation in the forward direction could actually be faster than computing MIMC in the backward direction. That said, these are both relatively inefficient implementations in python, and the proving to running time ratio for properly optimized implementations may be different. Also, it’s worth pointing out that the STARK proving overhead for MIMC is remarkably low, because MIMC is almost perfectly “arithmetizable” - it’s mathematical form is very simple. For “average” computations, which contain less arithmetically clean operations (eg. checking if a number is greater or less than another number), the overhead is likely much higher, possibly around 10000-50000x.

Special thanks to Emin Gun Sirer for review

We’ve heard for a long time that it’s possible to achieve consensus with 50% fault tolerance in a synchronous network where messages broadcasted by any honest node are guaranteed to be received by all other honest nodes within some known time period (if an attacker has more than 50%, they can perform a “51% attack”, and there’s an analogue of this for any algorithm of this type). We’ve also heard for a long time that if you want to relax the synchrony assumption, and have an algorithm that’s “safe under asynchrony”, the maximum achievable fault tolerance drops to 33% (PBFT, Casper FFG, etc all fall into this category). But did you know that if you add even more assumptions (specifically, you require observers to also be actively watching the consensus, and not just downloading its output after the fact), you can increase fault tolerance all the way to 99%?

This has in fact been known for a long time; Leslie Lamport’s famous 1982 paper “The Byzantine Generals Problem” (link here) contains a description of the algorithm. The following will be my attempt to describe and reformulate the algorithm in a simplified form.

Suppose that there are N nodes, which we label 0....N-1, and there is a known bound D on network latency plus clock disparity (eg. D = 8 seconds). Each node has the ability to publish a value at time T (a malicious node can of course propose values earlier or later than T). All nodes wait (N-1) * D seconds, running the following process. Define x : i as “the value x signed by node i”, x : i : j as “the value x signed by i, and that value and signature together signed by j”, etc. The proposals published in the first stage will be of the form v: i for some v and i, containing the signature of the node that proposed it.

If a validator i receives some message v : i[1] : ... : i[k], where i[1] ... i[k] is a list of indices that have (sequentially) signed the message already (just v by itself would count as k=0, and v:i as k=1), then the validator checks that (i) the time is less than T + k * D, and (ii) they have not yet seen a valid message containing v; if both checks pass, they publish v : i[1] : ... : i[k] : i.

At time T + (N-1) * D, nodes stop listening, and they use some “choice” function to pick a value out of all the values they have seen valid messages for (eg. they take the highest one). They then decide this value.

Node 1 (red) is malicious, and nodes 0 and 2 (grey) are honest. At the start, the two honest nodes make their proposals y and x, and the attacker proposes both w and z late. w reaches node 0 on time but not node 2, and z reaches neither node on time. At time T + D, nodes 0 and 2 rebroadcast all values they've seen that they have not yet broadcasted, but add their signatures on (x and w for node 0, y for node 2). Both honest nodes saw {x, y, w}; they can then use some standard choice function (eg. alphabetically highest: y).

Now, let’s explore why this works. What we need to prove is that if one honest node has seen a particular value (validly), then every other honest node has also seen that value (and if we prove this, then we know that all honest nodes are running the same choice function, so they will output the same value). Suppose that any honest node receives a message v : i[1] : ... : i[k] that they perceive to be valid (ie. it arrives before time T + k * D). Suppose x is the index of a single other honest node. Either x is part of {i[1] ... i[k]} or it is not.

In the first case (say x = i[j] for this message), we know that the honest node x had already broadcasted that message, and they did so in response to a message with j-1 signatures that they received before time T + (j-1) * D, so they broadcast their message at that time, and so the message must have been received by all honest nodes before time T + j * D.
In the second case, since the honest node sees the message before time T + k * D, then they will broadcast the message with their signature and guarantee that everyone, including x, will see it before time T + (k+1) * D.

Notice that the algorithm uses the act of adding one’s own signature as a kind of “bump” on the timeout of a message, and it’s this ability that guarantees that if one honest node saw a message on time, they can ensure that everyone else sees the message on time as well, as the definition of “on time” increments by more than network latency with every added signature.

In the case where one node is honest, can we guarantee that passive observers can also see the outcome, even if we require them to be watching the process the whole time? With the scheme as written, there’s a problem. Suppose that a commander and some subset of k (malicious) validators produce a message v : i[1] : .... : i[k], and broadcast it directly to some “victims” just before time T + k * D. The victims see the message as being “on time”, but when they rebroadcast it, it only reaches all honest consensus-participating nodes after T + k * D, and so all honest consensus-participating nodes reject it.

But we can plug this hole. We require D to be a bound on two times network latency plus clock disparity. We then put a different timeout on observers: an observer accepts v : i[1] : .... : i[k] before time T + (k - 0.5) * D. Now, suppose an observer sees a message an accepts it. They will be able to broadcast it to an honest node before time T + k * D, and the honest node will issue the message with their signature attached, which will reach all other observers before time T + (k + 0.5) * D, the timeout for messages with k+1 signatures.

Retrofitting onto other consensus algorithms

Suppose that we have some other consensus algorithm (eg. PBFT, Casper FFG, chain-based PoS) whose output can be seen by occasionally-online observers (we’ll call this the threshold-dependent consensus algorithm, as opposed to the algorithm above, which we’ll call the latency-dependent consensus algorithm). Suppose that the threshold-dependent consensus algorithm runs continuously, in a mode where it is constantly “finalizing” new blocks onto a chain (ie. each finalized value points to some previous finalized value as a “parent”; if there’s a sequence of pointers A -> ... -> B, we’ll call A a descendant of B). We can retrofit the latency-dependent algorithm onto this structure, giving always-online observers access to a kind of “strong finality” on checkpoints, with fault tolerance ~95% (you can push this arbitrarily close to 100% by adding more validators and requiring the process to take longer).

Every time the time reaches some multiple of 4096 seconds, we run the latency-dependent algorithm, choosing 512 random nodes to participate in the algorithm. A valid proposal is any valid chain of values that were finalized by the threshold-dependent algorithm. If a node sees some finalized value before time T + k * D (D = 8 seconds) with k signatures, it accepts the chain into its set of known chains and rebroadcasts it with its own signature added; observers use a threshold of T + (k - 0.5) * D as before.

The “choice” function used at the end is simple:

Finalized values that are not descendants of what was already agreed to be a finalized value in the previous round are ignored
Finalized values that are invalid are ignored
To choose between two valid finalized values, pick the one with the lower hash

If 5% of validators are honest, there is only a roughly 1 in 1 trillion chance that none of the 512 randomly selected nodes will be honest, and so as long as the network latency plus clock disparity is less than D/2 the above algorithm will work, correctly coordinating nodes on some single finalized value, even if multiple conflicting finalized values are presented because the fault tolerance of the threshold-dependent algorithm is broken.

If the fault tolerance of the threshold-dependent consensus algorithm is met (usually 50% or 67% honest), then the threshold-dependent consensus algorithm will either not finalize any new checkpoints, or it will finalize new checkpoints that are compatible with each other (eg. a series of checkpoints where each points to the previous as a parent), so even if network latency exceeds D/2 (or even D), and as a result nodes participating in the latency-dependent algorithm disagree on which value they accept, the values they accept are still guaranteed to be part of the same chain and so there is no actual disagreement. Once latency recovers back to normal in some future round, the latency-dependent consensus will get back “in sync”.

If the assumptions of both the threshold-dependent and latency-dependent consensus algorithms are broken at the same time (or in consecutive rounds), then the algorithm can break down. For example, suppose in one round, the threshold-dependent consensus finalizes Z -> Y -> X and the latency-dependent consensus disagrees between Y and X, and in the next round the threshold-dependent consensus finalizes a descendant W of X which is not a descendant of Y; in the latency-dependent consensus, the nodes who agreed Y will not accept W, but the nodes that agreed X will. However, this is unavoidable; the impossibility of safe-under-asynchrony consensus with more than 1/3 fault tolerance is a well known result in Byzantine fault tolerance theory, as is the impossibility of more than 1/2 fault tolerance even allowing synchrony assumptions but assuming offline observers.

See update 2018-08-29

One of the key tradeoffs in blockchain design is whether to build more functionality into base-layer blockchains themselves (“layer 1”), or to build it into protocols that live on top of the blockchain, and can be created and modified without changing the blockchain itself (“layer 2”). The tradeoff has so far shown itself most in the scaling debates, with block size increases (and sharding) on one side and layer-2 solutions like Plasma and channels on the other, and to some extent blockchain governance, with loss and theft recovery being solvable by either the DAO fork or generalizations thereof such as EIP 867, or by layer-2 solutions such as Reversible Ether (RETH). So which approach is ultimately better? Those who know me well, or have seen me out myself as a dirty centrist, know that I will inevitably say “some of both”. However, in the longer term, I do think that as blockchains become more and more mature, layer 1 will necessarily stabilize, and layer 2 will take on more and more of the burden of ongoing innovation and change.

There are several reasons why. The first is that layer 1 solutions require ongoing protocol change to happen at the base protocol layer, base layer protocol change requires governance, and it has still not been shown that, in the long term, highly “activist” blockchain governance can continue without causing ongoing political uncertainty or collapsing into centralization.

To take an example from another sphere, consider Moxie Marlinspike’s defense of Signal’s centralized and non-federated nature. A document by a company defending its right to maintain control over an ecosystem it depends on for its key business should of course be viewed with massive grains of salt, but one can still benefit from the arguments. Quoting:

One of the controversial things we did with Signal early on was to build it as an unfederated service. Nothing about any of the protocols we’ve developed requires centralization; it’s entirely possible to build a federated Signal Protocol-based messenger, but I no longer believe that it is possible to build a competitive federated messenger at all.

And:

Their retort was “that’s dumb, how far would the internet have gotten without interoperable protocols defined by 3rd parties?” I thought about it. We got to the first production version of IP, and have been trying for the past 20 years to switch to a second production version of IP with limited success. We got to HTTP version 1.1 in 1997, and have been stuck there until now. Likewise, SMTP, IRC, DNS, XMPP, are all similarly frozen in time circa the late 1990s. To answer his question, that’s how far the internet got. It got to the late 90s.
That has taken us pretty far, but it’s undeniable that once you federate your protocol, it becomes very difficult to make changes. And right now, at the application level, things that stand still don’t fare very well in a world where the ecosystem is moving … So long as federation means stasis while centralization means movement, federated protocols are going to have trouble existing in a software climate that demands movement as it does today.

At this point in time, and in the medium term going forward, it seems clear that decentralized application platforms, cryptocurrency payments, identity systems, reputation systems, decentralized exchange mechanisms, auctions, privacy solutions, programming languages that support privacy solutions, and most other interesting things that can be done on blockchains are spheres where there will continue to be significant and ongoing innovation. Decentralized application platforms often need continued reductions in confirmation time, payments need fast confirmations, low transaction costs, privacy, and many other built-in features, exchanges are appearing in many shapes and sizes including on-chain automated market makers, frequent batch auctions, combinatorial auctions and more. Hence, “building in” any of these into a base layer blockchain would be a bad idea, as it would create a high level of governance overhead as the platform would have to continually discuss, implement and coordinate newly discovered technical improvements. For the same reason federated messengers have a hard time getting off the ground without re-centralizing, blockchains would also need to choose between adopting activist governance, with the perils that entails, and falling behind newly appearing alternatives.

Even Ethereum’s limited level of application-specific functionality, precompiles, has seen some of this effect. Less than a year ago, Ethereum adopted the Byzantium hard fork, including operations to facilitate elliptic curve operations needed for ring signatures, ZK-SNARKs and other applications, using the alt-bn128 curve. Now, Zcash and other blockchains are moving toward BLS-12-381, and Ethereum would need to fork again to catch up. In part to avoid having similar problems in the future, the Ethereum community is looking to upgrade the EVM to E-WASM, a virtual machine that is sufficiently more efficient that there is far less need to incorporate application-specific precompiles.

But there is also a second argument in favor of layer 2 solutions, one that does not depend on speed of anticipated technical development: sometimes there are inevitable tradeoffs, with no single globally optimal solution. This is less easily visible in Ethereum 1.0-style blockchains, where there are certain models that are reasonably universal (eg. Ethereum’s account-based model is one). In sharded blockchains, however, one type of question that does not exist in Ethereum today crops up: how to do cross-shard transactions? That is, suppose that the blockchain state has regions A and B, where few or no nodes are processing both A and B. How does the system handle transactions that affect both A and B?

The current answer involves asynchronous cross-shard communication, which is sufficient for transferring assets and some other applications, but insufficient for many others. Synchronous operations (eg. to solve the train and hotel problem) can be bolted on top with cross-shard yanking, but this requires multiple rounds of cross-shard interaction, leading to significant delays. We can solve these problems with a synchronous execution scheme, but this comes with several tradeoffs:

The system cannot process more than one transaction for the same account per block
Transactions must declare in advance what shards and addresses they affect
There is a high risk of any given transaction failing (and still being required to pay fees!) if the transaction is only accepted in some of the shards that it affects but not others

It seems very likely that a better scheme can be developed, but it would be more complex, and may well have limitations that this scheme does not. There are known results preventing perfection; at the very least, Amdahl’s law puts a hard limit on the ability of some applications and some types of interaction to process more transactions per second through parallelization.

So how do we create an environment where better schemes can be tested and deployed? The answer is an idea that can be credited to Justin Drake: layer 2 execution engines. Users would be able to send assets into a “bridge contract”, which would calculate (using some indirect technique such as interactive verification or ZK-SNARKs) state roots using some alternative set of rules for processing the blockchain (think of this as equivalent to layer-two “meta-protocols” like Mastercoin/OMNI and Counterparty on top of Bitcoin, except because of the bridge contract these protocols would be able to handle assets whose “base ledger” is defined on the underlying protocol), and which would process withdrawals if and only if the alternative ruleset generates a withdrawal request.

Note that anyone can create a layer 2 execution engine at any time, different users can use different execution engines, and one can switch from one execution engine to any other, or to the base protocol, fairly quickly. The base blockchain no longer has to worry about being an optimal smart contract processing engine; it need only be a data availability layer with execution rules that are quasi-Turing-complete so that any layer 2 bridge contract can be built on top, and that allow basic operations to carry state between shards (in fact, only ETH transfers being fungible across shards is sufficient, but it takes very little effort to also allow cross-shard calls, so we may as well support them), but does not require complexity beyond that. Note also that layer 2 execution engines can have different state management rules than layer 1, eg. not having storage rent; anything goes, as it’s the responsibility of the users of that specific execution engine to make sure that it is sustainable, and if they fail to do so the consequences are contained to within the users of that particular execution engine.

In the long run, layer 1 would not be actively competing on all of these improvements; it would simply provide a stable platform for the layer 2 innovation to happen on top. Does this mean that, say, sharding is a bad idea, and we should keep the blockchain size and state small so that even 10 year old computers can process everyone’s transactions? Absolutely not. Even if execution engines are something that gets partially or fully moved to layer 2, consensus on data ordering and availability is still a highly generalizable and necessary function; to see how difficult layer 2 execution engines are without layer 1 scalable data availability consensus, see the difficulties in Plasma research, and its difficulty of naturally extending to fully general purpose blockchains, for an example. And if people want to throw a hundred megabytes per second of data into a system where they need consensus on availability, then we need a hundred megabytes per second of data availability consensus.

Additionally, layer 1 can still improve on reducing latency; if layer 1 is slow, the only strategy for achieving very low latency is state channels, which often have high capital requirements and can be difficult to generalize. State channels will always beat layer 1 blockchains in latency as state channels require only a single network message, but in those cases where state channels do not work well, layer 1 blockchains can still come closer than they do today.

Hence, the other extreme position, that blockchain base layers can be truly absolutely minimal, and not bother with either a quasi-Turing-complete execution engine or scalability to beyond the capacity of a single node, is also clearly false; there is a certain minimal level of complexity that is required for base layers to be powerful enough for applications to build on top of them, and we have not yet reached that level. Additional complexity is needed, though it should be chosen very carefully to make sure that it is maximally general purpose, and not targeted toward specific applications or technologies that will go out of fashion in two years due to loss of interest or better alternatives.

And even in the future base layers will need to continue to make some upgrades, especially if new technologies (eg. STARKs reaching higher levels of maturity) allow them to achieve stronger properties than they could before, though developers today can take care to make base layer platforms maximally forward-compatible with such potential improvements. So it will continue to be true that a balance between layer 1 and layer 2 improvements is needed to continue improving scalability, privacy and versatility, though layer 2 will continue to take up a larger and larger share of the innovation over time.

Update 2018.08.29: Justin Drake pointed out to me another good reason why some features may be best implemented on layer 1: those features are public goods, and so could not be efficiently or reliably funded with feature-specific use fees, and hence are best paid for by subsidies paid out of issuance or burned transaction fees. One possible example of this is secure random number generation.

Special thanks to Vlad Zamfir, Aditya Asgaonkar, Ameen Soleimani and Jinglan Wang for review

In order to help more people understand “the other Casper” (Vlad Zamfir’s CBC Casper), and specifically the instantiation that works best for blockchain protocols, I thought that I would write an explainer on it myself, from a less abstract and more “close to concrete usage” point of view. Vlad’s descriptions of CBC Casper can be found here and here and here; you are welcome and encouraged to look through these materials as well.

CBC Casper is designed to be fundamentally very versatile and abstract, and come to consensus on pretty much any data structure; you can use CBC to decide whether to choose 0 or 1, you can make a simple block-by-block chain run on top of CBC, or a 2⁹²-dimensional hypercube tangle DAG, and pretty much anything in between.

But for simplicity, we will first focus our attention on one concrete case: a simple chain-based structure. We will suppose that there is a fixed validator set consisting of N validators (a fancy word for “staking nodes”; we also assume that each node is staking the same amount of coins, cases where this is not true can be simulated by assigning some nodes multiple validator IDs), time is broken up into ten-second slots, and validator k can create a block in slot k, N + k, 2N + k, etc. Each block points to one specific parent block. Clearly, if we wanted to make something maximally simple, we could just take this structure, impose a longest chain rule on top of it, and call it a day.

The green chain is the longest chain (length 6) so it is considered to be the "canonical chain".

However, what we care about here is adding some notion of “finality” - the idea that some block can be so firmly established in the chain that it cannot be overtaken by a competing block unless a very large portion (eg. 1/4) of validators commit a uniquely attributable fault - act in some way which is clearly and cryptographically verifiably malicious. If a very large portion of validators do act maliciously to revert the block, proof of the misbehavior can be submitted to the chain to take away those validators’ entire deposits, making the reversion of finality extremely expensive (think hundreds of millions of dollars).

LMD GHOST

We will take this one step at a time. First, we replace the fork choice rule (the rule that chooses which chain among many possible choices is “the canonical chain”, ie. the chain that users should care about), moving away from the simple longest-chain-rule and instead using “latest message driven GHOST”. To show how LMD GHOST works, we will modify the above example. To make it more concrete, suppose the validator set has size 5, which we label A, B, C, D, E, so validator A makes the blocks at slots 0 and 5, validator B at slots 1 and 6, etc. A client evaluating the LMD GHOST fork choice rule cares only about the most recent (ie. highest-slot) message (ie. block) signed by each validator:

Latest messages in blue, slots from left to right (eg. A's block on the left is at slot 0, etc.)

Now, we will use only these messages as source data for the “greedy heaviest observed subtree” (GHOST) fork choice rule: start at the genesis block, then each time there is a fork choose the side where more of the latest messages support that block’s subtree (ie. more of the latest messages support either that block or one of its descendants), and keep doing this until you reach a block with no children. We can compute for each block the subset of latest messages that support either the block or one of its descendants:

Now, to compute the head, we start at the beginning, and then at each fork pick the higher number: first, pick the bottom chain as it has 4 latest messages supporting it versus 1 for the single-block top chain, then at the next fork support the middle chain. The result is the same longest chain as before. Indeed, in a well-running network (ie. the orphan rate is low), almost all of the time LMD GHOST and the longest chain rule will give the exact same answer. But in more extreme circumstances, this is not always true. For example, consider the following chain, with a more substantial three-block fork:

Scoring blocks by chain length. If we follow the longest chain rule, the top chain is longer, so the top chain wins.

Scoring blocks by number of supporting latest messages and using the GHOST rule (latest message from each validator shown in blue). The bottom chain has more recent support, so if we follow the LMD GHOST rule the bottom chain wins, though it's not yet clear which of the three blocks takes precedence.

The LMD GHOST approach is advantageous in part because it is better at extracting information in conditions of high latency. If two validators create two blocks with the same parent, they should really be both counted as cooperating votes for the parent block, even though they are at the same time competing votes for themselves. The longest chain rule fails to capture this nuance; GHOST-based rules do.

Detecting finality

But the LMD GHOST approach has another nice property: it’s sticky. For example, suppose that for two rounds, 4/5 of validators voted for the same chain (we’ll assume that the one of the five validators that did not, B, is attacking):

What would need to actually happen for the chain on top to become the canonical chain? Four of five validators built on top of E’s first block, and all four recognized that E had a high score in the LMD fork choice. Just by looking at the structure of the chain, we can know for a fact at least some of the messages that the validators must have seen at different times. Here is what we know about the four validators’ views:

A's view	C's view
D's view	E's view

Blocks produced by each validator in green, the latest messages we know that they saw from each of the other validators in blue.

Note that all four of the validators could have seen one or both of B’s blocks, and D and E could have seen C’s second block, making that the latest message in their views instead of C’s first block; however, the structure of the chain itself gives us no evidence that they actually did. Fortunately, as we will see below, this ambiguity does not matter for us.

A’s view contains four latest-messages supporting the bottom chain, and none supporting B’s block. Hence, in (our simulation of) A’s eyes the score in favor of the bottom chain is at least 4-1. The views of C, D and E paint a similar picture, with four latest-messages supporting the bottom chain. Hence, all four of the validators are in a position where they cannot change their minds unless two other validators change their minds first to bring the score to 2-3 in favor of B’s block.

Note that our simulation of the validators’ views is “out of date” in that, for example, it does not capture that D and E could have seen the more recent block by C. However, this does not alter the calculation for the top vs bottom chain, because we can very generally say that any validator’s new message will have the same opinion as their previous messages, unless two other validators have already switched sides first.

A minimal viable attack. A and C illegally switch over to support B's block (and can get penalized for this), giving it a 3-2 advantage, and at this point it becomes legal for D and E to also switch over.

Since fork choice rules such as LMD GHOST are sticky in this way, and clients can detect when the fork choice rule is “stuck on” a particular block, we can use this as a way of achieving asynchronously safe consensus.

Safety Oracles

Actually detecting all possible situations where the chain becomes stuck on some block (in CBC lingo, the block is “decided” or “safe”) is very difficult, but we can come up with a set of heuristics (“safety oracles”) which will help us detect some of the cases where this happens. The simplest of these is the clique oracle. If there exists some subset V of the validators making up portion p of the total validator set (with p > 1/2) that all make blocks supporting some block B and then make another round of blocks still supporting B that references their first round of blocks, then we can reason as follows:

Because of the two rounds of messaging, we know that this subset V all (i) support B (ii) know that B is well-supported, and so none of them can legally switch over unless enough others switch over first. For some competing B' to beat out B, the support such a B' can legally have is initially at most 1-p (everyone not part of the clique), and to win the LMD GHOST fork choice its support needs to get to 1/2, so at least 1/2 - (1-p) = p - 1/2 need to illegally switch over to get it to the point where the LMD GHOST rule supports B'.

As a specific case, note that the p=3/4 clique oracle offers a 1/4 level of safety, and a set of blocks satisfying the clique can (and in normal operation, will) be generated as long as 3/4 of nodes are online. Hence, in a BFT sense, the level of fault tolerance that can be reached using two-round clique oracles is 1/4, in terms of both liveness and safety.

This approach to consensus has many nice benefits. First of all, the short-term chain selection algorithm, and the “finality algorithm”, are not two awkwardly glued together distinct components, as they admittedly are in Casper FFG; rather, they are both part of the same coherent whole. Second, because safety detection is client-side, there is no need to choose any thresholds in-protocol; clients can decide for themselves what level of safety is sufficient to consider a block as finalized.

Going Further

CBC can be extended further in many ways. First, one can come up with other safety oracles; higher-round clique oracles can reach 1/3 fault tolerance. Second, we can add validator rotation mechanisms. The simplest is to allow the validator set to change by a small percentage every time the q=3/4 clique oracle is satisfied, but there are other things that we can do as well. Third, we can go beyond chain-like structures, and instead look at structures that increase the density of messages per unit time, like the Serenity beacon chain’s attestation structure:

In this case, it becomes worthwhile to separate attestations from blocks; a block is an object that actually grows the underlying DAG, whereas an attestation contributes to the fork choice rule. In the Serenity beacon chain spec, each block may have hundreds of attestations corresponding to it. However, regardless of which way you do it, the core logic of CBC Casper remains the same.

To make CBC Casper’s safety “cryptoeconomically enforceable”, we need to add validity and slashing conditions. First, we’ll start with the validity rule. A block contains both a parent block and a set of attestations that it knows about that are not yet part of the chain (similar to “uncles” in the current Ethereum PoW chain). For the block to be valid, the block’s parent must be the result of executing the LMD GHOST fork choice rule given the information included in the chain including in the block itself.

Dotted lines are uncle links, eg. when E creates a block, E notices that C is not yet part of the chain, and so includes a reference to C.

We now can make CBC Casper safe with only one slashing condition: you cannot make two attestations M1 and M2, unless either M1 is in the chain that M2 is attesting to or M2 is in the chain that M1 is attesting to.

Not OK

The validity and slashing conditions are relatively easy to describe, though actually implementing them requires checking hash chains and executing fork choice rules in-consensus, so it is not nearly as simple as taking two messages and checking a couple of inequalities between the numbers that these messages commit to, as you can do in Casper FFG for the NO_SURROUND and NO_DBL_VOTEslashing conditions.

Liveness in CBC Casper piggybacks off of the liveness of whatever the underlying chain algorithm is (eg. if it’s one-block-per-slot, then it depends on a synchrony assumption that all nodes will see everything produced in slot N before the start of slot N+1). It’s not possible to get “stuck” in such a way that one cannot make progress; it’s possible to get to the point of finalizing new blocks from any situation, even one where there are attackers and/or network latency is higher than that required by the underlying chain algorithm.

Suppose that at some time T, the network “calms down” and synchrony assumptions are once again satisfied. Then, everyone will converge on the same view of the chain, with the same head H. From there, validators will begin to sign messages supporting H or descendants of H. From there, the chain can proceed smoothly, and will eventually satisfy a clique oracle, at which point H becomes finalized.

Chaotic network due to high latency.

Network latency subsides, a majority of validators see all of the same blocks or at least enough of them to get to the same head when executing the fork choice, and start building on the head, further reinforcing its advantage in the fork choice rule.

Chain proceeds "peacefully" at low latency. Soon, a clique oracle will be satisfied.

That’s all there is to it! Implementation-wise, CBC may arguably be considerably more complex than FFG, but in terms of ability to reason about the protocol, and the properties that it provides, it’s surprisingly simple.

Special thanks to Glen Weyl, Phil Daian and Jinglan Wang for review

Over the last few years there has been an increasing interest in using deliberately engineered economic incentives and mechanism design to align behavior of participants in various contexts. In the blockchain space, mechanism design first and foremost provides the security for the blockchain itself, encouraging miners or proof of stake validators to participate honestly, but more recently it is being applied in prediction markets, “token curated registries” and many other contexts. The nascent RadicalXChange movement has meanwhile spawned experimentation with Harberger taxes, quadratic voting, quadratic financing and more. More recently, there has also been growing interest in using token-based incentives to try to encourage quality posts in social media. However, as development of these systems moves closer from theory to practice, there are a number of challenges that need to be addressed, challenges that I would argue have not yet been adequately confronted.

As a recent example of this move from theory toward deployment, Bihu, a Chinese platform that has recently released a coin-based mechanism for encouraging people to write posts. The basic mechanism (see whitepaper in Chinese here) is that if a user of the platform holds KEY tokens, they have the ability to stake those KEY tokens on articles; every user can make k“upvotes” per day, and the “weight” of each upvote is proportional to the stake of the user making the upvote. Articles with a greater quantity of stake upvoting them appear more prominently, and the author of an article gets a reward of KEY tokens roughly proportional to the quantity of KEY upvoting that article. This is an oversimplification and the actual mechanism has some nonlinearities baked into it, but they are not essential to the basic functioning of the mechanism. KEY has value because it can be used in various ways inside the platform, but particularly a percentage of all ad revenues get used to buy and burn KEY (yay, big thumbs up to them for doing this and not making yet another medium of exchange token!).

This kind of design is far from unique; incentivizing online content creation is something that very many people care about, and there have been many designs of a similar character, as well as some fairly different designs. And in this case this particular platform is already being used significantly:

A few months ago, the Ethereum trading subreddit /r/ethtrader introduced a somewhat similar experimental feature where a token called “donuts” is issued to users that make comments that get upvoted, with a set amount of donuts issued weekly to users in proportion to how many upvotes their comments received. The donuts could be used to buy the right to set the contents of the banner at the top of the subreddit, and could also be used to vote in community polls. However, unlike what happens in the KEY system, here the reward that B receives when A upvotes B is not proportional to A’s existing coin supply; instead, each Reddit account has an equal ability to contribute to other Reddit accounts.

These kinds of experiments, attempting to reward quality content creation in a way that goes beyond the known limitations of donations/microtipping, are very valuable; under-compensation of user-generated internet content is a very significant problem in society in general (see “liberal radicalism” and “data as labor”), and it’s heartening to see crypto communities attempting to use the power of mechanism design to make inroads on solving it. But unfortunately, these systems are also vulnerable to attack.

Self-voting, plutocracy and bribes

Here is how one might economically attack the design proposed above. Suppose that some wealthy user acquires some quantity N of tokens, and as a result each of the user’s k upvotes gives the recipient a reward of N * q (q here probably being a very small number, eg. think q = 0.000001). The user simply upvotes their own sockpuppet accounts, giving themselves the reward of N * k * q. Then, the system simply collapses into each user having an “interest rate” of k * q per period, and the mechanism accomplishes nothing else.

The actual Bihu mechanism seemed to anticipate this, and has some superlinear logic where articles with more KEY upvoting them gain a disproportionately greater reward, seemingly to encourage upvoting popular posts rather than self-upvoting. It’s a common pattern among coin voting governance systems to add this kind of superlinearity to prevent self-voting from undermining the entire system; most DPOS schemes have a limited number of delegate slots with zero rewards for anyone who does not get enough votes to join one of the slots, with similar effect. But these schemes invariably introduce two new weaknesses:

They subsidize plutocracy, as very wealthy individuals and cartels can still get enough funds to self-upvote.
They can be circumvented by users bribing other users to vote for them en masse.

Bribing attacks may sound farfetched (who here has ever accepted a bribe in real life?), but in a mature ecosystem they are much more realistic than they seem. In most contexts where bribing has taken place in the blockchain space, the operators use a euphemistic new name to give the concept a friendly face: it’s not a bribe, it’s a “staking pool” that “shares dividends”. Bribes can even be obfuscated: imagine a cryptocurrency exchange that offers zero fees and spends the effort to make an abnormally good user interface, and does not even try to collect a profit; instead, it uses coins that users deposit to participate in various coin voting systems. There will also inevitably be people that see in-group collusion as just plain normal; see a recent scandal involving EOS DPOS for one example:

Finally, there is the possibility of a “negative bribe”, ie. blackmail or coercion, threatening participants with harm unless they act inside the mechanism in a certain way.

In the /r/ethtrader experiment, fear of people coming in and buying donuts to shift governance polls led to the community deciding to make only locked (ie. untradeable) donuts eligible for use in voting. But there’s an even cheaper attack than buying donuts (an attack that can be thought of as a kind of obfuscated bribe): renting them. If an attacker is already holding ETH, they can use it as collateral on a platform like Compound to take out a loan of some token, giving you the full right to use that token for whatever purpose including participating in votes, and when they’re done they simply send the tokens back to the loan contract to get their collateral back - all without having to endure even a second of price exposure to the token that they just used to swing a coin vote, even if the coin vote mechanism includes a time lockup (as eg. Bihu does). In every case, issues around bribing, and accidentally over-empowering well-connected and wealthy participants, prove surprisingly difficult to avoid.

Identity

Some systems attempt to mitigate the plutocratic aspects of coin voting by making use of an identity system. In the case of the /r/ethtrader donut system, for example, although governance polls are done via coin vote, the mechanism that determines how many donuts (ie. coins) you get in the first place is based on Reddit accounts: 1 upvote from 1 Reddit account = N donuts earned. The ideal goal of an identity system is to make it relatively easy for individuals to get one identity, but relatively difficult to get many identities. In the /r/ethtrader donut system, that’s Reddit accounts, in the Gitcoin CLR matching gadget, it’s Github accounts that are used for the same purpose. But identity, at least the way it has been implemented so far, is a fragile thing….

Oh, are you too lazy to make a big rack of phones? Well maybe you’re looking for this:

Usual warning about how sketchy sites may or may not scam you, do your own research, etc. etc. applies.

Arguably, attacking these mechanisms by simply controlling thousands of fake identities like a puppetmaster is even easier than having to go through the trouble of bribing people. And if you think the response is to just increase security to go up to government-level IDs? Well, if you want to get a few of those you can start exploring here, but keep in mind that there are specialized criminal organizations that are well ahead of you, and even if all the underground ones are taken down, hostile governments are definitely going to create fake passports by the millions if we’re stupid enough to create systems that make that sort of activity profitable. And this doesn’t even begin to mention attacks in the opposite direction, identity-issuing institutions attempting to disempower marginalized communities by denying them identity documents…

Collusion

Given that so many mechanisms seem to fail in such similar ways once multiple identities or even liquid markets get into the picture, one might ask, is there some deep common strand that causes all of these issues? I would argue the answer is yes, and the “common strand” is this: it is much harder, and more likely to be outright impossible, to make mechanisms that maintain desirable properties in a model where participants can collude, than in a model where they can’t. Most people likely already have some intuition about this; specific instances of this principle are behind well-established norms and often laws promoting competitive markets and restricting price-fixing cartels, vote buying and selling, and bribery. But the issue is much deeper and more general.

In the version of game theory that focuses on individual choice - that is, the version that assumes that each participant makes decisions independently and that does not allow for the possibility of groups of agents working as one for their mutual benefit, there are mathematical proofs that at least one stable Nash equilibrium must exist in any game, and mechanism designers have a very wide latitude to “engineer” games to achieve specific outcomes. But in the version of game theory that allows for the possibility of coalitions working together, called cooperative game theory, there are large classes of games that do not have any stable outcome that a coalition cannot profitably deviate from.

Majority games, formally described as games of N agents where any subset of more than half of them can capture a fixed reward and split it among themselves, a setup eerily similar to many situations in corporate governance, politics and many other situations in human life, are part of that set of inherently unstable games. That is to say, if there is a situation with some fixed pool of resources and some currently established mechanism for distributing those resources, and it’s unavoidably possible for 51% of the participants can conspire to seize control of the resources, no matter what the current configuration is there is always some conspiracy that can emerge that would be profitable for the participants. However, that conspiracy would then in turn be vulnerable to potential new conspiracies, possibly including a combination of previous conspirators and victims… and so on and so forth.

Round	A	B	C
1	1/3	1/3	1/3
2	1/2	1/2	0
3	2/3	0	1/3
4	0	1/3	2/3

This fact, the instability of majority games under cooperative game theory, is arguably highly underrated as a simplified general mathematical model of why there may well be no “end of history” in politics and no system that proves fully satisfactory; I personally believe it’s much more useful than the more famous Arrow’s theorem, for example.

There are two ways to get around this issue. The first is to try to restrict ourselves to the class of games that are“identity-free” and “collusion-safe”, so where we do not need to worry about either bribes or identities. The second is to try to attack the identity and collusion resistance problems directly, and actually solve them well enough that we can implement non-collusion-safe games with the richer properties that they offer.

Identity-free and collusion-safe game design

The class of games that is identity-free and collusion-safe is substantial. Even proof of work is collusion-safe up to the bound of a single actor having ~23.21% of total hashpower, and this bound can be increased up to 50% with clever engineering. Competitive markets are reasonably collusion-safe up until a relatively high bound, which is easily reached in some cases but in other cases is not.

In the case of governance and content curation (both of which are really just special cases of the general problem of identifying public goods and public bads) a major class of mechanism that works well is futarchy - typically portrayed as “governance by prediction market”, though I would also argue that the use of security deposits is fundamentally in the same class of technique. The way futarchy mechanisms, in their most general form, work is that they make “voting” not just an expression of opinion, but also a prediction, with a reward for making predictions that are true and a penalty for making predictions that are false. For example, my proposal for “prediction markets for content curation DAOs” suggests a semi-centralized design where anyone can upvote or downvote submitted content, with content that is upvoted more being more visible, where there is also a “moderation panel” that makes final decisions. For each post, there is a small probability (proportional to the total volume of upvotes+downvotes on that post) that the moderation panel will be called on to make a final decision on the post. If the moderation panel approves a post, everyone who upvoted it is rewarded and everyone who downvoted it is penalized, and if the moderation panel disapproves a post the reverse happens; this mechanism encourages participants to make upvotes and downvotes that try to “predict” the moderation panel’s judgements.

Another possible example of futarchy is a governance system for a project with a token, where anyone who votes for a decision is obligated to purchase some quantity of tokens at the price at the time the vote begins if the vote wins; this ensures that voting on a bad decision is costly, and in the limit if a bad decision wins a vote everyone who approved the decision must essentially buy out everyone else in the project. This ensures that an individual vote for a “wrong” decision can be very costly for the voter, precluding the possibility of cheap bribe attacks.

A graphical description of one form of futarchy, creating two markets representing the two "possible future worlds" and picking the one with a more favorable price. Source this post on ethresear.ch

However, that range of things that mechanisms of this type can do is limited. In the case of the content curation example above, we’re not really solving governance, we’re just scaling the functionality of a governance gadget that is already assumed to be trusted. One could try to replace the moderation panel with a prediction market on the price of a token representing the right to purchase advertising space, but in practice prices are too noisy an indicator to make this viable for anything but a very small number of very large decisions. And often the value that we’re trying to maximize is explicitly something other than maximum value of a coin.

Let’s take a more explicit look at why, in the more general case where we can’t easily determine the value of a governance decision via its impact on the price of a token, good mechanisms for identifying public goods and bads unfortunately cannot be identity-free or collusion-safe. If one tries to preserve the property of a game being identity-free, building a system where identities don’t matter and only coins do, there is an impossible tradeoff between either failing to incentivize legitimate public goods or over-subsidizing plutocracy.

The argument is as follows. Suppose that there is some author that is producing a public good (eg. a series of blog posts) that provides value to each member of a community of 10000 people. Suppose there exists some mechanism where members of the community can take an action that causes the author to receive a gain of $1. Unless the community members are extremely altruistic, for the mechanism to work the cost of taking this action must be much lower than $1, as otherwise the portion of the benefit captured by the member of the community supporting the author would be much smaller than the cost of supporting the author, and so the system collapses into a tragedy of the commons where no one supports the author. Hence, there must exist a way to cause the author to earn $1 at a cost much less than $1. But now suppose that there is also a fake community, which consists of 10000 fake sockpuppet accounts of the same wealthy attacker. This community takes all of the same actions as the real community, except instead of supporting the author, they support another fake account which is also a sockpuppet of the attacker. If it was possible for a member of the “real community” to give the author $1 at a personal cost of much less than $1, it’s possible for the attacker to give themselves $1 at a cost much less than $1 over and over again, and thereby drain the system’s funding. Any mechanism that can help genuinely under-coordinated parties coordinate will, without the right safeguards, also help already coordinated parties (such as many accounts controlled by the same person) over-coordinate, extracting money from the system.

A similar challenge arises when the goal is not funding, but rather determining what content should be most visible. What content do you think would get more dollar value supporting it: a legitimately high quality blog article benefiting thousands of people but benefiting each individual person relatively slightly, or this?

Or perhaps this?

Those who have been following recent politics “in the real world” might also point out a different kind of content that benefits highly centralized actors: social media manipulation by hostile governments. Ultimately, both centralized systems and decentralized systems are facing the same fundamental problem, which is that the “marketplace of ideas” (and of public goods more generally) is very far from an “efficient market” in the sense that economists normally use the term, and this leads to both underproduction of public goods even in “peacetime” but also vulnerability to active attacks. It’s just a hard problem.

This is also why coin-based voting systems (like Bihu’s) have one major genuine advantage over identity-based systems (like the Gitcoin CLR or the /r/ethtrader donut experiment): at least there is no benefit to buying accounts en masse, because everything you do is proportional to how many coins you have, regardless of how many accounts the coins are split between. However, mechanisms that do not rely on any model of identity and only rely on coins fundamentally cannot solve the problem of concentrated interests outcompeting dispersed communities trying to support public goods; an identity-free mechanism that empowers distributed communities cannot avoid over-empowering centralized plutocrats pretending to be distributed communities.

But it’s not just identity issues that public goods games are vulnerable too; it’s also bribes. To see why, consider again the example above, but where instead of the “fake community” being 10001 sockpuppets of the attacker, the attacker only has one identity, the account receiving funding, and the other 10000 accounts are real users - but users that receive a bribe of $0.01 each to take the action that would cause the attacker to gain an additional $1. As mentioned above, these bribes can be highly obfuscated, even through third-party custodial services that vote on a user’s behalf in exchange for convenience, and in the case of “coin vote” designs an obfuscated bribe is even easier: one can do it by renting coins on the market and using them to participate in votes. Hence, while some kinds of games, particularly prediction market or security deposit based games, can be made collusion-safe and identity-free, generalized public goods funding seems to be a class of problem where collusion-safe and identity-free approaches unfortunately just cannot be made to work.

Collusion resistance and identity

The other alternative is attacking the identity problem head-on. As mentioned above, simply going up to higher-security centralized identity systems, like passports and other government IDs, will not work at scale; in a sufficiently incentivized context, they are very insecure and vulnerable to the issuing governments themselves! Rather, the kind of “identity” we are talking about here is some kind of robust multifactorial set of claims that an actor identified by some set of messages actually is a unique individual. A very early proto-model of this kind of networked identity is arguably social recovery in HTC’s blockchain phone:

The basic idea is that your private key is secret-shared between up to five trusted contacts, in such a way that mathematically ensures that three of them can recover the original key, but two or fewer can’t. This qualifies as an “identity system” - it’s your five friends determining whether or not someone trying to recover your account actually is you. However, it’s a special-purpose identity system trying to solve a problem - personal account security - that is different from (and easier than!) the problem of attempting to identify unique humans. That said, the general model of individuals making claims about each other can quite possibly be bootstrapped into some kind of more robust identity model. These systems could be augmented if desired using the “futarchy” mechanic described above: if someone makes a claim that someone is a unique human, and someone else disagrees, and both sides are willing to put down a bond to litigate the issue, the system can call together a judgement panel to determine who is right.

But we also want another crucially important property: we want an identity that you cannot credibly rent or sell. Obviously, we can’t prevent people from making a deal “you send me $50, I’ll send you my key”, but what we can try to do is prevent such deals from being credible - make it so that the seller can easily cheat the buyer and give the buyer a key that doesn’t actually work. One way to do this is to make a mechanism by which the owner of a key can send a transaction that revokes the key and replaces it with another key of the owner’s choice, all in a way that cannot be proven. Perhaps the simplest way to get around this is to either use a trusted party that runs the computation and only publishes results (along with zero knowledge proofs proving the results, so the trusted party is trusted only for privacy, not integrity), or decentralize the same functionality through multi-party computation. Such approaches will not solve collusion completely; a group of friends could still come together and sit on the same couch and coordinate votes, but they will at least reduce it to a manageable extent that will not lead to these systems outright failing.

There is a further problem: initial distribution of the key. What happens if a user creates their identity inside a third-party custodial service that then stores the private key and uses it to clandestinely make votes on things? This would be an implicit bribe, the user’s voting power in exchange for providing to the user a convenient service, and what’s more, if the system is secure in that it successfully prevents bribes by making votes unprovable, clandestine voting by third-party hosts would also be undetectable. The only approach that gets around this problem seems to be…. in-person verification. For example, one could have an ecosystem of “issuers” where each issuer issues smart cards with private keys, which the user can immediately download onto their smartphone and send a message to replace the key with a different key that they do not reveal to anyone. These issuers could be meetups and conferences, or potentially individuals that have already been deemed by some voting mechanic to be trustworthy.

Building out the infrastructure for making collusion-resistant mechanisms possible, including robust decentralized identity systems, is a difficult challenge, but if we want to unlock the potential of such mechanisms, it seems unavoidable that we have to do our best to try. It is true that the current computer-security dogma around, for example, introducing online voting is simply “don’t”, but if we want to expand the role of voting-like mechanisms, including more advanced forms such as quadratic voting and quadratic finance, to more roles, we have no choice but to confront the challenge head-on, try really hard, and hopefully succeed at making something secure enough, for at least some use cases.

“A statement may be both true and dangerous. The previous sentence is such a statement.” - David Friedman

Freedom of speech is a topic that many internet communities have struggled with over the last two decades. Cryptocurrency and blockchain communities, a major part of their raison d’etre being censorship resistance, are especially poised to value free speech very highly, and yet, over the last few years, the extremely rapid growth of these communities and the very high financial and social stakes involved have repeatedly tested the application and the limits of the concept. In this post, I aim to disentangle some of the contradictions, and make a case what the norm of “free speech” really stands for.

“Free speech laws” vs “free speech”

A common, and in my own view frustrating, argument that I often hear is that “freedom of speech” is exclusively a legal restriction on what governments can act against, and has nothing to say regarding the actions of private entities such as corporations, privately-owned platforms, internet forums and conferences. One of the larger examples of “private censorship” in cryptocurrency communities was the decision of Theymos, the moderator of the /r/bitcoin subreddit, to start heavily moderating the subreddit, forbidding arguments in favor of increasing the Bitcoin blockchain’s transaction capacity via a hard fork.

Here is a timeline of the censorship as catalogued by John Blocke: https://medium.com/@johnblocke/a-brief-and-incomplete-history-of-censorship-in-r-bitcoin-c85a290fe43

Here is Theymos’s post defending his policies: https://www.reddit.com/r/Bitcoin/comments/3h9cq4/its_time_for_a_break_about_the_recent_mess/, including the now infamous line “If 90% of /r/Bitcoin users find these policies to be intolerable, then I want these 90% of /r/Bitcoin users to leave”.

A common strategy used by defenders of Theymos’s censorship was to say that heavy-handed moderation is okay because /r/bitcoin is “a private forum” owned by Theymos, and so he has the right to do whatever he wants in it; those who dislike it should move to other forums:

And it’s true that Theymos has not broken any laws by moderating his forum in this way. But to most people, it’s clear that there is still some kind of free speech violation going on. So what gives? First of all, it’s crucially important to recognize that freedom of speech is not just a law in some countries. It’s also a social principle. And the underlying goal of the social principle is the same as the underlying goal of the law: to foster an environment where the ideas that win are ideas that are good, rather than just ideas that happen to be favored by people in a position of power. And governmental power is not the only kind of power that we need to protect from; there is also a corporation’s power to fire someone, an internet forum moderator’s power to delete almost every post in a discussion thread, and many other kinds of power hard and soft.

So what is the underlying social principle here? Quoting Eliezer Yudkowsky:

There are a very few injunctions in the human art of rationality that have no ifs, ands, buts, or escape clauses. This is one of them. Bad argument gets counterargument. Does not get bullet. Never. Never ever never for ever.

Slatestarcodex elaborates:

What does “bullet” mean in the quote above? Are other projectiles covered? Arrows? Boulders launched from catapults? What about melee weapons like swords or maces? Where exactly do we draw the line for “inappropriate responses to an argument”? A good response to an argument is one that addresses an idea; a bad argument is one that silences it. If you try to address an idea, your success depends on how good the idea is; if you try to silence it, your success depends on how powerful you are and how many pitchforks and torches you can provide on short notice. Shooting bullets is a good way to silence an idea without addressing it. So is firing stones from catapults, or slicing people open with swords, or gathering a pitchfork-wielding mob. But trying to get someone fired for holding an idea is also a way of silencing an idea without addressing it.

That said, sometimes there is a rationale for “safe spaces” where people who, for whatever reason, just don’t want to deal with arguments of a particular type, can congregate and where those arguments actually do get silenced. Perhaps the most innocuous of all is spaces like ethresear.ch where posts get silenced just for being “off topic” to keep the discussion focused. But there’s also a dark side to the concept of “safe spaces”; as Ken White writes:

This may come as a surprise, but I’m a supporter of ‘safe spaces.’ I support safe spaces because I support freedom of association. Safe spaces, if designed in a principled way, are just an application of that freedom… But not everyone imagines “safe spaces” like that. Some use the concept of “safe spaces” as a sword, wielded to annex public spaces and demand that people within those spaces conform to their private norms. That’s not freedom of association

Aha. So making your own safe space off in a corner is totally fine, but there is also this concept of a “public space”, and trying to turn a public space into a safe space for one particular special interest is wrong. So what is a “public space”? It’s definitely clear that a public space is not just “a space owned and/or run by a government”; the concept of privately owned public spaces is a well-established one. This is true even informally: it’s a common moral intuition, for example, that it’s less bad for a private individual to commit violations such as discriminating against races and genders than it is for, say, a shopping mall to do the same. In the case or the /r/bitcoin subreddit, one can make the case, regardless of who technically owns the top moderator position in the subreddit, that the subreddit very much is a public space. A few arguments particularly stand out:

It occupies “prime real estate”, specifically the word “bitcoin”, which makes people consider it to be the default place to discuss Bitcoin.
The value of the space was created not just by Theymos, but by thousands of people who arrived on the subreddit to discuss Bitcoin with an implicit expectation that it is, and will continue, to be a public space for discussing Bitcoin.
Theymos’s shift in policy was a surprise to many people, and it was not foreseeable ahead of time that it would take place.

If, instead, Theymos had created a subreddit called /r/bitcoinsmallblockers, and explicitly said that it was a curated space for small block proponents and attempting to instigate controversial hard forks was not welcome, then it seems likely that very few people would have seen anything wrong about this. They would have opposed his ideology, but few (at least in blockchain communities) would try to claim that it’s improper for people with ideologies opposed to their own to have spaces for internal discussion. But back in reality, Theymos tried to “annex a public space and demand that people within the space confirm to his private norms”, and so we have the Bitcoin community block size schism, a highly acrimonious fork and chain split, and now a cold peace between Bitcoin and Bitcoin Cash.

Deplatforming

About a year ago at Deconomy I publicly shouted down Craig Wright, a scammer claiming to be Satoshi Nakamoto, finishing my explanation of why the things he says make no sense with the question “why is this fraud allowed to speak at this conference?”

Of course, Craig Wright’s partisans replied back with…. accusations of censorship:

Did I try to “silence” Craig Wright? I would argue, no. One could argue that this is because “Deconomy is not a public space”, but I think the much better argument is that a conference is fundamentally different from an internet forum. An internet forum can actually try to be a fully neutral medium for discussion where anything goes; a conference, on the other hand, is by its very nature a highly curated list of presentations, allocating a limited number of speaking slots and actively channeling a large amount of attention to those lucky enough to get a chance to speak. A conference is an editorial act by the organizers, saying “here are some ideas and views that we think people really should be exposed to and hear”. Every conference “censors” almost every viewpoint because there’s not enough space to give them all a chance to speak, and this is inherent to the format; so raising an objection to a conference’s judgement in making its selections is absolutely a legitimate act.

This extends to other kinds of selective platforms. Online platforms such as Facebook, Twitter and Youtube already engage in active selection through algorithms that influence what people are more likely to be recommended. Typically, they do this for selfish reasons, setting up their algorithms to maximize “engagement” with their platform, often with unintended byproducts like promoting flat earth conspiracy theories. So given that these platforms are already engaging in (automated) selective presentation, it seems eminently reasonable to criticize them for not directing these same levers toward more pro-social objectives, or at the least pro-social objectives that all major reasonable political tribes agree on (eg. quality intellectual discourse). Additionally, the “censorship” doesn’t seriously block anyone’s ability to learn Craig Wright’s side of the story; you can just go visit their website, here you go: https://coingeek.com/. If someone is already operating a platform that makes editorial decisions, asking them to make such decisions with the same magnitude but with more pro-social criteria seems like a very reasonable thing to do.

A more recent example of this principle at work is the #DelistBSV campaign, where some cryptocurrency exchanges, most famously Binance, removed support for trading BSV (the Bitcoin fork promoted by Craig Weight). Once again, many people, even reasonable people, accused this campaign of being an exercise in censorship, raising parallels to credit card companies blocking Wikileaks:

I personally have been a critic of the power wielded by centralized exchanges. Should I oppose #DelistBSV on free speech grounds? I would argue no, it’s ok to support it, but this is definitely a much closer call.

Many #DelistBSV participants like Kraken are definitely not “anything-goes” platforms; they already make many editorial decisions about which currencies they accept and refuse. Kraken only accepts about a dozen currencies, so they are passively “censoring” almost everyone. Shapeshift supports more currencies but it does not support SPANK, or even KNC. So in these two cases, delisting BSV is more like reallocation of a scarce resource (attention/legitimacy) than it is censorship. Binance is a bit different; it does accept a very large array of cryptocurrencies, adopting a philosophy much closer to anything-goes, and it does have a unique position as market leader with a lot of liquidity.

That said, one can argue two things in Binance’s favor. First of all, censorship is retaliating against a truly malicious exercise of censorship on the part of core BSV community members when they threatened critics like Peter McCormack with legal letters (see Peter’s response); in “anarchic” environments with large disagreements on what the norms are, “an eye for an eye” in-kind retaliation is one of the better social norms to have because it ensures that people only face punishments that they in some sense have through their own actions demonstrated they believe are legitimate. Furthermore, the delistings won’t make it that hard for people to buy or sell BSV; Coinex has said that they will not delist (and I would actually oppose second-tier “anything-goes” exchanges delisting). But the delistings do send a strong message of social condemnation of BSV, which is useful and needed. So there’s a case to support all delistings so far, though on reflection Binance refusing to delist “because freedom” would have also been not as unreasonable as it seems at first glance.

It’s in general absolutely potentially reasonable to oppose the existence of a concentration of power, but support that concentration of power being used for purposes that you consider prosocial as long as that concentration exists; see Bryan Caplan’s exposition on reconciling supporting open borders and also supporting anti-ebola restrictions for an example in a different field. Opposing concentrations of power only requires that one believe those concentrations of power to be on balance harmful and abusive; it does not mean that one must oppose all things that those concentrations of power do.

If someone manages to make a completely permissionless cross-chain decentralized exchange that facilitates trade between any asset and any other asset, then being “listed” on the exchange would not send a social signal, because everyone is listed; and I would support such an exchange existing even if it supports trading BSV. The thing that I do support is BSV being removed from already exclusive positions that confer higher tiers of legitimacy than simple existence.

So to conclude: censorship in public spaces bad, even if the public spaces are non-governmental; censorship in genuinely private spaces (especially spaces that are not“defaults” for a broader community) can be okay; ostracizing projects with the goal and effect of denying access to them, bad; ostracizing projects with the goal and effect of denying them scarce legitimacy can be okay.

The regulatory and legal environment around internet-based services and applications has changed considerably over the last decade. When large-scale social networking platforms first became popular in the 2000s, the general attitude toward mass data collection was essentially “why not?”. This was the age of Mark Zuckerberg saying the age of privacy is over and Eric Schmidt arguing, “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” And it made personal sense for them to argue this: every bit of data you can get about others was a potential machine learning advantage for you, every single restriction a weakness, and if something happened to that data, the costs were relatively minor. Ten years later, things are very different.

It is especially worth zooming in on a few particular trends.

Privacy. Over the last ten years, a number of privacy laws have been passed, most aggressively in Europe but also elsewhere, but the most recent is the GDPR. The GDPR has many parts, but among the most prominent are: (i) requirements for explicit consent, (ii) requirement to have a legal basis to process data, (iii) users’ right to download all their data, (iv) users’ right to require you to delete all their data. Other jurisdictions are exploring similar rules.
Data localization rules. India, Russia and many other jurisdictions increasingly have or are exploring rules that require data on users within the country to be stored inside the country. And even when explicit laws do not exist, there’s a growing shift toward concern (eg. 1 2) around data being moved to countries that are perceived to not sufficiently protect it.
Sharing economy regulation. Sharing economy companies such as Uber are having a hard time arguing to courts that, given the extent to which their applications control and direct drivers’ activity, they should not be legally classified as employers.
Cryptocurrency regulation. A recent FINCEN guidance attempts to clarify what categories of cryptocurrency-related activity are and are not subject to regulatory licensing requirements in the United States. Running a hosted wallet? Regulated. Running a wallet where the user controls their funds? Not regulated. Running an anonymizing mixing service? If you’re running it, regulated. If you’re just writing code… not regulated.

As Emin Gun Sirer points out, the FINCEN cryptocurrency guidance is not at all haphazard; rather, it’s trying to separate out categories of applications where the developer is actively controlling funds, from applications where the developer has no control. The guidance carefully separates out how multisignature wallets, where keys are held both by the operator and the user, are sometimes regulated and sometimes not:

If the multiple-signature wallet provider restricts its role to creating un-hosted wallets that require adding a second authorization key to the wallet owner’s private key in order to validate and complete transactions, the provider is not a money transmitter because it does not accept and transmit value. On the other hand, if … the value is represented as an entry in the accounts of the provider, the owner does not interact with the payment system directly, or the provider maintains total independent control of the value, the provider will also qualify as a money transmitter.

Although these events are taking place across a variety of contexts and industries, I would argue that there is a common trend at play. And the trend is this: control over users’ data and digital possessions and activity is rapidly moving from an asset to a liability. Before, every bit of control you have was good: it gives you more flexibility to earn revenue, if not now then in the future. Now, every bit of control you have is a liability: you might be regulated because of it. If you exhibit control over your users’ cryptocurrency, you are a money transmitter. If you have “sole discretion over fares, and can charge drivers a cancellation fee if they choose not to take a ride, prohibit drivers from picking up passengers not using the app and suspend or deactivate drivers’ accounts”, you are an employer. If you control your users’ data, you’re required to make sure you can argue just cause, have a compliance officer, and give your users access to download or delete the data.

If you are an application builder, and you are both lazy and fear legal trouble, there is one easy way to make sure that you violate none of the above new rules: don’t build applications that centralize control. If you build a wallet where the user holds their private keys, you really are still “just a software provider”. If you build a “decentralized Uber” that really is just a slick UI combining a payment system, a reputation system and a search engine, and don’t control the components yourself, you really won’t get hit by many of the same legal issues. If you build a website that just… doesn’t collect data (Static web pages? But that’s impossible!) you don’t have to even think about the GDPR.

This kind of approach is of course not realistic for everyone. There will continue to be many cases where going without the conveniences of centralized control simply sacrifices too much for both developers and users, and there are also cases where the business model considerations mandate a more centralized approach (eg. it’s easier to prevent non-paying users from using software if the software stays on your servers) win out. But we’re definitely very far from having explored the full range of possibilities that more decentralized approaches offer.

Generally, unintended consequences of laws, discouraging entire categories of activity when one wanted to only surgically forbid a few specific things, are considered to be a bad thing. Here though, I would argue that the forced shift in developers’ mindsets, from “I want to control more things just in case” to “I want to control fewer things just in case”, also has many positive consequences. Voluntarily giving up control, and voluntarily taking steps to deprive oneself of the ability to do mischief, does not come naturally to many people, and while ideologically-driven decentralization-maximizing projects exist today, it’s not at all obvious at first glance that such services will continue to dominate as the industry mainstreams. What this trend in regulation does, however, is that it gives a big nudge in favor of those applications that are willing to take the centralization-minimizing, user-sovereignty-maximizing “can’t be evil” route.

Hence, even though these regulatory changes are arguably not pro-freedom, at least if one is concerned with the freedom of application developers, and the transformation of the internet into a subject of political focus is bound to have many negative knock-on effects, the particular trend of control becoming a liability is in a strange way even more pro-cypherpunk (even if not intentionally!) than policies of maximizing total freedom for application developers would have been. Though the present-day regulatory landscape is very far from an optimal one from the point of view of almost anyone’s preferences, it has unintentionally dealt the movement for minimizing unneeded centralization and maximizing users’ control of their own assets, private keys and data a surprisingly strong hand to execute on its vision. And it would be highly beneficial to the movement to take advantage of it.

Trigger warning: specialized mathematical topic

Special thanks to Karl Floersch for feedback

One of the more interesting algorithms in number theory is the Fast Fourier transform (FFT). FFTs are a key building block in many algorithms, including extremely fast multiplication of large numbers, multiplication of polynomials, and extremely fast generation and recovery of erasure codes. Erasure codes in particular are highly versatile; in addition to their basic use cases in fault-tolerant data storage and recovery, erasure codes also have more advanced use cases such as securing data availability in scalable blockchains and STARKs. This article will go into what fast Fourier transforms are, and how some of the simpler algorithms for computing them work.

Background

The original Fourier transform is a mathematical operation that is often described as converting data between the "frequency domain" and the "time domain". What this means more precisely is that if you have a piece of data, then running the algorithm would come up with a collection of sine waves with different frequencies and amplitudes that, if you added them together, would approximate the original data. Fourier transforms can be used for such wonderful things as expressing square orbits through epicycles and deriving a set of equations that can draw an elephant:

Ok fine, Fourier transforms also have really important applications in signal processing, quantum mechanics, and other areas, and help make significant parts of the global economy happen. But come on, elephants are cooler.

Running the Fourier transform algorithm in the "inverse" direction would simply take the sine waves and add them together and compute the resulting values at as many points as you wanted to sample.

The kind of Fourier transform we'll be talking about in this post is a similar algorithm, except instead of being a continuous Fourier transform over real or complex numbers, it's a discrete Fourier transform over finite fields (see the "A Modular Math Interlude" section here for a refresher on what finite fields are). Instead of talking about converting between "frequency domain" and "time domain", here we'll talk about two different operations: multi-point polynomial evaluation (evaluating a degree < N polynomial at N different points) and its inverse, polynomial interpolation (given the evaluations of a degree < N polynomial at N different points, recovering the polynomial). For example, if we are operating in the prime field with modulus 5, then the polynomial y = x² + 3 (for convenience we can write the coefficients in increasing order: [3,0,1]) evaluated at the points [0,1,2] gives the values [3,4,2] (not [3, 4, 7] because we're operating in a finite field where the numbers wrap around at 5), and we can actually take the evaluations [3,4,2] and the coordinates they were evaluated at ([0,1,2]) to recover the original polynomial [3,0,1].

There are algorithms for both multi-point evaluation and interpolation that can do either operation in O(N²) time. Multi-point evaluation is simple: just separately evaluate the polynomial at each point. Here's python code for doing that:

def eval_poly_at(self, poly, x, modulus):
    y = 0
    power_of_x = 1
    for coefficient in poly:
        y += power_of_x * coefficient
        power_of_x *= x
    return y % modulus

The algorithm runs a loop going through every coefficient and does one thing for each coefficient, so it runs in O(N) time. Multi-point evaluation involves doing this evaluation at N different points, so the total run time is O(N²).

Lagrange interpolation is more complicated (search for "Lagrange interpolation"here for a more detailed explanation). The key building block of the basic strategy is that for any domain D and point x, we can construct a polynomial that returns 1 for x and 0 for any value in D other than x. For example, if D = [1,2,3,4] and x = 1, the polynomial is:

You can mentally plug in 1, 2, 3 and 4 to the above expression and verify that it returns 1 for x=1 and 0 in the other three cases.

We can recover the polynomial that gives any desired set of outputs on the given domain by multiplying and adding these polynomials. If we call the above polynomial P_1, and the equivalent ones for x=2, x=3, x=4, P_2, P_3 and P_4, then the polynomial that returns [3,1,4,1] on the domain [1,2,3,4] is simply 3 * P_1 + P_2 + 4 * P_3 + P_4. Computing the P_i polynomials takes O(N²) time (you first construct the polynomial that returns to 0 on the entire domain, which takes O(N²) time, then separately divide it by (x - x_i) for each x_i), and computing the linear combination takes another O(N²) time, so it's O(N²) runtime total.

What Fast Fourier transforms let us do, is make both multi-point evaluation and interpolation much faster.

Fast Fourier Transforms

There is a price you have to pay for using this much faster algorithm, which is that you cannot choose any arbitrary field and any arbitrary domain. Whereas with Lagrange interpolation, you could choose whatever x coordinates and y coordinates you wanted, and whatever field you wanted (you could even do it over plain old real numbers), and you could get a polynomial that passes through them., with an FFT, you have to use a finite field, and the domain must be a multiplicative subgroup of the field (that is, a list of powers of some "generator" value). For example, you could use the finite field of integers modulo 337, and for the domain use [1, 85, 148, 111, 336, 252, 189, 226] (that's the powers of 85 in the field, eg. 85³ % 337 = 111; it stops at 226 because the next power of 85 cycles back to 1). Futhermore, the multiplicative subgroup must have size 2ⁿ (there's ways to make it work for numbers of the form 2^m * 3ⁿ and possibly slightly higher prime powers but then it gets much more complicated and inefficient). The finite field of intergers modulo 59, for example, would not work, because there are only multiplicative subgroups of order 2, 29 and 58; 2 is too small to be interesting, and the factor 29 is far too large to be FFT-friendly. The symmetry that comes from multiplicative groups of size 2ⁿ lets us create a recursive algorithm that quite cleverly calculate the results we need from a much smaller amount of work.

To understand the algorithm and why it has a low runtime, it's important to understand the general concept of recursion. A recursive algorithm is an algorithm that has two cases: a "base case" where the input to the algorithm is small enough that you can give the output directly, and the "recursive case" where the required computation consists of some "glue computation" plus one or more uses of the same algorithm to smaller inputs. For example, you might have seen recursive algorithms being used for sorting lists. If you have a list (eg. [1,8,7,4,5,6,3,2,9]), then you can sort it using the following procedure:

If the input has one element, then it's already "sorted", so you can just return the input.
If the input has more than one element, then separately sort the first half of the list and the second half of the list, and then merge the two sorted sub-lists (call them A and B) as follows. Maintain two counters, apos and bpos, both starting at zero, and maintain an output list, which starts empty. Until either apos or bpos is at the end of the corresponding list, check if A[apos] or B[bpos] is smaller. Whichever is smaller, add that value to the end of the output list, and increase that counter by 1. Once this is done, add the rest of whatever list has not been fully processed to the end of the output list, and return the output list.

Note that the "glue" in the second procedure has runtime O(N): if each of the two sub-lists has N elements, then you need to run through every item in each list once, so it's O(N) computation total. So the algorithm as a whole works by taking a problem of size N, and breaking it up into two problems of size N/2, plus O(N) of "glue" execution. There is a theorem called the Master Theorem that lets us compute the total runtime of algorithms like this. It has many sub-cases, but in the case where you break up an execution of size N into k sub-cases of size N/k with O(N) glue (as is the case here), the result is that the execution takes time O(N * log(N)).

An FFT works in the same way. We take a problem of size N, break it up into two problems of size N/2, and do O(N) glue work to combine the smaller solutions into a bigger solution, so we get O(N * log(N)) runtime total - much faster than O(N²). Here is how we do it. I'll describe first how to use an FFT for multi-point evaluation (ie. for some domain D and polynomial P, calculate P(x) for every x in D), and it turns out that you can use the same algorithm for interpolation with a minor tweak.

Suppose that we have an FFT where the given domain is the powers of x in some field, where x^{2^k} = 1 (eg. in the case we introduced above, the domain is the powers of 85 modulo 337, and 85^2³ = 1). We have some polynomial, eg. y = 6x⁷ + 2x⁶ + 9x⁵ + 5x⁴ + x³ + 4x² + x + 3 (we'll write it as p = [3, 1, 4, 1, 5, 9, 2, 6]). We want to evaluate this polynomial at each point in the domain, ie. at each of the eight powers of 85. Here is what we do. First, we break up the polynomial into two parts, which we'll call evens and odds: evens = [3, 4, 5, 2] and odds = [1, 1, 9, 6] (or evens = 2x³ + 5x² + 4x + 3 and odds = 6x³ + 9x² + x + 1; yes, this is just taking the even-degree coefficients and the odd-degree coefficients). Now, we note a mathematical observation: p(x) = evens(x²) + x * odds(x²) and p(-x) = evens(x²) - x * odds(x²) (think about this for yourself and make sure you understand it before going further).

Here, we have a nice property: evens and odds are both polynomials half the size of p, and furthermore, the set of possible values of x² is only half the size of the original domain, because there is a two-to-one correspondence: x and -x are both part of D (eg. in our current domain [1, 85, 148, 111, 336, 252, 189, 226], 1 and 336 are negatives of each other, as 336 = -1 % 337, as are (85, 252), (148, 189) and (111, 226). And x and -x always both have the same square. Hence, we can use an FFT to compute the result of evens(x) for every x in the smaller domain consisting of squares of numbers in the original domain ([1, 148, 336, 189]), and we can do the same for odds. And voila, we've reduced a size-N problem into half-size problems.

The "glue" is relatively easy (and O(N) in runtime): we receive the evaluations of evens and odds as size-N/2 lists, so we simply do p[i] = evens_result[i] + domain[i] * odds_result[i] and p[N/2 + i] = evens_result[i] - domain[i] * odds_result[i] for each index i.

Here's the full code:

def fft(vals, modulus, domain):
    if len(vals) == 1:
        return vals
    L = fft(vals[::2], modulus, domain[::2])
    R = fft(vals[1::2], modulus, domain[::2])
    o = [0 for i in vals]
    for i, (x, y) in enumerate(zip(L, R)):
        y_times_root = y*domain[i]
        o[i] = (x+y_times_root) % modulus
        o[i+len(L)] = (x-y_times_root) % modulus
    return o

We can try running it:

>>> fft([3,1,4,1,5,9,2,6], 337, [1, 85, 148, 111, 336, 252, 189, 226])
[31, 70, 109, 74, 334, 181, 232, 4]

And we can check the result; evaluating the polynomial at the position 85, for example, actually does give the result 70. Note that this only works if the domain is "correct"; it needs to be of the form [x**i % modulus for i in range(n)] where x**n == 1.

An inverse FFT is surprisingly simple:

def inverse_fft(vals, modulus, domain):
    vals = fft(vals, modulus, domain)
    return [x * modular_inverse(len(vals), modulus) % modulus for x in [vals[0]] + vals[1:][::-1]]

Basically, run the FFT again, but reverse the result (except the first item stays in place) and divide every value by the length of the list.

>>> domain = [1, 85, 148, 111, 336, 252, 189, 226]
>>> def modular_inverse(x, n): return pow(x, n - 2, n)
>>> values = fft([3,1,4,1,5,9,2,6], 337, domain)
>>> values
[31, 70, 109, 74, 334, 181, 232, 4]
>>> inverse_fft(values, 337, domain)
[3, 1, 4, 1, 5, 9, 2, 6]

Now, what can we use this for? Here's one fun use case: we can use FFTs to multiply numbers very quickly. Suppose we wanted to multiply 1253 by 1895. Here is what we would do. First, we would convert the problem into one that turns out to be slightly easier: multiply the polynomials[3, 5, 2, 1] by [5, 9, 8, 1] (that's just the digits of the two numbers in increasing order), and then convert the answer back into a number by doing a single pass to carry over tens digits. We can multiply polynomials with FFTs quickly, because it turns out that if you convert a polynomial into evaluation form (ie. f(x) for every x in some domain D), then you can multiply two polynomials simply by multiplying their evaluations. So what we'll do is take the polynomials representing our two numbers in coefficient form, use FFTs to convert them to evaluation form, multiply them pointwise, and convert back:

>>> p1 = [3,5,2,1,0,0,0,0]
>>> p2 = [5,9,8,1,0,0,0,0]
>>> x1 = fft(p1, 337, domain)
>>> x1
[11, 161, 256, 10, 336, 100, 83, 78]
>>> x2 = fft(p2, 337, domain)
>>> x2
[23, 43, 170, 242, 3, 313, 161, 96]
>>> x3 = [(v1 * v2) % 337 for v1, v2 in zip(x1, x2)]
>>> x3
[253, 183, 47, 61, 334, 296, 220, 74]
>>> inverse_fft(x3, 337, domain)
[15, 52, 79, 66, 30, 10, 1, 0]

This requires three FFTs (each O(N * log(N)) time) and one pointwise multiplication (O(N) time), so it takes O(N * log(N)) time altogether (technically a little bit more than O(N * log(N)), because for very big numbers you would need replace 337 with a bigger modulus and that would make multiplication harder, but close enough). This is much faster than schoolbook multiplication, which takes O(N²) time:

     3  5  2  1
   ------------
5 | 15 25 10  5
9 |    27 45 18  9
8 |       24 40 16  8
1 |           3  5  2  1
   ---------------------
    15 52 79 66 30 10  1

So now we just take the result, and carry the tens digits over (this is a "walk through the list once and do one thing at each point" algorithm so it takes O(N) time):

[15, 52, 79, 66, 30, 10, 1, 0]
[ 5, 53, 79, 66, 30, 10, 1, 0]
[ 5,  3, 84, 66, 30, 10, 1, 0]
[ 5,  3,  4, 74, 30, 10, 1, 0]
[ 5,  3,  4,  4, 37, 10, 1, 0]
[ 5,  3,  4,  4,  7, 13, 1, 0]
[ 5,  3,  4,  4,  7,  3, 2, 0]

And if we read the digits from top to bottom, we get 2374435. Let's check the answer....

>>> 1253 * 1895
2374435

Yay! It worked. In practice, on such small inputs, the difference between O(N * log(N)) and O(N²) isn't that large, so schoolbook multiplication is faster than this FFT-based multiplication process just because the algorithm is simpler, but on large inputs it makes a really big difference.

But FFTs are useful not just for multiplying numbers; as mentioned above, polynomial multiplication and multi-point evaluation are crucially important operations in implementing erasure coding, which is a very important technique for building many kinds of redundant fault-tolerant systems. If you like fault tolerance and you like efficiency, FFTs are your friend.

FFTs and binary fields

Prime fields are not the only kind of finite field out there. Another kind of finite field (really a special case of the more general concept of an extension field, which are kind of like the finite-field equivalent of complex numbers) are binary fields. In an binary field, each element is expressed as a polynomial where all of the entries are 0 or 1, eg. x³ + x + 1. Adding polynomials is done modulo 2, and subtraction is the same as addition (as -1 = 1 mod 2). We select some irreducible polynomial as a modulus (eg. x⁴ + x + 1; x⁴ + 1 would not work because x⁴ + 1 can be factored into (x² + 1) * (x² + 1) so it's not "irreducible"); multiplication is done modulo that modulus. For example, in the binary field mod x⁴ + x + 1, multiplying x² + 1 by x³ + 1 would give x⁵ + x³ + x² + 1 if you just do the multiplication, but x⁵ + x³ + x² + 1 = (x⁴ + x + 1) * x + (x³ + x + 1), so the result is the remainder x³ + x + 1.

We can express this example as a multiplication table. First multiply [1, 0, 0, 1] (ie. x³ + 1) by [1, 0, 1] (ie. x² + 1):

    1 0 0 1
   --------
1 | 1 0 0 1
0 |   0 0 0 0
1 |     1 0 0 1
   ------------
    1 0 1 1 0 1

The multiplication result contains an x⁵ term so we can subtract (x⁴ + x + 1) * x:

    1 0 1 1 0 1
  -   1 1 0 0 1    [(x⁴ + x + 1) shifted right by one to reflect being multipled by x]
   ------------
    1 1 0 1 0 0

And we get the result, [1, 1, 0, 1] (or x³ + x + 1).

Addition and multiplication tables for the binary field mod x⁴ + x + 1. Field elements are expressed as integers converted from binary (eg. x³ + x² -> 1100 -> 12)

Binary fields are interesting for two reasons. First of all, if you want to erasure-code binary data, then binary fields are really convenient because N bytes of data can be directly encoded as a binary field element, and any binary field elements that you generate by performing computations on it will also be N bytes long. You cannot do this with prime fields because prime fields' size is not exactly a power of two; for example, you could encode every 2 bytes as a number from 0...65536 in the prime field modulo 65537 (which is prime), but if you do an FFT on these values, then the output could contain 65536, which cannot be expressed in two bytes. Second, the fact that addition and subtraction become the same operation, and 1 + 1 = 0, create some "structure" which leads to some very interesting consequences. One particularly interesting, and useful, oddity of binary fields is the "freshman's dream" theorem: (x+y)² = x² + y² (and the same for exponents 4, 8, 16... basically any power of two).

But if you want to use binary fields for erasure coding, and do so efficiently, then you need to be able to do Fast Fourier transforms over binary fields. But then there is a problem: in a binary field, there are no (nontrivial) multiplicative groups of order 2ⁿ. This is because the multiplicative groups are all order 2ⁿ-1. For example, in the binary field with modulus x⁴ + x + 1, if you start calculating successive powers of x+1, you cycle back to 1 after 15 steps - not 16. The reason is that the total number of elements in the field is 16, but one of them is zero, and you're never going to reach zero by multiplying any nonzero value by itself in a field, so the powers of x+1 cycle through every element but zero, so the cycle length is 15, not 16. So what do we do?

The reason we needed the domain to have the "structure" of a multiplicative group with 2ⁿ elements before is that we needed to reduce the size of the domain by a factor of two by squaring each number in it: the domain [1, 85, 148, 111, 336, 252, 189, 226] gets reduced to [1, 148, 336, 189] because 1 is the square of both 1 and 336, 148 is the square of both 85 and 252, and so forth. But what if in a binary field there's a different way to halve the size of a domain? It turns out that there is: given a domain containing 2^k values, including zero (technically the domain must be a subspace), we can construct a half-sized new domain D' by taking x * (x+k) for x in D using some specific k in D. Because the original domain is a subspace, since k is in the domain, any x in the domain has a corresponding x+k also in the domain, and the function f(x) = x * (x+k) returns the same value for x and x+k so we get the same kind of two-to-one correspondence that squaring gives us.

`x`	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
`x * (x+1)`	0	0	6	6	7	7	1	1	4	4	2	2	3	3	5	5

So now, how do we do an FFT on top of this? We'll use the same trick, converting a problem with an N-sized polynomial and N-sized domain into two problems each with an N/2-sized polynomial and N/2-sized domain, but this time using different equations. We'll convert a polynomial p into two polynomials evens and odds such that p(x) = evens(x*(k-x)) + x * odds(x*(k-x)). Note that for the evens and odds that we find, it will also be true that p(x+k) = evens(x*(k-x)) + (x+k) * odds(x*(k-x)). So we can then recursively do an FFT to evens and odds on the reduced domain [x*(k-x) for x in D], and then we use these two formulas to get the answers for two "halves" of the domain, one offset by k from the other.

Converting p into evens and odds as described above turns out to itself be nontrivial. The "naive" algorithm for doing this is itself O(N²), but it turns out that in a binary field, we can use the fact that (x²-kx)² = x⁴ - k² * x², and more generally (x²-kx)^2ⁱ = x^2ⁱ⁺¹ - k^2ⁱ * x^2ⁱ, to create yet another recursive algorithm to do this in O(N * log(N)) time.

And if you want to do an inverse FFT, to do interpolation, then you need to run the steps in the algorithm in reverse order. You can find the complete code for doing this here: https://github.com/ethereum/research/tree/master/binary_fft, and a paper with details on more optimal algorithms here: http://www.math.clemson.edu/~sgao/papers/GM10.pdf

So what do we get from all of this complexity? Well, we can try running the implementation, which features both a "naive" O(N²) multi-point evaluation and the optimized FFT-based one, and time both. Here are my results:

>>> import binary_fft as b
>>> import time, random
>>> f = b.BinaryField(1033)
>>> poly = [random.randrange(1024) for i in range(1024)]
>>> a = time.time(); x1 = b._simple_ft(f, poly); time.time() - a
0.5752472877502441
>>> a = time.time(); x2 = b.fft(f, poly, list(range(1024))); time.time() - a
0.03820443153381348

And as the size of the polynomial gets larger, the naive implementation (_simple_ft) gets slower much more quickly than the FFT:

>>> f = b.BinaryField(2053)
>>> poly = [random.randrange(2048) for i in range(2048)]
>>> a = time.time(); x1 = b._simple_ft(f, poly); time.time() - a
2.2243144512176514
>>> a = time.time(); x2 = b.fft(f, poly, list(range(2048))); time.time() - a
0.07896280288696289

And voila, we have an efficient, scalable way to multi-point evaluate and interpolate polynomials. If we want to use FFTs to recover erasure-coded data where we are missing some pieces, then algorithms for this also exist, though they are somewhat less efficient than just doing a single FFT. Enjoy!

Special thanks to Jinglan Wang for review and feedback

One question that often comes up is: how exactly is sharding different from sidechains or Plasma? All three architectures seem to involve a hub-and-spoke architecture with a central “main chain” that serves as the consensus backbone of the system, and a set of “child” chains containing actual user-level transactions. Hashes from the child chains are usually periodically published into the main chain (sharded chains with no hub are theoretically possible but haven’t been done so far; this article will not focus on them, but the arguments are similar). Given this fundamental similarity, why go with one approach over the others?

Distinguishing sidechains from Plasma is simple. Plasma chains are sidechains that have a non-custodial property: if there is any error in the Plasma chain, then the error can be detected, and users can safely exit the Plasma chain and prevent the attacker from doing any lasting damage. The only cost that users suffer is that they must wait for a challenge period and pay some higher transaction fees on the (non-scalable) base chain. Regular sidechains do not have this safety property, so they are less secure. However, designing Plasma chains is in many cases much harder, and one could argue that for many low-value applications the security is not worth the added complexity.

So what about Plasma versus sharding? The key technical difference has to do with the notion of tight coupling. Tight coupling is a property of sharding, but NOT a property of sidechains or Plasma, that says that the validity of the main chain (“beacon chain” in ethereum 2.0) is inseparable from the validity of the child chains. That is, a child chain block that specifies an invalid main chain block as a dependency is by definition invalid, and more importantly a main chain block that includes an invalid child chain block is by definition invalid.

In non-sharded blockchains, this idea that the canonical chain (ie. the chain that everyone accepts as representing the “real” history) is by definition fully available and valid also applies; for example in the case of Bitcoin and Ethereum one typically says that the canonical chain is the “longest valid chain” (or, more pedantically, the “heaviest valid and available chain”). In sharded blockchains, this idea that the canonical chain is the heaviest valid and available chain by definition also applies, with the validity and availability requirement applying to both the main chain and shard chains. The new challenge that a sharded system has, however, is that users have no way of fully verifying the validity and availability of any given chain directly, because there is too much data. The challenge of engineering sharded chains is to get around this limitation by giving users a maximally trustless and practical indirect means to verify which chains are fully available and valid, so that they can still determine which chain is canonical. In practice, this includes techniques like committees, SNARKs/STARKs, fisherman schemes and fraud and data availability proofs.

If a chain structure does not have this tight-coupling property, then it is arguably not a layer-1 sharding scheme, but rather a layer-2 system sitting on top of a non-scalable layer-1 chain. Plasma is not a tightly-coupled system: an invalid Plasma block absolutely can have its header be committed into the main Ethereum chain, because the Ethereum base layer has no idea that it represents an invalid Plasma block, or even that it represents a Plasma block at all; all that it sees is a transaction containing a small piece of data. However, the consequences of a single Plasma chain failing are localized to within that Plasma chain.

Sharding	Try really hard to ensure total validity/availability of every part of the system
Plasma	Accept local faults but try to limit their consequences

However, if you try to analyze the process of how users perform the “indirect validation” procedure to determine if the chain they are looking at is fully valid and available without downloading and executing the whole thing, one can find more similarities with how Plasma works. For example, a common technique used to prevent availability issues is fishermen: if a node sees a given piece of a block as unavailable, it can publish a challenge claiming this, creating a time period within which anyone can publish that piece of data. If a block goes unchallenged for long enough, the blocks and all blocks that cite it as a dependency can be reverted. This seems fundamentally similar to Plasma, where if a block is unavailable users can publish a message to the main chain to exit their state in response. Both techniques eventually buckle under pressure in the same way: if there are too many false challenges in a sharded system, then users cannot keep track of whether or not all of the availability challenges have been answered, and if there are too many availability challenges in a Plasma system then the main chain could get overwhelmed as the exits fill up the chain’s block size limit. In both cases, it seems like there’s a system that has nominally O(C^2) scalability (where C is the computing power of one node) but where scalability falls to O(C) in the event of an attack. However, sharding has more defenses against this.

First of all, modern sharded designs use randomly sampled committees, so one cannot easily dominate even one committee enough to produce a fake block unless one has a large portion (perhaps >1/3) of the entire validator set of the chain. Second, there are better strategies to handling data availability than fishermen: data availability proofs. In a scheme using data availability proofs, if a block is unavailable, then clients’ data availability checks will fail and clients will see that block as unavailable. If the block is invalid, then even a single fraud proof will convince them of this fact for an entire block. An O(1)-sized fraud proof can convince a client of the invalidity of an O(C)-sized block, and so O(C) data suffices to convince a client of the invalidity of O(C^2) data (this is in the worst case where the client is dealing with N sister blocks all with the same parent of which only one is valid; in more likely cases, one single fraud proof suffices to prove invalidity of an entire invalid chain). Hence, sharded systems are theoretically less vulnerable to being overwhelmed by denial-of-service attacks than Plasma chains.

Second, sharded chains provide stronger guarantees in the face of large and majority attackers (with more than 1/3 or even 1/2 of the validator set). A Plasma chain can always be successfully attacked by a 51% attack on the main chain that censors exits; a sharded chain cannot. This is because data availability proofs and fraud proofs happen inside the client, rather than inside the chain, so they cannot be censored by 51% attacks. Third, the defenses provided by sharded chains are easier to generalize; Plasma’s model of exits requires state to be separated into discrete pieces each of which is in the interest of any single actor to maintain, whereas sharded chains relying on data availability proofs, fraud proofs, fishermen and random sampling are theoretically universal.

So there really is a large difference between validity and availability guarantees that are provided at layer 2, which are limited and more complex as they require explicit reasoning about incentives and which party has an interest in which pieces of state, and guarantees that are provided by a layer 1 system that is committed to fully satisfying them.

But Plasma chains also have large advantages too. First, they can be iterated and new designs can be implemented more quickly, as each Plasma chain can be deployed separately without coordinating the rest of the ecosystem. Second, sharding is inherently more fragile, as it attempts to guarantee absolute and total availability and validity of some quantity of data, and this quantity must be set in the protocol; too little, and the system has less scalability than it could have had, too much, and the entire system risks breaking. The maximum safe level of scalability also depends on the number of users of the system, which is an unpredictable variable. Plasma chains, on the other hand, allow different users to make different tradeoffs in this regard, and allow users to adjust more flexibly to changes in circumstances.

Single-operator Plasma chains can also be used to offer more privacy than sharded systems, where all data is public. Even where privacy is not desired, they are potentially more efficient, because the total data availability requirement of sharded systems requires a large extra level of redundancy as a safety margin. In Plasma systems, on the other hand, data requirements for each piece of data can be minimized, to the point where in the long term each individual piece of data may only need to be replicated a few times, rather than a thousand times as is the case in sharded systems.

Hence, in the long term, a hybrid system where a sharded base layer exists, and Plasma chains exist on top of it to provide further scalability, seems like the most likely approach, more able to serve different groups’ of users need than sole reliance on one strategy or the other. And it is unfortunately not the case that at a sufficient level of advancement Plasma and sharding collapse into the same design; the two are in some key ways irreducibly different (eg. the data availability checks made by clients in sharded systems cannot be moved to the main chain in Plasma because these checks only work if they are done subjectively and based on private information). But both scalability solutions (as well as state channels!) have a bright future ahead of them.

Special thanks to the Plasma Group team for review and feedback

Current approaches to layer 2 scaling - basically, Plasma and state channels - are increasingly moving from theory to practice, but at the same time it is becoming easier to see the inherent challenges in treating these techniques as a fully fledged scaling solution for Ethereum. Ethereum was arguably successful in large part because of its very easy developer experience: you write a program, publish the program, and anyone can interact with it. Designing a state channel or Plasma application, on the other hand, relies on a lot of explicit reasoning about incentives and application-specific development complexity. State channels work well for specific use cases such as repeated payments between the same two parties and two-player games (as successfully implemented in Celer), but more generalized usage is proving challenging. Plasma, particularly Plasma Cash, can work well for payments, but generalization similarly incurs challenges: even implementing a decentralized exchange requires clients to store much more history data, and generalizing to Ethereum-style smart contracts on Plasma seems extremely difficult.

But at the same time, there is a resurgence of a forgotten category of "semi-layer-2" protocols - a category which promises less extreme gains in scaling, but with the benefit of much easier generalization and more favorable security models. A long-forgotten blog post from 2014 introduced the idea of "shadow chains", an architecture where block data is published on-chain, but blocks are not verified by default. Rather, blocks are tentatively accepted, and only finalized after some period of time (eg. 2 weeks). During those 2 weeks, a tentatively accepted block can be challenged; only then is the block verified, and if the block proves to be invalid then the chain from that block on is reverted, and the original publisher's deposit is penalized. The contract does not keep track of the full state of the system; it only keeps track of the state root, and users themselves can calculate the state by processing the data submitted to the chain from start to head. A more recent proposal, ZK Rollup, does the same thing without challenge periods, by using ZK-SNARKs to verify blocks' validity.

Anatomy of a ZK Rollup package that is published on-chain. Hundreds of "internal transactions" that affect the state (ie. account balances) of the ZK Rollup system are compressed into a package that contains ~10 bytes per internal transaction that specifies the state transitions, plus a ~100-300 byte SNARK proving that the transitions are all valid.

In both cases, the main chain is used to verify data availability, but does not (directly) verify block validity or perform any significant computation, unless challenges are made. This technique is thus not a jaw-droppingly huge scalability gain, because the on-chain data overhead eventually presents a bottleneck, but it is nevertheless a very significant one. Data is cheaper than computation, and there are ways to compress transaction data very significantly, particularly because the great majority of data in a transaction is the signature and many signatures can be compressed into one through many forms of aggregation. ZK Rollup promises 500 tx/sec, a 30x gain over the Ethereum chain itself, by compressing each transaction to a mere ~10 bytes; signatures do not need to be included because their validity is verified by the zero-knowledge proof. With BLS aggregate signatures a similar throughput can be achieved in shadow chains (more recently called "optimistic rollup" to highlight its similarities to ZK Rollup). The upcoming Istanbul hard fork will reduce the gas cost of data from 68 per byte to 16 per byte, increasing the throughput of these techniques by another 4x (that's over 2000 transactions per second).

So what is the benefit of data on-chain techniques such as ZK/optimistic rollup versus data off-chain techniques such as Plasma? First of all, there is no need for semi-trusted operators. In ZK Rollup, because validity is verified by cryptographic proofs there is literally no way for a package submitter to be malicious (depending on the setup, a malicious submitter may cause the system to halt for a few seconds, but this is the most harm that can be done). In optimistic rollup, a malicious submitter can publish a bad block, but the next submitter will immediately challenge that block before publishing their own. In both ZK and optimistic rollup, enough data is published on chain to allow anyone to compute the complete internal state, simply by processing all of the submitted deltas in order, and there is no "data withholding attack" that can take this property away. Hence, becoming an operator can be fully permissionless; all that is needed is a security deposit (eg. 10 ETH) for anti-spam purposes.

Second, optimistic rollup particularly is vastly easier to generalize; the state transition function in an optimistic rollup system can be literally anything that can be computed within the gas limit of a single block (including the Merkle branches providing the parts of the state needed to verify the transition). ZK Rollup is theoretically generalizeable in the same way, though in practice making ZK SNARKs over general-purpose computation (such as EVM execution) is very difficult, at least for now. Third, optimistic rollup is much easier to build clients for, as there is less need for second-layer networking infrastructure; more can be done by just scanning the blockchain.

But where do these advantages come from? The answer lies in a highly technical issue known as the data availability problem (see note, video). Basically, there are two ways to try to cheat in a layer-2 system. The first is to publish invalid data to the blockchain. The second is to not publish data at all (eg. in Plasma, publishing the root hash of a new Plasma block to the main chain but without revealing the contents of the block to anyone). Published-but-invalid data is very easy to deal with, because once the data is published on-chain there are multiple ways to figure out unambiguously whether or not it's valid, and an invalid submission is unambiguously invalid so the submitter can be heavily penalized. Unavailable data, on the other hand, is much harder to deal with, because even though unavailability can be detected if challenged, one cannot reliably determine whose fault the non-publication is, especially if data is withheld by default and revealed on-demand only when some verification mechanism tries to verify its availability. This is illustrated in the "Fisherman's dilemma", which shows how a challenge-response game cannot distinguish between malicious submitters and malicious challengers:

Fisherman's dilemma. If you only start watching the given specific piece of data at time T3, you have no idea whether you are living in Case 1 or Case 2, and hence who is at fault.

Plasma and channels both work around the fisherman's dilemma by pushing the problem to users: if you as a user decide that another user you are interacting with (a counterparty in a state channel, an operator in a Plasma chain) is not publishing data to you that they should be publishing, it's your responsibility to exit and move to a different counterparty/operator. The fact that you as a user have all of the previous data, and data about all of the transactions you signed, allows you to prove to the chain what assets you held inside the layer-2 protocol, and thus safely bring them out of the system. You prove the existence of a (previously agreed) operation that gave the asset to you, no one else can prove the existence of an operation approved by you that sent the asset to someone else, so you get the asset.

The technique is very elegant. However, it relies on a key assumption: that every state object has a logical "owner", and the state of the object cannot be changed without the owner's consent. This works well for UTXO-based payments (but not account-based payments, where you can edit someone else's balance upward without their consent; this is why account-based Plasma is so hard), and it can even be made to work for a decentralized exchange, but this "ownership" property is far from universal. Some applications, eg. Uniswap don't have a natural owner, and even in those applications that do, there are often multiple people that can legitimately make edits to the object. And there is no way to allow arbitrary third parties to exit an asset without introducing the possibility of denial-of-service (DoS) attacks, precisely because one cannot prove whether the publisher or submitter is at fault.

There are other issues peculiar to Plasma and channels individually. Channels do not allow off-chain transactions to users that are not already part of the channel (argument: suppose there existed a way to send $1 to an arbitrary new user from inside a channel. Then this technique could be used many times in parallel to send $1 to more users than there are funds in the system, already breaking its security guarantee). Plasma requires users to store large amounts of history data, which gets even bigger when different assets can be intertwined (eg. when an asset is transferred conditional on transfer of another asset, as happens in a decentralized exchange with a single-stage order book mechanism).

Because data-on-chain computation-off-chain layer 2 techniques don't have data availability issues, they have none of these weaknesses. ZK and optimistic rollup take great care to put enough data on chain to allow users to calculate the full state of the layer 2 system, ensuring that if any participant disappears a new one can trivially take their place. The only issue that they have is verifying computation without doing the computation on-chain, which is a much easier problem. And the scalability gains are significant: ~10 bytes per transaction in ZK Rollup, and a similar level of scalability can be achieved in optimistic rollup by using BLS aggregation to aggregate signatures. This corresponds to a theoretical maximum of ~500 transactions per second today, and over 2000 post-Istanbul.

But what if you want more scalability? Then there is a large middle ground between data-on-chain layer 2 and data-off-chain layer 2 protocols, with many hybrid approaches that give you some of the benefits of both. To give a simple example, the history storage blowup in a decentralized exchange implemented on Plasma Cash can be prevented by publishing a mapping of which orders are matched with which orders (that's less than 4 bytes per order) on chain:

Left: History data a Plasma Cash user needs to store if they own 1 coin. Middle: History data a Plasma Cash user needs to store if they own 1 coin that was exchanged with another coin using an atomic swap. Right: History data a Plasma Cash user needs to store if the order matching is published on chain.

Even outside of the decentralized exchange context, the amount of history that users need to store in Plasma can be reduced by having the Plasma chain periodically publish some per-user data on-chain. One could also imagine a platform which works like Plasma in the case where some state does have a logical "owner" and works like ZK or optimistic rollup in the case where it does not. Plasma developers are already starting to work on these kinds of optimizations.

There is thus a strong case to be made for developers of layer 2 scalability solutions to move to be more willing to publish per-user data on-chain at least some of the time: it greatly increases ease of development, generality and security and reduces per-user load (eg. no need for users storing history data). The efficiency losses of doing so are also overstated: even in a fully off-chain layer-2 architecture, users depositing, withdrawing and moving between different counterparties and providers is going to be an inevitable and frequent occurrence, and so there will be a significant amount of per-user on-chain data regardless. The hybrid route opens the door to a relatively fast deployment of fully generalized Ethereum-style smart contracts inside a quasi-layer-2 architecture.

How PLONK works

Let us start with an explanation of how PLONK works, in a somewhat abstracted format that focuses on polynomial equations without immediately explaining how those equations are verified. A key ingredient in PLONK, as is the case in the QAPs used in SNARKs, is a procedure for converting a problem of the form "give me a value X such that a specific program P that I give you, when evaluated with X as an input, gives some specific result Y" into the problem "give me a set of values that satisfies a set of math equations". The program P can represent many things; for example the problem could be "give me a solution to this sudoku", which you would encode by setting P to be a sudoku verifier plus some initial values encoded and setting Y to 1 (ie. "yes, this solution is correct"), and a satisfying input X would be a valid solution to the sudoku. This is done by representing P as a circuit with logic gates for addition and multiplication, and converting it into a system of equations where the variables are the values on all the wires and there is one equation per gate (eg. x6 = x4 * x7 for multiplication, x8 = x5 + x9 for addition).

Here is an example of the problem of finding x such that P(x) = x**3 + x + 5 = 35 (hint: x = 3):

We can label the gates and wires as follows:

On the gates and wires, we have two types of constraints: gate constraints (equations between wires attached to the same gate, eg. a1 * b1 = c1) and copy constraints (claims about equality of different wires anywhere in the circuit, eg. a0 = a1 = b1 = b2 = a3 or c0 = a1). We will need to create a structured system of equations, which will ultimately reduce to a very small number of polynomial equations, to represent both.

In PLONK, the setup for these equations is as follows. Each equation is of the following form (think: L = left, R = right, O = output, M = multiplication, C = constant):

Each Q value is a constant; the constants in each equation (and the number of equations) will be different for each program. Each small-letter value is a variable, provided by the user: a_i is the left input wire of the i'th gate, b_i is the right input wire, and c_i is the output wire of the i'th gate. For an addition gate, we set:

Plugging these constants into the equation and simplifying gives us a_i + b_i - o_i = 0, which is exactly the constraint that we want. For a multiplication gate, we set:

For a constant gate setting a_i to some constant x, we set:

You may have noticed that each end of a wire, as well as each wire in a set of wires that clearly must have the same value (eg. x), corresponds to a distinct variable; there's nothing so far forcing the output of one gate to be the same as the input of another gate (what we call "copy constraints"). PLONK does of course have a way of enforcing copy constraints, but we'll get to this later. So now we have a problem where a prover wants to prove that they have a bunch of x_{a_i}, x_{b_i} and x_{c_i} values that satisfy a bunch of equations that are of the same form. This is still a big problem, but unlike "find a satisfying input to this computer program" it's a very structured big problem, and we have mathematical tools to "compress" it.

From linear systems to polynomials

If you have read about STARKs or QAPs, the mechanism described in this next section will hopefully feel somewhat familiar, but if you have not that's okay too. The main ingredient here is to understand a polynomial as a mathematical tool for encapsulating a whole lot of values into a single object. Typically, we think of polynomials in "coefficient form", that is an expression like:

But we can also view polynomials in "evaluation form". For example, we can think of the above as being "the" degree < 4 polynomial with evaluations (-2, 1, 0, 1) at the coordinates (0, 1, 2, 3) respectively.

Now here's the next step. Systems of many equations of the same form can be re-interpreted as a single equation over polynomials. For example, suppose that we have the system:

Let us define four polynomials in evaluation form: L(x) is the degree < 3 polynomial that evaluates to (2, 1, 8) at the coordinates (0, 1, 2), and at those same coordinates M(x) evaluates to (-1, 4, -1), R(x) to (3, -5, -1) and O(x) to (8, 5, -2) (it is okay to directly define polynomials in this way; you can use Lagrange interpolation to convert to coefficient form). Now, consider the equation:

Here, Z(x) is shorthand for (x-0) * (x-1) * (x-2) - the minimal (nontrivial) polynomial that returns zero over the evaluation domain (0, 1, 2). A solution to this equation (x₁ = 1, x₂ = 6, x₃ = 4, H(x) = 0) is also a solution to the original system of equations, except the original system does not need H(x). Notice also that in this case, H(x) is conveniently zero, but in more complex cases H may need to be nonzero.

So now we know that we can represent a large set of constraints within a small number of mathematical objects (the polynomials). But in the equations that we made above to represent the gate wire constraints, the x₁, x₂, x₃ variables are different per equation. We can handle this by making the variables themselves polynomials rather than constants in the same way. And so we get:

As before, each Q polynomial is a parameter that can be generated from the program that is being verified, and the a, b, c polynomials are the user-provided inputs.

Copy constraints

Now, let us get back to "connecting" the wires. So far, all we have is a bunch of disjoint equations about disjoint values that are independently easy to satisfy: constant gates can be satisfied by setting the value to the constant and addition and multiplication gates can simply be satisfied by setting all wires to zero! To make the problem actually challenging (and actually represent the problem encoded in the original circuit), we need to add an equation that verifies "copy constraints": constraints such as a(5) = c(7), c(10) = c(12), etc. This requires some clever trickery.

Our strategy will be to design a "coordinate pair accumulator", a polynomial p(x) which works as follows. First, let X(x) and Y(x) be two polynomials representing the x and y coordinates of a set of points (eg. to represent the set ((0, -2), (1, 1), (2, 0), (3, 1)) you might set X(x) = x and Y(x) = x³ - 5x² + 7x - 2). Our goal will be to let p(x) represent all the points up to (but not including) the given position, so p(0) starts at 1, p(1) represents just the first point, p(2) the first and the second, etc. We will do this by "randomly" selecting two constants, v1 and v2, and constructing p(x) using the constraints p(0) = 1 and p(x+1) = p(x) * (v1 + X(x) + v2 * Y(x)) at least within the domain (0, 1, 2, 3).

For example, letting v1 = 3 and v2 = 2, we get:

X(x)	0	1	2	3	4
Y(x)	-2	1	0	1
v1 + X(x) + v2 * Y(x)	-1	6	5	8
p(x)	1	-1	-6	-30	-240

Notice that (aside from the first column) every p(x) value equals the value to the left of it multiplied by the value to the left and above it.

The result we care about is p(4) = -240. Now, consider the case where instead of X(x) = x, we set X(x) = ²⁄₃ x³ - 4x² + ¹⁹⁄₃ x (that is, the polynomial that evaluates to (0, 3, 2, 1) at the coordinates (0, 1, 2, 3)). If you run the same procedure, you'll find that you also get p(4) = -240. This is not a coincidence (in fact, if you randomly pick v1 and v2 from a sufficiently large field, it will almost never happen coincidentally). Rather, this happens because Y(1) = Y(3), so if you "swap the X coordinates" of the points (1, 1) and (3, 1) you're not changing the set of points, and because the accumulator encodes a set (as multiplication does not care about order) the value at the end will be the same.

Now we can start to see the basic technique that we will use to prove copy constraints. First, consider the simple case where we only want to prove copy constraints within one set of wires (eg. we want to prove a(1) = a(3)). We'll make two coordinate accumulators: one where X(x) = x and Y(x) = a(x), and the other where Y(x) = a(x) but X'(x) is the polynomial that evaluates to the permutation that flips (or otherwise rearranges) the values in each copy constraint; in the a(1) = a(3) case this would mean the permutation would start 0 3 2 1 4.... The first accumulator would be compressing ((0, a(0)), (1, a(1)), (2, a(2)), (3, a(3)), (4, a(4))..., the second ((0, a(0)), (3, a(1)), (2, a(2)), (1, a(3)), (4, a(4)).... The only way the two can give the same result is if a(1) = a(3).

To prove constraints between a, b and c, we use the same procedure, but instead "accumulate" together points from all three polynomials. We assign each of a, b, c a range of X coordinates (eg. a gets X_a(x) = x ie. 0...n-1, b gets X_b(x) = n+x, ie. n...2n-1, c gets X_c(x) = 2n+x, ie. 2n...3n-1. To prove copy constraints that hop between different sets of wires, the "alternate" X coordinates would be slices of a permutation across all three sets. For example, if we want to prove a(2) = b(4) with n = 5, then X'_a(x) would have evaluations 0 1 9 3 4 and X'_b(x) would have evaluations 5 6 7 8 2 (notice the 2 and 9 flipped, where 9 corresponds to the b₄ wire).

We would then instead of checking equality within one run of the procedure (ie. checking p(4) = p'(4) as before), we would check the product of the three different runs on each side:

The product of the three p(n) evaluations on each side accumulates all coordinate pairs in the a, b and c runs on each side together, so this allows us to do the same check as before, except that we can now check copy constraints not just between positions within one of the three sets of wires a, b or c, but also between one set of wires and another (eg. as in a(2) = b(4)).

And that's all there is to it!

Putting it all together

In reality, all of this math is done not over integers, but over a prime field; check the section "A Modular Math Interlude" here for a description of what prime fields are. Also, for mathematical reasons perhaps best appreciated by reading and understanding this article on FFT implementation, instead of representing wire indices with x=0....n-1, we'll use powers of ω: 1, ω, ω²....ω^n-1 where ω is a high-order root-of-unity in the field. This changes nothing about the math, except that the coordinate pair accumulator constraint checking equation changes from p(x + 1) = p(x) * (v1 + X(x) + v2 * Y(x)) to p(ω * x)= p(x) * (v1 + X(x) + v2 * Y(x)), and instead of using 0..n-1, n..2n-1, 2n..3n-1 as coordinates we use ωⁱ, g * ωⁱ and g² * ωⁱ where g can be some random high-order element in the field.

Now let's write out all the equations we need to check. First, the main gate-constraint satisfaction check:

Then the polynomial accumulator transition constraint (note: think of "= Z(x) * H(x)" as meaning "equals zero for all coordinates within some particular domain that we care about, but not necessarily outside of it"):

Then the polynomial accumulator starting and ending constraints:

The user-provided polynomials are:

The wire assignments a(x), b(x), c(x)
The coordinate accumulators P_a(x), P_b(x), P_c(x), P_a'(x), P_b'(x), P_c'(x)
The quotients H(x) and H₁(x)...H₆(x)

The program-specific polynomials that the prover and verifier need to compute ahead of time are:

Q_L(x), Q_R(x), Q_O(x), Q_M(x), Q_C(x), which together represent the gates in the circuit (note that Q_C(x) encodes public inputs, so it may need to be computed or modified at runtime)
The "permutation polynomials" σ_a(x), σ_b(x) and σ_c(x), which encode the copy constraints between the a, b and c wires

Note that the verifier need only store commitments to these polynomials. The only remaining polynomial in the above equations is Z(x) = (x - 1) * (x - ω) * ... * (x - ω^n-1) which is designed to evaluate to zero at all those points. Fortunately, ω can be chosen to make this polynomial very easy to evaluate: the usual technique is to choose ω to satisfy ωⁿ = 1, in which case Z(x) = xⁿ - 1.

The only constraint on v₁ and v₂ is that the user must not be able to choose a(x), b(x) or c(x) after v₁ and v₂ become known, so we can satisfy this by computing v₁ and v₂ from hashes of commitments to a(x), b(x) and c(x).

So now we've turned the program satisfaction problem into a simple problem of satisfying a few equations with polynomials, and there are some optimizations in PLONK that allow us to remove many of the polynomials in the above equations that I will not go into to preserve simplicity. But the polynomials themselves, both the program-specific parameters and the user inputs, are big. So the next question is, how do we get around this so we can make the proof short?

Polynomial commitments

A polynomial commitment is a short object that "represents" a polynomial, and allows you to verify evaluations of that polynomial, without needing to actually contain all of the data in the polynomial. That is, if someone gives you a commitment c representing P(x), they can give you a proof that can convince you, for some specific z, what the value of P(z) is. There is a further mathematical result that says that, over a sufficiently big field, if certain kinds of equations (chosen before z is known) about polynomials evaluated at a random z are true, those same equations are true about the whole polynomial as well. For example, if P(z) * Q(z) + R(z) = S(z) + 5, then we know that it's overwhelmingly likely that P(x) * Q(x) + R(x) = S(x) + 5 in general. Using such polynomial commitments, we could very easily check all of the above polynomial equations above - make the commitments, use them as input to generate z, prove what the evaluations are of each polynomial at z, and then run the equations with these evaluations instead of the original polynomials. But how do these commitments work?

There are two parts: the commitment to the polynomial P(x) -> c, and the opening to a value P(z) at some z. To make a commitment, there are many techniques; one example is FRI, and another is Kate commitments which I will describe below. To prove an opening, it turns out that there is a simple generic "subtract-and-divide" trick: to prove that P(z) = a, you prove that

is also a polynomial (using another polynomial commitment). This works because if the quotient is a polynomial (ie. it is not fractional), then x - z is a factor of P(x) - a, so (P(x) - a)(z) = 0, so P(z) = a. Try it with some polynomial, eg. P(x) = x³ + 2*x²+5 with (z = 6, a = 293), yourself; and try (z = 6, a = 292) and see how it fails (if you're lazy, see WolframAlpha here vs here). Note also a generic optimization: to prove many openings of many polynomials at the same time, after committing to the outputs do the subtract-and-divide trick on a random linear combination of the polynomials and the outputs.

So how do the commitments themselves work? Kate commitments are, fortunately, much simpler than FRI. A trusted-setup procedure generates a set of elliptic curve points G, G * s, G * s² .... G * sⁿ, as well as G2 * s, where G and G2 are the generators of two elliptic curve groups and s is a secret that is forgotten once the procedure is finished (note that there is a multi-party version of this setup, which is secure as long as at least one of the participants forgets their share of the secret). These points are published and considered to be "the proving key" of the scheme; anyone who needs to make a polynomial commitment will need to use these points. A commitment to a degree-d polynomial is made by multiplying each of the first d+1 points in the proving key by the corresponding coefficient in the polynomial, and adding the results together.

Notice that this provides an "evaluation" of that polynomial at s, without knowing s. For example, x³ + 2x²+5 would be represented by (G * s³) + 2 * (G * s²) + 5 * G. We can use the notation [P] to refer to P encoded in this way (ie. G * P(s)). When doing the subtract-and-divide trick, you can prove that the two polynomials actually satisfy the relation by using elliptic curve pairings: check that e([P] - G * a, G2) = e([Q], [x] - G2 * z) as a proxy for checking that P(x) - a = Q(x) * (x - z).

But there are more recently other types of polynomial commitments coming out too. A new scheme called DARK ("Diophantine arguments of knowledge") uses "hidden order groups" such as class groups to implement another kind of polynomial commitment. Hidden order groups are unique because they allow you to compress arbitrarily large numbers into group elements, even numbers much larger than the size of the group element, in a way that can't be "spoofed"; constructions from VDFs to accumulators to range proofs to polynomial commitments can be built on top of this. Another option is to use bulletproofs, using regular elliptic curve groups at the cost of the proof taking much longer to verify. Because polynomial commitments are much simpler than full-on zero knowledge proof schemes, we can expect more such schemes to get created in the future.

Recap

To finish off, let's go over the scheme again. Given a program P, you convert it into a circuit, and generate a set of equations that look like this:

You then convert this set of equations into a single polynomial equation:

You also generate from the circuit a list of copy constraints. From these copy constraints you generate the three polynomials representing the permuted wire indices: σ_a(x), σ_b(x), σ_c(x). To generate a proof, you compute the values of all the wires and convert them into three polynomials: a(x), b(x), c(x). You also compute six "coordinate pair accumulator" polynomials as part of the permutation-check argument. Finally you compute the cofactors H_i(x).

There is a set of equations between the polynomials that need to be checked; you can do this by making commitments to the polynomials, opening them at some random z (along with proofs that the openings are correct), and running the equations on these evaluations instead of the original polynomials. The proof itself is just a few commitments and openings and can be checked with a few equations. And that's all there is to it!

Alice slowly walks down the old, dusty stairs of the building into the basement. She thinks wistfully of the old days, when quadratic-voting in the World Collective Market was a much simpler process of linking her public key to a twitter account and opening up metamask to start firing off votes. Of course back then voting in the WCM was used for little; there were a few internet forums that used it for voting on posts, and a few million dollars donated to its quadratic funding oracle. But then it grew, and then the game-theoretic attacks came.

First came the exchange platforms, which started offering "dividends" to anyone who registered a public key belonging to an exchange and thus provably allowed the exchange to vote on their behalf, breaking the crucial "independent choice" assumption of the quadratic voting and funding mechanisms. And soon after that came the fake accounts - Twitter accounts, Reddit accounts filtered by karma score, national government IDs, all proved vulnerable to either government cheating or hackers, or both. Elaborate infrastructure was instituted at registration time to ensure both that account holders were real people, and that account holders themselves held the keys, not a central custody service purchasing keys by the thousands to buy votes.

And so today, voting is still easy, but initiation, while still not harder than going to a government office, is no longer exactly trivial. But of course, with billions of dollars in donations from now-deceased billionaires and cryptocurrency premines forming part of the WCM's quadratic funding pool, and elements of municipal governance using its quadratic voting protocols, participating is very much worth it.

After reaching the end of the stairs, Alice opens the door and enters the room. Inside the room, she sees a table. On the near side of the table, she sees a single, empty chair. On the far side of the table, she sees four people already sitting down on chairs of their own, the high-reputation Guardians randomly selected by the WCM for Alice's registration ceremony. "Hello, Alice," the person sitting on the leftmost chair, whose name she intuits is Bob, says in a calm voice. "Glad that you can make it," the person sitting beside Bob, whose name she intuits is Charlie, adds.

Alice walks over to the chair that is clearly meant for her and sits down. "Let us begin," the person sitting beside Charlie, whose name by logical progression is David, proclaims. "Alice, do you have your key shares?"

Alice takes out four pocket-sized notebooks, clearly bought from a dollar store, and places them on the table. The person sitting at the right, logically named Evan, takes out his phone, and immediately the others take out theirs. They open up their ethereum wallets. "So," Evan begins, "the current Ethereum beacon chain slot number is 28,205,913, and the block hash starts 0xbe48. Do all agree?". "Yes," Alice, Bob, Charlie and David exclaim in unison. Evan continues: "so let us wait for the next block."

The five intently stare at their phones. First for ten seconds, then twenty, then thirty. "Three skipped proposers," Bob mutters, "how unusual". But then after another ten seconds, a new block appears. "Slot number 28,205,917, block hash starts 0x62f9, so first digit 6. All agreed?"

"Yes."

"Six mod four is two, and as is prescribed in the Old Ways, we start counting indices from zero, so this means Alice will keep the third book, counting as usual from our left."

Bob takes the first, second and fourth notebooks that Alice provided, leaving the third untouched. Alice takes the remaining notebook and puts it back in her backpack. Bob opens each notebook to a page in the middle with the corner folded, and sees a sequence of letters and numbers written with a pencil in the middle of each page - a standard way of writing the key shares for over a decade, since camera and image processing technology got powerful enough to recognize words and numbers written on single slips of paper even inside an envelope. Bob, Charlie, David and Evan crowd around the books together, and each open up an app on their phone and press a few buttons.

Bob starts reading, as all four start typing into their phones at the same time:

"Alice's first key share is, 6-b-d-7-h-k-k-l-o-e-q-q-p-3-y-s-6-x-e-f. Applying the 100,000x iterated SHA256 hash we get e-a-6-6..., confirm?"

"Confirmed," the others replied. "Checking against Alice's precommitted elliptic curve point A0... match."

"Alice's second key share is, f-r-n-m-j-t-x-r-s-3-b-u-n-n-n-i-z-3-d-g. Iterated hash 8-0-3-c..., confirm?"

"Confirmed. Checking against Alice's precommitted elliptic curve point A1... match."

"Alice's fourth key share is, i-o-f-s-a-q-f-n-w-f-6-c-e-a-m-s-6-z-z-n. Iterated hash 6-a-5-6..., confirm?"

"Confirmed. Checking against Alice's precommitted elliptic curve point A3... match."

"Adding the four precommitted curve points, x coordinate begins 3-1-8-3. Alice, confirm that that is the key you wish to register?"

"Confirm."

Bob, Charlie, David and Evan glance down at their smartphone apps one more time, and each tap a few buttons. Alice catches a glance at Charlie's phone; she sees four yellow checkmarks, and an "approval transaction pending" dialog. After a few seconds, the four yellow checkmarks are replaced with a single green checkmark, with a transaction hash ID, too small for Alice to make out the digits from a few meters away, below. Alice's phone soon buzzes, with a notification dialog saying "Registration confirmed".

"Congratulations, Alice," Bob says. "Unconditional possession of your key has been verified. You are now free to send a transaction to the World Collective Market's MPC oracle to update your key."

"Only a 75% probability this would have actually caught me if I didn't actually have all four parts of the key," Alice thought to herself. But it seemed to be enough for an in-person protocol in practice; and if it ever wasn't then they could easily switch to slightly more complex protocols that used low-degree polynomials to achieve exponentially high levels of soundness. Alice taps a few buttons on her smartphone, and a "transaction pending" dialog shows up on the screen. Five seconds later, the dialog disappears and is replaced by a green checkmark. She jumps up with joy and, before Bob, Charlie, David and Evan can say goodbye, runs out of the room, frantically tapping buttons to vote on all the projects and issues in the WCM that she had wanted to support for months.

Special thanks to the Gitcoin team and especially Frank Chen for working with me through these numbers

The next round of Gitcoin Grants quadratic funding has just finished, and we the numbers for how much each project has received were just released. Here are the top ten:

Altogether, $163,279 was donated to 80 projects by 477 contributors, augmented by a matching pool of $100,000. Nearly half came from four contributions above $10,000: $37,500 to Lighthouse, and $12,500 each to Gas Station Network, Black Girls Code and Public Health Incentives Layer. Out of the remainder, about half came from contributions between $1,000 and $10,000, and the rest came from smaller donations of various sizes. But what matters more here are not the raw donations, but rather the subsidies that the quadratic funding mechanism applied. Gitcoin Grants is there to support valuable public goods in the Ethereum ecosystem, but also serve as a testbed for this new quadratic donation matching mechanism, and see how well it lives up to its promise of creating a democratic, market-based and efficient way of funding public goods. This time around, a modified formula based on pairwise-bounded coordination subsidies was used, which has the goal of minimizing distortion from large contributions from coordinated actors. And now we get to see how the experiment went.

Judging the Outcomes

First, the results. Ultimately, every mechanism for allocating resources, whether centralized, market-based, democratic or otherwise, must stand the test of delivering results, or else sooner or later it will be abandoned for another mechanism that is perceived to be better, even if it is less philosophically clean. Judging results is inherently a subjective exercise; any single person's analysis of a mechanism will inevitably be shaped by how well the results fit their own preferences and tastes. However, in those cases where a mechanism does output a surprising result, one can and should use that as an opportunity to learn, and see whether or not one missed some key information that other participants in the mechanism had.

In my own case, I found the top results very agreeable and a quite reasonable catalogue of projects that are good for the Ethereum community. One of the disparities between these grants and the Ethereum Foundation grants is that the Ethereum Foundation grants (see recent rounds here and here) tend to overwhelmingly focus on technology with only a small section on education and community resources, whereas in the Gitcoin grants while technology still dominates, EthHub is #2 and lower down defiprime.com is #14 and cryptoeconomics.study is #17. In this case my personal opinion is that EF has made a genuine error in undervaluing grants to community/education organizations and Gitcoin's "collective instinct" is correct. Score one for new-age fancy quadratic market democracy.

Another surprising result to me was Austin Griffith getting second place. I personally have never spent too much time thinking about Burner Wallet; I knew that it existed but in my mental space I did not take it too seriously, focusing instead on client development, L2 scaling, privacy and to a lesser extent smart contract wallets (the latter being a key use case of Gas Station Network at #8). After seeing Austin's impressive performance in this Gitcoin round, I asked a few people what was going on.

Burner Wallet (website, explainer article) is an "insta-wallet" that's very easy to use: just load it up on your desktop or phone, and there you have it. It was used successfully at EthDenver to sell food from food trucks, and generally many people appreciate its convenience. Its main weaknesses are lower security and that one of its features, support for xDAI, is dependent on a permissioned chain.

Austin's Gitcoin grant is there to fund his ongoing work, and I have heard one criticism: there's many prototypes, but comparatively few "things taken to completion". There is also the critique that as great as Austin is, it's difficult to argue that he's as important to the success of Ethereum as, say, Lighthouse and Prysmatic, though one can reply that what matters is not total value, but rather the marginal value of giving a given project or person an extra $10,000. On the whole, however, I feel like quadratic funding's (Glen would say deliberate!) tendency to select for things like Burner Wallet with populist appeal is a much needed corrective to the influence of the Ethereum tech elite (including myself!) who often value technical impressiveness and undervalue simple and quick things that make it really easy for people to participate in Ethereum. This one is slightly more ambiguous, but I'll say score two for new-age fancy quadratic market democracy.

The main thing that I was disappointed the Gitcoiner-ati did not support more was Gitcoin maintenance itself. The Gitcoin Sustainability Fund only got a total $1,119 in raw contributions from 18 participants, plus a match of $202. The optional 5% tips that users could give to Gitcoin upon donating were not included into the quadratic matching calculations, but raised another ~$1,000. Given the amount of effort the Gitcoin people put in to making quadratic funding possible, this is not nearly enough; Gitcoin clearly deserves more than 0.9% of the total donations in the round. Meanwhile, the Ethereum Foundation (as well as Consensys and individual donors) have been giving grants to Gitcoin that include supporting Gitcoin itself. Hopefully in future rounds people will support Gitcoin itself too, but for now, score one for good old-fashioned EF technocracy.

On the whole, quadratic funding, while still young and immature, seems to be a remarkably effective complement to the funding preferences of existing institutions, and it seems worthwhile to continue it and even increase its scope and size in the future.

Pairwise-bounded quadratic funding vs traditional quadratic funding

Round 3 differs from previous rounds in that it uses a new flavor of quadratic funding, which limits the subsidy per pair of participants. For example, in traditional QF, if two people each donate $10, the subsidy would be $10, and if two people each donate $10,000, the subsidy would be $10,000. This property of traditional QF makes it highly vulnerable to collusion: two key employees of a project (or even two fake accounts owned by the same person) could each donate as much money as they have, and get back a very large subsidy. Pairwise-bounded QF computes the total subsidy to a project by looking through all pairs of contributors, and imposes a maximum bound on the total subsidy that any given pair of participants can trigger (combined across all projects). Pairwise-bounded QF also has the property that it generally penalizes projects that are dominated by large contributors:

The projects that lost the most relative to traditional QF seem to be projects that have a single large contribution (or sometimes two). For example, "fuzz geth and Parity for EVM consensus bugs" got a $415 match compared to the $2000 he would have gotten in traditional QF; the decrease is explained by the fact that the contributions are dominated by two large $4500 contributions. On the other hand, cryptoeconomics.study got $1274, up nearly double from the $750 it would have gotten in traditional QF; this is explained by the large diversity of contributions that the project received and particularly the lack of large sponsors: the largest contribution to cryptoeconomics.study was $100.

Another desirable property of pairwise-bounded QF is that it privileges cross-tribal projects. That is, if there are projects that group A typically supports, and projects that group B typically supports, then projects that manage to get support from both groups get a more favorable subsidy (because the pairs that go between groups are not as saturated). Has this incentive for building bridges appeared in these results?

Unfortunately, my code of honor as a social scientist obliges me to report the negative result: the Ethereum community just does not yet have enough internal tribal structure for effects like this to materialize, and even when there are differences in correlations they don't seem strongly connected to higher subsidies due to pairwise-bounding. Here are the cross-correlations between who contributed to different projects:

Generally, all projects are slightly positively correlated with each other, with a few exceptions with greater correlation and one exception with broad roughly zero correlation: Nori (120 in this chart). However, Nori did not do well in pairwise-bounded QF, because over 94% of its donations came from a single $5000 donation.

Dominance of large projects

One other pattern that we saw in this round is that popular projects got disproportionately large grants:

To be clear, this is not just saying "more contributions, more match", it's saying "more contributions, more match per dollar contributed". Arguably, this is an intended feature of the mechanism. Projects that can get more people to donate to them represent public goods that serve a larger public, and so tragedy of the commons problems are more severe and hence contributions to them should be multiplied more to compensate. However, looking at the list, it's hard to argue that, say, Prysm ($3,848 contributed, $8,566 matched) is a more public good than Nimbus ($1,129 contributed, $496 matched; for the unaware, Prysm and Nimbus are both eth2 clients). The failure does not look too severe; on average, projects near the top do seem to serve a larger public and projects near the bottom do seem niche, but it seems clear that at least part of the disparity is not genuine publicness of the good, but rather inequality of attention. N units of marketing effort can attract attention of N people, and theoretically get N^2 resources.

Of course, this could be solved via a "layer on top" venture-capital style: upstart new projects could get investors to support them, in return for a share of matched contributions received when they get large. Something like this would be needed eventually; predicting future public goods is as important a social function as predicting future private goods. But we could also consider less winner-take-all alternatives; the simplest one would be adjusting the QF formula so it uses an exponent of eg. 1.5 instead of 2. I can see it being worthwhile to try a future round of Gitcoin Grants with such a formula ( $\left(\sum_i x_i^{\frac{2}{3}}\right)^{\frac{3}{2}}$ instead of $\left(\sum_i x_i^{\frac{1}{2}}\right)^2$ ) to see what the results are like.

Individual leverage curves

One key question is, if you donate $1, or $5, or $100, how big an impact can you have on the amount of money that a project gets? Fortunately, we can use the data to calculate these deltas!

The different lines are for different projects; supporting projects with higher existing support will lead to you getting a bigger multiplier. In all cases, the first dollar is very valuable, with a matching ratio in some cases over 100:1. But the second dollar is much less valuable, and matching ratios quickly taper off; even for the largest projects increasing one's donation from $32 to $64 will only get a 1:1 match, and anything above $100 becomes almost a straight donation with nearly no matching. However, given that it's likely possible to get legitimate-looking Github accounts on the grey market for around those costs, having a cap of a few hundred dollars on the amount of matched funds that any particular account can direct seems like a very reasonable mitigation, despite its costs in limiting the bulk of the matching effect to small-sized donations.

Conclusions

On the whole, this was by far the largest and the most data-rich Gitcoin funding round to date. It successfully attracted hundreds of contributors, reaching a size where we can finally see many significant effects in play and drown out the effects of the more naive forms of small-scale collusion. The experiment already seems to be leading to valuable information that can be used by future quadratic funding implementers to improve their quadratic funding implementations. The case of Austin Griffith is also interesting because $23,911 in funds that he received comes, in relative terms, surprisingly close to an average salary for a developer if the grants can be repeated on a regular schedule. What this means is that if Gitcoin Grants does continue operating regularly, and attracts and expands its pool of donations, we could be very close to seeing the first "quadratic freelancer" - someone directly "working for the public", funded by donations boosted by quadratic matching subsidies. And at that point we could start to see more experimentation in new forms of organization that live on top of quadratic funding gadgets as a base layer. All in all, this foretells an exciting and, err, radical public-goods funding future ahead of us.

Special thanks to Justin Drake and Jinglan Wang for feedback

In 2014, I made a post and a presentation with a list of hard problems in math, computer science and economics that I thought were important for the cryptocurrency space (as I then called it) to be able to reach maturity. In the last five years, much has changed. But exactly how much progress on what we thought then was important has been achieved? Where have we succeeded, where have we failed, and where have we changed our minds about what is important? In this post, I'll go through the 16 problems from 2014 one by one, and see just where we are today on each one. At the end, I’ll include my new picks for hard problems of 2019.

The problems are broken down into three categories: (i) cryptographic, and hence expected to be solvable with purely mathematical techniques if they are to be solvable at all, (ii) consensus theory, largely improvements to proof of work and proof of stake, and (iii) economic, and hence having to do with creating structures involving incentives given to different participants, and often involving the application layer more than the protocol layer. We see significant progress in all categories, though some more than others.

Cryptographic problems

Blockchain Scalability
One of the largest problems facing the cryptocurrency space today is the issue of scalability ... The main concern with [oversized blockchains] is trust: if there are only a few entities capable of running full nodes, then those entities can conspire and agree to give themselves a large number of additional bitcoins, and there would be no way for other users to see for themselves that a block is invalid without processing an entire block themselves.
Problem: create a blockchain design that maintains Bitcoin-like security guarantees, but where the maximum size of the most powerful node that needs to exist for the network to keep functioning is substantially sublinear in the number of transactions.

Status: Great theoretical progress, pending more real-world evaluation.

Scalability is one technical problem that we have had a huge amount of progress on theoretically. Five years ago, almost no one was thinking about sharding; now, sharding designs are commonplace. Aside from ethereum 2.0, we have OmniLedger, LazyLedger, Zilliqa and research papers seemingly coming out every month. In my own view, further progress at this point is incremental. Fundamentally, we already have a number of techniques that allow groups of validators to securely come to consensus on much more data than an individual validator can process, as well as techniques allow clients to indirectly verify the full validity and availability of blocks even under 51% attack conditions.

These are probably the most important technologies:

Random sampling, allowing a small randomly selected committee to statistically stand in for the full validator set: https://github.com/ethereum/wiki/wiki/Sharding-FAQ#how-can-we-solve-the-single-shard-takeover-attack-in-an-uncoordinated-majority-model
Fraud proofs, allowing individual nodes that learn of an error to broadcast its presence to everyone else: https://bitcoin.stackexchange.com/questions/49647/what-is-a-fraud-proof
Proofs of custody, allowing validators to probabilistically prove that they individually downloaded and verified some piece of data: https://ethresear.ch/t/1-bit-aggregation-friendly-custody-bonds/2236
Data availability proofs, allowing clients to detect when the bodies of blocks that they have headers for are unavailable: https://arxiv.org/abs/1809.09044. See also the newer coded Merkle trees proposal.

There are also other smaller developments like Cross-shard communication via receipts as well as "constant-factor" enhancements such as BLS signature aggregation.

That said, sharded blockchains have still not been seen in live operation. On the theoretical side, there are mainly disputes about details remaining, along with challenges having to do with stability of sharded networking, developer experience and mitigating risks of centralization; fundamental technical possibility no longer seems in doubt. But the challenges that do remain are challenges that cannot be solved by just thinking about them; only developing the system and seeing ethereum 2.0 or some similar chain running live will suffice.

Timestamping
Problem: create a distributed incentive-compatible system, whether it is an overlay on top of a blockchain or its own blockchain, which maintains the current time to high accuracy. All legitimate users have clocks in a normal distribution around some "real" time with standard deviation 20 seconds ... no two nodes are more than 20 seconds apart The solution is allowed to rely on an existing concept of "N nodes"; this would in practice be enforced with proof-of-stake or non-sybil tokens (see #9). The system should continuously provide a time which is within 120s (or less if possible) of the internal clock of >99% of honestly participating nodes. External systems may end up relying on this system; hence, it should remain secure against attackers controlling < 25% of nodes regardless of incentives.

Status: Some progress.

Ethereum has actually survived just fine with a 13-second block time and no particularly advanced timestamping technology; it uses a simple technique where a client does not accept a block whose stated timestamp is earlier than the client's local time. That said, this has not been tested under serious attacks. The recent network-adjusted timestamps proposal tries to improve on the status quo by allowing the client to determine the consensus on the time in the case where the client does not locally know the current time to high accuracy; this has not yet been tested. But in general, timestamping is not currently at the foreground of perceived research challenges; perhaps this will change once more proof of stake chains (including Ethereum 2.0 but also others) come online as real live systems and we see what the issues are.

Arbitrary Proof of Computation
Problem: create programs POC_PROVE(P,I) -> (O,Q) and POC_VERIFY(P,O,Q) -> { 0, 1 } such that POC_PROVE runs program P on input I and returns the program output O and a proof-of-computation Q and POC_VERIFY takes P, O and Q and outputs whether or not Q and O were legitimately produced by the POC_PROVE algorithm using P.

Status: Fundamentally solved, ongoing progress.

This is basically saying, build a SNARK (or STARK, or SHARK, or...). And we've done it! SNARKs are now increasingly well understood, and are even already being used in multiple blockchains today (including tornado.cash on Ethereum). And SNARKs are extremely useful, both as a privacy technology (see Zcash and tornado.cash) and as a scalability technology (see ZK Rollup, STARKDEX and STARKing erasure coded data roots).

There are still challenges with efficiency; making arithmetization-friendly hash functions (see here and here for bounties for breaking proposed candidates) is a big one, and efficiently proving random memory accesses is another. Furthermore, there's the unsolved question of whether the O(n * log(n)) blowup in prover time is a fundamental limitation or if there is some way to make a succinct proof with only linear overhead as in bulletproofs (which unfortunately take linear time to verify). There are also ever-present risks that the existing schemes have bugs. In general, the problems are in the details rather than the fundamentals.

Code Obfuscation
The holy grail is to create an obfuscator O, such that given any program P the obfuscator can produce a second program O(P) = Q such that P and Q return the same output if given the same input and, importantly, Q reveals no information whatsoever about the internals of P. One can hide inside of Q a password, a secret encryption key, or one can simply use Q to hide the proprietary workings of the algorithm itself.

Status: Slow progress.

In plain English, the problem is saying that we want to come up with a way to "encrypt" a program so that the encrypted program would still give the same outputs for the same inputs, but the "internals" of the program would be hidden. An example use case for obfuscation is a program containing a private key where the program only allows the private key to sign certain messages.

A solution to code obfuscation would be very useful to blockchain protocols. The use cases are subtle, because one must deal with the possibility that an on-chain obfuscated program will be copied and run in an environment different from the chain itself, but there are many possibilities. One that personally interests me is the ability to remove the centralized operator from collusion-resistance gadgets by replacing the operator with an obfuscated program that contains some proof of work, making it very expensive to run more than once with different inputs as part of an attempt to determine individual participants' actions.

Unfortunately this continues to be a hard problem. There is continuing ongoing work in attacking the problem, one side making constructions (eg. this) that try to reduce the number of assumptions on mathematical objects that we do not know practically exist (eg. general cryptographic multilinear maps) and another side trying to make practical implementations of the desired mathematical objects. However, all of these paths are still quite far from creating something viable and known to be secure. See https://eprint.iacr.org/2019/463.pdf for a more general overview to the problem.

Hash-Based Cryptography
Problem: create a signature algorithm relying on no security assumption but the random oracle property of hashes that maintains 160 bits of security against classical computers (ie. 80 vs. quantum due to Grover's algorithm) with optimal size and other properties.

Status: Some progress.

There have been two strands of progress on this since 2014. SPHINCS, a "stateless" (meaning, using it multiple times does not require remembering information like a nonce) signature scheme, was released soon after this "http://vitalik.ca/files/hard problems" list was published, and provides a purely hash-based signature scheme of size around 41 kB. Additionally, STARKs have been developed, and one can create signatures of similar size based on them. The fact that not just signatures, but also general-purpose zero knowledge proofs, are possible with just hashes was definitely something I did not expect five years ago; I am very happy that this is the case. That said, size continues to be an issue, and ongoing progress (eg. see the very recent DEEP FRI) is continuing to reduce the size of proofs, though it looks like further progress will be incremental.

The main not-yet-solved problem with hash-based cryptography is aggregate signatures, similar to what BLS aggregation makes possible. It's known that we can just make a STARK over many Lamport signatures, but this is inefficient; a more efficient scheme would be welcome. (In case you're wondering if hash-based public key encryption is possible, the answer is, no, you can't do anything with more than a quadratic attack cost)

Consensus theory problems

ASIC-Resistant Proof of Work
One approach at solving the problem is creating a proof-of-work algorithm based on a type of computation that is very difficult to specialize ... For a more in-depth discussion on ASIC-resistant hardware, see https://blog.ethereum.org/2014/06/19/mining/.

Status: Solved as far as we can.

About six months after the "http://vitalik.ca/files/hard problems" list was posted, Ethereum settled on its ASIC-resistant proof of work algorithm: Ethash. Ethash is known as a memory-hard algorithm. The theory is that random-access memory in regular computers is well-optimized already and hence difficult to improve on for specialized applications. Ethash aims to achieve ASIC resistance by making memory access the dominant part of running the PoW computation. Ethash was not the first memory-hard algorithm, but it did add one innovation: it uses pseudorandom lookups over a two-level DAG, allowing for two ways of evaluating the function. First, one could compute it quickly if one has the entire (~2 GB) DAG; this is the memory-hard "fast path". Second, one can compute it much more slowly (still fast enough to check a single provided solution quickly) if one only has the top level of the DAG; this is used for block verification.

Ethash has proven remarkably successful at ASIC resistance; after three years and billions of dollars of block rewards, ASICs do exist but are at best 2-5 times more power and cost-efficient than GPUs. ProgPoW has been proposed as an alternative, but there is a growing consensus that ASIC-resistant algorithms will inevitably have a limited lifespan, and that ASIC resistance has downsides because it makes 51% attacks cheaper (eg. see the 51% attack on Ethereum Classic).

I believe that PoW algorithms that provide a medium level of ASIC resistance can be created, but such resistance is limited-term and both ASIC and non-ASIC PoW have disadvantages; in the long term the better choice for blockchain consensus is proof of stake.

Useful Proof of Work
making the proof of work function something which is simultaneously useful; a common candidate is something like Folding@home, an existing program where users can download software onto their computers to simulate protein folding and provide researchers with a large supply of data to help them cure diseases.

Status: Probably not feasible, with one exception.

The challenge with useful proof of work is that a proof of work algorithm requires many properties:

Hard to compute
Easy to verify
Does not depend on large amounts of external data
Can be efficiently computed in small "bite-sized" chunks

Unfortunately, there are not many computations that are useful that preserve all of these properties, and most computations that do have all of those properties and are "useful" are only "useful" for far too short a time to build a cryptocurrency around them.

However, there is one possible exception: zero-knowledge-proof generation. Zero knowledge proofs of aspects of blockchain validity (eg. data availability roots for a simple example) are difficult to compute, and easy to verify. Furthermore, they are durably difficult to compute; if proofs of "highly structured" computation become too easy, one can simply switch to verifying a blockchain's entire state transition, which becomes extremely expensive due to the need to model the virtual machine and random memory accesses.

Zero-knowledge proofs of blockchain validity provide great value to users of the blockchain, as they can substitute the need to verify the chain directly; Coda is doing this already, albeit with a simplified blockchain design that is heavily optimized for provability. Such proofs can significantly assist in improving the blockchain's safety and scalability. That said, the total amount of computation that realistically needs to be done is still much less than the amount that's currently done by proof of work miners, so this would at best be an add-on for proof of stake blockchains, not a full-on consensus algorithm.

Proof of Stake
Another approach to solving the mining centralization problem is to abolish mining entirely, and move to some other mechanism for counting the weight of each node in the consensus. The most popular alternative under discussion to date is "proof of stake" - that is to say, instead of treating the consensus model as "one unit of CPU power, one vote" it becomes "one currency unit, one vote".

Status: Great theoretical progress, pending more real-world evaluation.

Near the end of 2014, it became clear to the proof of stake community that some form of "weak subjectivity" is unavoidable. To maintain economic security, nodes need to obtain a recent checkpoint extra-protocol when they sync for the first time, and again if they go offline for more than a few months. This was a difficult pill to swallow; many PoW advocates still cling to PoW precisely because in a PoW chain the "head" of the chain can be discovered with the only data coming from a trusted source being the blockchain client software itself. PoS advocates, however, were willing to swallow the pill, seeing the added trust requirements as not being large. From there the path to proof of stake through long-duration security deposits became clear.

Most interesting consensus algorithms today are fundamentally similar to PBFT, but replace the fixed set of validators with a dynamic list that anyone can join by sending tokens into a system-level smart contract with time-locked withdrawals (eg. a withdrawal might in some cases take up to 4 months to complete). In many cases (including ethereum 2.0), these algorithms achieve "economic finality" by penalizing validators that are caught performing actions that violate the protocol in certain ways (see here for a philosophical view on what proof of stake accomplishes).

As of today, we have (among many other algorithms):

Casper FFG: https://arxiv.org/abs/1710.09437
Tendermint: https://tendermint.com/docs/spec/consensus/consensus.html
HotStuff: https://arxiv.org/abs/1803.05069
Casper CBC: https://vitalik.ca/general/2018/12/05/cbc_casper.html

There continues to be ongoing refinement (eg. here and here) . Eth2 phase 0, the chain that will implement FFG, is currently under implementation and enormous progress has been made. Additionally, Tendermint has been running, in the form of the Cosmos chain for several months. Remaining arguments about proof of stake, in my view, have to do with optimizing the economic incentives, and further formalizing the strategy for responding to 51% attacks. Additionally, the Casper CBC spec could still use concrete efficiency improvements.

Proof of Storage
A third approach to the problem is to use a scarce computational resource other than computational power or currency. In this regard, the two main alternatives that have been proposed are storage and bandwidth. There is no way in principle to provide an after-the-fact cryptographic proof that bandwidth was given or used, so proof of bandwidth should most accurately be considered a subset of social proof, discussed in later problems, but proof of storage is something that certainly can be done computationally. An advantage of proof-of-storage is that it is completely ASIC-resistant; the kind of storage that we have in hard drives is already close to optimal.

Status: A lot of theoretical progress, though still a lot to go, as well as more real-world evaluation.

There are a number of blockchains planning to use proof of storage protocols, including Chia and Filecoin. That said, these algorithms have not been tested in the wild. My own main concern is centralization: will these algorithms actually be dominated by smaller users using spare storage capacity, or will they be dominated by large mining farms?

Economics

Stable-value cryptoassets
One of the main problems with Bitcoin is the issue of price volatility ... Problem: construct a cryptographic asset with a stable price.

Status: Some progress.

MakerDAO is now live, and has been holding stable for nearly two years. It has survived a 93% drop in the value of its underlying collateral asset (ETH), and there is now more than $100 million in DAI issued. It has become a mainstay of the Ethereum ecosystem, and many Ethereum projects have or are integrating with it. Other synthetic token projects, such as UMA, are rapidly gaining steam as well.

However, while the MakerDAO system has survived tough economic conditions in 2019, the conditions were by no means the toughest that could happen. In the past, Bitcoin has fallen by 75% over the course of two days; the same may happen to ether or any other collateral asset some day. Attacks on the underlying blockchain are an even larger untested risk, especially if compounded by price decreases at the same time. Another major challenge, and arguably the larger one, is that the stability of MakerDAO-like systems is dependent on some underlying oracle scheme. Different attempts at oracle systems do exist (see #16), but the jury is still out on how well they can hold up under large amounts of economic stress. So far, the collateral controlled by MakerDAO has been lower than the value of the MKR token; if this relationship reverses MKR holders may have a collective incentive to try to "loot" the MakerDAO system. There are ways to try to protect against such attacks, but they have not been tested in real life.

Decentralized Public Goods Incentivization
One of the challenges in economic systems in general is the problem of "public goods". For example, suppose that there is a scientific research project which will cost $1 million to complete, and it is known that if it is completed the resulting research will save one million people $5 each. In total, the social benefit is clear ... [but] from the point of view of each individual person contributing does not make sense ... So far, most problems to public goods have involved centralization Additional Assumptions And Requirements: A fully trustworthy oracle exists for determining whether or not a certain public good task has been completed (in reality this is false, but this is the domain of another problem)

Status: Some progress.

The problem of funding public goods is generally understood to be split into two problems: the funding problem (where to get funding for public goods from) and the preference aggregation problem (how to determine what is a genuine public good, rather than some single individual's pet project, in the first place). This problem focuses specifically on the former, assuming the latter is solved (see the "decentralized contribution metrics" section below for work on that problem).

In general, there haven't been large new breakthroughs here. There's two major categories of solutions. First, we can try to elicit individual contributions, giving people social rewards for doing so. My own proposal for charity through marginal price discrimination is one example of this; another is the anti-malaria donation badges on Peepeth. Second, we can collect funds from applications that have network effects. Within blockchain land there are several options for doing this:

Issuing coins
Taking a portion of transaction fees at protocol level (eg. through EIP 1559)
Taking a portion of transaction fees from some layer-2 application (eg. Uniswap, or some scaling solution, or even state rent in an execution environment in ethereum 2.0)
Taking a portion of other kinds of fees (eg. ENS registration)

Outside of blockchain land, this is just the age-old question of how to collect taxes if you're a government, and charge fees if you're a business or other organization.

Reputation systems
Problem: design a formalized reputation system, including a score rep(A,B) -> V where V is the reputation of B from the point of view of A, a mechanism for determining the probability that one party can be trusted by another, and a mechanism for updating the reputation given a record of a particular open or finalized interaction.

Status: Slow progress.

There hasn't really been much work on reputation systems since 2014. Perhaps the best is the use of token curated registries to create curated lists of trustable entities/objects; the Kleros ERC20 TCR (yes, that's a token-curated registry of legitimate ERC20 tokens) is one example, and there is even an alternative interface to Uniswap (http://uniswap.ninja) that uses it as the backend to get the list of tokens and ticker symbols and logos from. Reputation systems of the subjective variety have not really been tried, perhaps because there is just not enough information about the "social graph" of people's connections to each other that has already been published to chain in some form. If such information starts to exist for other reasons, then subjective reputation systems may become more popular.

Proof of excellence
One interesting, and largely unexplored, solution to the problem of [token] distribution specifically (there are reasons why it cannot be so easily used for mining) is using tasks that are socially useful but require original human-driven creative effort and talent. For example, one can come up with a "proof of proof" currency that rewards players for coming up with mathematical proofs of certain theorems

Status: No progress, problem is largely forgotten.

The main alternative approach to token distribution that has instead become popular is airdrops; typically, tokens are distributed at launch either proportionately to existing holdings of some other token, or based on some other metric (eg. as in the Handshake airdrop). Verifying human creativity directly has not really been attempted, and with recent progress on AI the problem of creating a task that only humans can do but computers can verify may well be too difficult.

15 [sic]. Anti-Sybil systems
A problem that is somewhat related to the issue of a reputation system is the challenge of creating a "unique identity system" - a system for generating tokens that prove that an identity is not part of a Sybil attack ... However, we would like to have a system that has nicer and more egalitarian features than "one-dollar-one-vote"; arguably, one-person-one-vote would be ideal.

Status: Some progress.

There have been quite a few attempts at solving the unique-human problem. Attempts that come to mind include (incomplete list!):

HumanityDAO: https://www.humanitydao.org/
Pseudonym parties: https://bford.info/pub/net/sybil.pdf
POAP ("proof of attendance protocol"): https://www.poap.xyz/
BrightID: https://www.brightid.org/

With the growing interest in techniques like quadratic voting and quadratic funding, the need for some kind of human-based anti-sybil system continues to grow. Hopefully, ongoing development of these techniques and new ones can come to meet it.

14 [sic]. Decentralized contribution metrics

Incentivizing the production of public goods is, unfortunately, not the only problem that centralization solves. The other problem is determining, first, which public goods are worth producing in the first place and, second, determining to what extent a particular effort actually accomplished the production of the public good. This challenge deals with the latter issue.

Status: Some progress, some change in focus.

More recent work on determining value of public-good contributions does not try to separate determining tasks and determining quality of completion; the reason is that in practice the two are difficult to separate. Work done by specific teams tends to be non-fungible and subjective enough that the most reasonable approach is to look at relevance of task and quality of performance as a single package, and use the same technique to evaluate both.

Fortunately, there has been great progress on this, particularly with the discovery of quadratic funding. Quadratic funding is a mechanism where individuals can make donations to projects, and then based on the number of people who donated and how much they donated, a formula is used to calculate how much they would have donated if they were perfectly coordinated with each other (ie. took each other's interests into account and did not fall prey to the tragedy of the commons). The difference between amount would-have-donated and amount actually donated for any given project is given to that project as a subsidy from some central pool (see #11 for where the central pool funding could come from). Note that this mechanism focuses on satisfying the values of some community, not on satisfying some given goal regardless of whether or not anyone cares about it. Because of the complexity of values problem, this approach is likely to be much more robust to unknown unknowns.

Quadratic funding has even been tried in real life with considerable success in the recent gitcoin quadratic funding round. There has also been some incremental progress on improving quadratic funding and similar mechanisms; particularly, pairwise-bounded quadratic funding to mitigate collusion. There has also been work on specification and implementation of bribe-resistant voting technology, preventing users from proving to third parties who they voted for; this prevents many kinds of collusion and bribe attacks.

Decentralized success metrics
Problem: come up with and implement a decentralized method for measuring numerical real-world variables ... the system should be able to measure anything that humans can currently reach a rough consensus on (eg. price of an asset, temperature, global CO2 concentration)

Status: Some progress.

This is now generally just called "the oracle problem". The largest known instance of a decentralized oracle running is Augur, which has processed outcomes for millions of dollars of bets. Token curated registries such as the Kleros TCR for tokens are another example. However, these systems still have not seen a real-world test of the forking mechanism (search for "subjectivocracy" here) either due to a highly controversial question or due to an attempted 51% attack. There is also research on the oracle problem happening outside of the blockchain space in the form of the "peer prediction" literature; see here for a very recent advancement in the space.

Another looming challenge is that people want to rely on these systems to guide transfers of quantities of assets larger than the economic value of the system's native token. In these conditions, token holders in theory have the incentive to collude to give wrong answers to steal the funds. In such a case, the system would fork and the original system token would likely become valueless, but the original system token holders would still get away with the returns from whatever asset transfer they misdirected. Stablecoins (see #10) are a particularly egregious case of this. One approach to solving this would be a system that assumes that altruistically honest data providers do exist, and creating a mechanism to identify them, and only allowing them to churn slowly so that if malicious ones start getting voted in the users of systems that rely on the oracle can first complete an orderly exit. In any case, more development of oracle tech is very much an important problem.

New problems

If I were to write the hard problems list again in 2019, some would be a continuation of the above problems, but there would be significant changes in emphasis, as well as significant new problems. Here are a few picks:

Cryptographic obfuscation: same as #4 above
Ongoing work on post-quantum cryptography: both hash-based as well as based on post-quantum-secure "structured" mathematical objects, eg. elliptic curve isogenies, lattices...
Anti-collusion infrastructure: ongoing work and refinement of https://ethresear.ch/t/minimal-anti-collusion-infrastructure/5413, including adding privacy against the operator, adding multi-party computation in a maximally practical way, etc.
Oracles: same as #16 above, but removing the emphasis on "success metrics" and focusing on the general "get real-world data" problem
Unique-human identities (or, more realistically, semi-unique-human identities): same as what was written as #15 above, but with an emphasis on a less "absolute" solution: it should be much harder to get two identities than one, but making it impossible to get multiple identities is both impossible and potentially harmful even if we do succeed
Homomorphic encryption and multi-party computation: ongoing improvements are still required for practicality
Decentralized governance mechanisms: DAOs are cool, but current DAOs are still very primitive; we can do better
Fully formalizing responses to PoS 51% attacks: ongoing work and refinement of https://ethresear.ch/t/responding-to-51-attacks-in-casper-ffg/6363
More sources of public goods funding: the ideal is to charge for congestible resources inside of systems that have network effects (eg. transaction fees), but doing so in decentralized systems requires public legitimacy; hence this is a social problem along with the technical one of finding possible sources
Reputation systems: same as #12 above

In general, base-layer problems are slowly but surely decreasing, but application-layer problems are only just getting started.

Special thanks to Karl Floersch and Jinglan Wang for feedback

If you follow applied mechanism design or decentralized governance at all, you may have recently heard one of a few buzzwords: quadratic voting, quadratic funding and quadratic attention purchase. These ideas have been gaining popularity rapidly over the last few years, and small-scale tests have already been deployed: the Taiwanese presidential hackathon used quadratic voting to vote on winning projects, Gitcoin Grants used quadratic funding to fund public goods in the Ethereum ecosystem, and the Colorado Democratic party also experimented with quadratic voting to determine their party platform.

To the proponents of these voting schemes, this is not just another slight improvement to what exists. Rather, it's an initial foray into a fundamentally new class of social technology which, has the potential to overturn how we make many public decisions, large and small. The ultimate effect of these schemes rolled out in their full form could be as deeply transformative as the industrial-era advent of mostly-free markets and constitutional democracy. But now, you may be thinking: "These are large promises. What do these new governance technologies have that justifies such claims?"

Private goods, private markets...

To understand what is going on, let us first consider an existing social technology: money, and property rights - the invisible social technology that generally hides behind money. Money and private property are extremely powerful social technologies, for all the reasons classical economists have been stating for over a hundred years. If Bob is producing apples, and Alice wants to buy apples, we can economically model the interaction between the two, and the results seem to make sense:

Alice keeps buying apples until the marginal value of the next apple to her is less than the cost of producing it, which is pretty much exactly the optimal thing that could happen. And if the cost of producing the apples is greater than their value to Alice, then Alice just doesn't buy any:

This is all formalized in results such as the "fundamental theorems of welfare economics". Now, those of you who have learned some economics may be screaming, but what about imperfect competition? Asymmetric information? Economic inequality? Public goods? Externalities? Many activities in the real world, including those that are key to the progress of human civilization, benefit (or harm) many people in complicated ways. These activities and the consequences that arise from them often cannot be neatly decomposed into sequences of distinct trades between two parties.

But since when do we expect a single package of technologies to solve every problem anyway? "What about oceans?" isn't an argument against cars, it's an argument against car maximalism, the position that we need cars and nothing else. Much like how private property and markets deal with private goods, can we try to use economic means to deduce what kind of social technologies would work well for encouraging production of the public goods that we need?

... Public goods, public markets

Private goods (eg. apples) and public goods (eg. public parks, air quality, scientific research, this article...) are different in some key ways. When we are talking about private goods, production for multiple people (eg. the same farmer makes apples for both Alice and Bob) can be decomposed into (i) the farmer making some apples for Alice, and (ii) the farmer making some other apples for Bob. If Alice wants apples but Bob does not, then the farmer makes Alice's apples, collects payment from Alice, and leaves Bob alone. Even complex collaborations (the "I, Pencil" essay popular in libertarian circles comes to mind) can be decomposed into a series of such interactions. When we are talking about public goods, however, this kind of decomposition is not possible. When I write this blog article, it can be read by both Alice and Bob (and everyone else). I could put it behind a paywall, but if it's popular enough it will inevitably get mirrored on third-party sites, and paywalls are in any case annoying and not very effective. Furthermore, making an article available to ten people is not ten times cheaper than making the article available to a hundred people; rather, the cost is exactly the same. So I either produce the article for everyone, or I do not produce it for anyone at all.

So here comes the challenge: how do we aggregate together people's preferences? Some private and public goods are worth producing, others are not. In the case of private goods, the question is easy, because we can just decompose it into a series of decisions for each individual. Whatever amount each person is willing to pay for, that much gets produced for them; the economics is not especially complex. In the case of public goods, however, you cannot "decompose", and so we need to add up people's preferences in a different way.

First of all, let's see what happens if we just put up a plain old regular market: I offer to write an article as long as at least $1000 of money gets donated to me (fun fact: I literally did this back in 2011). Every dollar donated increases the probability that the goal will be reached and the article will be published; let us call this "marginal probability" p. At a cost of $k, you can increase the probability that the article will be published by k * p (though eventually the gains will decrease as the probability approaches 100%). Let's say to you personally, the article being published is worth $V. Would you donate? Well, donating a dollar increases the probability it will be published by p, and so gives you an expected $p * V of value. If p * V > 1, you donate, and quite a lot, and if p * V < 1 you don't donate at all.

Phrased less mathematically, either you value the article enough (and/or are rich enough) to pay, and if that's the case it's in your interest to keep paying (and influencing) quite a lot, or you don't value the article enough and you contribute nothing. Hence, the only blog articles that get published would be articles where some single person is willing to basically pay for it themselves (in my experiment in 2011, this prediction was experimentally verified: in most rounds, over half of the total contribution came from a single donor).

Note that this reasoning applies for any kind of mechanism that involves "buying influence" over matters of public concern. This includes paying for public goods, shareholder voting in corporations, public advertising, bribing politicians, and much more. The little guy has too little influence (not quite zero, because in the real world things like altruism exist) and the big guy has too much. If you had an intuition that markets work great for buying apples, but money is corrupting in "the public sphere", this is basically a simplified mathematical model that shows why.

We can also consider a different mechanism: one-person-one-vote. Let's say you can either vote that I deserve a reward for writing this article, or you can vote that I don't, and my reward is proportional to the number of votes in my favor. We can interpret this as follows: your first "contribution" costs only a small amount of effort, so you'll support an article if you care about it enough, but after that point there is no more room to contribute further; your second contribution "costs" infinity.

Now, you might notice that neither of the graphs above look quite right. The first graph over-privileges people who care a lot (or are wealthy), the second graph over-privileges people who care only a little, which is also a problem. The single sheep's desire to live is more important than the two wolves' desire to have a tasty dinner.

But what do we actually want? Ultimately, we want a scheme where how much influence you "buy" is proportional to how much you care. In the mathematical lingo above, we want your k to be proportional to your V. But here's the problem: your V determines how much you're willing to pay for one unit of influence. If Alice were willing to pay $100 for the article if she had to fund it herself, then she would be willing to pay $1 for an increased 1% chance it will get written, and if Bob were only willing to pay $50 for the article then he would only be willing to pay $0.5 for the same "unit of influence".

So how do we match these two up? The answer is clever: your n'th unit of influence costs you $n . That is, for example, you could buy your first vote for $0.01, but then your second would cost $0.02, your third $0.03, and so forth. Suppose you were Alice in the example above; in such a system she would keep buying units of influence until the cost of the next one got to $1, so she would buy 100 units. Bob would similarly buy until the cost got to $0.5, so he would buy 50 units. Alice's 2x higher valuation turned into 2x more units of influence purchased.

Let's draw this as a graph:

Now let's look at all three beside each other:

One dollar one vote	Quadratic voting	One person one vote

Notice that only quadratic voting has this nice property that the amount of influence you purchase is proportional to how much you care; the other two mechanisms either over-privilege concentrated interests or over-privilege diffuse interests.

Now, you might ask, where does the quadratic come from? Well, the marginal cost of the n'th vote is $n (or $0.01 * n), but the total cost of n votes is $\approx \frac{n^2}{2}$. You can view this geometrically as follows:

The total cost is the area of a triangle, and you probably learned in math class that area is base * height / 2. And since here base and height are proportionate, that basically means that total cost is proportional to number of votes squared - hence, "quadratic". But honestly it's easier to think "your n'th unit of influence costs $n".

Finally, you might notice that above I've been vague about what "one unit of influence" actually means. This is deliberate; it can mean different things in different contexts, and the different "flavors" of quadratic payments reflect these different perspectives.

Quadratic Voting

See also the original paper: https://papers.ssrn.com/sol3/papers.cfm?abstract%5fid=2003531

Let us begin by exploring the first "flavor" of quadratic payments: quadratic voting. Imagine that some organization is trying to choose between two choices for some decision that affects all of its members. For example, this could be a company or a nonprofit deciding which part of town to make a new office in, or a government deciding whether or not to implement some policy, or an internet forum deciding whether or not its rules should allow discussion of cryptocurrency prices. Within the context of the organization, the choice made is a public good (or public bad, depending on whom you talk to): everyone "consumes" the results of the same decision, they just have different opinions about how much they like the result.

This seems like a perfect target for quadratic voting. The goal is that option A gets chosen if in total people like A more, and option B gets chosen if in total people like B more. With simple voting ("one person one vote"), the distinction between stronger vs weaker preferences gets ignored, so on issues where one side is of very high value to a few people and the other side is of low value to more people, simple voting is likely to give wrong answers. With a private-goods market mechanism where people can buy as many votes as they want at the same price per vote, the individual with the strongest preference (or the wealthiest) carries everything. Quadratic voting, where you can make n votes in either direction at a cost of n², is right in the middle between these two extremes, and creates the perfect balance.

Note that in the voting case, we're deciding two options, so different people will favor A over B or B over A; hence, unlike the graphs we saw earlier that start from zero, here voting and preference can both be positive or negative (which option is considered positive and which is negative doesn't matter; the math works out the same way)

As shown above, because the n'th vote has a cost of n, the number of votes you make is proportional to how much you value one unit of influence over the decision (the value of the decision multiplied by the probability that one vote will tip the result), and hence proportional to how much you care about A being chosen over B or vice versa. Hence, we once again have this nice clean "preference adding" effect.

We can extend quadratic voting in multiple ways. First, we can allow voting between more than two options. While traditional voting schemes inevitably fall prey to various kinds of "strategic voting" issues because of Arrow's theorem and Duverger's law, quadratic voting continues to be optimal in contexts with more than two choices.

The intuitive argument for those interested: suppose there are established candidates A and B and new candidate C. Some people favor C > A > B but others C > B > A. in a regular vote, if both sides think C stands no chance, they decide may as well vote their preference between A and B, so C gets no votes, and C's failure becomes a self-fulfilling prophecy. In quadratic voting the former group would vote [A +10, B -10, C +1] and the latter [A -10, B +10, C +1], so the A and B votes cancel out and C's popularity shines through.

Second, we can look not just at voting between discrete options, but also at voting on the setting of a thermostat: anyone can push the thermostat up or down by 0.01 degrees n times by paying a cost of n².

Plot twist: the side wanting it colder only wins when they convince the other side that "C" stands for "caliente".

Quadratic funding

See also the original paper: https://papers.ssrn.com/sol3/papers.cfm?abstract%5fid=3243656

Quadratic voting is optimal when you need to make some fixed number of collective decisions. But one weakness of quadratic voting is that it doesn't come with a built-in mechanism for deciding what goes on the ballot in the first place. Proposing votes is potentially a source of considerable power if not handled with care: a malicious actor in control of it can repeatedly propose some decision that a majority weakly approves of and a minority strongly disapproves of, and keep proposing it until the minority runs out of voting tokens (if you do the math you'll see that the minority would burn through tokens much faster than the majority). Let's consider a flavor of quadratic payments that does not run into this issue, and makes the choice of decisions itself endogenous (ie. part of the mechanism itself). In this case, the mechanism is specialized for one particular use case: individual provision of public goods.

Let us consider an example where someone is looking to produce a public good (eg. a developer writing an open source software program), and we want to figure out whether or not this program is worth funding. But instead of just thinking about one single public good, let's create a mechanism where anyone can raise funds for what they claim to be a public good project. Anyone can make a contribution to any project; a mechanism keeps track of these contributions and then at the end of some period of time the mechanism calculates a payment to each project. The way that this payment is calculated is as follows: for any given project, take the square root of each contributor's contribution, add these values together, and take the square of the result. Or in math speak:

\[(\sum_{i=1}^n \sqrt{c_i})^2\]

If that sounds complicated, here it is graphically:

In any case where there is more than one contributor, the computed payment is greater than the raw sum of contributions; the difference comes out of a central subsidy pool (eg. if ten people each donate $1, then the sum-of-square-roots is $10, and the square of that is $100, so the subsidy is $90). Note that if the subsidy pool is not big enough to make the full required payment to every project, we can just divide the subsidies proportionately by whatever constant makes the totals add up to the subsidy pool's budget; you can prove that this solves the tragedy-of-the-commons problem as well as you can with that subsidy budget.

There are two ways to intuitively interpret this formula. First, one can look at it through the "fixing market failure" lens, a surgical fix to the tragedy of the commons problem. In any situation where Alice contributes to a project and Bob also contributes to that same project, Alice is making a contribution to something that is valuable not only to herself, but also to Bob. When deciding how much to contribute, Alice was only taking into account the benefit to herself, not Bob, whom she most likely does not even know. The quadratic funding mechanism adds a subsidy to compensate for this effect, determining how much Alice "would have" contributed if she also took into account the benefit her contribution brings to Bob. Furthermore, we can separately calculate the subsidy for each pair of people (nb. if there are N people there are N * (N-1) / 2 pairs), and add up all of these subsidies together, and give Bob the combined subsidy from all pairs. And it turns out that this gives exactly the quadratic funding formula.

Second, one can look at the formula through a quadratic voting lens. We interpret the quadratic funding as being a special case of quadratic voting, where the contributors to a project are voting for that project and there is one imaginary participant voting against it: the subsidy pool. Every "project" is a motion to take money from the subsidy pool and give it to that project's creator. Everyone sending $c_i$ of funds is making $\sqrt{c_i}$ votes, so there's a total of $\sum_{i=1}^n \sqrt{c_i}$ votes in favor of the motion. To kill the motion, the subsidy pool would need to make more than $\sum_{i=1}^n \sqrt{c_i}$ votes against it, which would cost it more than $(\sum_{i=1}^n \sqrt{c_i})^2$. Hence, $(\sum_{i=1}^n \sqrt{c_i})^2$ is the maximum transfer from the subsidy pool to the project that the subsidy pool would not vote to stop.

Quadratic funding is starting to be explored as a mechanism for funding public goods already; Gitcoin grants for funding public goods in the Ethereum ecosystem is currently the biggest example, and the most recent round led to results that, in my own view, did a quite good job of making a fair allocation to support projects that the community deems valuable.

Numbers in white are raw contribution totals; numbers in green are the extra subsidies.

Quadratic attention payments

One of the defining features of modern capitalism that people love to hate is ads. Our cities have ads:

Source: https://www.flickr.com/photos/argonavigo/36657795264

Our subway turnstiles have ads:

Source: https://commons.wikimedia.org/wiki/File:NYC,_subway_ad_on_Prince_St.jpg

Our politics are dominated by ads:

Source: https://upload.wikimedia.org/wikipedia/commons/e/e3/Billboard_Challenging_the_validity_of_Barack_Obama%27s_Birth_Certificate.JPG

And even the rivers and the skies have ads. Now, there are some places that seem to not have this problem:

But really they just have a different kind of ads:

Now, recently there are attempts to move beyond this in some cities. And on Twitter. But let's look at the problem systematically and try to see what's going wrong. The answer is actually surprisingly simple: public advertising is the evil twin of public goods production. In the case of public goods production, there is one actor that is taking on an expenditure to produce some product, and this product benefits a large number of people. Because these people cannot effectively coordinate to pay for the public goods by themselves, we get much less public goods than we need, and the ones we do get are those favored by wealthy actors or centralized authorities. Here, there is one actor that reaps a large benefit from forcing other people to look at some image, and this action harms a large number of people. Because these people cannot effectively coordinate to buy out the slots for the ads, we get ads we don't want to see, that are favored by... wealthy actors or centralized authorities.

So how do we solve this dark mirror image of public goods production? With a bright mirror image of quadratic funding: quadratic fees! Imagine a billboard where anyone can pay $1 to put up an ad for one minute, but if they want to do this multiple times the prices go up: $2 for the second minute, $3 for the third minute, etc. Note that you can pay to extend the lifetime of someone else's ad on the billboard, and this also costs you only $1 for the first minute, even if other people already paid to extend the ad's lifetime many times. We can once again interpret this as being a special case of quadratic voting: it's basically the same as the "voting on a thermostat" example above, but where the thermostat in question is the number of seconds an ad stays up.

This kind of payment model could be applied in cities, on websites, at conferences, or in many other contexts, if the goal is to optimize for putting up things that people want to see (or things that people want other people to see, but even here it's much more democratic than simply buying space) rather than things that wealthy people and centralized institutions want people to see.

Complexities and caveats

Perhaps the biggest challenge to consider with this concept of quadratic payments is the practical implementation issue of identity and bribery/collusion. Quadratic payments in any form require a model of identity where individuals cannot easily get as many identities as they want: if they could, then they could just keep getting new identities and keep paying $1 to influence some decision as many times as they want, and the mechanism collapses into linear vote-buying. Note that the identity system does not need to be airtight (in the sense of preventing multiple-identity acquisition), and indeed there are good civil-liberties reasons why identity systems probably should not try to be airtight. Rather, it just needs to be robust enough that manipulation is not worth the cost.

Collusion is also tricky. If we can’t prevent people from selling their votes, the mechanisms once again collapse into one-dollar-one-vote. We don't just need votes to be anonymous and private (while still making the final result provable and public); we need votes to be so private that even the person who made the vote can't prove to anyone else what they voted for. This is difficult. Secret ballots do this well in the offline world, but secret ballots are a nineteenth century technology, far too inefficient for the sheer amount of quadratic voting and funding that we want to see in the twenty first century.

Fortunately, there are technological means that can help, combining together zero-knowledge proofs, encryption and other cryptographic technologies to achieve the precise desired set of privacy and verifiability properties. There's also proposed techniques to verify that private keys actually are in an individual's possession and not in some hardware or cryptographic system that can restrict how they use those keys. However, these techniques are all untested and require quite a bit of further work.

Another challenge is that quadratic payments, being a payment-based mechanism, continues to favor people with more money. Note that because the cost of votes is quadratic, this effect is dampened: someone with 100 times more money only has 10 times more influence, not 100 times, so the extent of the problem goes down by 90% (and even more for ultra-wealthy actors). That said, it may be desirable to mitigate this inequality of power further. This could be done either by denominating quadratic payments in a separate token of which everyone gets a fixed number of units, or giving each person an allocation of funds that can only be used for quadratic-payments use cases: this is basically Andrew Yang's "democracy dollars" proposal.

A third challenge is the "rational ignorance" and "rational irrationality" problems, which is that decentralized public decisions have the weakness that any single individual has very little effect on the outcome, and so little motivation to make sure they are supporting the decision that is best for the long term; instead, pressures such as tribal affiliation may dominate. There are many strands of philosophy that emphasize the ability of large crowds to be very wrong despite (or because of!) their size, and quadratic payments in any form do little to address this.

Quadratic payments do better at mitigating this problem than one-person-one-vote systems, and these problems can be expected to be less severe for medium-scale public goods than for large decisions that affect many millions of people, so it may not be a large challenge at first, but it's certainly an issue worth confronting. One approach is combining quadratic voting with elements of sortition. Another, potentially more long-term durable, approach is to combine quadratic voting with another economic technology that is much more specifically targeted toward rewarding the "correct contrarianism" that can dispel mass delusions: prediction markets. A simple example would be a system where quadratic funding is done retrospectively, so people vote on which public goods were valuable some time ago (eg. even 2 years), and projects are funded up-front by selling shares of the results of these deferred votes; by buying shares people would be both funding the projects and betting on which project would be viewed as successful in 2 years' time. There is a large design space to experiment with here.

Conclusion

As I mentioned at the beginning, quadratic payments do not solve every problem. They solve the problem of governing resources that affect large numbers of people, but they do not solve many other kinds of problems. A particularly important one is information asymmetry and low quality of information in general. For this reason, I am a fan of techniques such as prediction markets (see electionbettingodds.com for one example) to solve information-gathering problems, and many applications can be made most effective by combining different mechanisms together.

One particular cause dear to me personally is what I call "entrepreneurial public goods": public goods that in the present only a few people believe are important but in the future many more people will value. In the 19th century, contributing to abolition of slavery may have been one example; in the 21st century I can't give examples that will satisfy every reader because it's the nature of these goods that their importance will only become common knowledge later down the road, but I would point to life extension and AI risk research as two possible examples.

That said, we don't need to solve every problem today. Quadratic payments are an idea that has only become popular in the last few years; we still have not seen more than small-scale trials of quadratic voting and funding, and quadratic attention payments have not been tried at all! There is still a long way to go. But if we can get these mechanisms off the ground, there is a lot that these mechanisms have to offer!

Since it's Christmas time now, and we're theoretically supposed to be enjoying ourselves and spending time with our families instead of waging endless holy wars on Twitter, this blog post will offer some games that you can play with your friends that will help you have fun and at the same time understand some spooky mathematical concepts!

1.58 dimensional chess

This is a variant of chess where the board is set up like this:

The board is still a normal 8x8 board, but there are only 27 open squares. The other 37 squares should be covered up by checkers or Go pieces or anything else to denote that they are inaccessible. The rules are the same as chess, with a few exceptions:

White pawns move up, black pawns move left. White pawns take going left-and-up or right-and-up, black pawns take going left-and-down or left-and-up. White pawns promote upon reaching the top, black pawns promote upon reaching the left.
No en passant, castling, or two-step-forward pawn jumps.
Chess pieces cannot move onto or through the 37 covered squares. Knights cannot move onto the 37 covered squares, but don't care what they move "through".

The game is called 1.58 dimensional chess because the 27 open squares are chosen according to a pattern based on the Sierpinski triangle. You start off with a single open square, and then every time you double the width, you take the shape at the end of the previous step, and copy it to the top left, top right and bottom left corners, but leave the bottom right corner inaccessible. Whereas in a one-dimensional structure, doubling the width increases the space by 2x, and in a two-dimensional structure, doubling the width increases the space by 4x (4 = 2²), and in a three-dimensional structure, doubling the width increases the space by 8x (8 = 2³), here doubling the width increases the space by 3x (3 = 2^1.58496), hence "1.58 dimensional" (see Hausdorff dimension for details).

The game is substantially simpler and more "tractable" than full-on chess, and it's an interesting exercise in showing how in lower-dimensional spaces defense becomes much easier than offense. Note that the relative value of different pieces may change here, and new kinds of endings become possible (eg. you can checkmate with just a bishop).

3 dimensional tic tac toe

The goal here is to get 4 in a straight line, where the line can go in any direction, along an axis or diagonal, including between planes. For example in this configuration X wins:

It's considerably harder than traditional 2D tic tac toe, and hopefully much more fun!

Modular tic-tac-toe

Here, we go back down to having two dimensions, except we allow lines to wrap around:

X wins

Note that we allow diagonal lines with any slope, as long as they pass through all four points. Particularly, this means that lines with slope +/- 2 and +/- 1/2 are admissible:

Mathematically, the board can be interpreted as a 2-dimensional vector space over integers modulo 4, and the goal being to fill in a line that passes through four points over this space. Note that there exists at least one line passing through any two points.

Tic tac toe over the 4-element binary field

Here, we have the same concept as above, except we use an even spookier mathematical structure, the 4-element field of polynomials over $Z_2$ modulo $x^2 + x + 1$. This structure has pretty much no reasonable geometric interpretation, so I'll just give you the addition and multiplication tables:

OK fine, here are all possible lines, excluding the horizontal and the vertical lines (which are also admissible) for brevity:

The lack of geometric interpretation does make the game harder to play; you pretty much have to memorize the twenty winning combinations, though note that they are basically rotations and reflections of the same four basic shapes (axial line, diagonal line, diagonal line starting in the middle, that weird thing that doesn't look like a line).

Now play 1.77 dimensional connect four. I dare you.

Modular poker

Everyone is dealt five (you can use whatever variant poker rules you want here in terms of how these cards are dealt and whether or not players have the right to swap cards out). The cards are interpreted as: jack = 11, queen = 12, king = 0, ace = 1. A hand is stronger than another hand, if it contains a longer sequence, with any constant difference between consecutive cards (allowing wraparound), than the other hand.

Mathametically, this can be represented as, a hand is stronger if the player can come up with a line $L(x) = mx+b$ such that they have cards for the numbers $L(0)$, $L(1)$ ... $L(k)$ for the highest $k$.

Example of a full five-card winning hand. y = 4x + 5.

To break ties between equal maximum-length sequences, count the number of distinct length-three sequences they have; the hand with more distinct length-three sequences wins.

This hand has four length-three sequences: K 2 4, K 4 8, 2 3 4, 3 8 K. This is rare.

Only consider lines of length three or higher. If a hand has three or more of the same denomination, that counts as a sequence, but if a hand has two of the same denomination, any sequences passing through that denomination only count as one sequence.

This hand has no length-three sequences.

If two hands are completely tied, the hand with the higher highest card (using J = 11, Q = 12, K = 0, A = 1 as above) wins.

Enjoy!