The optimist pattern allows for offloading heavy computation from the EVM (everything gets executed in every node in the network) to just one incentivized relayer that commits to the result of a deterministic or pure function. However, in order to assume the result is legit, the input data to the function (and not just its hash) must be made publicly available so that any watcher can challenge the result on-chain in a non-interactive way.
This works well for functions whose input payload size is small (p.e. calculating the Nth smallest prime number, where N is the sole input), but are computation heavy (p.e. calculating large primes is really computationally expensive). However, if the optimistic function requires a large input blob, it quickly becomes the bottleneck for scaling. In our experiments implementing optimistic vote tallying, input data size became the limiting factor allowing the system to only be able to relay ~1000 votes per block (just a 10.72x improvement over the current Aragon Voting implementation).
In this post we present a design for ensuring data availability of input data without requiring it to be submitted on-chain while maintaining as many properties of optimist as possible.
The cost of guaranteeing data availability on-chain
Sending an Ethereum transaction imposes a cost to every existing node in the network, and every future node that does a full sync. Sending an Ethereum transaction has a baseline cost of 21,000 gas, however one must pay for every byte of transaction data sent in the transaction (on top of paying for the computation that data could trigger).
Every non-zero byte of transaction data costs 68 gas and bytes which are zeros cost 4 gas.
In case of optimistic input payloads, the full input data must be submitted on-chain, bearing a cost of 68 gas per byte in the worst case. The data must also be hashed and logged, taking the total cost per byte of input data to:
- Transaction data cost: 68 gas per byte
- Hashing the input data: 3/16 gas per byte (plus a constant 30 gas)
- Logging the input data: 8 gas per byte (plus a constant 750 gas for a log with two topics)
Therefore, the marginal cost per byte of input payload is 76.1875 gas. Assuming a constant cost of 100,000 gas for submitting optimistic computation, the maximum amount of data that can be passed as input, assuming submission can only take up to 80% of the block gas limit (6.4M gas) to leave some breathing room for challenge processing (since in order to challenge the input data must be submitted again, as only its hash is put in storage), is 82.69 KB per block.
The data availability problem for a smart contract's worldview
Vanilla optimists, like optimist DNSSEC, allow anyone to submit a bonded computation result and thanks to the input data being available to any potential challenger, the computation can be challenged by anyone who sees an opportunity to profit from it.
If we just removed the requirement for the submitter to send the full input data to the chain and just commit to its hash, the submitter wouldn't have an incentive to publish the input data anywhere, and therefore no one could verify the computation nor submit challenges.
A smart contract has no way to know whether the submitter actually published the data without relying on a trusted or subjective oracle (like the Aragon Court). Also allowing potential challengers to request the submitter to post the data to the chain, opens up an unsolvable griefing attack to submitters, since the smart contract wouldn't be able to prove that the submitter did indeed publish the data.
To achieve similar properties without posting the input data in full to the blockchain, the following protocol is proposed that has a major trade-off: optimistic computation can only be submitted and challenged by a limited and somewhat static set of bonded participants.
Ensuring availability with a set of incentivized relayers and challengers
A minimum of three equally staked validators are appointed by the system as the validator set for the next X blocks. These validators could be selected:
- In a deposit auction: validator candidates post an offer to the system with the amount of tokens they wish to lock up as their deposit in order to work. The Nth highest offer is taken as the deposit price that the N validator candidates that posted the highest offers must deposit. Once all deposits are secured, the system is enabled.
- In a completely dictatorial manner by the system creators, although they should only be trusted if the entities have somehow competing incentives (p.e. Apple, Google, and Microsoft), but have an incentive to cooperate to make the system work (p.e. the Internet). The deposit size would be centrally decided, and when all the parties deposit the chosen amount, the system opens up.
Validators should establish a public or private P2P gossip network to message the group and each other individually.
When a user wishes to perform some optimistic computation, they must establish a connection with one of the validators and send them the input data. At this point, the user and validator could also negotiate any fees for the execution of the computation, but such fees are outside of the scope of this proposal.
The validator verifies the computation after securing the fee payment (could be shared with the validator set) and sends it for verification to other validators. Other validators should independently verify the computation as well. When the validator receives signed commitments from 2/3 validators of the hash of the input data and a hash of the result, they send the user a signed receipt that acts as a commitment that their computation will be relayed in a maximum of Y blocks.
Every Z blocks a submission window opens during which only one of the validators is appointed as the relayer (which is selected in a random, non-predictable way) and has the sole right and responsibility to submit computations. In order to submit a computation, they must send valid signatures from 2/3 validators for the contract to accept the submission (the validator set size becomes the scalability bottleneck).
Other validators should have kept the input data in case they had become the relayer themselves and be rewarded for relaying it. The input data they have is required to submit challenges for a relayed computation that is invalid.
If one of the relayers acts honestly and decides to publish the data to provide transparency, they could become the default gateway for users to send their relay requests to (and this validator could get a special fee as well, and not only be rewarded for relaying the message).
Optimist meta transactions: an off-chain execution engine
Optimist is a very powerful scalability construction for executing computation-heavy transactions. Most contracts deployed to the EVM aren't computation-heavy but storage heavy, which optimist doesn't directly help. Although one could imagine just storing a merkle root of the entire state of the contract in storage similarly to roll-up, and executing state transitions in an optimistic zkSNARK proof.
With this relayer model, users themselves wouldn't have to pay for their computation directly nor own any ETH. Transactions that aren't storage heavy could be so cheap that projects could start running validators themselves and effectively sponsor usage fees.
Comparing with vanilla optimistic
- Cheaper gas-wise, because it uses fewer chain resources.
- Users themselves don't have to put a deposit in order to perform an optimistic computation.
- Major scalability for use cases in which input data size is the scaling bottleneck.
- If 2/3 validators collude (and don't communicate with honest validators), they could submit malicious results, and there would be no obvious way to detect such behavior.
- 1/3 validators could censor a user from submitting optimistic computations (there should be the possibility for users to do non-optimistic computation or vanilla optimistic to avoid censorship)
Future work & ongoing research
- Liveness guarantees in case validators stop signing to censor a relayed computation. If validators can't reach quorum, they could be required to put more funds at risk and be required to randomly publish the input data on-chain.
- Aggregating relayed computation in a tree so only one hash needs to be put in storage and many computations could be relayed and just add one storage slot.
- By using BLS signatures, the validator set size could be massively increased as performing signature checks for any number of validators would have a constant cost of around 160,000 gas.
Thanks to Luis Cuende and the Aragon One research team for the ideas, discussions, and help shaping this post. Also thanks to Dean Eigenmann, Georgios Konstantopoulos and Raul Jordan for providing feedback on early drafts.