The amount of gas consumed by a Solidity contract is of paramount importance as it determines the price users will pay for interacting with it. It's quite a broad topic, which can even be considered a particular case of an even broader topic in Computer Science: optimization.
There are many techniques to minimize gas usage, like trying to pack less than one word-sized variables together in structures (to save
SSTORE's) or caching storage variables accessed more than once in a function (to save SLOAD's). If you have previous experience or intuition on the topic of optimization, it's going to be very useful, but try to take into account gas prices and make sure they are up to date. As you can see in that table, optimizing usage of
SSTORE's is one of the first things you should probably try. Also, bear in mind that gas prices don't accurately reflect the real CPU (or GPU or ASIC or whatever) usage, so don't let your previous experience betray you.
Anyway, trying to write a comprehensive guide about how to optimize would exceed the scope of a blog post. In this article, we are going to discuss a particular optimization we did while working on sortition for jury drafting in the Aragon Court. It may be the case that you never have to apply it in your own code, but it’s a good trick to carry around anyway.
The problem of choosing non-ordered elements of a list proportionally to their weight is well formulated in this article about creating a Work token, and some solutions proposed. Still, in essence, on each (non-final) round of a dispute, we need to randomly choose a set of jurors from the total number of jurors available, proportionally to the number of tokens they have. Let’s visualize jurors as adjacent segments in a line. The length of each segment represents the number of tokens in a juror’s balance. Then, we randomly choose a number, and the interval which that number belongs to corresponds to the juror being selected.
As you can read in the article posted above, it is not an easy problem to solve if you want to do it efficiently. The most naive approach would be to do a linear search from the beginning (zero), but that would have a complexity of
O(n). The next natural idea would be to order jurors by their stake, so we could then use a binary search and achieve logarithmic time, but then inserting and updating would increase complexity a lot. Although we can foresee that searches will happen way more often than insertions or updates, we must take into account that writing (
SSTORE) is way more expensive than reading (
SLOAD) in the EVM, so, again, this doesn't seem like a good path to follow either.
Luckily, we found a better way to solve this in this article by Kleros. The solution is presented as “a sum tree.” In short, we just build a tree where each leaf represents a juror's stake, and the rest of the nodes contain the sum of all the nodes below.
You can see an example of such a tree below, with 4 jurors having
35 tokens, with the sums in the intermediate nodes. The root node has a total amount,
With this data structure, the complexity for sortition becomes
O(K * log N / log K), where
K is the number of children of each node in the tree, while
N would be the number of leaves (jurors in this case). For writing operations, the complexity becomes
O(log N / log K). Therefore, increasing
K makes sortition worse while also making addition and removal jurors better.
As pointed out above, reading is cheaper (
SLOAD) than writing (
SSTORE), but we expect it to happen much more often. We have done some rough calculations, and
K = 16 looks like a good trade-off, and it's the number we are currently using in the Court, but once we have real data, we will be able to produce better estimates of how often each operation takes place in a regular scenario, and come up with more fine-tuned numbers. There's another factor that tips the scales towards optimizing sortition, which is explained in the next section: checkpointing.
Of course, when working with algorithms, we firmly believe the motto "you can do better", so we can already think of some paths to explore, like using Fenwick trees, which look promising, especially for low values of
Now let's add the time dimension for some more fun.
As you can imagine, the tree is quite dynamic, it changes over time: jurors come and go, activate and deactivate, earn rewards, and get slashed…
In all rounds, but the last one, we can simply draft on the current term (which is the minimum time unit in the Court). There is still the problem of balance updates, which we defer for next term to ensure consistency during the whole term (that's the point of having terms as the minimum time unit), but we could achieve that using queues, as we did initially. It has some upsides and downsides, you can read our initial discussions, and gas estimations in this GitHub issue of our repo.
But when you enter the final round, when there is no draft, and all jurors are allowed to vote, you definitely need to keep track of historical values, what we call “checkpointing,” like MiniMe tokens do, mostly to prevent double voting. For regular rounds, moving tokens around wouldn't give jurors any advantage, as it'd be a lottery; the chances you win in the recipient account are symmetrically lost from the sender account. In final rounds, nonetheless, all jurors are allowed to vote, so we need to set a fixed term, and use that timestamp for all juror balances. This prevents jurors from performing sybil attacks, in which they could double vote by moving tokens from one account to another.
The animation in the section above, although not accurate because we don't make a complete copy of the tree each time there's an update, gives us the intuition that keeping track of balances along time has a significant impact on gas costs. Jorge already pointed it out when we first considered the idea of checkpointing:
I am particularly worried about the added SLOAD cost while traversing the tree performing a sortition, as for every node in the tree we'd need to get its checkpointed value (even though it is a binary search, it would increase the number SLOADs by at least a factor of 2 and grows logarithmically with the number of balance updates that a juror has had).
Indeed it turned out to be a huge problem, a deal-breaker. As you can see in this pull request, we initially were spending 1M gas for a 10 jurors draft with big trees. That would mean using almost a whole block for a last regular round draft, which needs
81 jurors. We didn't want to be the next crypto-kitties clogging the network. Besides, it could mean that the business model for the court may not be viable.
After several failed attempts (that you can check in that pull request mentioned above), we finally came up with the idea of multi-sortition. Essentially it consists in descending the tree only once for every batch, instead of once for each selected juror, and therefore visiting all intermediate levels only once at most, thus saving a lot of
SLOADS. You can grasp the idea with these two animations:
You can check the details of the current implementation in our repo.
Another question that arose during the early stages of development was if draftings should be performed with or without replacement. In other words, once a juror has been selected in a dispute round, should we subtract the minimum active balance from their stake before trying to select the next juror or not? If you take into consideration that for each time a juror owns the minimum active balance number of tokens, that juror has a ticket for the draft, a question arises: can a juror reuse those tickets during the same draft?
Luke initially suggested that they shouldn't be reused, i.e., that we should draft without replacement. This suggestions seems more reasonable or fair, but it adds complexity, as we have to keep records of those temporary balance modifications. As the tree data is in storage, that would mean a lot of extra
SSTOREs. That's why we decided to explore the implications of this difference in the way of drafting.
I’ll use a simple example that we used in our discussions (you can find them in our chat): imagine you have 20 blue cards and 80 red ones, and from them, we extract 80 cards in total (each color could represent a juror and the number of cards for each color would be their balance in the tree).
- With replacement, the cards would be put back after each extraction. For instance, this means that blue cards could theoretically be selected up to 80 times (quite unlikely though). The experiment follows a binomial distribution, and therefore the expected value is
E(X) = n*p. So for Blue cards, it would be
80*(20/100) = 16, while for Red cards, it would be
80*(80/100) = 64.
- Without replacement, it follows a hypergeometric distribution, and therefore the expectations are:
Which again gives
It can be proven that for the general case,
E(X) = n*(K/N), where
K would be the number of cards of the same color and
N the total number of cards, therefore
K/N would equal
p in the other case.
In the following graph, you can see the probability distribution of the Blue cards with replacement (green) and without (red) and the difference in the behavior between the two functions:
As you can see, probabilities in the without replacement case are distributed more closely to the center, while in the with replacement one, they are more spread apart. What's better? To be honest, I don't have an answer to that. I find it quite subjective. But we agreed that the expected value was the most important and objective parameter to take into account, as it means the average number of times a juror will be selected under those conditions. As it turned out to be the same for both distributions, while the gas cost was much higher for the without replacement case, the answer was clear: the EVM decided the trade-off.
As a final note on this, the case without replacement had yet another conceptual difficulty: what should be the size of the ticket? We mentioned before the minimum active balance because it seems to be the more natural option, and it was initially proposed so by Luke, but if you think about it, this is too arbitrary: why not use 1 token unit (
10^18 with the decimals)? Or 1 minimum unit (
10^-18 tokens)? The election of this unit would affect the underlying behavior. For the with replacement case, it doesn't matter, as all tickets are always in the pool, so the probability will always be the same. As we finally went with this last option, we didn’t need to explore those questions further.
Despite those optimizations, there are several loops involved in the batching, so we need to make sure we don't run out of gas. With the current configuration, we can draft the whole last regular round (final round doesn't have batching) in one batch, but all those parameters could change, in particular the initial number of jurors, the appeal step factor, or the number of rounds. Even gas prices can change. So our Court allows for batching, i.e., drafting jurors in multiple calls to the draft function.
The way we do this is by splitting the whole tree in whole contiguous regions and running each sortition batch restricted to one of these regions, like in the figure below:
The main reason for this is that multi-sortition works better when there's not too much "distance" between selected jurors, as more sum nodes reading is being reused: the closer the jurors are, the more they will share parent nodes, so the less intermediate checks we will need while descending the tree.
As in the previous section, we had to analyze the impact of batching on the fairness of the selection. We work under the assumption that the sortition is done "with replacement" (see the previous section). The next sections will become a little bit math-heavy, but if you don’t feel like following those formulas, try to focus on the description of the scenario, the numeric examples, and the graphs. It should be enough to gain an intuition of the differences.
Let's say that the total stake in the tree is
S, we are extracting
n jurors, and we want to calculate the probability function for a given juror that has staked
J tokens to be chosen. Therefore the probability for each extraction is
p = J/S
This scenario follows a binomial distribution. Therefore, the probability mass function of that juror being elected
k times is:
its expected value
E[X]=np, and its variance
Var(X) = np(1-p).
Putting some numbers to it, let's say that
J=100 (and therefore
p=0.1). The expectation will be
E = 8, the variance
V = 7.2 and it will look like this:
All in one
Now let's divide the stake into
r regions, each region with
S / r stake, and let's get
n' = n / r from each one (let's assume they are both divisible). The probability that a juror gets selected in one extraction, restricted to that region, assuming all its stake fits into it, will be
p'=J / (S / r) = Jr / S = rp.
The probability function will be:
its expected value
E[X'] = n'p' = (n / r) * rp = np = E[X], and its variance
Var(X') = n'p'(1-p') = (n / r) * rp(1-rp) = np(1-rp) <= np(1-p) = Var(X)
So both cases have the same expected value, but the function looks a little bit narrower here (reflected on the variance). With the same numbers as before and with 4 regions (
r=4), the binomial function now has
p=0.4, the same expected value
E = 8 and variance
V = 4.8:
Even split stake
Now let's look at the case where the juror has the stake split into two or more regions. To simplify, let's assume that there are only two, and the stake is half and half in each region.
We can label
Y_2 the random variable corresponding to each of the regions, with probability
p_1 = p_2 = J / (2 * S / r) = rJ / 2S = rp / 2 = p' / 2 and
n_1 = n_2 = n / r.
The probability function will be:
So it corresponds to a Binomial distribution of
n'' = 2n / r = 2n' and
p'' = rp / 2 = p' / 2 (see sum of binomials).
Its expected value is:
E[X''] =(2n / r) (rp / 2) = np = E[X'] = E[X''] and its variance:
Var(X'') = (2n / r) (rp / 2) (1 - rp / 2) = np(1 - rp / 2)
Var(X') <= Var(X'') <= Var(X)
Actually, if we took this example a little bit further, and imagined that the juror has the stake evenly distributed along all the regions (so we would be doing the sum of
r binomial distributions, with
n_i= n / r and
p_i= rp / r = p), we would see that the resulting distribution would be exactly the same as the original one, which matches well with the intuition.
Using the same numbers as before, with
J = 50 + 50:
Odd split stake
If the stake is not evenly split (
J_1 + J_2 = J, J_1 != J_2), the probability for each region will be different and then the sum of the binomials is not a binomial distribution. Even the task of finding a good approximation is not easy. See here for more details.
So now we have two independent binomial distributions,
n_1 = n / r and
n_2 = n / rand
p_2, so that
p_1 + p_2 = J_1 / (n / r) + J_2 / (n / r) = J / (n / r) = rJ / n = p' = rp
The expectation will be again:
E[X'''] = E[Y_1] + E[Y_2] = n_1 * p_1 + n_2 * p_2 = (n / r)(p_1 + p_2) = (n / r)rp = np = E[X]
We know that the variance will be less or equal than the even case (see here):
Var(X''') <= Var(X'')
Using the numbers above, let's suppose that the stake is split like
J_1 = 75, J_2 = 25:
In both cases, the charts look quite similar to the even partition versions.
Conclusion on partitioning
Partitioning the sortitions in separate regions has little effect on the probability distribution. The expected value is always the same and the variance decreases a bit (narrowing effect). This narrowing effect would only be significant in case that the juror stake is big in relation to the region size. Brought to the extreme, if the juror stake were to occupy a whole region, the distribution would concentrate on a single point,
n / r, meaning that the juror would be selected for all the values belonging to that region. But of course, that would be at the expense of the other regions, were that juror would reduce a lot the probability of being chosen.
So, once again, we are slightly modifying the expected natural behavior of the algorithm to save some gas, after verifying that the impact of such modification is indeed minor.
Final appeal round slashing
On the final appeal round, there is no drafting because all jurors can vote there. The problem here is that the total amount of jurors can be really big, so trying to loop over all of them is a no-go operation in Ethereum. That means that we can not go one by one after the round is finished to slash them if needed. For rewards, this is not a problem as we can just allow them to claim them, and they have the incentive to do so.
So, the trade-off we chose this time is that we lock jurors' tokens when they vote, so we only slash jurors who don't vote for the final winning ruling. But jurors who don't vote won't get slashed, as opposed to regular rounds, where all jurors who don't vote for the winning ruling get slashed (for regular rounds we lock their tokens right away when they get drafted).
This is, of course, far from ideal. It's inconsistent with previous rounds, and we are missing an opportunity to incentivize jurors to participate in the final round, where, of course, the goal is to have as many jurors voting as possible. They still have other reasons to do so, like they want to keep the Court reputation up, or maybe they already have tokens at stake from previous rounds of that dispute, but being slashed would be a stronger reason.
The main purpose of this article is to show how restrictive developing on the EVM can be compared to more traditional platforms. We have seen a lot of improvements and trade-offs we had to make in our code to make it suitable for the EVM. The problems presented here are quite particular to the Court, so it's unlikely that you will be able to reuse these concepts out of the box in your project, but we think it's useful for two reasons:
- It will help people interested in participating in the Court, either as a juror or as a consumer, to understand why we made some decisions about the sortition mechanism.
- It may serve as an inspiration to overcome similar problems.
For more, read Rage Against the (EV)Machine Part 1