Constructing simple SMTs
Understanding the finer details of how the Storage state machine operates requires a good grasp of the way the zkProver’s sparse Merkle trees (SMTs) are constructed. This document explains how these SMTs are built.
Consider keyvalue pair based binary SMTs. The focus here is on explaining how to construct an SMT that represents a given set of keyvalue pairs. And, for the sake of simplicity, we assume 8bit keylengths.
A NULL or empty SMT has a zero root. That is, it has no key and no value recorded in it. Similarly, a zero node or NULL node refers to a node that carries no value.
A binary SMT with one keyvalue pair¶
A binary SMT with a single keyvalue pair \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\), is built as follows.
Suppose that \(K_{\mathbf{a}} = 11010110\). In order to build a binary SMT with this single keyvalue \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\),

One computes the hash \(\mathbf{H}( \text{V}_{\mathbf{a}})\) of the value \(\text{V}_{\mathbf{a}}\),

Sets the leaf \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\),

Sets the sibling leaf as a NULL leaf, simply represented as “\(\mathbf{0}\)”,

Computes the root as \(\mathbf{root}_{a0} = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{0} )\), with the leaf \(\mathbf{L}_{\mathbf{a}}\) on the left because the \(\text{lsb}(K_{\mathbf{a}}) = 0\). That is, between the two edges leading up to the root, the leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left edge, while the NULL leaf “\(\mathbf{0}\)” is on the right.
See the below figure for the SMT representing the single keyvalue pair \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\), where \(K_{\mathbf{a}} = 11010110\).
Note that the last nodes in binary SMT branches are generally either leaves or zeronodes.
In the case where the leastsignificant bit, lsb of \(K_{\mathbf{a}}\) is \(1\), the SMT with a single keyvalue pair \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) would be a mirror image of what is seen in Figure 3. And its root, \(\mathbf{root}_{0a} = \mathbf{H}( \mathbf{0}\ \mathbf{L}_{\mathbf{a}} ) \neq \mathbf{root}_{a0}\) because \(\mathbf{H}\) is a collisionresistant hash function.
This example also explains why we need a zero node. Since all trees used in our design are binary SMTs, a zero node is used as a default sibling for computing the parent node. This helps to differentiate between roots (also between parent nodes) because a root node actually identifies an SMT. Therefore, it is crucial to distinguish between \(\mathbf{root}_{a0} = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{0} )\) and \(\mathbf{root}_{0a} = \mathbf{H}( \mathbf{0}\ \mathbf{L}_{\mathbf{a}})\) because they represent two distinct trees.
Binary SMTs with two keyvalue pairs¶
Consider now SMTs with two keyvalue pairs, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\).
There are three distinct cases of how corresponding SMTs can be built, each determined by the keys, \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\).
Case 1¶
The keys are such that the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 1\).
Suppose that the keys are given as \(K_{\mathbf{a}} = 11010110\) and \(K_{\mathbf{b}} = 11010101\).
To build a binary SMT with this two keyvalues, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\),

One computes the hashes, \(\mathbf{H}(\text{V}_{\mathbf{a}})\) and \(\mathbf{H}( \text{V}_{\mathbf{b}})\) of the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , respectively,

Sets the leaves, \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\) and \(\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}})\),

Checks if the \(\text{lsb}(K_{\mathbf{a}})\) differs from the \(\text{lsb}(K_{\mathbf{b}})\),

Since the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 1\), it means the two leaves can be siblings,

One can then compute the root as, \(\mathbf{root}_{\mathbf{ab}} = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{L}_{\mathbf{b}})\).
Note that the leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left because the \(\text{lsb}(K_{\mathbf{a}}) = 0\), but \(\mathbf{L}_{\mathbf{b}}\) is on the right because the \(\text{lsb}(K_{\mathbf{b}}) = 1\). That is, between the two edges leading up to the \(\mathbf{root}_{\mathbf{ab}}\), the leaf \(\mathbf{L}_{\mathbf{a}}\) must be on the edge from the left, while \(\mathbf{L}_{\mathbf{b}}\) is on the edge from the right.
Note that in this particular example, any second key \(K_{\mathbf{b}}\) with its LSB being \(1\) would result in the corresponding values being siblings. The algorithm unfolds as we build Merkle trees with more keyvalue pairs.
See the below figure for the SMT representing the two keyvalue pairs \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\), where \(K_{\mathbf{a}} = 11010110\) and \(K_{\mathbf{b}} = 11010101\).
Case 2¶
Both keys end with the same keybit. That is, the \(\text{lsb}(K_{\mathbf{a}}) = \text{lsb}(K_{\mathbf{b}})\), but their second leastsignificant bits differ.
Suppose that the two keys are given as \(K_{\mathbf{a}} = 11010100\) and \(K_{\mathbf{b}} = 11010110\).
To build a binary SMT with this two keyvalues, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\);

One computes the hashes, \(\mathbf{H}(\text{V}_{\mathbf{a}})\) and \(\mathbf{H}( \text{V}_{\mathbf{b}})\) of the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , respectively.

Sets the leaves, \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\) and \(\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}})\).

Checks if the \(\text{lsb}(K_{\mathbf{a}})\) differs from the \(\text{lsb}(K_{\mathbf{b}})\). Since the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 0\), it means the two leaves cannot be siblings at this position because it would otherwise mean they share the same treeaddress 0, which is not allowed.

One continues to check if the second leastsignificant bits of \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\) differ. Since the \(\text{second lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{second lsb}(K_{\mathbf{b}}) = 1\), it means the two leaves \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\) can be siblings at their respective treeaddresses, 00 and 10.

Next is to compute the hash \(\mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{L}_{\mathbf{b}})\) and set it as the branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{L}_{\mathbf{b}})\) at the treeaddress 0. Note that the leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left because the \(\text{second lsb}(K_{\mathbf{a}}) = 0\), while \(\mathbf{L}_{\mathbf{b}}\) is on the right because the \(\text{second lsb}(K_{\mathbf{b}}) = 1\).

The branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{L}_{\mathbf{b}})\) needs a sibling. Since all the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\), are already represented in the tree at \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\), respectively, one therefore sets a NULL leaf “\(\mathbf{0}\)” as the sibling leaf to \(\mathbf{B}_{\mathbf{ab}}\).

As a result, it is possible to compute the root as, \(\mathbf{root}_{\mathbf{ab0}} = \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \ \mathbf{0})\). Note that, the branch \(\mathbf{B}_{\mathbf{ab}}\) is on the left because the \(\text{lsb}(K_{\mathbf{a}}) = 0\), and \(\mathbf{0}\) must therefore be on the right. That is, between the two edges leading up to the \(\mathbf{root}_{\mathbf{ab0}}\), the branch \(\mathbf{B}_{\mathbf{ab}}\) must be on the edge from the left, while \(\mathbf{0}\) is on the edge from the right.
See the below figure depicting the SMT representing the two keyvalue pairs \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\), where \(K_{\mathbf{a}} = 11010100\) and \(K_{\mathbf{b}} = 11010110\).
Case 3¶
The first two leastsignificant bits of both keys are the same, but their third leastsignificant bits differ.
Suppose that the two keys are given as \(K_{\mathbf{a}} = 11011000\) and \(K_{\mathbf{b}} = 10010100\). The process for building a binary SMT with these two keyvalues, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\) is the same as in Case 2;

One computes the hashes, \(\mathbf{H}(\text{V}_{\mathbf{a}})\) and \(\mathbf{H}( \text{V}_{\mathbf{b}})\) of the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , respectively.

Sets the leaves, \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\) and \(\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}})\).

Checks if the \(\text{lsb}(K_{\mathbf{a}})\) differs from the \(\text{lsb}(K_{\mathbf{b}})\). Since the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 0\), it means the two leaves cannot be siblings at this position as it would otherwise mean they share the same treeaddress 0, which is not allowed.

Next verifier continues to check if the second leastsignificant bits of \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\) differ. Since the \(\text{second lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{second lsb}(K_{\mathbf{b}}) = 0\), it means the two leaves cannot be siblings at this position, because it would otherwise mean they share the same treeaddress 00, which is not allowed.

Once again he checks if the third leastsignificant bits of \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\) differ. Since the \(\text{third lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{third lsb}(K_{\mathbf{b}}) = 1\), it means the two leaves \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\) can be siblings at their respective treeaddresses, 000 and 100.

One then computes the hash \(\mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{L}_{\mathbf{b}})\), and sets it as the branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{L}_{\mathbf{b}})\) at the treeaddress 00. The leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left because the third \(\text{lsb}(K_{\mathbf{a}}) = 0\), while \(\mathbf{L}_{\mathbf{b}}\) is on the right because the third \(\text{lsb}(K_{\mathbf{b}}) = 1\).

The branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \ \mathbf{L}_{\mathbf{b}})\) needs a sibling. Since all the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , are already represented in the tree at \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\), respectively, one therefore sets a NULL leaf “\(\mathbf{0}\)” as the sibling leaf to \(\mathbf{B}_{\mathbf{ab}}\).

One can now compute the hash \(\mathbf{H}(\mathbf{B}_{\mathbf{ab}} \ \mathbf{0})\), and set it as the branch \(\mathbf{B}_{\mathbf{ab0}} := \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \ \mathbf{0})\) at the treeaddress 0. The hash is computed with the branch \(\mathbf{B}_{\mathbf{ab}}\) on the left because the second lsb of both keys, \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\), equals \(0\). Therefore the NULL leaf “\(\mathbf{0}\)” must be on the right as an argument to the hash.

The branch \(\mathbf{B}_{\mathbf{ab0}} := \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \ \mathbf{0})\) also needs a sibling. For the same reason given above, one sets a NULL leaf “\(\mathbf{0}\)” as the sibling leaf to \(\mathbf{B}_{\mathbf{ab0}}\).

Now, one is able to compute the root as \(\mathbf{root}_{\mathbf{ab00}} = \mathbf{H}(\mathbf{B}_{\mathbf{ab0}} \ \mathbf{0})\). Note that the hash is computed with the branch \(\mathbf{B}_{\mathbf{ab0}}\) on the left because the lsb of both keys, \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\), equals \(0\). That is, between the two edges leading up to the \(\mathbf{root}_{\mathbf{ab00}}\), the branch \(\mathbf{B}_{\mathbf{ab0}}\) must be on the edge from the left, while “\(\mathbf{0}\)” is on the edge from the right.
See the below figure depicting the SMT representing the two keyvalue pairs \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\), where \(K_{\mathbf{a}} = 11011000\) and \(K_{\mathbf{b}} = 10010100\).
There are several other SMTs of two keyvalue pairs \((K_{\mathbf{x}}, \text{V}_{\mathbf{x}})\) and \((K_{\mathbf{z}}, \text{V}_{\mathbf{z}})\) that can be constructed depending on how long the strings of the common leastsignificant bits between \(K_{\mathbf{x}}\) and \(K_{\mathbf{z}}\) are.
In general, when building an SMT, leaves of keyvalue pairs with the same leastsignificant keybits share the same navigational path only until any of the corresponding keybits differ. These common strings of keybits dictate where the leaf storing the corresponding value is located in the tree.