Constructing simple SMTs
Understanding the finer details of how the Storage state machine operates requires a good grasp of the way the zkProver’s sparse Merkle trees (SMTs) are constructed. This document explains how these SMTs are built.
Consider key-value pair based binary SMTs. The focus here is on explaining how to construct an SMT that represents a given set of key-value pairs. And, for the sake of simplicity, we assume 8-bit key-lengths.
A NULL or empty SMT has a zero root. That is, it has no key and no value recorded in it. Similarly, a zero node or NULL node refers to a node that carries no value.
A binary SMT with one key-value pair¶
A binary SMT with a single key-value pair \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\), is built as follows.
Suppose that \(K_{\mathbf{a}} = 11010110\). In order to build a binary SMT with this single key-value \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\),
-
One computes the hash \(\mathbf{H}( \text{V}_{\mathbf{a}})\) of the value \(\text{V}_{\mathbf{a}}\),
-
Sets the leaf \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\),
-
Sets the sibling leaf as a NULL leaf, simply represented as “\(\mathbf{0}\)”,
-
Computes the root as \(\mathbf{root}_{a0} = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{0} )\), with the leaf \(\mathbf{L}_{\mathbf{a}}\) on the left because the \(\text{lsb}(K_{\mathbf{a}}) = 0\). That is, between the two edges leading up to the root, the leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left edge, while the NULL leaf “\(\mathbf{0}\)” is on the right.
See the below figure for the SMT representing the single key-value pair \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\), where \(K_{\mathbf{a}} = 11010110\).
Note that the last nodes in binary SMT branches are generally either leaves or zero-nodes.
In the case where the least-significant bit, lsb of \(K_{\mathbf{a}}\) is \(1\), the SMT with a single key-value pair \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) would be a mirror image of what is seen in Figure 3. And its root, \(\mathbf{root}_{0a} = \mathbf{H}( \mathbf{0}\| \mathbf{L}_{\mathbf{a}} ) \neq \mathbf{root}_{a0}\) because \(\mathbf{H}\) is a collision-resistant hash function.
This example also explains why we need a zero node. Since all trees used in our design are binary SMTs, a zero node is used as a default sibling for computing the parent node. This helps to differentiate between roots (also between parent nodes) because a root node actually identifies an SMT. Therefore, it is crucial to distinguish between \(\mathbf{root}_{a0} = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{0} )\) and \(\mathbf{root}_{0a} = \mathbf{H}( \mathbf{0}\| \mathbf{L}_{\mathbf{a}})\) because they represent two distinct trees.
Binary SMTs with two key-value pairs¶
Consider now SMTs with two key-value pairs, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\).
There are three distinct cases of how corresponding SMTs can be built, each determined by the keys, \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\).
Case 1¶
The keys are such that the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 1\).
Suppose that the keys are given as \(K_{\mathbf{a}} = 11010110\) and \(K_{\mathbf{b}} = 11010101\).
To build a binary SMT with this two key-values, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\),
-
One computes the hashes, \(\mathbf{H}(\text{V}_{\mathbf{a}})\) and \(\mathbf{H}( \text{V}_{\mathbf{b}})\) of the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , respectively,
-
Sets the leaves, \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\) and \(\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}})\),
-
Checks if the \(\text{lsb}(K_{\mathbf{a}})\) differs from the \(\text{lsb}(K_{\mathbf{b}})\),
-
Since the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 1\), it means the two leaves can be siblings,
-
One can then compute the root as, \(\mathbf{root}_{\mathbf{ab}} = \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}})\).
Note that the leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left because the \(\text{lsb}(K_{\mathbf{a}}) = 0\), but \(\mathbf{L}_{\mathbf{b}}\) is on the right because the \(\text{lsb}(K_{\mathbf{b}}) = 1\). That is, between the two edges leading up to the \(\mathbf{root}_{\mathbf{ab}}\), the leaf \(\mathbf{L}_{\mathbf{a}}\) must be on the edge from the left, while \(\mathbf{L}_{\mathbf{b}}\) is on the edge from the right.
Note that in this particular example, any second key \(K_{\mathbf{b}}\) with its LSB being \(1\) would result in the corresponding values being siblings. The algorithm unfolds as we build Merkle trees with more key-value pairs.
See the below figure for the SMT representing the two key-value pairs \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\), where \(K_{\mathbf{a}} = 11010110\) and \(K_{\mathbf{b}} = 11010101\).
Case 2¶
Both keys end with the same key-bit. That is, the \(\text{lsb}(K_{\mathbf{a}}) = \text{lsb}(K_{\mathbf{b}})\), but their second least-significant bits differ.
Suppose that the two keys are given as \(K_{\mathbf{a}} = 11010100\) and \(K_{\mathbf{b}} = 11010110\).
To build a binary SMT with this two key-values, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\);
-
One computes the hashes, \(\mathbf{H}(\text{V}_{\mathbf{a}})\) and \(\mathbf{H}( \text{V}_{\mathbf{b}})\) of the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , respectively.
-
Sets the leaves, \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\) and \(\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}})\).
-
Checks if the \(\text{lsb}(K_{\mathbf{a}})\) differs from the \(\text{lsb}(K_{\mathbf{b}})\). Since the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 0\), it means the two leaves cannot be siblings at this position because it would otherwise mean they share the same tree-address 0, which is not allowed.
-
One continues to check if the second least-significant bits of \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\) differ. Since the \(\text{second lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{second lsb}(K_{\mathbf{b}}) = 1\), it means the two leaves \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\) can be siblings at their respective tree-addresses, 00 and 10.
-
Next is to compute the hash \(\mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}})\) and set it as the branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}})\) at the tree-address 0. Note that the leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left because the \(\text{second lsb}(K_{\mathbf{a}}) = 0\), while \(\mathbf{L}_{\mathbf{b}}\) is on the right because the \(\text{second lsb}(K_{\mathbf{b}}) = 1\).
-
The branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}})\) needs a sibling. Since all the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\), are already represented in the tree at \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\), respectively, one therefore sets a NULL leaf “\(\mathbf{0}\)” as the sibling leaf to \(\mathbf{B}_{\mathbf{ab}}\).
-
As a result, it is possible to compute the root as, \(\mathbf{root}_{\mathbf{ab0}} = \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0})\). Note that, the branch \(\mathbf{B}_{\mathbf{ab}}\) is on the left because the \(\text{lsb}(K_{\mathbf{a}}) = 0\), and \(\mathbf{0}\) must therefore be on the right. That is, between the two edges leading up to the \(\mathbf{root}_{\mathbf{ab0}}\), the branch \(\mathbf{B}_{\mathbf{ab}}\) must be on the edge from the left, while \(\mathbf{0}\) is on the edge from the right.
See the below figure depicting the SMT representing the two key-value pairs \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\), where \(K_{\mathbf{a}} = 11010100\) and \(K_{\mathbf{b}} = 11010110\).
Case 3¶
The first two least-significant bits of both keys are the same, but their third least-significant bits differ.
Suppose that the two keys are given as \(K_{\mathbf{a}} = 11011000\) and \(K_{\mathbf{b}} = 10010100\). The process for building a binary SMT with these two key-values, \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\) is the same as in Case 2;
-
One computes the hashes, \(\mathbf{H}(\text{V}_{\mathbf{a}})\) and \(\mathbf{H}( \text{V}_{\mathbf{b}})\) of the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , respectively.
-
Sets the leaves, \(\mathbf{L}_{\mathbf{a}} := \mathbf{H}( \text{V}_{\mathbf{a}})\) and \(\mathbf{L}_{\mathbf{b}} := \mathbf{H}( \text{V}_{\mathbf{b}})\).
-
Checks if the \(\text{lsb}(K_{\mathbf{a}})\) differs from the \(\text{lsb}(K_{\mathbf{b}})\). Since the \(\text{lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{lsb}(K_{\mathbf{b}}) = 0\), it means the two leaves cannot be siblings at this position as it would otherwise mean they share the same tree-address 0, which is not allowed.
-
Next verifier continues to check if the second least-significant bits of \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\) differ. Since the \(\text{second lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{second lsb}(K_{\mathbf{b}}) = 0\), it means the two leaves cannot be siblings at this position, because it would otherwise mean they share the same tree-address 00, which is not allowed.
-
Once again he checks if the third least-significant bits of \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\) differ. Since the \(\text{third lsb}(K_{\mathbf{a}}) = 0\) and the \(\text{third lsb}(K_{\mathbf{b}}) = 1\), it means the two leaves \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\) can be siblings at their respective tree-addresses, 000 and 100.
-
One then computes the hash \(\mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}})\), and sets it as the branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}})\) at the tree-address 00. The leaf \(\mathbf{L}_{\mathbf{a}}\) is on the left because the third \(\text{lsb}(K_{\mathbf{a}}) = 0\), while \(\mathbf{L}_{\mathbf{b}}\) is on the right because the third \(\text{lsb}(K_{\mathbf{b}}) = 1\).
-
The branch \(\mathbf{B}_{\mathbf{ab}} := \mathbf{H}(\mathbf{L}_{\mathbf{a}} \| \mathbf{L}_{\mathbf{b}})\) needs a sibling. Since all the values, \(\text{V}_{\mathbf{a}}\) and \(\text{V}_{\mathbf{b}}\) , are already represented in the tree at \(\mathbf{L}_{\mathbf{a}}\) and \(\mathbf{L}_{\mathbf{b}}\), respectively, one therefore sets a NULL leaf “\(\mathbf{0}\)” as the sibling leaf to \(\mathbf{B}_{\mathbf{ab}}\).
-
One can now compute the hash \(\mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0})\), and set it as the branch \(\mathbf{B}_{\mathbf{ab0}} := \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0})\) at the tree-address 0. The hash is computed with the branch \(\mathbf{B}_{\mathbf{ab}}\) on the left because the second lsb of both keys, \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\), equals \(0\). Therefore the NULL leaf “\(\mathbf{0}\)” must be on the right as an argument to the hash.
-
The branch \(\mathbf{B}_{\mathbf{ab0}} := \mathbf{H}(\mathbf{B}_{\mathbf{ab}} \| \mathbf{0})\) also needs a sibling. For the same reason given above, one sets a NULL leaf “\(\mathbf{0}\)” as the sibling leaf to \(\mathbf{B}_{\mathbf{ab0}}\).
-
Now, one is able to compute the root as \(\mathbf{root}_{\mathbf{ab00}} = \mathbf{H}(\mathbf{B}_{\mathbf{ab0}} \| \mathbf{0})\). Note that the hash is computed with the branch \(\mathbf{B}_{\mathbf{ab0}}\) on the left because the lsb of both keys, \(K_{\mathbf{a}}\) and \(K_{\mathbf{b}}\), equals \(0\). That is, between the two edges leading up to the \(\mathbf{root}_{\mathbf{ab00}}\), the branch \(\mathbf{B}_{\mathbf{ab0}}\) must be on the edge from the left, while “\(\mathbf{0}\)” is on the edge from the right.
See the below figure depicting the SMT representing the two key-value pairs \((K_{\mathbf{a}}, \text{V}_{\mathbf{a}})\) and \((K_{\mathbf{b}}, \text{V}_{\mathbf{b}})\), where \(K_{\mathbf{a}} = 11011000\) and \(K_{\mathbf{b}} = 10010100\).
There are several other SMTs of two key-value pairs \((K_{\mathbf{x}}, \text{V}_{\mathbf{x}})\) and \((K_{\mathbf{z}}, \text{V}_{\mathbf{z}})\) that can be constructed depending on how long the strings of the common least-significant bits between \(K_{\mathbf{x}}\) and \(K_{\mathbf{z}}\) are.
In general, when building an SMT, leaves of key-value pairs with the same least-significant key-bits share the same navigational path only until any of the corresponding key-bits differ. These common strings of key-bits dictate where the leaf storing the corresponding value is located in the tree.