ToolszkEVMConceptsSparse merkle trees

Sparse Merkle Tree

zkProver's data is stored in the form of a special _sparse Merkle tree_ (SMT), which is a tree that combines the concept of a Merkle tree and that of a Patricia

zkProver's data is stored in the form of a special sparse Merkle tree (SMT), which is a tree that combines the concept of a Merkle tree and that of a Patricia trie. The design is based on how the sparse Merkle trees are constructed and how they store keys and values.

The content of this document is rather elementary. Experienced developers can fast-forward to later sections and only refer back as the need arises.

A typical Merkle tree has leaves, branches and a root. A leaf is a node with no child-nodes, while a branch is a node with child-nodes. The root is therefore the node with no parent-node.

See the below figure for an example of how a hash function H\mathbf{H} is used to create a Merkle tree recording eight values;

Va,Vb,Vc,Vd,Ve,Vf,Vg,Vh\text{V}_{\mathbf{a}}, \text{V}_{\mathbf{b}}, \text{V}_{\mathbf{c}}, \text{V}_{\mathbf{d}}, \text{V}_{\mathbf{e}}, \text{V}_{\mathbf{f}}, \text{V}_{\mathbf{g}}, \text{V}_{\mathbf{h}}

A Merkle tree example

Firstly, each leaf is nothing but the hash value H(Vi)\mathbf{H}(\text{V}_{\mathbf{i}}) of a particular value Vi\text{V}_{\mathbf{i}}, where i\mathbf{i} is an element of the index-set {a,b,c,d,e,f,g,h}\{ \mathbf{a}, \mathbf{b}, \mathbf{c}, \mathbf{d}, \mathbf{e}, \mathbf{f}, \mathbf{g}, \mathbf{h} \}.

Secondly, the branch nodes are computed as follows;

Bab=H(H(Va)H(Vb)), Bcd=H(H(Vc)H(Vd)),Bef=H(H(Ve)H(Vf)), Bgh=H(H(Vg)H(Vh)),Babcd=H(BabBcd), and Befgh=H(BefBgh).\begin{aligned} &\mathbf{B}_{\mathbf{ab}} = \mathbf{H} \big(\mathbf{H}(\text{V}_{\mathbf{a}})\| \mathbf{H}(\text{V}_{\mathbf{b}})\big),\ \mathbf{B}_{\mathbf{cd}} = \mathbf{H} \big(\mathbf{H}(\text{V}_{\mathbf{c}})\| \mathbf{H}(\text{V}_{\mathbf{d}})\big), \\ &\mathbf{B}_{\mathbf{ef}} = \mathbf{H} \big(\mathbf{H}(\text{V}_{\mathbf{e}})\| \mathbf{H}(\text{V}_{\mathbf{f}})\big),\ \mathbf{B}_{\mathbf{gh}} = \mathbf{H} \big(\mathbf{H}(\text{V}_{\mathbf{g}})\| \mathbf{H}(\text{V}_{\mathbf{h}})\big), \\ &\mathbf{B}_{\mathbf{abcd}} = \mathbf{H} \big(\mathbf{B}_{\mathbf{ab}}\| \mathbf{B}_{\mathbf{cd}}\big),\ \text{and}\ \mathbf{B}_{\mathbf{efgh}} = \mathbf{H} \big( \mathbf{B}_{\mathbf{ef}}\| \mathbf{B}_{\mathbf{gh}}\big). \end{aligned}

Thirdly, the root is computed as roota..h=H(BabcdBefgh)\mathbf{root}_{\mathbf{a..h}} = \mathbf{H} \big(\mathbf{B}_{\mathbf{abcd}}\| \mathbf{B}_{\mathbf{efgh}} \big).

Leaves that share a parent-node are called siblings. The same terminology applies to branches. For example, Bab\mathbf{B}_{\mathbf{ab}} and Bcd\mathbf{B}_{\mathbf{cd}} are sibling branches because they are branches of the same parent, Babcd\mathbf{B}_{\mathbf{abcd}}. Similarly, Befgh\mathbf{B}_{\mathbf{efgh}} and Babcd\mathbf{B}_{\mathbf{abcd}} are sibling branches.

Keys and navigating a Merkle tree

Note the bits used to label edges in figure above. The strings of these bits are called keys, and they indicated where a node is located in a Merkle tree. Since keys uniquely locate nodes, they are used to navigate from the root to the leaves, and backwards.

Suppose one is given the key-value pair (Kd,Vd)( K_{\mathbf{d}} , V_{\mathbf{d}}), where the key is Kd=10010110K_{\mathbf{d}} = 10010110.

In order to locate the key-value pair (Kd,Vd)( K_{\mathbf{d}} , V_{\mathbf{d}}) in the Merkle depicted in the figure above, the key KdK_{\mathbf{d}} is read bit-by-bit from the right-most bit to the left-most bit. While traversing the tree from the root downwards,

  • a zero-key-bit "00" means "follow the edge going to the left",

  • a key-bit "11" means "follow the edge going to the right".

Since Kd=10010110K_{\mathbf{d} } = 10010110, as follows:

  1. Read the least-significant bit of KdK_{\mathbf{d}}, which is 00, hence traverse the tree to the left, and reach Babcd\mathbf{B_`{abcd}`}.
  2. Then read the second significant key-bit, which is "11" in this case. So take the edge going to the right, reaching Bcd\mathbf{B_`{cd}`}.
  3. Again, read the next key-bit, which is "11", hence follow the edge going to the right, reaching the leaf H(Vd)\mathbf{H}( V_{\mathbf{d}} ).

Since H(Vd)\mathbf{H}( V_{\mathbf{d}}) is a leaf and not a branch, and the navigation was correctly done with respect to the given key KdK_{\mathbf{d}}, the H(Vd)\mathbf{H}( V_{\mathbf{d}}) must be the leaf storing the value VdV_{\mathbf{d}}.

One can similarly climb the tree, going in the reverse direction, by using the key-bits of the given key in the reverse order. That is, starting with the last key-bit used to reach the leaf and ending with the least-significant bit of the key.

The tree-address of the value VxV_{\mathbf{x}}, herein refers to the position of the leaf Lx:=H(Vx)L_{\mathbf{x}} := \mathbf{H}( V_{\mathbf{x}}), denoted by the key-bits used to reach LdL_{\mathbf{d}} but in the reverse order.

In the above example, the tree-address of VdV_{\mathbf{d}} is 011.

Merkle proof example

Merkle trees can be used as commitment schemes. And here is an example that follows the (key,value)-pair approach used in the zkProver. Consider the Merkle tree shown in the figure above.

If the prover has committed to a value Vf\text{V}_{\mathbf{f}} by appending a new leaf H(Vf)\mathbf{H}(\text{V}_{\mathbf{f}}) to the Merkle tree as depicted in the above figure, he must then avail the following information, to enable verification of his claim;

  1. The Merkle root, denoted by roota..h\mathbf{root}_{\mathbf{a..h}},
  2. The value Vf\text{V}_{\mathbf{f}}, and
  3. The siblings; H(Ve)\mathbf{H}(\text{V}_{\mathbf{e}}), Bgh\mathbf{B}_{\mathbf{gh}} and Babcd\mathbf{B}_{\mathbf{abcd}}.

Instead of searching through all hash values stored in the tree, the verifier uses only a few hash values of relevant siblings. That is, three siblings in this case.

The verifier then checks the prover's claim by computing the Merkle root as follows;

  1. He computes H(Vf)\mathbf{H}(\text{V}_{\mathbf{f}}), which is the hash of the value Vf\text{V}_{\mathbf{f}}.

  2. Then uses the sibling H(Ve)\mathbf{H}(\text{V}_{\mathbf{e}}) to compute H(H(Ve)H(Vf))=:B~ef\mathbf{H} \big( \mathbf{H}(\text{V}_{\mathbf{e}})\|\mathbf{H}(\text{V}_{\mathbf{f}}) \big) =: \tilde{ \mathbf{B}}_{\mathbf{ef}}, which should be the same as the branch node Bef\mathbf{B}_{\mathbf{ef}}.

  3. Next, he computes H(B~efBgh)=:B~efgh\mathbf{H} \big( \tilde{ \mathbf{B}}_{\mathbf{ef}}\|\mathbf{B}_{\mathbf{gh}} \big) =: \tilde{ \mathbf{B}}_{\mathbf{efgh}}, corresponding to the branch node Befgh\mathbf{B}_{\mathbf{efgh}}.

  4. Now, uses H(BabcdB~efgh)=:root~a..h\mathbf{H} \big( \mathbf{B}_{\mathbf{abcd}}\| \tilde{ \mathbf{B}}_{\mathbf{efgh}} \big) =: \tilde{ \mathbf{root}}_{\mathbf{a..h}}.

The Merkle proof is concluded by checking whether root~ah\tilde{ \mathbf{root}}_{\mathbf{a\dots h}} equals to the publicly known root roota..h\mathbf{root}_{\mathbf{a..h}}.

The symbol tilde denoted ~\tilde{\Box}, is used throughout the document to indicate that the computed value, ~\tilde{\Box}, still needs to be checked, or tested to be true.

Edit on GitHub

Last updated on