What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

时间:2023-01-25 09:56:44

Merklized Abstract Syntax Trees (MAST) are a proposed addition to Bitcoin that allows for smaller transaction sizes, more privacy, and larger smart contracts. In this post, we’ll explore the basics of MAST, describe its potential benefits, and summarize some of the current proposals to add it to the Bitcoin Protocol.

The problem: unused script data

Satoshi Nakamoto gave Bitcoin an interesting feature that wasn’t described in the original whitepaper. Instead of requiring bitcoins be received to a public key and spent by a digital signature, Nakamoto gave users the ability to write programs (called scripts) that would act as dynamic public keys and signatures.

When you specify a script —which is the default in every wallet — the consensus-enforced Bitcoin Protocol won’t let anyone spend your bitcoins unless the script returns True. This allows you to specify restrictions, called encumbrances, such as requiring the spending transaction be signed by your private key.

More complex encumbrances are possible too, such as the following example we’ll use throughout this article: Alice wants to be able to spend her bitcoins at any time, but if her bitcoins aren’t spent within three months (maybe because she’s dead or incapacitated), she wants her siblings Bob and Charlie to have her bitcoins as long as they can agree on where to spend them.

The encumbrance script below which specifies the policy described above includes not just Alice’s public key (needed to verify a signature from her private key) but also some conditional logic, a time out, and the public keys for both Bob and Charlie.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

In the current Bitcoin Protocol, all of the data above must be added to the block chain when Alice’s bitcoins are spent. That includes the parts of the script that don’t get used in a particular spend, such as Bob’s and Charlie’s public keys on the occasions when Alice spends her own bitcoins.

Unused encumbrance data increases the size of transactions, reduces privacy by publicly disclosing more information than necessary, and primarily limits smart contracts by their size rather than by their validation costs. MAST seeks to improve this situation by removing the need to directly include unused parts of a script in the block chain.

The origins of MAST

The idea behind MAST¹ comes from two pre-existing concepts, abstract syntax trees and merkle trees. Abstract Syntax Trees (ASTs) are a way of describing a program by splitting it into its individual parts, which can make it easier to analyze and optimize. To generate an AST, you connect each function to its dependencies until all of the dependencies have been mapped out. Here’s an AST for the example encumbrance described above:

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

On the other hand, merkle trees allow you to verify that an individual element is a member of a set without the whole set being present. For example, Bitcoin SPV wallets use merkle trees to save bandwidth by verifying that individual received transactions are a member of a block without downloading the full block.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?
Example merkle tree

To generate a merkle tree, each member is individually hashed, producing a short unique identifier for that member. Each of those identifiers is then paired with another identifier and hashed again, producing another short unique identifier for that pair. This step is repeated until only one identifier remains, called the merkle root, which uniquely identifies the whole set in just a few bytes of data.

To verify that a particular member is part of the set, someone with the whole set provides you with just the identifiers you need in order to connect that particular member to the merkle root of the whole set. This proof that the member belongs to the set is called a merkle proof.

In short, the technique behind ASTs allows us to split a program into its individual parts, and merkle trees allow us to verify the individual parts belong to a complete program without the entire program being present. This is the basis of MAST, which allows spenders to replace the unused parts of encumbrances with a merkle proof — reducing transaction size, increasing privacy, and making larger smart contracts possible.

An example of MAST

Let’s take our example encumbrance from above and split it into separate sub-scripts for each of the two possible outcomes we allow:

  1. Alice can spend her bitcoins at any time (below left)
  2. Or, after three months pass without Alice’s bitcoins being spent, Bob and Charlie can agree where to spend Alice’s bitcoins (below right)

Let’s create a merkle tree based on these two independent sub-scripts:

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

The merkle root for this tree uniquely identifies Alice’s complete encumbrance in just 32 bytes of data. Alice then uses a substitute encumbrance that says that a spender must provide a merkle proof connecting the merkle root to one of the sub-scripts and that the sub-script must return True.

The merkle proof with sub-script could be visualized like either of the examples below depending on which sub-script we wanted to use:

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

Benefit #1 — smaller transactions

Let’s start our examination of the benefits of MAST by looking at how it allows users of complex encumbrances to create smaller transactions.

In the example section above, we used an encumbrance that had two sub-scripts: either Alice spent her funds, or Bob and Charlie waited three months and spent the funds themselves. Let’s imagine an infinitely-extendable version of this where a third sub-script says that after three months and one day, Dan and Edith can spend the funds; or a fourth sub-script says that after three months and two days, Fred and George can spend the funds; etc…

This gives us the ability to create the following simple plot that shows the number of sub-scripts and how much encumbrance data needs to be added to the block chain with and without MAST in order to make that possible.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

Here’s the same plot in log scale:

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

Although MAST starts off being slightly more expensive than the same example script without MAST for just two sub-scripts, non-MAST increases in cost linearly while MAST increases only logarithmically.

If saving bytes is the primary goal, this can be optimized further. For many encumbrances, spenders are much more likely to use one condition than the others. For example, Alice is hoping to live for a long time, so she constructs her merkle tree so that her spending condition is always near the top and all other conditions are on the bottom.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

This gives us two different sizes for the MAST merkle proof, one for the best case where Alice is alive and spending her bitcoins, and one for the other cases where Alice is dead and her beneficiaries are spending her bitcoins; let’s overlay those sizes on the previous plot.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

We can see that Alice is now always using the same number of bytes in the best case no matter how many potential beneficiaries she adds to her encumbrance, and that the other potential spenders only use a few more bytes than the previous normal case.

Whichever arrangement Alice chooses, we see that MAST can make encumbrances with multiple sub-scripts much smaller, reducing transaction size so that users can pay less in fees and blocks can hold a greater number of advanced transactions.

Benefit #2 — more privacy

Because we’ve been discussing Alice’s example in depth throughout this article, you know all the details of the encumbrance, but imagine if all you saw was the data actually added to the block chain when Alice spent her bitcoins (below, left example):

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

With just this information, you wouldn’t know whether or not anyone besides Alice could access the funds or what conditions might restrict their spending. You might guess from the use of MAST that there were other conditions, but that would be only a guess — Alice might be just pretending to have other spendable parts of her merkle tree.

Alternatively, if all you saw was the other branch (above, right example), you wouldn’t know that the funds were spendable before the timeout or that a single person (Alice) could spend them. Again, you might guess that there were other conditions, but you couldn’t be sure just by looking at the block chain.

The ability to keep private any unused encumbrance conditions can be quite important to some users, such as businesses who want to keep their smart contract arrangements as confidential as possible from potential competitors. This stands in contrast to some altcoins that claim to be designed specifically for smart contracts but which provide no privacy for any part of those contracts.

Privacy can also provide another benefit that applies to all Bitcoin users, even those who don’t care about encumbrance privacy itself. Imagine that Alice is the only person who ever uses the non-MAST encumbrance template from the first section of this article. Because the full encumbrance is public, anyone can track all of Alice’s spending just by looking for cases where that template is used, destroying Alice’s privacy.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

Anything that makes it easy to identify particular users also makes it easy to treat their bitcoins differently than other people’s bitcoins, called a lack of fungibility. If someone knows what Alice’s encumbrances look like, they can bribe or force miners not to mine those transactions in order to prevent Alice from spending her bitcoins.

MAST alone can’t entirely fix this because Alice (or Bob and Charlie) still need to reveal part of the encumbrance when Alice’s bitcoins are spent, but it is possible that many different complex encumbrances can be resolved down to a smaller number of simple MAST-style encumbrances.

For example, Alice’s default spend looks like the default spend for any transaction where just a single signature is required, so Alice’s MAST-based transactions blend in with other MAST-based single-signature transactions. This returns Alice’s privacy and increases both her fungibility and the fungibility of everyone else who uses MAST-based encumbrances that can be satisfied by a single signature.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

This particular benefit of MAST is likely to combine well with other proposed features that improve privacy and fungibility for Bitcoin users by allowing certain complex encumbrances to by satisfied by a single digital signature, for example Pieter Wuille’s and Gregory Maxwell’s generalized threshold trees, Andrew Poelstra’s scriptless scripts, and Thaddeus Dryja’s discrete log contracts.

But even if none of those things ever becomes possible on Bitcoin, MAST by itself provides users with more privacy and more fungibility for sophisticated encumbrances than they can get today on any altcoin that supports smart contracts through user-specified encumbrances.

Benefit #3 — larger smart contracts

Bitcoin has three different byte size hard limits that apply to individual scripts depending on how the encumbrance is constructed: a 10,000-byte limit added in July 2010 for bare scripts, a 520-byte limit for P2SH, and a 10,000-byte limit for segwit. Let’s overlay these thresholds on the size chart we used previously.

What is a Bitcoin Merklized Abstract Syntax Tree (MAST)?

We can see that even for our very basic infinitely-extendable example, MAST makes it possible to have many more conditional branches than would be allowed using any other current mechanism. Indeed, MAST scales so well that if you had at your disposal all of the energy believed to exist in the entire observable universe, you could only create a balanced merkle tree whose merkle proof would be about 8,448 bytes in size. Yet even a merkle proof of that size could be validated in less than a millisecond by a full node running on any modern laptop.

There are other limits that also apply to Bitcoin scripts which MAST allows bypassing by virtue of not requiring full nodes process unused sub-scripts. In this aspect MAST laudably preserves and extends a long-held design goal of Bitcoin smart contracts, which is that as much as possible of the contract burden should be placed on the contract participants, and that network nodes whose use of bandwidth, memory, and processing power goes uncompensated should be burdened as little as possible.

So the real achievement of MAST is not that it allows Bitcoin users to create more advanced smart contracts than before, but that it does so without placing any new worst-case burdens on Bitcoin nodes.

Making MAST possible: multiple proposed methods

So far, two methods have been proposed on the bitcoin-dev mailing list for enabling the use of MAST in the Bitcoin Protocol, both of which are still draft proposals subject to change.

The first proposal is BIP114 by Johnson Lau (jl2012), which uses a segwit-based extension feature that allows native segwit addresses (bech32) to commit to the merkle root of a MAST encumbrance. Spenders can then select a single sub-script from the tree.

The second proposal is two yet-unnumbered BIPs (12) by Mark Friedenbach (maaku), which increase the flexibility of the Script language in a way that allows programmers to write scripts that can themselves validate MAST-based encumbrances. If implemented in Friedenbach’s preferred way, this would make it possible to use merkle proofs in all three types of scripts currently supported by Bitcoin (bare, P2SH, and segwit).

Both approaches present tradeoffs when compared to each other but either approach will provide the benefits described previously (plus or minus a few bytes). Either approach can be activated as a soft fork.

Conclusion: so when do we get MAST?

After describing the benefits of MAST and briefly mentioning two proposals that would make it available on Bitcoin, you’re probably wondering when you’ll be able to use it. Sadly, I don’t know.

The path from idea, to proposal, to complete implementation, to proposed soft fork, to activated soft fork is not straightforward. I think the two years of drama surrounding segwit made that clear.

But it does seem to me that the basic idea behind MAST is something that has strong support among the Bitcoin technical community, and that the developers most interested in MAST will continue working on it unless it is proven to be completely untenable. Should those developers succeed in producing peer-reviewed soft-fork-ready code, it will be up to the readers of this article and other Bitcoin users to decide whether or not MAST becomes part of the Bitcoin Protocol.

Further reading

Acknowledgements

I thank Mark Friedenbach, Jimmy Song, and John Newbery for their reviews of drafts of this article. What errors may remain are entirely my fault.

Footnotes

  1. Russell O’Connor is universally credited with first describing MAST in a discussion that some sources say may have also involved Pieter Wuille. Sources: Peter Todd, Gregory Maxwell (attributed via Jeremy Rubin et al.), and Mark Friedenbach (private correspondence).

https://bitcointechtalk.com/what-is-a-bitcoin-merklized-abstract-syntax-tree-mast-33fdf2da5e2f