Modeling the DAO attack in PlusCal

Maurice Herlihy's paper: "Blockchains from a distributed computing perspective" explains the DAO attack as follows:

"Figure 1 shows a fragment of a DAO-like contract, illustrating a function that allows an investor to withdraw funds. First, the function extracts the client's address (Line 2), then checks whether the client has enough funds to cover the withdrawal (Line 3). If so, the funds are sent to the client through an external function call (Line 4), and if the transfer is successful, the client’s balance is decremented (Line 5). 
This code is fatally  flawed. In June 2016, someone exploited this function to steal about $50 million funds from the DAO. As noted, the expression in Line 3 is a call to a function in the client's contract. Figure 2 shows the client's code. The client's contract immediately calls withdraw() again (Line 4). This re-entrant call again tests whether the client has enough funds to cover the withdrawal (Line 3), and because withdraw() decrements the balance only after the nested call is complete, the test erroneously passes, and the funds are transferred a second time, then a third, and so on, stopping only when the call stack overflows."
(Of course, that is a very simplified description of the DAO attack. More accurate descriptions are provided here and here.)

Even though the code seems sequential (after all the blockchain serializes everything), it has concurrency problems built in. This was a point made in Herlihy's paper as follows:
"In Ethereum, all contracts are recorded on the blockchain, and the ledger includes those contracts' current states. When a miner constructs a block, it fills that block with smart contracts and exe- cutes them one-by-one, where each contract's  final state is the next contract's initial state. These contract executions occur in order, so it would appear that there is no need to worry about concurrency." 
After showing DAO vulnerability and ERC20 token standard vulnerability, the paper says:
"We have seen that the notion that smart contracts do not need a concurrency model because execution is single-threaded is a dangerous illusion. Sergey and Hobor give an excellent survey of pitfalls and common bugs in smart contracts that are disguised versions of familiar concurrency pitfalls and bugs." 

Enter TLA+/PlusCal

Since TLA+/PlusCal is a great tool for catching concurrency problems, I thought it would be useful to model this DAO attack in PlusCal. After I got the idea, it took me a short time to model and model-check this in PlusCal. I used procedures in PlusCal (which I don't use often) to match the description of the problem.


TLA+ is all about invariant-based reasoning so I wrote the invariant first. Writing "SafeWithdrawal == (bankBalance=BALANCE /\ malloryBalance=0) \/ (bankBalance=BALANCE-AMOUNT /\ malloryBalance=AMOUNT)was too tight, because the updates of the balances are not happening atomically. That is how the invariant-based thinking helps us immediately: we can see that the withdrawal is a non-atomic operation, and realize that we should be more careful with the updates.

In the model checking pane, I set BALANCE as 10 and AMOUNT as 10. That is, initially Mallory has 10 coins in her bankBalance, and 0 in her wallet and wants to transfer her bankBalance and sets AMOUNT=10. When I run the model checker, it finds the double withdrawal problem immediately. Mallory's account got to 20 starting from 0! Normally we would expect it to go to 10 (line 27) temporarily, and then her bankBalance to be set to 0 (line 22). But this code managed to do double withdrawal, and the SafeWithdrawal invariant is violated.

The error trace contains 8 steps: Initially BankWithdraw is called, which then calls the MallorySendMoney to complete withdrawal. However, Mallory's SendMoney implementation includes another call to BankWithdraw and the balance check in line 18 passes because bankBalance is not decremented by amount (that comes in line 22). So the second BankWithdraw executes concurrently and Mallory manages to do double (and later triple) withdrawal.

Fixing things

Ok, let's check if we can fix this if we move the bankBalance subtraction before MallorySendMoney.
Of course for that we change SafeWithDrawal to accommodate the new way of updating bankBalance. But it turns out that is still too tight. If I call this with BALANCE=10 and AMOUNT=4, it is OK to have two withdrawals concurrently provided that in the final state no new money is produced: Invariant == bankBalance+malloryBalance <= BALANCE. I also model check for progress and write an EndState temporal formula for it: EndState == <>(bankBalance<=BALANCE-AMOUNT /\ bankBalance+malloryBalance=BALANCE). When we model check it, we see that this solves the problem.  So it leaves me puzzled, why, when it was this easy, the original BankWithdraw code was not coded this way and was left vulnerable to the attack.


These PlusCal models are available on my Github directory.

MAD questions

Should we come up with a PlusCal framework to facilitate modeling and model-checking of smart-contracts?

I had written about why you should model. Those apply here as well, and here things become even more critical. When money is involved, attackers get smart quickly, and it is easy to have vulnerabilities in concurrent code due to the many corner cases. Let TLA+/PlusCal show you those cornercases and help you design your protocol to achieve correctness guarantees. So if you are writing smartcontracts, I think it makes sense to first model-check and verify them. It doesn't take much effort, and it can save you from big problems.

Related links

Here is some previous discussion/context about why I started assigning TLA+/PlusCal modeling projects in distributed systems classes.

There is a vibrant Google Groups forum for TLA+ : https://groups.google.com/forum/#!forum/tlaplus

Clicking on label "tla" at the end of the post you can reach all my posts about TLA+

Comments

H said…
> When we model check it, we see that this solves the problem. So it leaves me puzzled, why, when it was this easy, the original BankWithdraw code was not coded this way and was left vulnerable to the attack.

I think the reason why is because you have to explicitly model that particular DAO attack to realize that you need to change BankWithdraw. PlusCal isn't finding an attack, just verifying that a specific one is effective.
Unknown said…
Hello,

Just created reddit account - this is my first post.

Have you checked my blog entry on DAO hack: https://jarjuk.wordpress.com/2016/12/09/sbuilder-ethereum-example/

A short summary:
- it uses a tool called sbuilder (aka specifcication builder), which produces TLA -language formal models
- sbuilder has plugin support, the block entry I linked uses _old_ GEM sbuilder-ethereum, a newer version is 'sbuilder-eth'
- a word of warning trying to simulate Solitidy language with TLA+ (or using any other language) is subject to false positives and false negatives.

Best Regards,
Jukkka

Popular posts from this blog

The end of a myth: Distributed transactions can scale

Hints for Distributed Systems Design

Foundational distributed systems papers

Learning about distributed systems: where to start?

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The demise of coding is greatly exaggerated

SIGMOD panel: Future of Database System Architectures

Dude, where's my Emacs?

There is plenty of room at the bottom