Back to Table of Contents

SPV, Bloom filters and checkpoints

This is a technical article that assumes a working knowledge of Bitcoin.

A full node, such as Bitcoin Core, knows the following:

  • every transaction that is currently being broadcast around the network
  • every transaction that has EVER been sent
  • all the unspent transaction outputs (UTXOs)

This requires a lot of data to be downloaded, stored and indexed. However, there are some shortcuts that reduce the amount of redundant information if you don't need a complete block chain.

Simplified Payment Verification (SPV).

SPV provides a way where you can determine that a particular transaction was in a block in the block chain without requiring the entire block chain to be downloaded. It does this as follows:

  • every transaction has a hash
  • every block has a hash
  • a transaction hash and the block hash can be linked using a Merkle tree proof.

A Merkle tree is a mathematical model where the block is at the apex and all the transaction get placed in a tree-like structure.

A Merkle tree proof is a list of all the hashes between the apex (block) and the leaf (transaction). The point of a Merkle tree proof is that you only need a small part of the block to prove the transaction is in the block.

Thus when a wallet says it uses SPV it means that before it believes in a transaction it checks:

  1. there is a Merkle tree proof that the transaction is in a block
  2. the block itself is in the main chain of the block chain

The transaction is then "good" and will be added to the wallet.

Bloom filtering and single HD account support

Many people have asked us about why we only support HD accounts (namely Account 1 in the normal parlance).

The main reason is due to how we get our transactions from the Bitcoin Core nodes. We use a technique called bloom filtering. We don't ask for the transaction directly, instead we give the Bitcoin Core nodes a filter that we know will match all the transaction we are interested in (plus some false positives to put anyone spying off the scent a little).

Supporting just one account means creating filters for a steadily increasing number of addresses for both the main addresses and the change addresses. This starts off as "hundreds" and, as the wallets get used, will become "thousands".

Scaling this up to supporting any number of accounts means creating filters that match:

number of accounts x (main addresses + change addresses)

Thus we have to filter to match many more addresses to the point (we think) where we are pretty much getting the complete blocks. This makes us as at least as slow as a Bitcoin Core node and more likely slower as we are uploading very wide bloom filters.

We think this will be far too slow to be useful so we are restricting our usage to a single account.

We don't have the UTXO set

We don't have access to the Unspent Transaction Output (UTXO) set using Bitcoinj thus we cannot check directly against it. Only implementations that have a full block store in their backend - and can query it directly - can use the UTXO set which would mean downloading the entire block chain.

Bitcoinj only talks the Bitcoin network protocol which does not support features such as "give me all the UTXO for this address".


To reduce the amount of blocks that need to be downloaded we include a checkpoints file in the installer which contains the headers of each block when the Bitcoin difficulty level changes (every 2015 blocks).

This allows us to only sync from the checkpoint before the wallet birth date which saves a lot of time and is why we ask you to record the "datestamp" during wallet creation. Thus if the wallet datestamp is equivalent to block 200,050 and we have a checkpoint at block 200,000 then we can just sync 50 blocks.

Since the checkpoints file is stored locally and is provided through our installer it enables us to detect when a Bitcoin node is attempting to spoof the use of their forked chain (containing fake transactions) rather than the genuine Bitcoin block chain.

Connecting to a local Bitcoin Core node

MultiBit HD will automatically connect to a Bitcoin Core running on localhost if it can detect one. It also connects to other nodes since we use transaction propagation to determine when a transaction has been sent properly and when the change from a transaction can be used. If we relied purely on a single node (even if it was trusted) we could not have confidence that the real external Bitcoin network is relaying it.

Related articles

Here are some related articles: