When a bookie crashes, any ledgers with entries on the bookie potentially become under-replicated. For this reason, we provide a recovery tool which will ensure that all ledgers which had entries on the bookie are fully replicated. At the moment, this is not an automatic process. The administrator must run this tool manually when he sees that the bookie has died.
To run recovery, with zk1.example.com as the zookeeper ensemble, and 192.168.1.10 as the failed bookie, do the following:
bookkeeper-server/bin/bookkeeper org.apache.bookkeeper.tools.BookKeeperTools zk1.example.com:2181 192.168.1.10:3181
It is necessary to specify the host and port portion of failed bookie, as this is how it identifies itself to zookeeper. It is possible to specify a third argument, which is the bookie to replicate to. If this is omitted, as in our example, a random bookie is chosen for each ledger fragment. A ledger fragment is a continuous sequence of entries in a bookie, which share the same ensemble.
The recovery process is as follows.