Metadata Management

There are two classes of metadata that need to be managed in Hedwig: one is the list of available hubs, which is used to track server availability (ZooKeeper is designed naturally for this); while the other is for data structures to track topic states and subscription states. This second class can be handled by any key/value store which provides ah CAS (Compare And Set) operation. The metadata in this class are:

Each kind of metadata is handled by a specific metadata manager. They are TopicOwnershipManager, TopicPersistenceManager and SubscriptionDataManager.

Topic Ownership Management

There are two ways to management topic ownership. One is leveraging ZooKeeper's ephemeral znodes to record the topic's owner info as a child ephemeral znode under its topic znode. When a hub server, owning a specific topic, crashes, the ephemeral znode which signifies topic ownership will be deleted due to the loss of the zookeeper session. Other hubs can then be assigned the ownership of the topic. The other one is to leverage the CAS operation provided by key/value stores to do leader election. CAS doesn't require the underlying key/value store to provide functionality similar to ZooKeeper's ephemeral nodes. With CAS it is possible to guarantee that only one hub server gains the ownership for a specific topic, which is more scalable and generic solution.

The implementation of a TopicOwnershipManager is required to implement following methods:



public void readOwnerInfo(ByteString topic, Callback<Versioned<HubInfo>> callback, Object ctx);

public void writeOwnerInfo(ByteString topic, HubInfo owner, Version version,
                           Callback<Version> callback, Object ctx);

public void deleteOwnerInfo(ByteString topic, Version version,
                            Callback<Void> callback, Object ctx);

Topic Persistence Info Management

Similar as TopicOwnershipManager, an implementation of TopicPersistenceManager is required to implement READ/WRITE/DELETE interfaces as below:


public void readTopicPersistenceInfo(ByteString topic,
                                     Callback<Versioned<LedgerRanges>> callback, Object ctx);

public void writeTopicPersistenceInfo(ByteString topic, LedgerRanges ranges, Version version,
                                      Callback<Version> callback, Object ctx);

public void deleteTopicPersistenceInfo(ByteString topic, Version version,
                                       Callback<Void> callback, Object ctx);

Subscription Data Management

SubscriptionDataManager has similar READ/CREATE/WRITE/DELETE interfaces as other managers. Besides that, the implementation needs to implement READ SUBSCRIPTIONS interface, which is to fetch all the subscriptions for a given topic.


public void createSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData data,
                                   Callback<Version> callback, Object ctx);

public boolean isPartialUpdateSupported();

public void updateSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData dataToUpdate, 
                                   Version version, Callback<Version> callback, Object ctx);

public void replaceSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData dataToReplace,
                                    Version version, Callback<Version> callback, Object ctx);

public void deleteSubscriptionData(ByteString topic, ByteString subscriberId, Version version,
                                   Callback<Void> callback, Object ctx);

public void readSubscriptionData(ByteString topic, ByteString subscriberId,
                                 Callback<Versioned<SubscriptionData>> callback, Object ctx);

public void readSubscriptions(ByteString topic, Callback<Map<ByteString, Versioned<SubscriptionData>>> cb,
                              Object ctx);

Create/Update Subscriptions

The metadata for a subscription includes two parts, one is preferences and the other one is subscription state. SubscriptionPreferences tracks all the preferences for a subscriber (etc. Application could store its customized preferences for message filtering), while SubscriptionState is used internally to track the message consumption state for a given subscriber. These two kinds of metadata are quite different: SubscriptionPreferences is not updated
frequently while SubscriptionState is be updated frequently when messages are consumed. If the underlying key/value store supports independent field update for a given key (subscription), SubscriptionPreferences and SubscriptionState could be stored as two different fields for a given subscription. In this case isPartialUpdateSupported should return true. Otherwise, isPartialUpdateSupported should return false and the implementation should serialize/deserialize SubscriptionData as an opaque blob.

Read Subscriptions

Delete Subscription

How to choose a key/value store for Hedwig.

From the interface, several requirements needs to meet before picking up a key/value store for Hedwig:

ZooKeeper is the default implementation for Hedwig metadata management, which holds data in memory and provides filesystem-like namespace, meeting the above requirements. ZooKeeper is suitable for most Hedwig usecases. However, if your application needs to manage millions of topics/subscriptions, a more scalable solution would be HBase, which also meet the above requirements.