Batches

Introduction

A batch is a group of read and write operations that are logically related. When an app uses Syncbase without synchronization, a batch is equivalent to an ACID transaction.

Atomic: All writes that are part of the batch are committed together.
Consistent: Any batches started in the future necessarily see the effects of batches committed in the past.
Isolated: The concurrent execution of batches results in a state that would be equivalent to the batches executing serially in some order.
Durable: Once a batch has been committed, it will remain committed in the face of power loss or crashes.

When an app uses Syncbase with synchronization, a batch no longer provides ACID semantics. Syncbase is a loosely coupled, decentralized, distributed storage system, so the guarantees of batches are appropriate for that environment.

Atomic: All read and write operations that are part of the batch are synchronized as an atomic unit. However, a conflict resolver may merge two batches by taking part of one batch and another part of the other batch.
Consistent: Consistency is impossible to provide when devices are allowed to work offline. A user could perform an operation on one device and then attempt to perform an operation on a second device before the two devices have synced with each other.
Isolated: Conflict resolvers could violate isolation guarantees by improperly merging two batches.
Durable: While batches are durable in the common case, there are two exceptions:
- The batch is committed on a device while partitioned from other devices. The device never syncs with other devices (e.g. dropped in the river).
- A poorly written conflict resolver erroneously discards the conflicting batch rather than merging it.

While the edge cases prevent us from claiming ACID semantics, we believe that the behavior above strikes a good balance between implementable semantics and useful behavior for the developer and user.

Batches are not limited to the data within a collection. If a batch contains data from multiple collections, peers will receive only the parts of the batch they are allowed to see.

Using Batches

BatchDatabase is the entry point to the batch API. BatchDatabase is similar to Database except it provides commit and abort methods and all operations on collection references obtained from a BatchDatabase would be part of the batch.

RunInBatch

RunInBatch is the recommended way of doing batch operations. It detects concurrent batch errors and handles retries and commit/aborts automatically.

cat - <<EOF | sed 's///' >> $FILE
db.runInBatch(new Database.BatchOperation() {
  @Override
  public void run(BatchDatabase batchDb) throws SyncbaseException {
    Collection c1 = batchDb.createCollection();
    Collection c2 = batchDb.createCollection();

    c1.put("myKey", "myValue");
    c2.put("myKey", "myValue");

    // No need to commit. RunInBatch will commit and retry if necessary.
  }
}, new Database.BatchOptions());
EOF

Warning

Using collection references previously obtained from Database will have no atomicity effect when used in RunInBatch. New collection references must be obtained from BatchDatabase.

The following code snippet demonstrates the WRONG way of using batches.

cat - <<EOF | sed 's///' >> $FILE
// WRONG: c1 is NOT part of the batch.
final Collection c1 = db.createCollection();
{#dim}{#dim-children}db.runInBatch(new Database.BatchOperation() {
    @Override
    public void run(BatchDatabase batchDb) throws SyncbaseException {
        Collection c2 = batchDb.createCollection();{/dim-children}{/dim}
        // WRONG: Only mutations on c2 are atomic since c1 reference
        // was obtained from Database and not BatchDatabase.
        c1.put("myKey", "myValue");
        c2.put("myKey", "myValue");
{#dim}{#dim-children}        // No need to commit. RunInBatch will commit and retry if necessary.
    }
}, new Database.BatchOptions());{/dim-children}{/dim}
EOF

BeginBatch

BeginBatch is an alternative approach to starting a batch operation. Unlike RunInBatch, it does not manage retries and commit/aborts. They are left to the developers to manage themselves.

cat - <<EOF | sed 's///' >> $FILE
BatchDatabase batchDb = db.beginBatch(new Database.BatchOptions());

Collection c1 = batchDb.createCollection();
Collection c2 = batchDb.createCollection();

c1.put("myKey", "myValue");
c2.put("myKey", "myValue");

batchDb.commit();
EOF

Warning

Using collection references obtained from a BatchDatabase after the batch is committed or aborted will throw exceptions.

The following code snippet demonstrates the WRONG way of using batches.

cat - <<EOF | sed 's///' >> $FILE
// WRONG: c1 is NOT part of the batch.
Collection c1 = db.createCollection();
{#dim}{#dim-children}BatchDatabase batchDb = db.beginBatch(new Database.BatchOptions());

// c2 is part of the batch.
Collection c2 = batchDb.createCollection();{/dim-children}{/dim}

// WRONG: Only mutations on c2 are atomic since c1 reference was obtained
// from Database and not BatchDatabase.
c1.put("myKey", "myValue");
c2.put("myKey", "myValue");

batchDb.commit();

// WRONG: Throws exception since c2 is from an already committed batch.
c2.put("myKey", "myValue");
EOF

Summary

Use batches to group operations that are logically related.
Use the recommended runInBatch method to perform batch operations to get the added benefit of automatic retries and commit/abort.
Ensure all collection references are obtained from BatchDatabase otherwise mutations may not be part of a batch.