How to get started with CockroachDB

Introduction

CockroachDB is a really interesting database system, getting the best of both RDBMS and NoSQL. It’s been developed by former Google developers, and it’s inspired by Google Spanner. However, unlike Google Spanner, which is offered as a service in Google Cloud, CockroachDB is an open-source database that can be installed on premise.

Also, CockroackDB allows you to use the PostgreSQL drivers as opposed to Spanner which only supports the gRPC protocol. So, you can practically reuse all the frameworks that have emerged in the Java ecosystem like connection pools, monitoring proxies (e.g. FlexyPool) or data access frameworks (e.g. Hibernate).

Unlike a typical RDBMS, CockroachDB is designed to be globally distributed and strongly resilient to disasters, hence its very unusual name. But what’s really exciting about CockroachDB is its non-locking timestamp ordering concurrency control mechanism which allows CockroachDB to run in Serializable isolation level, even if the database is massively distributed. Intriguing, right?

Installing CockroachDB

Using Docker is probably the easiest way to start working with CockroachDB.

So, you need to pull the latest image from Docker Hub:

> docker pull cockroachdb/cockroach:beta-20170406

Then, you need to create a new network bridge since you might want to run multiple nodes on your cluster:

> docker network create -d bridge roachnet

For our examples, we only need to run aa single node, so to create a new Docker container you need to run the following command:

docker run -d ^
 --name=roach1 ^
 --hostname=roach1 ^
 --net=roachnet ^
 -p 26257:26257 -p 8080:8080  ^
 cockroachdb/cockroach start --insecure

In the command above, I ussed the Windows-style ^ multi-line separator. For Linux, use \ instead.

If you run docker ps -a, you can now see the CockroachDB Docker container:

> docker ps -as

CONTAINER ID  IMAGE                         COMMAND                 CREATED         STATUS         PORTS                                             NAMES
753d8ebddd02  cockroachdb/cockroach         "/cockroach/cockro..."  28 seconds ago  Up 27 seconds  0.0.0.0:8080->8080/tcp, 0.0.0.0:26257->26257/tcp  roach1

No, we need to set up our database. For this, you need to start a bash process inside the Docker container we just created and started:

> docker exec -it roach1 /bin/bash

Once you get access to the CockroachDB Linux OS host, you can add a new database user as follows:

root@roach1:/cockroach# ./cockroach user set cockroach --password --insecure

Enter password:
Confirm password:
INSERT 1

You are prompted for a password after you type the ./cockroach user set cockroach --password command.

Now, you can see your newly created user as follows:

root@roach1:/cockroach# ./cockroach user ls --insecure

+-----------+
| username  |
+-----------+
| cockroach |
+-----------+
(1 row)

Now, you need to create a database like this:

root@roach1:/cockroach# ./cockroach sql -e 'CREATE DATABASE high_performance_java_persistence' --insecure

and grant all privileges to the cockroach user we previously created.

root@roach1:/cockroach# ./cockroach sql -e 'GRANT ALL ON DATABASE high_performance_java_persistence TO cockroach' --insecure

And we are done!

Using JDBC

Now, to make things really interesting, let me configure the High-Performance Java Persistence book GitHub repository to support CockroachDB.

Since it speaks the PostgreSQL protocol, we can use a PostgreSQL JDBC Driver with CockroachDB.

All I need to do is to create a new PGSimpleDataSource like this:

public DataSource dataSource() {
    PGSimpleDataSource dataSource = 
        new PGSimpleDataSource();
        
    dataSource.setDatabaseName( 
        "high_performance_java_persistence" 
    );
    dataSource.setServerName( host() );
    dataSource.setPortNumber( port() );
    dataSource.setUser( username() );
    dataSource.setPassword( password() );
    dataSource.setSsl( false );
    
    return dataSource;
}

I can even use HikariCP as a Connection Pooling solution for CockroachDB:

protected HikariDataSource connectionPoolDataSource(
    DataSource dataSource) {
    
    HikariConfig hikariConfig = new HikariConfig();
    int cpuCores = Runtime.getRuntime().availableProcessors();
    hikariConfig.setMaximumPoolSize(cpuCores * 4);
    hikariConfig.setDataSource(dataSource);
    return new HikariDataSource(hikariConfig);
}

Due to a known issue that will be fixed in the next Beta release, I had to set up the custom isolation level to make it work with HikariCP:

properties.put(
    "hibernate.hikari.transactionIsolation", 
    "TRANSACTION_SERIALIZABLE"
);

Using Hibernate

Not only that we can use JDBC with CockroachDB, but we can even use Hibernate with it. Although the official Hibernate tutorial uses the PostgreSQL94Dialect, for the moment, the PostgreSQL82Dialect is much closer to CockroachDB since it does not offer JSON types, SKIP_LOCKED, CTE (Common Table Expressions) or MATERIALIZED VIEWS to justify any newer PostgreSQL Dialect version.

In fact, even the PostgreSQL82Dialect is not a drop-in replacement since it comes with additional SQL functions (e.g. chr, str), and pessimistic locking support as well. However, for simplicity sake, we can justs extend the PostgreSQL82Dialect and simply replace the most common data types:

public class CockroachDBDialect extends PostgreSQL82Dialect {

    public CockroachDBDialect() {
        super();
        registerColumnType( Types.SMALLINT, "smallint" );
        registerColumnType( Types.TINYINT, "smallint" );
        registerColumnType( Types.INTEGER, "integer" );

        registerColumnType( Types.FLOAT, "double precision" );
        registerColumnType( Types.DOUBLE, "double precision" );

        registerColumnType( Types.BLOB, "blob" );
        registerColumnType( Types.OTHER, "interval" );
    }
}

Even if the CockroachDBDialect will get you started, in reality, a new Dialect needs to be created from scratch since and define the exact set of features that CockroachDB offers.

Crafting an entity

Although there is a Hibernate tutorial on the CockroachDB website, the entity mapping they offered is using an assigned identifier, which is more of an exception than a rule.

So, our entity looks like this:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue(
        strategy = GenerationType.IDENTITY
    )
    private Long id;

    @Column
    @Temporal(TemporalType.DATE)
    private Date createdOn;

    private String title;

    //Getters and setters omitted for brevity
}

Now, while using a database SEQUENCE is the best identifier strategy when using Hibernate, there is no such construct in CockroachDB. However, CockroachDB offers a SERIAL type, just like PostgreSQL, which uses a globally distributed generation strategy.

Unfortunately, due to a limitation of how Hibernate manages identifiers, the use of the IDENTITY generator can disable batch inserts. Read this article for more details on this topic. However, at some point we’d really need to provide a solution to this limitation.

The createdOn is a DATE type attribute which we will use to order entities in our queries.

For this entity, Hibernate generates the following database table:

CREATE TABLE post
(
  id bigserial NOT NULL,
  createdOn DATE,
  title VARCHAR(255),
  PRIMARY KEY ( id )
)

Adding some entities

Let’s now add 10 entities:

LocalDate startDate = LocalDate.of( 2016, 11, 2 );

for ( int offset = 0; offset < 10; offset++ ) {
    Post post = new Post();
    post.setTitle(
        String.format(
            "High-Performance Java Persistence, Review %d",
            offset
        )
    );
    post.setCreatedOn(
        Date.from( startDate
            .plusDays( offset )
            .atStartOfDay( ZoneId.of( "UTC" ) )
            .toInstant()
        )
    );
    entityManager.persist( post );
}

Which generate typical SQL INSERT statements:

INSERT INTO post ('2016-11-02', 'High-Performance Java Persistence, Review 0')
INSERT INTO post ('2016-11-03', 'High-Performance Java Persistence, Review 1')
INSERT INTO post ('2016-11-04', 'High-Performance Java Persistence, Review 2')
INSERT INTO post ('2016-11-05', 'High-Performance Java Persistence, Review 3')
INSERT INTO post ('2016-11-06', 'High-Performance Java Persistence, Review 4')
INSERT INTO post ('2016-11-07', 'High-Performance Java Persistence, Review 5')
INSERT INTO post ('2016-11-08', 'High-Performance Java Persistence, Review 6')
INSERT INTO post ('2016-11-09', 'High-Performance Java Persistence, Review 7')
INSERT INTO post ('2016-11-10', 'High-Performance Java Persistence, Review 8')
INSERT INTO post ('2016-11-11', 'High-Performance Java Persistence, Review 9')

Query entities

Assuming we run the following JPQL query:

List<Post> posts = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "order by p.createdOn", Post.class )
.setMaxResults( 5 )
.getResultList();

assertEquals( 5, posts.size() );

Hibernate generates the proper PostgreSQL SELECT query:

SELECT p.id AS id1_0_,
       p.createdOn AS createdO2_0_,
       p.title AS title3_0_
FROM   post p
ORDER BY p.createdOn
LIMIT 5 

And CockroachDB provides the proper result set.

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

Compared to a NoSQL database like MongoDB or Cassandra, or even a NewSQL database like VoltDB, CockroachDB allows you to reuse all the frameworks that have been developed in the Java ecosystem. This is very convenient both for developers and for the business since the adoption process is going to be smoother than when the data access paradigm shifts considerably.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

4 thoughts on “How to get started with CockroachDB

  1. By the way I managed to connect to Elasticsearch from Hibernate using sql4es driver, so now I am able to query data from ES by Criteria/JPQL/QueryDSL. Many other databases even non-relational like Cassandra or ClickHouse support SQL/JDBC too on some degree. It means that “good old”JPA-JDBC-SQL stack can be used to create polyglot persistence architecture quite simply. It gives kind of additional level of abstraction over data storage. No need to learn new APIs/query languages and pretty simple way to switch from one database to another one.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s