A beginner’s guide to YugabyteDB

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see what YugabyteDB is, how to install it and manage using PostgreSQL tools, and how you can connect to it using JDBC, JPA, or Hibernate.

I got curious about Yugabyte since Franck Pachot joined them as a Developer Advocate. Having followed Franck for a long time, I decided to investigate this new PostgreSQL-compatible database they are developing since I’ve been learning a lot of stuff about SQL and database systems from Franck.

What is YugabyteDB

YugabyteDB is an open-source distributed SQL database that combines the benefits of using a relational database (e.g., ACID transactions) with the advantages of globally-distributed auto-sharded stores (e.g., NoSQL document databases).

First of all, it’s an open-source database, and you can find it on GitHub. Only the cloud management part is proprietary, but the engine itself is community-driven.

Second, YugabyteDB builds on top of PostgreSQL, so every tool that works with PostgreSQL works with Yugabyte as well. So, not only you’ll be able to use PgAdmin to connect to Yugabyte, but you can use any software framework or library that works with the PostgreSQL drivers. As you will see in this article, it’s extremely easy to make an existing PostgreSQL application work on Yugabyte.

Third, YugabyteDB is versatile when it comes to data and traffic volumes. Because it provides auto-scaling, auto-sharding, and auto-balancing, you won’t have to rearchitect your system the moment it becomes too successful for the initial architecture to cope with.

How to install YugabyteDB

Depending on your application needs, there are multiple ways to install Yugabyte.

However, in this article, I’m going to show you how to run Yugabyte in a Docker container.

The first step is to pull the Docker image:

> docker pull yugabytedb/yugabyte:2.15.1.0-b175

2.15.1.0-b175: Pulling from yugabytedb/yugabyte
2d473b07cdd5: Pull complete
5954b7a9c5ea: Pull complete
5b00001786bb: Pull complete
c43e6bd8eb6c: Pull complete
99ad07cc1c7c: Pull complete
b9331fac7e42: Pull complete
a7e3630fe335: Pull complete
05b42b4417c9: Pull complete
d97501a5f6ad: Pull complete
06158813861c: Pull complete
736eaefc97b2: Pull complete
c45ea0648626: Pull complete
2843bee931d8: Pull complete
808b5e86368d: Pull complete

Digest: sha256:b340163bdd55bf6b3653224460eb93f71782b331804d2f9655194e2b135ba72f
Status: Downloaded newer image for yugabytedb/yugabyte:2.15.1.0-b175
docker.io/yugabytedb/yugabyte:2.15.1.0-b175

Afterward, we can create a new container using the following docker run command:

> docker run -d --name yugabyte  -p7000:7000 -p9000:9000 -p5433:5433 -p9042:9042 yugabytedb/yugabyte:2.15.1.0-b175 bin/yugabyted start --daemon=false --ui=false

If you are running macOS Monterey, you have to replace -p7000:7000 with -p7001:7000.

This is necessary because, by default, AirPlay listens on port 7000. This conflicts with YugabyteDB and causes yugabyted start to fail unless you forward the port as shown. Alternatively, you can disable AirPlay receiving, then start YugabyteDB normally, and then, optionally, re-enable AirPlay receiving.

Notice that the newly created container is called yugabyte, and we can see it installed with the ps -a command:

> docker ps -a

CONTAINER ID   IMAGE                                                COMMAND                  CREATED          STATUS                       PORTS                                                                                                                                                                     NAMES
88feaa0a2942   yugabytedb/yugabyte:2.15.1.0-b175                    "/sbin/tini -- bin/y…"   27 seconds ago   Up 24 seconds                0.0.0.0:5433->5433/tcp, 6379/tcp, 7100/tcp, 0.0.0.0:7000->7000/tcp, 0.0.0.0:9000->9000/tcp, 7200/tcp, 9100/tcp, 10100/tcp, 11000/tcp, 0.0.0.0:9042->9042/tcp, 12000/tcp   yugabyte

Having the container in place, the next time we boot our system, we can start the Yugabyte database using the start Docker command:

docker start yugabyte

That’s it!

How to connect to Yugabyte

Once the Yugabyte database server is started, you can connect to it using any PostgreSQL-compatible tool. For instance, I can use the PgAdmin UI tool to connect to both my local PostgreSQL server and the YugabyteDB server running on Docker:

Connecting to YugabyteDB using PgAdmin

From your favorite programming language, you can connect to Yugabyte just like you’d do for PostgreSQL. For instance, if you’re using Java, you can use the PGSimpleDataSource from the PostgreSQL JDBC Driver, as illustrated by the following example:

PGSimpleDataSource dataSource = new PGSimpleDataSource();
dataSource.setURL(
    "jdbc:postgresql://127.0.0.1:5433/high_performance_java_persistence"
);
dataSource.setUser("yugabyte");
dataSource.setPassword("admin");

Awesome, right?

Running the High-Performance repository on Yugabyte

For me, the best way to test a database system that has a JDBC Driver and a Hibernate Dialect is to use the High-Performance Java Persistence GitHub repository since it provides a massive collection of integration tests that can verify tons of JPA, Hibernate, JDBC, and database features.

> find . -name '*Test.java' | wc -l
709

With 709 integration test classes available, I have a lot of ways I could test a given relational database, so I’m going to integrate Yugabyte into my High-Performance Java Persistence GitHub repository and test how it works using the existing PostgreSQL-compatible integration tests.

As illustrated by this commit, adding support for Yugabyte was just a matter of creating a new YugabyteDBDataSourceProvider.

I didn’t even have to add the Yugabyte-specific JDBC Driver if I’m using a single Docker database server instance. YugabyteDB provides its own JDBC Driver, which is needed if you want to benefit from auto-balancing or enable other cool features they offer.

Testing time

Assuming we have the following JPA entity:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Column(name = "created_on")
    private LocalDateTime createdOn;

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public Post setTitle(String title) {
        this.title = title;
        return this;
    }

    public LocalDateTime getCreatedOn() {
        return createdOn;
    }

    public Post setCreatedOn(LocalDateTime createdOn) {
        this.createdOn = createdOn;
        return this;
    }
}

When persisting three Post entities:

entityManager.persist(
    new Post()
        .setTitle("High-Performance Java Persistence, Part 1")
        .setCreatedOn(today.minusDays(2).atStartOfDay())
);

entityManager.persist(
    new Post()
        .setTitle("High-Performance Java Persistence, Part 2")
        .setCreatedOn(today.minusDays(1).atStartOfDay())
);

entityManager.persist(
    new Post()
        .setTitle("High-Performance Java Persistence, Part 3")
        .setCreatedOn(today.atStartOfDay())
);

Hibernate executes the following INSERT statements on YugabyteDB:

INSERT INTO post (
    created_on, 
    title, 
    id
) 
VALUES (
    '2022-09-05 00:00:00.0', 
    'High-Performance Java Persistence, Part 1', 
    1
)

INSERT INTO post (
    created_on, 
    title, 
    id
) 
VALUES (
    '2022-09-06 00:00:00.0', 
    'High-Performance Java Persistence, Part 2', 
    2
)

INSERT INTO post (
    created_on, 
    title, 
    id
) 
VALUES (
    '2022-09-07 00:00:00.0', 
    'High-Performance Java Persistence, Part 3', 
    3
)

And, querying works just like on any relational database system:

List<Post> posts = entityManager.createNativeQuery("""
    SELECT *
    FROM post
    WHERE
        created_on >= :startTimestamp and 
        created_on < :endTimestamp
    """, Post.class)
.setParameter("startTimestamp", today.minusDays(2))
.setParameter("endTimestamp", today)
.getResultList();

assertEquals(2, posts.size());

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

This was the first time I ever used YugabyteDB, and I’m really impressed because it allows me to reuse lots of tools I’m already familiar with.

The fact that I didn’t have to do anything special to make it work with JPA and Hibernate is great because I can easily migrate an existing Spring Boot project from PostgreSQL or YugabyteDB and benefit from its auto-scaling capabilities.

Transactions and Concurrency Control eBook

4 Comments on “A beginner’s guide to YugabyteDB

  1. Many thanks for clarification.

    Regarding
    Why would existing tests fail on PostgreSQL? If they were failing, I would have fixed them.

    Some time ago in Hibernate were known bugs, which were not fixed for quite a long time. I had sth similar in mind regarding PostgreSQL – like being not compliant with SQL standard. In such situation this should be PostgreSQL which should be fixed, not your test. And I wondered if there are tests which Yugabyte pass despite PostgreSQL fail.

  2. Thanks for interesting post.

    How many PostgreSQL compatible tests do you have in you repo?
    How many of them did fail on Yugabyte, which pass on PostgreSQL?
    How many of them did pass on Yugabyte, which fail on PostgreSQL?

    What is nearest competitor, similar database, to Yugabyte? Is it Cassandra?

    Do you have any link to comparison with CassandraDB/Aws Keyspace?

    Regards and thanks in advance.

    • How many PostgreSQL compatible tests do you have in you repo?

      The repository is available online:

      https://github.com/vladmihalcea/high-performance-java-persistence

      The number of PostgreSQL tests is given by the total number of tests after you subtract the Oracle, SQL Server, and MySQL tests. I haven’t done the maths, but it’s surely over 500.

      How many of them did fail on Yugabyte, which pass on PostgreSQL?

      I didn’t have to run that because Yugabyte builds on top of PostgreSQL. It’s an extended PostgreSQL engine just like Percona is for MySQL.

      How many of them did pass on Yugabyte, which fail on PostgreSQL?

      Why would existing tests fail on PostgreSQL? If they were failing, I would have fixed them.

      What is nearest competitor, similar database, to Yugabyte? Is it Cassandra?

      Most likely CockroachDB, Google Spanner, TiDB. Cassandra is not an SQL database and has serious flaws when it comes to data integrity since its multi-master conflict resolution is based on the last-writer wins approach.

      Do you have any link to comparison with CassandraDB/Aws Keyspace?

      That would be apples vs. oranges comparison. An SQL database that’s ACID compliant and globally distributed is an extremely convenient choice. This is not just for smaller and medium companies but for FAANG as well. Google Spanner is ACID compliant and globally distributed. Facebook runs on MySQL. If you need ACID and the convenience of SQL, you choose A RDBMS or a NewSQL database.

      Casandra is not a replacement for SQL or NewSQL. As a column-oriented database, Casandra is more suited for analytics and/or batch processing rather than for OLTP applications. For instance, Casandra would not work for Google the way Spanner does because SPanner can be used for real-time ad bidding across the entire world, while Cassandra queries will never be able to deliver that. But, a database like YugabyteDB is built for those kind of use cases.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.