A beginner’s guide to database multitenancy

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

In software terminology, multitenancy is an architectural pattern which allows you to isolate customers even if they are using the same hardware or software components. Multitenancy has become even more attractive with the widespread adoption of cloud computing.

A relational database system provides a hierarchy structure of objects which, typically, looks like this: catalog -> schema -> table. In this article, we are going to see how we can use each of these database object structures to accommodate a multitenancy architecture.

Catalog-based multitenancy

In a catalog-based multitenancy architecture, each customer uses its own database catalog. Therefore, the tenant identifier is the database catalog itself.

Since each customer will only be granted access to its own catalog, it’s very easy to achieve customer isolation. More, the data access layer is not even aware of the multitenancy architecture, meaning that the data access code can focus on business requirements only.

This strategy is useful relational database systems like MySQL where there is no distinction between a catalog and a schema.

The disadvantage of this strategy is that it requires more work on the Ops side: monitoring, replication, backups. However, with automation in place, this problem could be mitigated.

For more details about catalog-based multitenancy, check out this article.

Schema-based multitenancy

In a schema-based multitenancy architecture, each custom uses its own database schema. Therefore, the tenant identifier is the database schema itself.

Since each customer will only be granted access to its own schema, it’s very easy to achieve customer isolation. Also, the data access layer is not even aware of the multitenancy architecture, meaning that, just like for catalog-based multitenancy, the data access code can focus on business requirements only.

This strategy is useful for relational database systems like PostgreSQL which support multiple schemas per database (catalog). Replication, backing up, and monitoring can be set up on the catalog-level, hence all schemas could benefit from it.

However, if schemas are colocated on the same hardware, one tenant which runs a resource-intensive job might incur latency spikes in other tenants. Therefore, although data is isolated, sharing resources might make it difficult to honor the Service-Level Agreement.

For more details about schema-based multitenancy, check out this article.

Table-based multitenancy

In a table-based multitenancy architecture, multiple customers reside in the same database catalog and/or schema. To provide isolation, a tenant identifier column must be added to all tables that are shared between multiple clients.

While on the Ops side, this strategy requires no additional work, the data access layer needs extra logic to make sure that each customer is allowed to see only its data and to prevent data leaking from one tenant to the other. Also, since multiple customers are stored together, tables and indexes might grow larger, putting pressure on SQL statement performance.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

As you can see, there are multiple strategies to implement a multitenancy architecture on the database side. However, each one has its own advantages and disadvantages, so you must make sure you choose the right strategy according to your project DevOps requirements.

FREE EBOOK
Newsletter logo
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence,
  • a 10% discount coupon for my book.

5 Comments on “A beginner’s guide to database multitenancy

  1. Hi Vlad, good post!, I have some doubts, but only one question for you about the catalog approach:

    The catalog approach: I’m deciding to use the catalog approach to manage thousands of customers, each catalog will have around 20-30 tables, but I can see that the problem will be the filesystems and not the RDBMS (Postgres in my case), have you had problems having thousands of catalogs?
    This looks like a limitation of the O.S.

    The schema approach: the main problems I can see with this approach are focused with the exportation, whenever you use pg_dump (postgres) they lock the rows and doing so when you have thousands of schemas will take a loooong, another thing are the tools, I could imagine having a tool trying to list all the thousands of schemas, it will never show them if the tools doesn’t use any windowing strategy.

    The table approach: this seems to be the perfect approach, but the problem is the backup and restore, we need to discriminate each row, in my requirements the backup is required, also the restoration.

    In my conclusion I can say that If there’s no limit in the filesystem I can go with the approach of the catalog, I found it more flexible, isolated, easy to manage even if I have thousands of them.

    Could you Vlad give me some insights?
    Thanks.

    • Every approach has advantages and disadvantages. If you don’t have a very large number of customers, the catalog and schema approaches are fine. Otherwise, the table approach is more suitable. That’s pretty much it.

  2. We use Table-based multitenancy and with a team of three doing the whole ERP I don’t see how we could manage Schema-based multitenancy, since we have about 10000 tables and working with data in stored procedures, for example, is in my experience much easier and faster than with any outside (Java, C++, …) program.

    • The problem I can see with the table approach is the backup, in my requirements the database has to export it’s data including associations, so a backup for a schema or catalog is the approach to select because they allow me to do so without having to spend time discriminating which records belongs to a customer.
      The downside I can see with a catalog isolated is the limit, in our case there’re like 150,000 customers, each customer catalog has around 20-30 tables and… I found that the limits I read are the filesystems, they have it’s own constraints.
      To summary if you are going to implement the catalog approach make sure you don’t have more than 30,000 customers (databases).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.