A beginner’s guide to database multitenancy

(Last Updated On: August 16, 2018)

Introduction

In software terminology, multitenancy is an architectural pattern which allows you to isolate customers even if they are using the same hardware or software components. Multitenancy has become even more attractive with the widespread adoption of cloud computing.

A relational database system provides a hierarchy structure of objects which, typically, looks like this: catalog -> schema -> table. In this article, we are going to see how we can use each of these database object structures to accommodate a multitenancy architecture.

Catalog-based multitenancy

In a catalog-based multitenancy architecture, each customer uses its own database catalog. Therefore, the tenant identifier is the database catalog itself.

Since each customer will only be granted access to its own catalog, it’s very easy to achieve customer isolation. More, the data access layer is not even aware of the multitenancy architecture, meaning that the data access code can focus on business requirements only.

This strategy is useful relational database systems like MySQL where there is no distinction between a catalog and a schema.

The disadvantage of this strategy is that it requires more work on the Ops side: monitoring, replication, backups. However, with automation in place, this problem could be mitigated.

Schema-based multitenancy

In a schema-based multitenancy architecture, each custom uses its own database schema. Therefore, the tenant identifier is the database schema itself.

Since each customer will only be granted access to its own schema, it’s very easy to achieve customer isolation. Also, the data access layer is not even aware of the multitenancy architecture, meaning that, just like for catalog-based multitenancy, the data access code can focus on business requirements only.

This strategy is useful for relational database systems like PostgreSQL which support multiple schemas per database (catalog). Replication, backing up, and monitoring can be set up on the catalog-level, hence all schemas could benefit from it.

However, if schemas are colocated on the same hardware, one tenant which runs a resource-intensive job might incur latency spikes in other tenants. Therefore, although data is isolated, sharing resources might make it difficult to honor the Service-Level Agreement.

Table-based multitenancy

In a table-based multitenancy architecture, multiple customers reside in the same database catalog and/or schema. To provide isolation, a tenant identifier column must be added to all tables that are shared between multiple clients.

While on the Ops side, this strategy requires no additional work, the data access layer needs extra logic to make sure that each customer is allowed to see only its data and to prevent data leaking from one tenant to the other. Also, since multiple customers are stored together, tables and indexes might grow larger, putting pressure on SQL statement performance.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

As you can see, there are multiple strategies to implement a multitenancy architecture on the database side. However, each one has its own advantages and disadvantages, so you must make sure you choose the right strategy according to your project DevOps requirements.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

8 thoughts on “A beginner’s guide to database multitenancy

  1. We use Table-based multitenancy and with a team of three doing the whole ERP I don’t see how we could manage Schema-based multitenancy, since we have about 10000 tables and working with data in stored procedures, for example, is in my experience much easier and faster than with any outside (Java, C++, …) program.

  2. How to manage the version update of the schemas among all different tenancies is another challenge if you are designing a SaaS application. Table based multi-tenancy makes it much easier for any DB schema change.
    Schema or Catalog based multi-tenancy require a well-designed master-slave structure and manageable non-static route to the database per tenant. At the data access layer, connection pool also needs to be redesigned for multi-tenancy.
    So for most of the case, I prefer table based multi-tenancy.

    1. If you use Flyway, managing the DB schema is straightforward. Also, replication is a necessity anyway, no matter what multitenancy strategy you choose. For monitoring connection pooling, you should FlexyPool.

    1. We use schema-based multitenancy, and have really struggled to get it to work nicely with Hibernate. In our case, we use schemas to separate both logically different areas of the application domain and different customers. E.g. each customer has their own “projects”, “admin” and “orders” schemas, and we need datasources that are customised both by domain type, and by customer id, for each request. It would be great to see some tutorial on this kind thing.

      1. Subscribe to my newsletter because I’ll certainly write about that in the coming weeks.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.