How to extract change data events from MySQL to Kafka using Debezium

Introduction

As previously explained, CDC (Change Data Capture) is one of the best ways to interconnect an OLTP database system with other systems like Data Warehouse, Caches, Spark or Hadoop.

Debezium is an open source project developed by Red Hat which aims to simplify this process by allowing you to extract changes from various database systems (e.g. MySQL, PostgreSQL, MongoDB) and push them to Apache Kafka.

In this article, we are going to see how you can extract events from MySQL binary logs using Debezium.

Continue reading “How to extract change data events from MySQL to Kafka using Debezium”

Advertisements

Why you should always use hibernate.connection.provider_disables_autocommit for resource-local JPA transactions

Introduction

One of my major goals for Hibernate is to make sure we offer all sorts of performance improvements to reduce transaction response time and increase throughput. In Hibernate 5.2.10, we addressed the HHH-11542 Jira issue which allows you now to delay the database connection acquisition for resource-local transactions as well.

In this article, I’m going to explain how Hibernate acquires connections and why you want it to delay this process as long as possible.

Continue reading “Why you should always use hibernate.connection.provider_disables_autocommit for resource-local JPA transactions”

How does a relational database work

Introduction

While doing my High-Performance Java Persistence training, I came to realize that it’s worth explaining how a relational database works, as otherwise, it is very difficult to grasp many transaction-related concepts like atomicity, durability, and checkpoints.

In this post, I’m going to give a high-level explanation of how a relational database works internally while also hinting some database-specific implementation details.

Continue reading “How does a relational database work”

How does database pessimistic locking interact with INSERT, UPDATE, and DELETE SQL statements

Introduction

Relational database systems employ various Concurrency Control mechanisms to provide transactions with ACID property guarantees. While isolation levels are one way of choosing a given Concurrency Control mechanism, you can also use explicit locking whenever you want a finer-grained control to prevent data integrity issues.

As previously explained, there are two types of explicit locking mechanisms: pessimistic (physical) and optimistic (logical). In this post, I’m going to explain how explicit pessimistic locking interacts with non-query DML statements (e.g. insert, update, and delete).

Continue reading “How does database pessimistic locking interact with INSERT, UPDATE, and DELETE SQL statements”

A beginner’s guide to the Phantom Read anomaly, and how it differs between 2PL and MVCC

Introduction

Unlike SQL Server which, by default, relies on the 2PL (Two-Phase Locking) to implement the SQL standard isolation levels, Oracle, PostgreSQL, and MySQL InnoDB engine use MVCC (Multi-Version Concurrency Control).

However, providing a truly Serializable isolation level on top of MVCC is really difficult, and, in this post, I’ll demonstrate that it’s very difficult to prevent the Phantom Read anomaly without resorting to pessimistic locking.

Continue reading “A beginner’s guide to the Phantom Read anomaly, and how it differs between 2PL and MVCC”