MongoDB Incremental Migration Scripts

Introduction An incremental software development process requires an incremental database migration strategy. I remember working on an enterprise application where the hibernate.hbm2ddl.auto was the default data migration tool. Updating the production environment required intensive preparation and the migration scripts were only created on-the-spot. An unforeseen error could have led production data corruption. Incremental updates to the rescue The incremental database update is a technical feature that needs to be addressed in the very first application development iterations. We used to develop our own custom data migration implementations and spending time on writing/supporting… Read More

Integration testing done right with Embedded MongoDB

Introduction Unit testing requires isolating individual components from their dependencies. Dependencies are replaced with mocks, which simulate certain use cases. This way, we can validate the in-test component behavior across various external context scenarios. Web components can be unit tested using mock business logic services. Services can be tested against mock data access repositories. But the data access layer is not a good candidate for unit testing because database statements need to be validated against an actual running database system. Integration testing database options Ideally, our tests should run against a production-like… Read More

MongoDB 2.6 is $out

Introduction MongoDB is evolving rapidly. The 2.2 version introduced the aggregation framework as an alternative to the Map-Reduce query model. Generating aggregated reports is a recurrent requirement for enterprise systems and MongoDB shines in this regard. If you’re new to it you might want to check this aggregation framework introduction or the performance tuning and the data modelling guides.

MongoDB and the fine art of data modeling

Introduction This is the third part of our MongoDB time series tutorial, and this post will emphasize the importance of data modeling. You might want to check the first part of this series, to get familiar with our virtual project requirements and the second part talking about common optimization techniques. When you first start using MongoDB, you’ll immediately notice it’s schema-less data model. But schema-less doesn’t mean skipping proper data modeling (satisfying your application business and performance requirements). As opposed to a SQL database, a NoSQL document model is more focused towards… Read More

A beginner’s guide to MongoDB performance turbocharging

Introduction This is the second part of our MongoDB time series tutorial, and this post will be dedicated to performance tuning. In my previous post, I introduced you into our virtual project requirements. In short, we have 50M time events, spanning from the 1st of January 2012 to the 1st of January 2013, with the following structure: We’d like to aggregate the minimum, the maximum, and the average value as well as the entries count for the following discrete time samples: all seconds in a minute all minutes in an hour all… Read More

MongoDB time series: Introducing the aggregation framework

In my previous posts I talked about batch importing and the out-of-the-box MongoDB performance. Meanwhile, MongoDB was awarded DBMS of the year 2013, so I therefore decided to offer a more thorough analyze of its real-life usage. Because a theory is better understood in a pragmatic context, I will first present you our virtual project requirements. Introduction Our virtual project has the following requirements: it must store valued time events represented as v=f(t) it must aggregate the minimum, maximum, average and count records by: seconds in a minute minutes in an hour… Read More

NoSQL is not just about BigData

Introduction There is so much debate on the SQL vs NoSQL subject, and probably this is our natural way of understanding and learning what’s the best way of storing data. After publishing the small experiment on MongoDB aggregating framework, I was challenged by the JOOQ team to match my results against Oracle. Matching MongoDB and Oracle is simply honoring Mongo, as Oracle is probably the best SQL DB engine. Being a simple experiment, it’s dangerous to draw any conclusion, since I was only testing the out-of-the-box Mongo performance, without taking advantage of… Read More

MongoDB Facts: Lightning fast aggregation

In my previous post, I demonstrated how fast you can insert 50 millions time-event entries with MongoDB. This time, we will make use of all that data to fuel our aggregation tests.

MongoDB Facts: 80000+ inserts/second on commodity hardware

Introduction While experimenting with some time series collections I needed a large data set to check that our aggregation queries don’t become a bottleneck in case of increasing data loads. We settled for 50 million documents since beyond this number we would consider sharding anyway. Each time event looks like this: As we wanted to get random values, we thought of generating them using JavaScript or Python (we could have tried in in Java, but we wanted to write it as fast as possible). We didn’t know which one will be faster… Read More

Optimistic locking retry with MongoDB

In my previous post I talked about the benefit of employing optimistic locking for MongoDB batch processors. As I wrote before, the optimistic locking exception is a recoverable one, as long as we fetch the latest Entity, we update and save it. Because we are using MongoDB we don’t have to worry about local or XA transactions. In a future post, I’ll demonstrate how you can build the same mechanism when using JPA. The Spring framework offers a very good AOP support and, therefore, it makes easy implementing an automatic retry mechanism,… Read More