MongoDB and the fine art of data modeling

Introduction This is the third part of our MongoDB time series tutorial, and this post will emphasize the importance of data modeling. You might want to check the first part of this series, to get familiar with our virtual project requirements and the second part talking about common optimization techniques. When you first start using MongoDB, you’ll immediately notice it’s schema-less data model. But schema-less doesn’t mean skipping proper data modeling (satisfying your application business and performance requirements). As opposed to a SQL database, a NoSQL document model is more focused towards… Read More

MongoDB time series: Introducing the aggregation framework

In my previous posts I talked about batch importing and the out-of-the-box MongoDB performance. Meanwhile, MongoDB was awarded DBMS of the year 2013, so I therefore decided to offer a more thorough analyze of its real-life usage. Because a theory is better understood in a pragmatic context, I will first present you our virtual project requirements. Introduction Our virtual project has the following requirements: it must store valued time events represented as v=f(t) it must aggregate the minimum, maximum, average and count records by: seconds in a minute minutes in an hour… Read More

MongoDB Facts: 80000+ inserts/second on commodity hardware

Introduction While experimenting with some time series collections I needed a large data set to check that our aggregation queries don’t become a bottleneck in case of increasing data loads. We settled for 50 million documents since beyond this number we would consider sharding anyway. Each time event looks like this: As we wanted to get random values, we thought of generating them using JavaScript or Python (we could have tried in in Java, but we wanted to write it as fast as possible). We didn’t know which one will be faster… Read More