Field Storage Tests with Drupal 7
I had some spare time this weekend and decided to do some tests with the field storage layer. I really just wanted to re-produce the results Moshe Weitzman published a while back. I also wanted to see what the best results I could get were.
Environment Details
The software and versions used for testing were:
- EC2 EBS backed Large instance (8GB of memory) in the US-EAST availability zone
- Ubuntu 12.04 (ami-fd20ad94 as listed in official ubuntu AMI’s)
- MySQL 5.5.28
- PostgreSQL 9.2
- MongoDB 2.0.4
- Drupal 7.17
- Drush 5.1
- Migrate 2.5
I ran tests against both MySQL and PostgreSQL with default settings for both but I also ran tests where I modified the configuration of both systems to be optimized for writes.
The configuration options I specified for MySQL when tuning it were:
<span class="go">innodb_flush_log_at_trx_commit=0</span><span class="go">innodb_doublewrite=0</span><span class="go">log-bin=0</span><span class="go">innodb_support_xa=0</span><span class="go">innodb_buffer_pool_size=6G</span><span class="go">innodb_log_file_size=512M</span>
The configuration options I specified for PostgreSQL when tuning it were:
<span class="go">fsync = off</span><span class="go">synchronous_commit = off</span><span class="go">wal_writer_delay = 10000ms</span><span class="go">wal_buffers = 16MB</span><span class="go">checkpoint_segments = 64</span><span class="go">shared_buffers = 6GB</span>
Dataset
The dataset used for the tests comes from the migrate_example_baseball module that comes as part of the migrate module. This dataset contains a box score from every Major League Baseball game from the year 2000 to the year 2009. Each year’s data is contained in CSV file. Different components of the box score are saved in fields hence stressing field storage a lot.
Results
Average throughput numbers for the various configurations I tested are shown in the table below.
Environment
Average Throughput
Default MySQL
1932 nodes / minute
Default PostgreSQL
1649 nodes / minute
Tuned MySQL
3024 nodes / minute
Tuned PostgreSQL
1772 nodes / minute
Default MySQL with MongoDB
4609 nodes / minute
Default PostgreSQL with MongoDB
4810 nodes / minute
Tuned MySQL with MongoDB
7671 nodes / minute
Tuned PostgreSQL with MongoDB
5911 nodes / minute
The image below shows the results graphically for different environments I tested. The Y axis is throughput (node per minute) with the X axis specifying the CSV file (corresponding to a MLB year) being imported.
Conclusion
Its pretty obvious from glancing at the results above that using MongoDB for field storage results in the best throughput. Tuned MySQL using MongoDB for field storage gave me the best results. This is consistent with what Moshe reported in his original article as well.
What was very interesting to me was the PostgreSQL numbers. The overhead of having a table per field with the default SQL field storage seems to be very high with PostgreSQL. Its interesting to see how much better an optimized PostgreSQL does when using MongoDB for field storage.
After performing these tests, one experiment I really want to try now is to create a field storage module for PostgreSQL that uses the JSON data type included in the 9.2 release. Hopefully, I will get some spare time in the coming week or two to work on that.