Akiban Server Progress with Drupal 7
The call for papers for DrupalCon Munich closed on Friday and I submitted a session related to the work I’m doing on developing a database module for the Akiban Server with Drupal 7. That work has not been open sourced yet but will be before August. We also plan on open sourcing and releasing the Akiban Server for public download by August as well. The end result of this work will be a database driver for the Akiban Server that will allow Drupal 7 to run on Akiban.
In this post, I wanted to briefly show the type of results I’ve been seeing from running Drupal on Akiban. To do this, I constructed a simple view using the Views module and benchmarked the query that resulted from this view.
Environment Setup
All results were gathered on EC2 instances. The base AMI used for these results is an official AMI of Ubuntu 10.04 provided by Canonical. The particular AMI used as the base image for the results gathered in this post was ami-0baf7662.
Images used were all launched in the US-EAST-1A availability zone and were large instance types. After launching this base image I installed MySQL 5.6 and Drupal 7.12. The steps I took to install these components along with the my.cnf
file I used for MySQL are outlined in this gist.
I also created an AMI from the running instance after all the steps outlined were performed. This AMI has MySQL 5.6 installed along with Drupal 7.12 and data generated with drush.
The Akiban AMI cannot be made available for general download yet since we have not open-sourced our stack as of this time. Once our stack has been open-sourced I will update this post with a link to an AMI that can be downloaded. However, if you are interested in seeing the results here for yourself, feel free to contact me and I should be able to grant access to an EC2 instance for testing.
Data Generation
I used drush and the devel modules to generate data so the view would be operating on some data. I generated the following data:
users50000
tags1000
vocabularies5000
menus5000
nodes100000
max comments per node10
View and SQL Query
I created a simple view that filters and sorts on content criteria. A screenshot of my view construction page can be seen here.
The resulting SQL query that gets executed by this view is:
<span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="n">node</span><span class="p">.</span><span class="n">title</span> <span class="k">AS</span> <span class="n">node_title</span><span class="p">,</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="k">AS</span> <span class="n">nid</span><span class="p">,</span> <span class="n">node_comment_statistics</span><span class="p">.</span><span class="n">comment_count</span> <span class="k">AS</span> <span class="n">node_comment_statistics_comment_count</span><span class="p">,</span> <span class="n">node</span><span class="p">.</span><span class="n">created</span> <span class="k">AS</span> <span class="n">node_created</span> <span class="k">FROM</span> <span class="n">node</span> <span class="n">node</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">node_comment_statistics</span> <span class="n">node_comment_statistics</span> <span class="k">ON</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="o">=</span> <span class="n">node_comment_statistics</span><span class="p">.</span><span class="n">nid</span> <span class="k">WHERE</span> <span class="p">((</span> <span class="p">(</span> <span class="n">node</span><span class="p">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'1'</span> <span class="p">)</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">node</span><span class="p">.</span><span class="k">comment</span> <span class="k">IN</span> <span class="p">(</span> <span class="s1">'2'</span> <span class="p">)</span> <span class="p">)</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="o">>=</span> <span class="s1">'111'</span> <span class="p">)</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">node_comment_statistics</span><span class="p">.</span><span class="n">comment_count</span> <span class="o">>=</span> <span class="s1">'2'</span> <span class="p">)</span> <span class="p">))</span><span class="k">ORDER</span> <span class="k">BY</span> <span class="n">node_created</span> <span class="k">ASC</span> <span class="k">LIMIT</span> <span class="mi">50</span> <span class="k">offset</span> <span class="mi">0</span>
Performance Comparison
The response time of the query in Akiban versus MySQL is shown below.
As seen in the image above, Akiban can execute the query in question in 5 ms or less whereas MySQL consistently takes 1200 ms to execute the query. In the next section I’ll go into details of how Akiban executes this query.
Secondly, numbers were obtained using the mysqlslap benchmark tool from MySQL to demonstrate how Akiban performs versus MySQL with varying degrees of concurrency.
MySQL Execution Plan
Using Maatkit to visualize the MySQL execution plan, we get:
JOIN+- Filter with WHERE| +- Bookmark lookup| +- Table| | table node_comment_statistics| | possible_keys PRIMARY,comment_count| +- Unique index lookup| key node_comment_statistics->PRIMARY| possible_keys PRIMARY,comment_count| key_len 4| ref drupal.node.nid| rows 1+- Table scan +- TEMPORARY table temporary<span class="o">(</span>node<span class="o">)</span> +- Filter with WHERE +- Bookmark lookup +- Table | table node | possible_keys PRIMARY,node_status_type +- Index scan key node->node_created possible_keys PRIMARY,node_status_type key_len 4 rows 100
MySQL chooses to start from the node table and scans an index on the created column. A temporary table is then created in memory to store the results of this index scan. The items stored in the temporary table are then processed to eliminate duplicates (for the DISTINCT). For each distinct row in the temporary table, MySQL then performs a join to the node_comment_statistics
table by performing an index lookup using its primary key.
Akiban Execution Plan
The tables involved in the query fall into a single table group in Akiban - the node group. Grouping is explained by our CTO in this post and that post includes a grouping for Drupal where you can see the node group. For this query, it means all joins within the node group are executed with essentially zero cost. It also allows for the creation of Akiban group indexes. A group index is an index that can span multiple tables along a single branch within a table group.
A covering group index for this query is:
CREATE INDEX cvr_gi ON node<span class="o">(</span> node.status, node.comment, node.created, node.nid, node_comment_statistics.comment_count, node_comment_statistics.nid, node.title<span class="o">)</span> USING LEFT JOIN
Notice that the node.created
column is included in this index so a sort could be avoided.
The other large advantage Akiban brings when executing this query is the query optimizer is intelligent enough to determine that the DISTINCT is not required in the query due to the 1-to-1 mapping between node
and node_comment_statistics
and the fact that an INNER JOIN is being performed between these 2 tables.
Limit_Default<span class="o">(</span><span class="nv">limit</span><span class="o">=</span>50: project<span class="o">([</span>Field<span class="o">(</span>6<span class="o">)</span>, Field<span class="o">(</span>3<span class="o">)</span>, Field<span class="o">(</span>4<span class="o">)</span>, Field<span class="o">(</span>2<span class="o">)]))</span> project<span class="o">([</span>Field<span class="o">(</span>6<span class="o">)</span>, Field<span class="o">(</span>3<span class="o">)</span>, Field<span class="o">(</span>4<span class="o">)</span>, Field<span class="o">(</span>2<span class="o">)])</span> Select_HKeyOrdered<span class="o">(</span>Index<span class="o">(</span>cvr_gi<span class="o">(</span>BoolLogic<span class="o">(</span>AND -> Field<span class="o">(</span>3<span class="o">)</span> ><span class="o">=</span> Literal<span class="o">(</span>111<span class="o">)</span>, Field<span class="o">(</span>4<span class="o">)</span> ><span class="o">=</span> Literal<span class="o">(</span>2<span class="o">)</span> -> BOOL<span class="o">))</span> IndexScan_Default<span class="o">(</span>Index<span class="o">(</span>cvr_gi<span class="o">(</span>><span class="o">=</span>UnboundExpressions<span class="o">[</span>Literal<span class="o">(</span>1<span class="o">)</span>, Literal<span class="o">(</span>2<span class="o">)]</span>,<<span class="o">=</span>UnboundExpressions<span class="o">[</span>Literal<span class="o">(</span>1<span class="o">)</span>, Literal<span class="o">(</span>2<span class="o">)]))</span>
The above execution plan is in the Akiban format. In this format, you read the plan like a tree so we start from the leaf nodes. The above plan starts with a scan of the cvr_gi
index using the node.status
and node.comment
predicates. It then filters rows from this scan (the Select_HKeyOrdered
operator performs this filtering) before limiting the results to the size of the result set requested.
Conclusion
To wrap up, I briefly showed some of the performance benefits we are seeing when running Drupal 7 on the Akiban Server. In the not too distant future, we will be open sourcing our stack here at Akiban and providing downloads of the Akiban Server. I will also be making the database driver for the Akiban Server for Drupal 7 available for download on drupal.org once its complete.
If you are interested in trying this out yourself or want to verify the results before this work becomes publically available, feel free to contact me and I should be able to set you up with access to an EC2 instance so you try if for yourself.