Comparing PostgreSQL 9.1 vs. MySQL 5.6 using Drupal 7.x
Its tough to come across much information about running Drupal on PostgreSQL I find beisdes the basics of installing Drupal on PostgreSQL. In particular, I’m interested in comparisons of running Drupal on PostgreSQL versus MySQL. Previous posts such as this article from 2bits compares performance of MySQL versus PostgreSQL on Drupal 5.x and seems a bit outdated. This post from the high performance drupal group is also pretty dated and has some information with similar comparisons.
In this post, I wanted to run similar tests to what was done in the article from 2bits but on a more recent version of Drupal - 7.x. I also wanted to test out a few more complex queries that can get generated by the view module and see how they perform in MySQL versus PostgreSQL.
For this post, I used the latest GA version of PostgreSQL and for kicks, I went with an aplha release of MySQL - 5.6. I would expect to see similar results for 5.5 in tests like this. I didn’t use default configurations after installation since I didn’t see much benefit in testing that. The configurations I used for both systems are documented below.
Environment Setup
All results were gathered on EC2 instances. The base AMI used for these results is an official AMI of Ubuntu 10.04 provided by Canonical. The particular AMI used as the base image for the results gathered in this post was ami-0baf7662.
Images used were all launched in the US-EAST-1A availability zone and were large instance types. After launching this base image I installed MySQL 5.6 and Drupal 7.12. The steps I took to install these components along with the my.cnf
file I used for MySQL are outlined in this gist.
The PostgreSQL 9.1 setup I performed on a separate instance along with the postgresql.conf
settings I used are outlined in this gist.
APC was installed and its default configuration was used on both servers.
Data Generation
I used drush and the devel modules to generate data. I generated the following data:
users50000
tags1000
vocabularies5000
menus5000
nodes100000
max comments per node10
I generated this data in the MySQL installation first. The data was then migrated to the PostgreSQL instance using the dbtng_migrator module. This ensures the same data is used for all tests against MySQL and PostgreSQL. I covered how to perform this migration in a previous post.
pgbouncer
One additional setup item I performed for PostgreSQL was to install pgbouncer and configure Drupal to connect through pgbouncer
instead of directly to PostgreSQL.
Installation and configuration on Ubuntu 10.04 is straightforward. The steps to install pgbouncer
and the configuration I used are outlined in this gist.
The main reason for this change is the ApacheBench based test unfairly favors MySQL due to its process model. Each connection results in a new thread being spawned whereas with PostgreSQL, each new connection results in a new process being forked. The overhead of forking a new process is much larger than spawning a new thread. I did collect numbers for PostgreSQL without using pgbouncer
and I do report them in the ApacheBench test section below.
pgbouncer
maintains a connection pool that Drupal connects so in my settings.php
file for my Drupal PostgreSQL instance, I modified my database settings to be:
<span class="x">$databases = array (</span><span class="x"> 'default' =></span><span class="x"> array (</span><span class="x"> 'default' =></span><span class="x"> array (</span><span class="x"> 'database' => 'drupal',</span><span class="x"> 'username' => 'drupal',</span><span class="x"> 'password' => 'drupal',</span><span class="x"> 'host' => 'localhost',</span><span class="x"> 'port' => '6432',</span><span class="x"> 'driver' => 'pgsql',</span><span class="x"> 'prefix' => '',</span><span class="x"> ),</span><span class="x"> ),</span><span class="x">);</span>
I performed this configuration step after I generated data in MySQL and migrated it to PostgreSQL.
Anonymous Users Testing with ApacheBench
First, loading the front page for each Drupal site with the devel module enabled and reporting on query execution times, the following was reported:
DatabaseQuery Exec Times
MySQLExecuted 65 queries in 31.69 ms
PostgreSQL (with pgbouncer)
Executed 66 queries in 49.84 ms
PostgreSQL
Executed 66 queries in 95 ms
Straight out the gate, we can see there is not much difference here. 31 versus 50 ms is not going to be felt by many end users. If pgbouncer
is not used, query execution time is 3 times slower though.
Next, I went to do some simple benchmarks using ApacheBench. The command used to run ab
was (the number of concurrent connections, X, was the only parameter varied):
ab -c X -n 100 http://drupal.url.com/
The ab
command was always run from a separate EC2 instance in the same availability zone and never on the same instance as which Drupal was running.
Results obtained with default Drupal configuration (page cache disabled) but all other caching enabled are shown in the figure below. The raw numbers are presented in the table after the figure.
Databasec = 1c = 5c = 10
MySQL11.7116.5316.28
PostgreSQL (using pgbouncer)8.4411.0311.10
PostgreSQL4.817.327.22
The next test was run after all caches were cleared using drush
. The command issued was:
drush cc
Option 1 was then chosen to clear all caches. This was done before each ab
command was run. Results are shown in the figure with raw numbers presented in the table after the figure.
Databasec = 1c = 5c = 10
MySQL10.5014.086.28
PostgreSQL (using pgbouncer)7.929.237.32
PostgreSQL57.046.79
Finally, the same test was run with Drupal’s page cache enabled. Results are shown in the figure below with raw numbers presented in the table after the figure.
Databasec = 1c = 5c = 10
MySQL144282267
PostgreSQL (using pgbouncer)120205202
PostgreSQL354546
Views Queries
The views module is known to sometimes generate queries that can cause performance problems for MySQL.
Image Gallery View
The first SQL query I want to look is generated by one of the sample templates that come with the Views module. If you click ‘Add view from template’ in the Views module, by default, you will only have 1 template to choose from - the Image Gallery template. After creating a view from this template and not modifying anything about that view, I see 2 problematic queries being generated.
The first query is a query that counts the number of the rows in the result set for this view since this is a paginated view. The second query actually retrieves the results with a LIMIT clause and the appropriate OFFSET dependending on what page of the results the user is currently on. For this post, we’ll just look at the second query that retries results. That query is:
<span class="k">SELECT</span> <span class="n">taxonomy_index</span><span class="p">.</span><span class="n">tid</span> <span class="k">AS</span> <span class="n">taxonomy_index_tid</span><span class="p">,</span> <span class="n">taxonomy_term_data</span><span class="p">.</span><span class="n">name</span> <span class="k">AS</span> <span class="n">taxonomy_term_data_name</span><span class="p">,</span> <span class="k">Count</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">nid</span><span class="p">)</span> <span class="k">AS</span> <span class="n">num_records</span> <span class="k">FROM</span> <span class="n">node</span> <span class="n">node</span> <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">users</span> <span class="n">users_node</span> <span class="k">ON</span> <span class="n">node</span><span class="p">.</span><span class="n">uid</span> <span class="o">=</span> <span class="n">users_node</span><span class="p">.</span><span class="n">uid</span> <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">field_data_field_image</span> <span class="n">field_data_field_image</span> <span class="k">ON</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="o">=</span> <span class="n">field_data_field_image</span><span class="p">.</span><span class="n">entity_id</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">field_data_field_image</span><span class="p">.</span><span class="n">entity_type</span> <span class="o">=</span> <span class="s1">'node'</span> <span class="k">AND</span> <span class="n">field_data_field_image</span><span class="p">.</span><span class="n">deleted</span> <span class="o">=</span> <span class="s1">'0'</span> <span class="p">)</span> <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">taxonomy_index</span> <span class="n">taxonomy_index</span> <span class="k">ON</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="o">=</span> <span class="n">taxonomy_index</span><span class="p">.</span><span class="n">nid</span> <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">taxonomy_term_data</span> <span class="n">taxonomy_term_data</span> <span class="k">ON</span> <span class="n">taxonomy_index</span><span class="p">.</span><span class="n">tid</span> <span class="o">=</span> <span class="n">taxonomy_term_data</span><span class="p">.</span><span class="n">tid</span> <span class="k">WHERE</span> <span class="p">((</span> <span class="p">(</span> <span class="n">field_data_field_image</span><span class="p">.</span><span class="n">field_image_fid</span> <span class="k">IS</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="p">)</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">node</span><span class="p">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'1'</span> <span class="p">)</span> <span class="p">))</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">taxonomy_term_data_name</span><span class="p">,</span> <span class="n">taxonomy_index_tid</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">num_records</span> <span class="k">ASC</span> <span class="k">LIMIT</span> <span class="mi">24</span> <span class="k">offset</span> <span class="mi">0</span>
The response time of the query in MySQL versus PostgreSQL is shown in the figure below.
As seen in the image above, PostgreSQL can execute the query in question in 300ms or less whereas MySQL consistently takes 2800 ms to execute the query.
The MySQL execution plan looks like:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: field_data_field_image <span class="nb">type</span>: refpossible_keys: PRIMARY,entity_type,deleted,entity_id,field_image_fid key: PRIMARY key_len: 386 ref: const rows: 19165 Extra: Using where; Using temporary; Using filesort*************************** 2. row *************************** id: 1 select_type: SIMPLE table: node <span class="nb">type</span>: eq_refpossible_keys: PRIMARY,node_status_type key: PRIMARY key_len: 4 ref: drupal.field_data_field_image.entity_id rows: 1 Extra: Using where*************************** 3. row *************************** id: 1 select_type: SIMPLE table: users_node <span class="nb">type</span>: eq_refpossible_keys: PRIMARY key: PRIMARY key_len: 4 ref: drupal.node.uid rows: 1 Extra: Using where; Using index*************************** 4. row *************************** id: 1 select_type: SIMPLE table: taxonomy_index <span class="nb">type</span>: refpossible_keys: nid key: nid key_len: 4 ref: drupal.field_data_field_image.entity_id rows: 1 Extra: NULL*************************** 5. row *************************** id: 1 select_type: SIMPLE table: taxonomy_term_data <span class="nb">type</span>: eq_refpossible_keys: PRIMARY key: PRIMARY key_len: 4 ref: drupal.taxonomy_index.tid rows: 1 Extra: NULL
MySQL starts from the field_date_field_image
table and since there is no selective predicates in the query, chooses to scan the table using the PRIMARY
key of the table. It then filters the rows scanned using the field_image_fid IS NOT NULL
predicate. Since MySQL only has 1 join algorithm, nested loops, it is used to perform the remainder of the joins. A temporary table is created in memory to store the results of these joins. This is then sorted and the result set limited to the 24 requested.
The PostgreSQL execution plan looks drastically different.
Limit <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>11712.83..11712.89 <span class="nv">rows</span><span class="o">=</span>24 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> -> Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>11712.83..11829.24 <span class="nv">rows</span><span class="o">=</span>46564 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> Sort Key: <span class="o">(</span>count<span class="o">(</span>node.nid<span class="o">))</span> -> HashAggregate <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>9946.90..10412.54 <span class="nv">rows</span><span class="o">=</span>46564 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> -> Hash Left Join <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>6174.69..9597.67 <span class="nv">rows</span><span class="o">=</span>46564 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> Hash Cond: <span class="o">(</span>taxonomy_index.tid <span class="o">=</span> taxonomy_term_data.tid<span class="o">)</span> -> Hash Right Join <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>6140.19..8922.92 <span class="nv">rows</span><span class="o">=</span>46564 <span class="nv">width</span><span class="o">=</span>12<span class="o">)</span> Hash Cond: <span class="o">(</span>taxonomy_index.nid <span class="o">=</span> node.nid<span class="o">)</span> -> Seq Scan on taxonomy_index <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..1510.18 <span class="nv">rows</span><span class="o">=</span>92218 <span class="nv">width</span><span class="o">=</span>16<span class="o">)</span> -> Hash <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>5657.14..5657.14 <span class="nv">rows</span><span class="o">=</span>38644 <span class="nv">width</span><span class="o">=</span>4<span class="o">)</span> -> Hash Join <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>2030.71..5657.14 <span class="nv">rows</span><span class="o">=</span>38644 <span class="nv">width</span><span class="o">=</span>4<span class="o">)</span> Hash Cond: <span class="o">(</span>node.nid <span class="o">=</span> field_data_field_image.entity_id<span class="o">)</span> -> Seq Scan on node <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..2187.66 <span class="nv">rows</span><span class="o">=</span>76533 <span class="nv">width</span><span class="o">=</span>8<span class="o">)</span> Filter: <span class="o">(</span><span class="nv">status</span> <span class="o">=</span> 1<span class="o">)</span> -> Hash <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>1547.66..1547.66 <span class="nv">rows</span><span class="o">=</span>38644 <span class="nv">width</span><span class="o">=</span>8<span class="o">)</span> -> Seq Scan on field_data_field_image <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..1547.66 <span class="nv">rows</span><span class="o">=</span>38644 <span class="nv">width</span><span class="o">=</span>8<span class="o">)</span> Filter: <span class="o">((</span>field_image_fid IS NOT NULL<span class="o">)</span> AND <span class="o">((</span>entity_type<span class="o">)</span>::text <span class="o">=</span> <span class="s1">'node'</span>::text<span class="o">)</span> AND <span class="o">(</span><span class="nv">deleted</span> <span class="o">=</span> 0::smallint<span class="o">))</span> -> Hash <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>22.00..22.00 <span class="nv">rows</span><span class="o">=</span>1000 <span class="nv">width</span><span class="o">=</span>12<span class="o">)</span> -> Seq Scan on taxonomy_term_data <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..22.00 <span class="nv">rows</span><span class="o">=</span>1000 <span class="nv">width</span><span class="o">=</span>12<span class="o">)</span>
PostgreSQL has a number of other join algorithms available for use. In particular, for this query, the optimizer has decided that a hash join is the optimal choice.
PostgreSQL starts by scanning the tiny (1000 rows) taxonomy_term_data
table and constructing an in-memory hash table (the build phase in a hash join). It then probes this hash table for possible matches of taxonomy_index.tid = taxonomy_term_data.tid
for each row that results from a hash join of taxonomy_index
and node
. This hash join was a result of the field_data_field_image
and node
table being join with the field_data_field_image
being used to build a hash table and a sequential scan of node
being used to probe that hash table. Aggregation is then performed and the result set is then sorted by the aggregated value (in this case a count of node ids). Finally, the result set is limited to 24.
One neat thing about PostgreSQL is planner nodes can be disabled. So to make PostgreSQL execute the query in a similar manner to MySQL, I did:
<span class="nv">drupal</span><span class="o">=</span>> <span class="nb">set </span><span class="nv">enable_hashjoin</span><span class="o">=</span>off;SET<span class="nv">drupal</span><span class="o">=</span>> <span class="nb">set </span><span class="nv">enable_hashagg</span><span class="o">=</span>off;SET<span class="nv">drupal</span><span class="o">=</span>> <span class="nb">set </span><span class="nv">enable_mergejoin</span><span class="o">=</span>off;SET<span class="nv">drupal</span><span class="o">=</span>>
And the execution plan PostgreSQL chose then was:
Limit <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>52438.04..52438.10 <span class="nv">rows</span><span class="o">=</span>24 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> -> Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>52438.04..52552.82 <span class="nv">rows</span><span class="o">=</span>45913 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> Sort Key: <span class="o">(</span>count<span class="o">(</span>node.nid<span class="o">))</span> -> GroupAggregate <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>50237.67..51155.93 <span class="nv">rows</span><span class="o">=</span>45913 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> -> Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>50237.67..50352.45 <span class="nv">rows</span><span class="o">=</span>45913 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> Sort Key: taxonomy_term_data.name, taxonomy_index.tid -> Nested Loop Left Join <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..46682.48 <span class="nv">rows</span><span class="o">=</span>45913 <span class="nv">width</span><span class="o">=</span>20<span class="o">)</span> -> Nested Loop Left Join <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..33783.81 <span class="nv">rows</span><span class="o">=</span>45913 <span class="nv">width</span><span class="o">=</span>12<span class="o">)</span> -> Nested Loop <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..18575.38 <span class="nv">rows</span><span class="o">=</span>38644 <span class="nv">width</span><span class="o">=</span>4<span class="o">)</span> -> Seq Scan on field_data_field_image <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..1547.66 <span class="nv">rows</span><span class="o">=</span>38644 <span class="nv">width</span><span class="o">=</span>8<span class="o">)</span> Filter: <span class="o">((</span>field_image_fid IS NOT NULL<span class="o">)</span> AND <span class="o">((</span>entity_type<span class="o">)</span>::text <span class="o">=</span> <span class="s1">'node'</span>::text<span class="o">)</span> AND <span class="o">(</span><span class="nv">deleted</span> <span class="o">=</span> 0::smallint<span class="o">))</span> -> Index Scan using node_pkey on node <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..0.43 <span class="nv">rows</span><span class="o">=</span>1 <span class="nv">width</span><span class="o">=</span>8<span class="o">)</span> Index Cond: <span class="o">(</span><span class="nv">nid</span> <span class="o">=</span> field_data_field_image.entity_id<span class="o">)</span> Filter: <span class="o">(</span><span class="nv">status</span> <span class="o">=</span> 1<span class="o">)</span> -> Index Scan using taxonomy_index_nid_idx on taxonomy_index <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..0.36 <span class="nv">rows</span><span class="o">=</span>3 <span class="nv">width</span><span class="o">=</span>16<span class="o">)</span> Index Cond: <span class="o">(</span>node.nid <span class="o">=</span> nid<span class="o">)</span> -> Index Scan using taxonomy_term_data_pkey on taxonomy_term_data <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..0.27 <span class="nv">rows</span><span class="o">=</span>1 <span class="nv">width</span><span class="o">=</span>12<span class="o">)</span> Index Cond: <span class="o">(</span>taxonomy_index.tid <span class="o">=</span> tid<span class="o">)</span>
The above plan takes 2 seconds to execute against PostgreSQL. You can see it is very similar to the MySQL plan. It starts with the field_data_field_image
table and performs nested loop joins to join the remainder of the tables. In this case, a sort must be performed before the aggregation that is expensive to perform. Using the HashAggregate operator in PostgreSQL would greatly reduce that cost.
So you can see out of the box, PostgreSQL performs much better on this query.
Simple View
I created a simple view that filters and sorts on content criteria. A screenshot of my view construction page can be seen here.
The resulting SQL query that gets executed by this view is:
<span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="n">node</span><span class="p">.</span><span class="n">title</span> <span class="k">AS</span> <span class="n">node_title</span><span class="p">,</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="k">AS</span> <span class="n">nid</span><span class="p">,</span> <span class="n">node_comment_statistics</span><span class="p">.</span><span class="n">comment_count</span> <span class="k">AS</span> <span class="n">node_comment_statistics_comment_count</span><span class="p">,</span> <span class="n">node</span><span class="p">.</span><span class="n">created</span> <span class="k">AS</span> <span class="n">node_created</span> <span class="k">FROM</span> <span class="n">node</span> <span class="n">node</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">node_comment_statistics</span> <span class="n">node_comment_statistics</span> <span class="k">ON</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="o">=</span> <span class="n">node_comment_statistics</span><span class="p">.</span><span class="n">nid</span> <span class="k">WHERE</span> <span class="p">((</span> <span class="p">(</span> <span class="n">node</span><span class="p">.</span><span class="n">status</span> <span class="o">=</span> <span class="s1">'1'</span> <span class="p">)</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">node</span><span class="p">.</span><span class="k">comment</span> <span class="k">IN</span> <span class="p">(</span> <span class="s1">'2'</span> <span class="p">)</span> <span class="p">)</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">node</span><span class="p">.</span><span class="n">nid</span> <span class="o">>=</span> <span class="s1">'111'</span> <span class="p">)</span> <span class="k">AND</span> <span class="p">(</span> <span class="n">node_comment_statistics</span><span class="p">.</span><span class="n">comment_count</span> <span class="o">>=</span> <span class="s1">'2'</span> <span class="p">)</span> <span class="p">))</span><span class="k">ORDER</span> <span class="k">BY</span> <span class="n">node_created</span> <span class="k">ASC</span> <span class="k">LIMIT</span> <span class="mi">50</span> <span class="k">offset</span> <span class="mi">0</span>
The response time of the query in MySQL versus PostgreSQL is shown in the figure below.
As seen in the image above, PostgreSQL can execute the query in question in 200ms or less whereas MySQL can take up to 1000 ms to execute the query.
The MySQL execution plan looks like:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: node <span class="nb">type</span>: indexpossible_keys: PRIMARY,node_status_type key: node_created key_len: 4 ref: NULL rows: 100 Extra: Using where; Using temporary*************************** 2. row *************************** id: 1 select_type: SIMPLE table: node_comment_statistics <span class="nb">type</span>: eq_refpossible_keys: PRIMARY,comment_count key: PRIMARY key_len: 4 ref: drupal.node.nid rows: 1 Extra: Using where
MySQL chooses to start from the node
table and scans an index on the created column. A temporary table is then created in memory to store the results of this index scan. The items stored in the temporary table are then processed to eliminate duplicates (for the DISTINCT
). For each distinct row in the temporary table, MySQL then performs a join to the node_comment_statistics
table by performing an index lookup using its primary key.
The PostgreSQL execution plan for this query looks like:
Limit <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>6207.15..6207.27 <span class="nv">rows</span><span class="o">=</span>50 <span class="nv">width</span><span class="o">=</span>42<span class="o">)</span> -> Sort <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>6207.15..6250.75 <span class="nv">rows</span><span class="o">=</span>17441 <span class="nv">width</span><span class="o">=</span>42<span class="o">)</span> Sort Key: node.created -> HashAggregate <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>5453.36..5627.77 <span class="nv">rows</span><span class="o">=</span>17441 <span class="nv">width</span><span class="o">=</span>42<span class="o">)</span> -> Hash Join <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>1985.31..5278.95 <span class="nv">rows</span><span class="o">=</span>17441 <span class="nv">width</span><span class="o">=</span>42<span class="o">)</span> Hash Cond: <span class="o">(</span>node.nid <span class="o">=</span> node_comment_statistics.nid<span class="o">)</span> -> Seq Scan on node <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..2589.32 <span class="nv">rows</span><span class="o">=</span>38539 <span class="nv">width</span><span class="o">=</span>34<span class="o">)</span> Filter: <span class="o">((</span>nid ><span class="o">=</span> 111<span class="o">)</span> AND <span class="o">(</span><span class="nv">status</span> <span class="o">=</span> 1<span class="o">)</span> AND <span class="o">(</span><span class="nv">comment</span> <span class="o">=</span> 2<span class="o">))</span> -> Hash <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>1546.22..1546.22 <span class="nv">rows</span><span class="o">=</span>35127 <span class="nv">width</span><span class="o">=</span>16<span class="o">)</span> -> Seq Scan on node_comment_statistics <span class="o">(</span><span class="nv">cost</span><span class="o">=</span>0.00..1546.22 <span class="nv">rows</span><span class="o">=</span>35127 <span class="nv">width</span><span class="o">=</span>16<span class="o">)</span> Filter: <span class="o">(</span>comment_count ><span class="o">=</span> 2::bigint<span class="o">)</span>
PostgreSQL chooses to start by scanning the node_comment_statistics
table and building an in-memory hash table. This hash table is then probed for possible mathces of node.nid = node_comment_statistics.nid
for each row that results from a sequential scan of the node
table. The result of this hash join is then aggregated (for the DISTINCT
) before being sorted and limited to 50 rows.
Its worth noting that with out of the box settings, the above query would do a disk based sort (sort method is viewable using EXPLAIN ANALYZE
in PostgreSQL). When doing a disk based sort, the query takes about 450 ms to execute. I was running all my tests with work_mem
set to 4MB though which results in a top-N heapsort being used.
Conclusion
In my opinion, the only issue with using PostgreSQL as your Drupal database is that some contributed modules will not work out of the box with that configuration.
Certainly, from a performance point of view, I see no issues with using PostgreSQL with Drupal. In fact, for Drupal sites using the Views module (probably the majority), I would say PostgreSQL is probably even a better option than MySQL due to its more advanced optimizer and execution engine. This does assume pgbouncer
is being used and Drupal is not connecting directly to PostgreSQL. Users who do not use pgbouncer
and perform simple benchmarks like the ones I did with ab
are likely to see poor performance against PostgreSQL.
I’m working a lot with Drupal on PostgreSQL these days. I’ll be sure to share any interesting experiences I have here.