Digging into Drupal's Schema
I’m relatively new to Drupal internals and most of the work I do is on the database side. While searching for information on Drupal’s schema, I found very little. During my research, I put together an ER diagram of the schema installed by Drupal 7 (D8 is very similar with only 3 extra tables at time of writing) and decided to share my work. Note that the relationships I discuss here are based on the foreign key documentation that exists in core and my understanding of what I believe other relationships could be. Corrections and comments are very much welcome.
Overview
I’ll start off by showing my complete ER diagram below. You will see I grouped tables I found to be related in colored boxes. The image below is just meant to give a general overview of the schema. I will be diving into different parts of the schema in this post. I created this diagram using MySQL Workbench and the model can be downloaded from here if someone wishes to open this up in Workbench. This gist also shows the ALTER TABLE
SQL statements that would need to be issued to actually create these foreign keys in MySQL. I would not recommend doing this right now with Drupal as many things would break.
Without delving into the relationships and details of this diagram, lets first cover some basic details. A stock install of Drupal 7 results in 73 tables being created. 10 of those tables are used for caching purposes:
Caching TableDescription
cachecaches items not separated out into their own cache tables
cache_blockthe block modules can cache already built blocks here
cache_bootstrapdata required during the bootstrap process can be cached in this table
cache_fieldstores cached field values
cache_filtercaches already filtered pieces of text
cache_formcaches recently built forms and their storage data
cache_imagecaches information about image manipulations that are in progress
cache_menucaches router information as well as generated link trees
cache_pagecaches compressed pages served to anonymous users
cache_pathcaches path aliases
11 tables are created which do not relate to any other tables:
Table NameDescription
actionsstores action information
batchstores details about batches (processes that run in multiple HTTP requests)
blocked_ipsstores a list of blocked IP addresses
floodcontrols the threshold of events, such as the number of contact attempts
queuestores items in queues
rdf_mappingstores custom RDF mappings for user-defined content types
semaphorestores semaphores, locks, and flags
sequencesstores IDs
systemcontains a list of all modules, themes, and theme engines that are or have been installed
url_aliascontains a list of URL aliases for Drupal paths
variablestores variable/value pairs created by Drupal core or any other module or theme
The 21 tables listed above are self-explanatory and I’m not going to discuss them any further in this post. They also are independent in that these tables have no relationships with other tables.
Field Related Tables
There are 8 tables installed with core related to fields and field storage:
Table NameDescription
field_data_bodystores details about the body field of an entity
field_revision_bodystores information about revisions to body fields
field_data_comment_bodystores information about comments associated with an entity
field_revision_comment_bodystores information about revisions to comments
field_data_field_imagestores information about images associated with an entity
field_revision_field_imagestores information about revisions to images
field_data_field_tagsstores information about tags associated with an entity
field_revision_field_tagsstores information about revisions to taxonomy terms/tags associated with an entity
While I was initially tempted to have these tables related to node
, that would not really be correct since these tables are related to an entity. In D7, entities can be other objects besides nodes, such as users or comments. The entity_type
column in these tables reflects that reality. These tables can be stored in other storage systems such as MongoDB due to the field storage API introduced in Drupal 7.
There are 2 other tables related to fields: field_config
and field_config_instance
. These tables store field configuration information. I believe a row in field_config_instance
cannot (well at least should not) exist without the correspondong field_id
in the field_config
table. Hence, the one-to-many relationship from field_config
to field_config_instance
is an identifying relationship.
Small Groups of Tables
There are a number of groups you will notice in the full ER diagram that are made up of 2 to 3 tables. Zooming in on 4 of those groups, we can see those tables more clearly:
One thing you will notice is that some relationships are shown with a solid line whereas others use a dotted line. MySQL Workbench represents identifying relationships with a solid line and non-identifying relationships with a dotted line. If you are unfamiliar with those terms, the standard defintions are:
- identifying relationship - the foreign key attribute is part of the child’s primary key attribute.
- non-identifying relationship - the primary key attributes of the parent must not become primary key attributes of the child.
This stack overflow answer from Bill Karwin contains a good discussion on these topics.
Now lets discuss those groups in more detail.
Registry Group
I grouped the registry
and registry_file
tables together. These tables are used for implementing the code registry in Drupal. A one-to-many relationship exists from registry_file
to registry
and this relationship is an identifying relationship. A filename
should not appear in the registry
table that is not present in the registry_file
table.
Image Group
I grouped the image_styles
and image_effects
tables together. These tables store configuration options for image styles and effects. A one-to-many relationship exists from image_styles
to image_effects
and this relationship is a non-identifying relationship.
date_format Group
There are three tables about date formats in Drupal. date_format_type
is a lookup table that stores configured date format types. After a stock install of Drupal 7, three date format types exist:
- long
- medium
- short
A one-to-many relationship exists from this lookup table to both date_formats
and date_format_locale
.
In practice, this would be problematic. For example, a new date format can be created by an adminstrator. In D7, this results in the system_date_format_save
function being called. This function will insert a row in the date_formats
table that will not have a corresponding type (type will be listed as custom).
You will also notice the locked
column is redundant in the date_formats
table. I submitted a patch to change this.
File Group
I grouped the file_managed
and file_usage
tables into 1 group. These tables store information about uploaded files and information for tracking where a file is used.
I believe a 1-to-1 relationship exists from file_managed
to file_usage
and that this is an identifying relationship.
User Related Tables
There are quite a few tables that store user related information. Below is a figure where I zoom in on those tables.
As you can see, the tables directly associated with users are watchdog
, sessions
, and authmap
. These tables are in a one-to-many relationship from users
. The functionality these tables provide is:
Table NameDescription
authmapstores distributed authentication mapping
sessionsstores information about a users session
watchdogcontains logs of all system events
There are then two tables that are in a many-to-many relationship with users
that link this table with other groups. One of these is the users_roles
table. This table links users
with role
. The role
table is then in a one-to-many relationship with the role_permission
table. The other many-to-many table is shortcut_set_users
. This table links users
with shortcut_set
.
The tables for the menu system are not really related to users but I placed the group close by since the menu_links
table maintains a one-to-many relationship with the shortcut_set
table. While the tables for the menu system do not appear to be related, I do believe a relationship exists there. In particular, I think that the menu_link
table has relationships to both the menu_router
and menu_custom
tables. The router_path
column in menu_links
could reference the router
column in menu_router
and the menu_name
column in menu_links
could reference the menu_name
in the menu_custom
table. Right now however, after a stock install of D7, a row with a menu name that is not present in menu_custom
will be created in menu_links
.
The menu system tables and a description of what they do is below.
Table NameDescription
menu_customholds definitions for top-level custom menus
menu_linkscontains the individual links within a menu
menu_routermaps paths to various callbacks
Node Related Tables
Node is one of the most central concepts in Drupal so as you can imagine, many tables are related to that concept. First off, a high level overview of the tables related to the node
table are shown below.
Tables that are directly related to node
are node_revision
, node_access
, and node_type
. The node_type
table is in many-to-many relationship with node
and block_node_type
. node_revision
is in a many-to-one relationship with node
as is node_access
. The node_access
table has only 1 row upon initial installation and references a non-existent node. An issue has been created to address this.
The tables directly related to node
and a description of what they do is below.
Table NameDescription
node_accessidentifies which realm/grant pairs a user must possess in order to view, update, or delete specific nodes
node_revisionstores information about each saved version of a node
node_typestores information about all defined node types
Taxonomy Tables
Four tables in the stock schema are related to taxonomy. These tables are shown in the figure below.
First of all, the taxonomy_index
table is in a many-to-many relationship with the node
and taxonomy_term_data
tables. The taxonomy_vocabulary
table has a one-to-many relationship with the taxonomy_term_data
table. The taxonomy_term_data
table in turn has 2 1-to-many relationships with the taxonomy_term_hierarchy
table.
A description of the taxonomy tables is given below.
Table NameDescription
taxonomy_indexmaintains de-normalized information about node/term relationships
taxonomy_term_datastores term information
taxonomy_term_hierarchystores the hierarchical relationship between terms
taxonomy_vocabularystores vocabulary information
Block Tables
The main table in this group is block
. It has three directly related tables in one-to-many relationships: block_node_type
, block_role
, and block_custom
.
A description of the blocks tables is given below.
Table NameDescription
blocksstores block settings
block_customstores the contents of custom-made blocks
block_node_typestores information that sets up display criteria for blocks based on content type
block_rolestores access permissions for blocks based on user roles
Search Tables
The relationships for the search tables I am a little unsure of. I believe they are as shown in the figure below.
The relationship I’m most unsure of here are between search_total
and search_index
. I don’t think the one-to-many relationship I have in place from search_total
to search_index
is correct.
A description of the search tables is given below.
Table NameDescription
search_datasetstores items that will be searched
search_indexstores the search index and associates words, items, and scores
search_node_linksstores items that link to other nodes
search_totalstores search totals for words
Tables That Relate Nodes to Users
There are three tables in many-to-many relationships between node
and users
:
Table NameDescription
commentstores comments and associated data
historystores a record of which users have read which nodes
node_comment_statisticsmaintains statistics of nodes and comments posts to show new and updated flags
The comment
table could be in its own group. I decided against doing that in this ER diagram since I felt like it would have been a table by itself. Logically, I think of it as either being in the users
or node
group.
node_comment_statistics
does also maintain a relationship with comment
. This is a non-identifying relationship since a node can exist without any comments.
Conclusion
During this work, I noticed that the column definitions for many foreign key relationships are in-correct which would result in MySQL not allowing these constraints to actually be created. I created an issue and patch for this but it turns out Liam Morland is working on using foreign keys in core and also came across this around the same time as me.
Other issues I encountered have also been logged by Liam:
- the
node_access
table references a non-existent node (relevant issue) - a set name exists in
shortcut_set
that does not exist inmenu_links
(relevant issue)
I would vote for foreign keys being used in Drupal core for a number of reasons, not least of which foreign keys aid a newcomer when trying to understand the schema installed by Drupal.
As I mentioned at the beginning of this post, any comments or corrections are very much welcome. I hope this information can prove useful to someone else besides me!