MySQL - Foreign Keys

Foreign keys let you cross-reference related data across tables. This is primarily an excellent tool to maintain data integrity, but it can also help the query optimizer with complex queries. This relationship is always a parent -> child relationship where the foreign key is defined in the child table definition. For example, if you wanted to store the name of every major river in the world (of which there are 165), with the country it was in, then you might create a rivers table like so:

CREATE TABLE `rivers` (
    `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` varchar(100) NOT NULL,
    `country_id` TINYINT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

and a countries reference table like so:

CREATE TABLE `countries` (
    `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` varchar(50) NOT NULL,
    `population` INT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

We know that the country_id in the rivers table relates to the id of the country in the countries table, but there is nothing in the database that enforces this relationship. This is where foreign keys come in. This time, we will create the rivers table after we have created the countries table, and with the following definition:

CREATE TABLE `rivers` (
    `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` varchar(100) NOT NULL,
    `country_id` TINYINT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`),
    FOREIGN KEY (country_id) REFERENCES countries(id) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Now lets insert some data

INSERT INTO `countries`
(`id`, `name`, `population`)
VALUES
('1', 'China', 1303723332),
('2', 'India', 1656553632),
('3', 'United States', 439010253),
('4', 'Indonesia', 313020847),
('5', 'Brazil', 260692493);
INSERT INTO `rivers`
(`id`, `name`, `country_id`)
VALUES
(1, 'Yangtze', 1),
(2, 'Ganga', 2),
(3, 'Missouri River', 3),
(4, 'Kapuas River', 4),
(5, 'Amazon', 5);

Now if we wanted to see the data in a human readable form, we could perform the following select query:

SELECT
`rivers`.`id` as `id`,
`rivers`.`name` as `river`,
`countries`.`name` as `country`,
`countries`.`population`
FROM `rivers`
LEFT JOIN `countries`
on `rivers`.`country_id` = `countries`.`id`;

+----+----------------+---------------+------------+
| id | river          | country       | population |
+----+----------------+---------------+------------+
|  1 | Yangtze        | China         | 1303723332 |
|  2 | Ganga          | India         | 1656553632 |
|  3 | Missouri River | United States |  439010253 |
|  4 | Kapuas River   | Indonesia     |  313020847 |
|  5 | Amazon         | Brazil        |  260692493 |
+----+----------------+---------------+------------+

By having a foreign key, we cannot just insert a river with a random country_id. For example, try using the follwoing insert:

INSERT INTO `rivers`
(`id`, `name`, `country_id`)
VALUES
(6, 'Nile', 6);

It will fail with the message:

ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails...

This is because the system knows that there is no country with ID 6, so that data is erroneous. Once you have a country with ID 6, that query will succeed.

Likewise, if I was to try and update a river to change its country ID to 6, it would fail with the same message and for the same reason.

Referential Actions

Referential actions are actions the database can take to automatically maintain data integrity. You can specify an action to be performed when the parent data is updated (ON UPDATE), and another action to be take when the parent data is deleted (ON DELETE).

CASCADE

Because we created the foreign key with ON UPDATE CASCADE in the example above, if we were to update the ID of a country, the relevant country_id would change in the rivers table. To demonstrate this, run the following update statement:

UPDATE `countries` set `id`=6 where `id`=5;

Now if you select everything from the rivers table you will see that the Amazon's country_id changed from 5 to 6, even though we didn't mention that table in our update statement. Thus, data integrity is kept.

Likewise, if we were to delete a country, it would also delete any rivers that are assigned that country. This can often be undesirable. For example, if you were to have a user's ID be a foreign key for a table of answers they made on stack overflow, you probably wouldn't want all their answers to disappear just because they deleted their account. In such a situation I would not delete the relevant user row from the database when the user deletes their account, but have a column in the users table for the account_status which could be set to deleted, which the application could have special code for handling. Thus all their posts won't disappear.

This update and deleting relationship is one-way. Before we updated and deleted the countries (parent/master) which changed the data in the rivers table (child). If we were to delete a river then the relevant country would not also be deleted. Likewise, we can update a river to change the country_id so that it is assigned to a different country, not changing the ID of the country in the countries table instead.

RESTRICT

If the previous example where we talked about deleting a user causing all of their posts to be deleted scared you because you think this could accidentally occur in your database, then RESTRICT is a perfect solution for you! Essentially, RESTRICT would prevent the user from being deleted if the user has some posts. However if the user had no posts, then the user deletion would successfully go ahead. In this scenario, you might program your application to change the user_id of all the posts to the ID of a special user that represents deleted users (e.g. user with ID 1). Then when all the posts have been updated, you can delete the user without an error being thrown. To program this relationship, you would use the following:

CREATE TABLE `users` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `first_name` varchar(100) NOT NULL,
    `last_name`  varchar(100) NOT NULL,
    `email`  varchar(150) NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `posts` (
    `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `content` TEXT NOT NULL,
    `author_id` INT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`),
    FOREIGN KEY (`author_id`) REFERENCES users(`id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Now lets insert some data to play with.

INSERT INTO `users`
(`id`, `first_name`, `last_name`, `email`)
VALUES
('1', 'deleted', 'account', 'deleted.account@noreply.com'),
('2', 'programster', 'page', 'programster@programster.org'),
('3', 'Keano', 'Reeves', 'keanu.reeves@gmail.com');

INSERT INTO `posts`
(`author_id`, `content`)
VALUES
(2, 'hello world'),
(3, 'I am the one.');

Now if we try to delete user with ID 3, the query will fail:

DELETE FROM `users` where `user_id`='3';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails

This will force us to program our application so that we remember to take care of the posts, either by changing the ownership of the posts to our deleted account user, or by running our own delete queries (in which case you could have just used cascade instead of RESTRICT).

RESTRICT is the default referential action when no action is specified. For example, the following would use RESTRICT for both updates and deletions.

CREATE TABLE `posts` (
    `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `content` TEXT NOT NULL,
    `author_id` INT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`),
    FOREIGN KEY (`author_id`) REFERENCES users(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

SET NULL

When a parent row is deleted, the related child table(s) will have their foreign key column values set to null. This may be a better alternative than re-assigning posts to a special "deleted user" account in our previous example.

NO ACTION

In MySQL, NO ACTION is a synonym for RESTRICT so the follwing two definitions are the same:

CREATE TABLE `posts` (
    `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `content` TEXT NOT NULL,
    `author_id` INT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`),
    FOREIGN KEY (`author_id`) REFERENCES users(`id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `posts` (
    `id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `content` TEXT NOT NULL,
    `author_id` INT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`),
    FOREIGN KEY (`author_id`) REFERENCES users(`id`) ON DELETE NO ACTION ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Some database systems have deferred checks, and NO ACTION is a deferred check. In MySQL, foreign key constraints are checked immediately, which is why NO ACTION is the same as RESTRICT.

If you specify a SET NULL action, make sure that you have not declared the columns in the child table as NOT NULL.

Specifying Nothing

As mentioned previously, RESTRICT is the default referential action when no action is specified. This is important so I put it here in case you skipped past it and suggest you go back and read Restrict.

Conclusion

  • Foreign keys are extremely useful, without them you can take the R out of RDBMS.
  • Foreign keys are a parent -> child relationship.
    • The foreign key is always defined in the child table.
  • If you don't specify a referential action, RESTRICT will be the default.
  • Cascade will automatically delete/update the related table columns/rows to keep data integrity.
  • Restrict will just throw an error if a query were to cause data integrity issues, rather than change any other table's data.
  • SET NULL would update the related table columns to NULL. This is probably useful for ON DELETE, but probably not for ON UPDATE.

Drawbacks of Foreign Keys

Deleting Tables

Tables with foreign keys are nice to set up, but can be a pain in the butt to maintain. Once a table with a foreign key references another table (which may not have any foreign keys itself), you will not be able to drop or truncate that parent table unless you remove the foreign key from the child. You can usually "get around" this by running a delete query to delete all of the data from the table rather than using the truncate query.

Changing Column Type

Changing the type of a field that is a foreign key to another table is near impossible, even if it is something that should be straightforward like changing from a TINYINT to an INT field. To perform such an operation, you have to remove all the key constraints, alter all the relevant tables (finding all the children can be a pain), and then recreate the foreign key constraints.

REFERENCES

Author

Programster

Stuart is a software developer with a passion for Linux and open source projects.

comments powered by Disqus