MySQL - History Tables
Keeping a history of your data can be immensely useful, such as for reverting silly mistakes, or for auditing purposes. This tutorial will show you a really simple way to achieve this in a generic manner that can be applied to any table. You will be able to see what data and when, as well as return to any specific revision or point in time quickly and easily.
Preparation
I am assuming you already have a dev database to work with.
Run the following statements to create a table of data that we are going to demonstrate with throghout this tutorial.
CREATE TABLE `user_comments` (
`id` int UNSIGNED NOT NULL AUTO_INCREMENT,
`comment` text NOT NULL,
`author_id` int NOT NULL,
`modified_timestamp` timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `user_comments`
(`comment`, `author_id`) VALUES
("hello world", 1),
("foo bar", 2);
If your table's don't already have the timestamp
, or author
fields, then I recommend that you add them. Without the
timestamp
field, you will only be able to go back to a specific revision number rather than a certain point in time. Without
the author_id field
, you will not know who made the changes.
Steps
The first thing we need to do is clone the original table's schema to create our history table.
CREATE TABLE `user_comments_history` LIKE `user_comments`;
_history
rather than a prefix, you keep your history tables beside the ones they track in your database list.
From this point on, there are two main ways of structuring your history tables. For both of these options, a new
field will be added to the history table which will be called history_id
.
In option 1, the new history_id
field will be the primary key, and id
in the history table will just be a
data field containing the value of and referencing the ID in the main table. This has the advantage of being very simple
for copying data rows to/from the history table as the data in the columns remain exactly the same.
In option 2, the id
field in the history table will remain as the primary key, and a new field (history_id
) will
reference the id
in the original table. The advantage of this is that the schema doesn't change as id
remains the primary
key, but you now have to move data between the id and history_id fields when moving rows. You may find this option simpler
if you rename history_id
to row_id
or primary_table_id
. I am just keeping the name the same between the two options for this tutorial.
Option 1
Run the following steps to alter the history table to how we need it.
ALTER TABLE `user_comments_history`
MODIFY COLUMN `id` INT UNSIGNED NOT NULL;
ALTER TABLE `user_comments_history` DROP PRIMARY KEY;
ALTER TABLE `user_comments_history`
ADD COLUMN `history_id` INT UNSIGNED NOT NULL;
ALTER TABLE `user_comments_history`
ADD CONSTRAINT PRIMARY KEY (`history_id`);
ALTER TABLE `user_comments_history`
MODIFY `history_id` INT UNSIGNED NOT NULL AUTO_INCREMENT;
Adding A Foreign Key
It may be a good idea to add a foreign key to enforce the relationship between the history table and the rows in the original table.
ALTER TABLE `user_comments_history`
ADD CONSTRAINT fk_id FOREIGN KEY (id) REFERENCES `user_comments`(id) ON UPDATE CASCADE ON DELETE CASCADE;
Updating A Row
Now when we wish to update a row, we need to insert into the history table first. The code below shows how to do this in a
single transaction so that both have to go through or neither. We are going to change the first row's comment from
hello world
to hello earth
.
START TRANSACTION;
# Insert the current row into the history table
INSERT INTO `user_comments_history` (`id`, `comment`, `author_id`, `modified_timestamp`)
SELECT
`id` as `id`,
`comment` as `comment`,
`author_id` as `author_id`,
`modified_timestamp` as `modified_timestamp`
FROM `user_comments`
WHERE `comment` = 'hello world';
# Now run the update
UPDATE `user_comments`
SET `comment`='hello earth'
WHERE `comment` = 'hello world';
# Commit the transaction
COMMIT;
Restoring To A Point In Time
If you know that your data was fine on the 11th of July 2016 and you want to retrieve the data from that point in time, then just use the following query (assuming we want the history of row 1 in the primary table):
SELECT * FROM `user_comments_history`
WHERE `id`= 1
AND `modified_timestamp` < "2016-11-30"
LIMIT 1
ORDER BY `modified_timestamp` DESC
However, to change back to that point in time, then run the following transaction:
START TRANSACTION;
# Insert the current row into the history table before reverting
INSERT INTO `user_comments_history` (`id`, `comment`, `author_id`, `modified_timestamp`)
SELECT
`id` as `id`,
`comment` as `comment`,
`author_id` as `author_id`,
`modified_timestamp` as `modified_timestamp`
FROM `user_comments`
WHERE `id`= 1
# Now run the update
UPDATE `user_comments` dest,
(
SELECT * FROM `user_comments_history`
WHERE `id`= 1
AND `modified_timestamp` < "2016-11-30"
ORDER BY `modified_timestamp` DESC
LIMIT 1
) src
SET
dest.comment = src.comment,
dest.author_id = dest.author_id
WHERE dest.`id`= 1;
# Commit the transaction
COMMIT;
Option 2
If you believe that the id
of every table needs to be the primary key, you can do the following instead:
ALTER TABLE `user_comments_history`
ADD COLUMN `history_id` INT UNSIGNED NOT NULL;
It may be a good idea to add a foreign key to enforce the relationship between the history table and the rows in the original table.
ALTER TABLE `user_comments_history`
ADD CONSTRAINT fk_id FOREIGN KEY (history_id) REFERENCES `user_comments`(id);
Updating A Row
Now when we wish to update a row, we need to insert into the history table first. The code below shows how to do this in a single
transaction so that both have to go through or neither. We are going to change the first row's comment from hello world
to hello earth
.
START TRANSACTION;
# Insert the current row into the history table
INSERT INTO `user_comments_history` (`history_id`, `comment`, `author_id`, `modified_timestamp`)
SELECT `id` as `history_id`, `comment` as `comment`, `author_id` as `author_id`, `modified_timestamp` as `modified_timestamp`
FROM `user_comments`
WHERE `comment` = 'hello world'
;
# Update the row in the primary table
UPDATE `user_comments`
SET `comment`='hello earth'
WHERE `comment` = 'hello world';
# Commit the transaction
COMMIT;
Restoring To A Point In Time
If you know that your data was fine on the 11th of July 2016 and you want to retrieve the data from that point in time, then just use the following query (assuming we want the history of row 1 in the primary table):
SELECT * FROM `user_comments_history`
WHERE `history_id`= 1
AND `modified_timestamp` < "2016-11-30"
LIMIT 1
ORDER BY `modified_timestamp` DESC
However, to change back to that point in time, then run the following transaction:
START TRANSACTION;
# Insert the current row into the history table before reverting
INSERT INTO `user_comments_history` (`history_id`, `comment`, `author_id`, `modified_timestamp`)
SELECT
`id` as `history_id`,
`comment` as `comment`,
`author_id` as `author_id`,
`modified_timestamp` as `modified_timestamp`
FROM `user_comments`
WHERE `id`= 1
# Now run the update
UPDATE `user_comments` dest,
(
SELECT * FROM `user_comments_history`
WHERE `history_id`= 1
AND `modified_timestamp` < "2016-11-30"
ORDER BY `modified_timestamp` DESC
LIMIT 1
) src
SET
dest.comment = src.comment,
dest.author_id = dest.author_id
WHERE dest.`id`= 1;
# Commit the transaction
COMMIT;
References
Appendix
To Use A Foreign Key or Not
It may be a good idea to add a foreign key to enforce the relationship between the history table and the rows in the original table.
However if you do this, then it must be the case that if you DELETE
a row from the primary table, its entire history is also removed.
If you need to keep the history such a situation, then do not implement the foreign key, but your application layer will need to ensure
to cascade any updates that occur to the primary table's ID. Unlike the rest of the columns, you cannot let the IDs diverge because
otherwise you do not know which row in the primary table that the history table relates to. Based on my experience, there is usually
no reason for the ID of a row to change but its something to be aware of.
Depending on your circumstances, sometimes it is easier for rows to have a "state" field that can be altered to mark the row as "deleted" than to actually delete the row. For example if a user deletes their account, you may wish to put into a "deleted" state rather than actually removing their data. That way the data is there if the user changes their mind at a later date. Such a scenario would allow you to keep the foreign key.
First published: 16th August 2018