Programster's Blog

Tutorials focusing on Linux, programming, and open source

MySQL - History Tables

Keeping a history of your data can be immensely useful, such as for reverting silly mistakes, or for auditing purposes. This tutorial will show you a really simple way to achieve this in a generic manner that can be applied to any table. You will be able to see what data and when, as well as return to any specific revision or point in time quickly and easily.

Preparation

I am assuming you already have a dev database to work with.

Run the following statements to create a table of data that we are going to demonstrate with throghout this tutorial.

CREATE TABLE `user_comments` (
    `id` int UNSIGNED NOT NULL AUTO_INCREMENT,
    `comment` text NOT NULL,
    `author_id` int NOT NULL,
    `modified_timestamp` timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `user_comments`
(`comment`, `author_id`) VALUES
("hello world", 1),
("foo bar", 2);

If your table's don't already have the timestamp, or author fields, then I recommend that you add them. Without the timestamp field, you will only be able to go back to a specific revision number rather than a certain point in time. Without the author_id field, you will not know who made the changes.

Steps

The first thing we need to do is clone the original table's schema to create our history table.

CREATE TABLE `user_comments_history` LIKE `user_comments`;

By using a suffix of _history rather than a prefix, you keep your history tables beside the ones they track in your database list.

From this point on, there are two main ways of structuring your history tables. For both of these options, a new field will be added to the history table which will be called history_id.

In option 1, the new history_id field will be the primary key, and id in the history table will just be a data field containing the value of and referencing the ID in the main table. This has the advantage of being very simple for copying data rows to/from the history table as the data in the columns remain exactly the same.

In option 2, the id field in the history table will remain as the primary key, and a new field (history_id) will reference the id in the original table. The advantage of this is that the schema doesn't change as id remains the primary key, but you now have to move data between the id and history_id fields when moving rows. You may find this option simpler if you rename history_id to row_id or primary_table_id. I am just keeping the name the same between the two options for this tutorial.

Option 1

Run the following steps to alter the history table to how we need it.

ALTER TABLE `user_comments_history`
MODIFY COLUMN `id` INT UNSIGNED NOT NULL;

ALTER TABLE `user_comments_history` DROP PRIMARY KEY;

ALTER TABLE `user_comments_history`
ADD COLUMN `history_id` INT UNSIGNED NOT NULL;

ALTER TABLE `user_comments_history`
ADD CONSTRAINT PRIMARY KEY (`history_id`);

ALTER TABLE `user_comments_history`
MODIFY `history_id` INT UNSIGNED NOT NULL AUTO_INCREMENT;

It seems long-winded but unfortunately adding the primary key to a new column has to be done in that many steps.

Adding A Foreign Key

It may be a good idea to add a foreign key to enforce the relationship between the history table and the rows in the original table.

ALTER TABLE `user_comments_history`
ADD CONSTRAINT fk_id FOREIGN KEY (id) REFERENCES `user_comments`(id) ON UPDATE CASCADE ON DELETE CASCADE;

Updating A Row

Now when we wish to update a row, we need to insert into the history table first. The code below shows how to do this in a single transaction so that both have to go through or neither. We are going to change the first row's comment from hello world to hello earth.

START TRANSACTION;

# Insert the current row into the history table
INSERT INTO `user_comments_history` (`id`, `comment`, `author_id`, `modified_timestamp`)
    SELECT
      `id` as `id`,
      `comment` as `comment`,
      `author_id` as `author_id`,
      `modified_timestamp` as `modified_timestamp`
    FROM `user_comments`
    WHERE `comment` = 'hello world';

# Now run the update
UPDATE `user_comments`
SET `comment`='hello earth'
WHERE `comment` = 'hello world';

# Commit the transaction
COMMIT;

Restoring To A Point In Time

If you know that your data was fine on the 11th of July 2016 and you want to retrieve the data from that point in time, then just use the following query (assuming we want the history of row 1 in the primary table):

SELECT * FROM `user_comments_history`
WHERE `id`= 1
AND `modified_timestamp` < "2016-11-30"
LIMIT 1
ORDER BY `modified_timestamp` DESC

However, to change back to that point in time, then run the following transaction:

START TRANSACTION;

# Insert the current row into the history table before reverting
INSERT INTO `user_comments_history` (`id`, `comment`, `author_id`, `modified_timestamp`)
    SELECT
      `id` as `id`,
      `comment` as `comment`,
      `author_id` as `author_id`,
      `modified_timestamp` as `modified_timestamp`
    FROM `user_comments`
    WHERE `id`= 1

# Now run the update
UPDATE `user_comments` dest,
(
  SELECT * FROM `user_comments_history`
  WHERE `id`= 1
  AND `modified_timestamp` < "2016-11-30"
  ORDER BY `modified_timestamp` DESC
  LIMIT 1
) src
SET
dest.comment = src.comment,
dest.author_id = dest.author_id
WHERE dest.`id`= 1;

# Commit the transaction
COMMIT;

Option 2

If you believe that the id of every table needs to be the primary key, you can do the following instead:

ALTER TABLE `user_comments_history`
ADD COLUMN `history_id` INT UNSIGNED NOT NULL;

It may be a good idea to add a foreign key to enforce the relationship between the history table and the rows in the original table.

ALTER TABLE `user_comments_history`
ADD CONSTRAINT fk_id FOREIGN KEY (history_id) REFERENCES `user_comments`(id);

Updating A Row

Now when we wish to update a row, we need to insert into the history table first. The code below shows how to do this in a single transaction so that both have to go through or neither. We are going to change the first row's comment from hello world to hello earth.

START TRANSACTION;

# Insert the current row into the history table
INSERT INTO `user_comments_history` (`history_id`, `comment`, `author_id`, `modified_timestamp`)
    SELECT `id` as `history_id`, `comment` as `comment`, `author_id` as `author_id`, `modified_timestamp` as `modified_timestamp`
    FROM `user_comments`
    WHERE `comment` = 'hello world'
;

# Update the row int he primary table
UPDATE `user_comments`
SET `comment`='hello earth'
WHERE `comment` = 'hello world';

# Commit the transaction
COMMIT;

Restoring To A Point In Time

If you know that your data was fine on the 11th of July 2016 and you want to retrieve the data from that point in time, then just use the following query (assuming we want the history of row 1 in the primary table):

SELECT * FROM `user_comments_history`
WHERE `history_id`= 1
AND `modified_timestamp` < "2016-11-30"
LIMIT 1
ORDER BY `modified_timestamp` DESC

However, to change back to that point in time, then run the following transaction:

START TRANSACTION;

# Insert the current row into the history table before reverting
INSERT INTO `user_comments_history` (`history_id`, `comment`, `author_id`, `modified_timestamp`)
    SELECT
      `id` as `history_id`,
      `comment` as `comment`,
      `author_id` as `author_id`,
      `modified_timestamp` as `modified_timestamp`
    FROM `user_comments`
    WHERE `id`= 1

# Now run the update
UPDATE `user_comments` dest,
(
  SELECT * FROM `user_comments_history`
  WHERE `history_id`= 1
  AND `modified_timestamp` < "2016-11-30"
  ORDER BY `modified_timestamp` DESC
  LIMIT 1
) src
SET
dest.comment = src.comment,
dest.author_id = dest.author_id
WHERE dest.`id`= 1;

# Commit the transaction
COMMIT;

References

Appendix

To Use A Foreign Key or Not

It may be a good idea to add a foreign key to enforce the relationship between the history table and the rows in the original table. However if you do this, then it must be the case that if you DELETE a row from the primary table, its entire history is also removed. If you need to keep the history such a situation, then do not implement the foreign key, but your application layer will need to ensure to cascade any updates that occur to the primary table's ID. Unlike the rest of the columns, you cannot let the IDs diverge because otherwise you do not know which row in the primary table that the history table relates to. Based on my experience, there is usually no reason for the ID of a row to change but its something to be aware of.

Depending on your circumstances, sometimes it is easier for rows to have a "state" field that can be altered to mark the row as "deleted" than to actually delete the row. For example if a user deletes their account, you may wish to put into a "deleted" state rather than actually removing their data. That way the data is there if the user changes their mind at a later date. Such a scenario would allow you to keep the foreign key.