Programster's Blog

Tutorials focusing on Linux, programming, and open source

Database Normalization - Infinite Hierarchy and Linked Lists

In some situations you don't have a set structure and need to allow an infinite hierarchy, such as in a linked list. A real-world example of this is with people. There is no set grandparents and parents layer, but every person's parents is someone else's child. In terms of a database, this means you can't have the following tables.

  • grandparents
  • parents
  • children

Instead, you need one table for person, and to somehow create the relationship between them. The first thing to pop into your head might be to just add extra columns for these relationships as shown below:

  • id
  • name,
  • mother_id - ID of another row in this table
  • father_id - ID of another row in this table

Unfortunately, this leads to all sorts of issues with maintaining data integrity. A much better approach is to use two tables and foreign keys. Thus, a database update will never leave you with a mother or father ID that points to nobody.

CREATE TABLE `person` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` varchar(255) NOT NULL,
    ...
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `relationships` (
    `id` int unsigned NOT NULL AUTO_INCREMENT,
    `child` INT UNSIGNED NOT NULL,
    `mother` INT UNSIGNED NOT NULL,
    `father` INT UNSIGNED NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY(`child`),
    FOREIGN KEY (`child`) REFERENCES person(id) ON DELETE RESTRICT ON UPDATE CASCADE,
    FOREIGN KEY (`mother`) REFERENCES person(id) ON DELETE RESTRICT ON UPDATE CASCADE,
    FOREIGN KEY (`father`) REFERENCES person(id) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Since a child can have only one father and one mother (at least genetically until very recently), I set a unique key on the child column. If your situation allows for the data type (in this case person), to have any number of "parents" (multiple inheritance) then you would just remove this restriction. If the number of parents is always fixed, then you would keep the restriction and have that many columns in the relationships table.

Having this structure ensures that all changes to the IDs "ripple through" appropriately, and that you never end up with "junk IDs" because you deleted a person and forgot to update their parents or children. The only disadvantage of this structure is that there is no way to force a relationship row to exist for every row in the persons table. Thus you could have a person who has no parents specified, but you have to start somewhere, and I guess in this case they would be Adam and Eve.