istoria

January 10, 2016

Abstract

istoria, latin spelling of the Greek word “ιστορία” which means history, is a collection of database objects, packaged as a PostgreSQL extension, that implements non-linear multi-timeline undo/redo at the table level. It persists a table's history, and provides an API for viewing, traversing, and jumping to any point in the history. This jumping to any point in the history makes its undo/redo “non-linear”. New table transactions can continue after such jumps, creating multiple timelines (branches) in the history.



Requirements and Installation

istoria uses features first introduced with PostgreSQL 9.4.0 so your database installation must be at that version or higher. To install the extension, get it from GitHub, go to the istoria top level directory and do:

make ext
sudo make install

Login to PostgreSQL with a superuser account and do:

begin;
create schema istoria;
grant usage on schema istoria to public;
create extension istoria with schema istoria;
-- if all went well, commit;

You should now have the extension installed, with all its objects within the istoria schema. Log out of the PostgreSQL superuser account and log in with your normal account. Add the istoria schema to the end of your search_path. If you are not familiar with the search_path you probably want to read about it here, or set your search path for now with:

set search_path to public, istoria;

You are now ready for some history gymnastics.

Hello World

Time for a small test. Create a table called hello, making sure it has a primary key:

create table hello (id serial, msg text);
alter table hello add constraint hello_pk primary key (id);

Install the history trigger:

drop trigger if exists hello_history_tr on hello;
create trigger hello_history_tr after insert or update or delete on hello
for each row
when (pg_trigger_depth() < 1)
execute procedure history_trigger_func('[]');

Now do some inserts:

insert into hello(msg) values('hi from Athens');
insert into hello(msg) values('hi from New York');
insert into hello(msg) values('hi from Tokyo');
insert into hello(msg) values('hi from Paris');

select * from hello;
 id |       msg        
----+------------------
  1 | hi from Athens
  2 | hi from New York
  3 | hi from Tokyo
  4 | hi from Paris
(4 rows)

You can now undo a couple of inserts with:

select history_session_undo(1);
select history_session_undo(1);

select * from hello;
 id |       msg
----+------------------
  1 | hi from Athens
  2 | hi from New York
(2 rows)

Redo one insert with:

select history_session_redo(1);

select * from hello;
 id |       msg
----+------------------
  1 | hi from Athens
  2 | hi from New York
  3 | hi from Tokyo
(3 rows)

Sessions

Suppose we are working on an editor for designing products. Each product is a collection of drawings, and we want to have complete drawing edit history per product. Lets create the table:

create table drawing (drawing_id serial, product_id integer, drawing_geom text);
alter table drawing add constraint drawing_pk primary key (drawing_id);

Drawings in the drawing table with the same product id are said to define a session. Think about undo/redo operations–we wouldn't want them to jump across products, they must be confined within a given session. A session is defined by the base table name and by an array of column names. This information is fixed when we install the history trigger on the table:

drop trigger if exists drawing_history_tr on drawing;
create trigger drawing_history_tr after insert or update or delete on drawing
for each row
when (pg_trigger_depth() < 1)
execute procedure history_trigger_func('["product_id"]');

We have just installed the history trigger on the drawing table passing a json array containing one column name, product_id. istoria now knows that transactions to the drawing table are tracked per product_id. Time to test it:

-- user A is working on product_id 1
insert into drawing(product_id, drawing_geom) values(1, 'point(1 2, 4 5)');
insert into drawing(product_id, drawing_geom) values(1, 'point(3 2, 6 5)');

-- user B is working on product_id 2
insert into drawing(product_id, drawing_geom) values(2, 'point(1 7, 8 9)');
insert into drawing(product_id, drawing_geom) values(2, 'point(4 8, 3 1)');

select * from history_sessions;
 session_id | table_name |      session      | active_action_id 
------------+------------+-------------------+------------------
          1 | hello      | {}                |                3
          2 | drawing    | {"product_id": 1} |                6
          3 | drawing    | {"product_id": 2} |                8
(3 rows)

The history_sessions table shows us that there are two sessions on the drawing table, one for product 1 and one for product 2. The session column is a json object that stores key/value pairs. Notice the hello table has an empty session column (no key/value pair) because we didn't pass any column name when we installed the history trigger, thus undo/redo operations operate on the entire hello table.

Time for an undo on the drawing table:

-- user A decides to undo her last edit on product 1; this is session 2
select history_session_undo(2);

select * from drawing;
 drawing_id | product_id |  drawing_geom   
------------+------------+-----------------
          1 |          1 | point(1 2, 4 5)
          3 |          2 | point(1 7, 8 9)
          4 |          2 | point(4 8, 3 1)
(3 rows)

Timelines

We are now ready for a more complex example that demonstrates how we can jump anywhere in the history. Suppose we are now designing charts, with each chart comprised of several sectors.

create table sector (
    id bigserial,
    chart_id bigint not null,
    sector_name text not null,
    sector_geom text not null, -- a geometry object, in reality a PostGIS geometry object
    gui_editor_action text not null -- the editor can put a note here to describe what it did
);
alter table sector add constraint sector_pk primary key (id);

drop trigger if exists sector_history_tr on sector;
create trigger sector_history_tr after insert or update or delete on sector
for each row
when (pg_trigger_depth() < 1)
execute procedure history_trigger_func('["chart_id"]');

insert into sector (chart_id, sector_name, sector_geom, gui_editor_action)
values(1, 'S010', 'polygon(1 2, 3 4, 1 2)', 'add sector');

insert into sector (chart_id, sector_name, sector_geom, gui_editor_action)
values(1, 'S020', 'polygon(9 9, 4 5, 9 9)', 'add sector');

update sector
set sector_geom = 'polygon(1 2, 3 4, 4 5, 1 2)',
    gui_editor_action = 'add point'
where id = 1;

update sector
set sector_name = 'S011',
    gui_editor_action = 'sector renamed'
where id = 1;

select
action_id, ancestors, indent, parent_action_id, active,
timeline_id, timeline_root, action, new->'sector_name' as name,
new->'gui_editor_action' as editor_action
from history_session_actions(4);

 action_id | ancestors | indent | parent_action_id | active | timeline_id | timeline_root | action |  name  |  editor_action   
-----------+-----------+--------+------------------+--------+-------------+---------------+--------+--------+------------------
        10 | {}        |      0 |                  | f      |           5 |             5 | I      | "S010" | "add sector"
        11 | {}        |      0 |                  | f      |           5 |             5 | I      | "S020" | "add sector"
        12 | {}        |      0 |                  | f      |           5 |             5 | U      | "S010" | "add point"
        13 | {}        |      0 |                  | t      |           5 |             5 | U      | "S011" | "sector renamed"
(4 rows)

The call to the history_session_actions() function returns the history tree for chart 1 (session 4). We can now use the history_session_set_active_action() function to jump somewhere in the history.

-- go to action 11 (the second sector added)
select history_session_set_active_action(4, 11);

select * from sector;

 id | chart_id | sector_name |      sector_geom       | gui_editor_action 
----+----------+-------------+------------------------+-------------------
  2 |        1 | S020        | polygon(9 9, 4 5, 9 9) | add sector
  1 |        1 | S010        | polygon(1 2, 3 4, 1 2) | add sector
(2 rows)

Now we add a new sector:

insert into sector (chart_id, sector_name, sector_geom, gui_editor_action)
values(1, 'S100', 'polygon(3 3, 4 5, 9 9, 3 3)', 'add sector');

select                                                                                                                  
action_id, ancestors, indent, parent_action_id, active,                                                                 
timeline_id, timeline_root, action, new->'sector_name' as name,                                                         
new->'gui_editor_action' as editor_action                                                                               
from history_session_actions(4);

 action_id | ancestors | indent | parent_action_id | active | timeline_id | timeline_root | action |  name  |  editor_action   
-----------+-----------+--------+------------------+--------+-------------+---------------+--------+--------+------------------
        10 | {}        |      0 |                  | f      |           5 |             5 | I      | "S010" | "add sector"
        11 | {}        |      0 |                  | f      |           5 |             5 | I      | "S020" | "add sector"
        14 | {11}      |      1 |               11 | t      |           6 |             5 | I      | "S100" | "add sector"
        12 | {}        |      0 |                  | f      |           5 |             5 | U      | "S010" | "add point"
        13 | {}        |      0 |                  | f      |           5 |             5 | U      | "S011" | "sector renamed"
(5 rows)

Since we backtracked to sector S020 (action 11) before we inserted sector S100 (action 14), there is now a new timeline (a new branch) that contains the insert action for the sector S100. This new timeline (6) has as its parent action 11.

A new timeline is created for actions occuring anywhere other than at the tail end of a timeline. Note that history_session_undo() can jump across timelines when walking the history tree backwards, however, history_session_redo() always stays on the same timeline. The active column above indicates which action is active in the session. If you attemp to jump from the active action of one session to an action belonging to another session you will get an error:

-- 5 is an action from session 2
select history_session_set_active_action(4, 5);

ERROR:  the new active_history_action_id is from session 2
HINT:  the actions can not be from different sessions
CONTEXT:  SQL statement "update history_sessions
    set active_action_id = p_action
    where session_id = p_session
    returning active_action_id"
PL/pgSQL function history_session_set_active_action(bigint,bigint) line 5 at SQL statement

The information returned by the history_session_actions() function is mainly for populating a TreeView widget to display the history tree in a GUI, but you can also use it to get an idea what the history tree looks like.

API

istoria's user API is comprised of the following functions:

  • history_session_actions(session)
  • history_session_set_active_action(session, action)
  • history_session_undo(session)
  • history_session_redo(session)
Furthermore, you can examine the state of actions, timelines, and sessions from the following read-only tables:
  • history_actions
  • history_timelines
  • history_sessions


history_session_actions(session)

This function returns all actions in the history tree for a session, in an order that is appropriate for populating a TreeView widget. The columns returned are:

action_id
The action id.
ancestors
For an action on a root timeline (a timeline without a parent action) it returns an empty array. For an action X not on a root timeline, it returns an array with the action ids of the parent action of each timeline starting from the root timeline and ending at the timeline of action X. So if action X is 11 and it belongs to timeline 5 that has a parent action of 8, which is on timeline 4 that has a parent action of 3 which is on timeline 1 with no parent, then this column with have the array {3,8}.
indent
A number, starting at 0 for actions on root timelines, that a GUI application may use to calculate how much to indent or in which column to show actions, thus graphically relating actions to timelines.
parent_action_id
The parent action id of the timeline the action belongs to, possibly null if on a root timeline.
active
Boolean flag indicating the current (active = true) location in the history tree as reflected by the state of the base table. This is affected by the history walking functions history_session_set_active_action(session, action), history_session_undo(session), and history_session_redo(session).
timeline_id
The id of the timeline that the action belongs to.
timeline_root
The root timeline that the action belongs to. This can be found by walking the history tree backwards until arriving to a timeline with no parent action.
action
Letter indicating the type of operation that describes the action. I for insert, U for update, D for delete.
old
A json object representing the old row; null for insert operations.
new
A json object representing the new row; null for delete operations.


history_session_set_active_action(session, action)

This function allows jumping to a particular action within a given session, and returns that action's id.


history_session_undo(session)

This function resets the active action for the given session to be the parent action of the current active action, and returns the new active action's id. It may return null if there is nothing to undo (at the beginning of time.) This function can jump timelines.


history_session_redo(session)

This function resets the active action for the given session to be the child action of the current active action, and returns the new active action's id. At a leaf action, repeated calls of this function will return the same leaf action id. This function does not jump timelines.


history_actions

This table stores all actions taken for all tables whose history is tracked. Its columns are:

action_id
The action id.
timeline_id
The id of the timeline that the action belongs to.
action
Letter indicating the type of operation that describes the action. I for insert, U for update, D for delete.
old
A json object representing the old row; null for insert operations.
new
A json object representing the new row; null for delete operations.


history_timelines

This table stores all timelines. Its columns are:

timeline_id
The timeline id.
session_id
The session id.
parent_action_id
The id of the timeline's parent action, null for root timelines.


history_sessions

This table stores all sessions. Its columns are:

session_id
The session id.
table_name
The name of the base table that the session applies to.
session
A json object that has the columns and values that define the session.
active_action_id
The id of the current active action in the history tree as reflected by the state of the base table. This is affected by the history walking functions.

Limitations

istoria is designed to implement history tracking for an object that is represented by a single table. Since walking the history tree often causes records in the tracked table to be deleted, there should not be referential integrity constraints from other tables to the tracked table.