Construct an Undo Tree From a Linear Undo History

1. How does undo work in Emacs
2. Constructing the tree
3. Moving around the tree
4. Show me the code

Emacs comes with a powerful but arguably strange undo system, it considers the action of undo themselves undo-able, so instead of redo, you just undo a previous undo. This allows you to return to every previous buffer state, something a conventional undo system doesn’t guarantee. But Emacs’s undo history can easily get out of hand when you undo, then undo that undo, then undo the undo of undo… You lose your mental model of the undo history very quickly and end up holding the undo button until you see the desired buffer state.

One idea to take advantage of both worlds is to use an undo tree. An undo tree is easy to navigate and understand. The ubiquitous undo-tree.el is exactly for that. As a coding challenge, I have been thinking about how to construct a tree out of the linear undo record. That way we can avoid keeping an internal data structure of undos like undo-tree does. This post describes the way I figured out to do that.

1 How does undo work in Emacs

;; An example of `buffer-undo-list'.
(nil
 (11369 . 11377) ; Insertion from 11369 to 11377.
 nil
 (11332 . 11344) ; Insertion from 11332 to 11344.
 ("<li" . 11332) ; Deleteion of "<li" to 11322.
 nil
 (t 24653 33109 208947 953000))

Emacs keeps undo records in buffer-undo-list. Every time the buffer content changes, Emacs pushes multiple entries onto the list, each representing a change like insertion, deletion, etc. These entries are grouped by the nil entries as delimiters, so multiple actions can be undone at once. Here I’ll call a group of entries a “modification”.

For example, if I insert “ABCDE” into a buffer ¹, and undo twice, I’ll see “ABC” in the buffer, and two undo modifications would be pushed onto buffer-undo-list, one for deleting “E” and the other “D”. If we keep undoing, we will keep going back until all edits are undone.

To stop undo and start redo, we press C-g which breaks the undo chain ². All the undo records we just created are now considered ordinary modifications and further undo undoes these previous undo’s. So if we undo twice, we are back to “ABCDE”.

How does this “undo chain” work? When we invoke the first undo command, it sets pending-undo-list to the value of buffer-undo-list; further undo commands pop modifications from pending-undo-list and extend buffer-undo-list. And when we break the undo chain, the next undo command will once again set pending-undo-list to the value of buffer-undo-list.

Here’s what happens when branching occurs: if we undo to “ABC”, and insert “F”. Emacs simply pushes a new modification to buffer-undo-list, just like before.

Besides extending buffer-undo-list, Emacs also maps buffer states to their equivalents. After Emacs undid a modification, it maps the tip of buffer-undo-list to the tip of pending-undo-list in undo-equiv-table. This is the key to construct a tree from the linear undo list.

BTW, as shown in the figure, pending-undo-list and buffer-undo-list are really pointing to the same list object, just different cons cells in that list.

2 Constructing the tree

Here is an example undo tree and the corresponding undo list. The undo list can be viewed as “wrapping around” the tree. To “construct” the tree out of the undo list, we need to know:

Which node to show. In the list, node 5 and 6 are duplicates of earlier nodes and don’t need to appear in the tree.
Establish parent-child relationships between nodes. Starting with node 0, it needs to know node 1 is its child, node 1 needs to know 2 is its child, and so on.

Both are easy to figure out with the help of undo-equvi-table: For a modification m (the modification that creates buffer state m), if it is an ordinary change, buffer state m must be a child of m-1; if m is an undo change, then buffer state m must be equivalent to a previous node, say, n. m equivalent to n means 1) we don’t draw m in the tree, only n, and 2) children of m are children of n.

In this example, 2 is an ordinary modification, so 2 is a child of 1. 6 is an undo modification and is equivalent to 2, so we don’t draw 6, only 2; and 6’s child, 7, becomes 2’s child.

So it turns out that constructing the tree is simple: we first go over buffer-undo-list to generate a list of modifications. Then we go over the modification list, identify equivalent nodes and establish parent-child relationships. In the end, we can draw out the tree starting from the first node, either depth-first or breadth-first.

3 Moving around the tree

Drawing out the tree is only half the story, the undo tree isn’t of any use if we can’t go back and forth in time by moving around on the tree. Say we are at node m and want to move to node n. What should Emacs do to bring us back?

My immediate thought is to just repeatedly call undo until we are at node n. It works, but only within 10 yards: simple movements could easily explode the undo list. For example, suppose we are at node 1 and move back to node 0, what happens to the undo list?

We need to undo from 9 all the way back to 0, going back and forth between 1 and 3. Worse, if we now want to go from 0 to 1, we need to undo from 18 to 1. The undo list doubles every time we move back and forth between 0 and 1.

Hmm, if we are in an undo chain, undoing from 1 to 0 would be so much easier: pending-undo-list would point at 1 (instead of 9), and we simply undo modification 1. Why don’t we just do that, regardless of whether we are in an undo chain? During chained undo, undo pops modifications from pending-undo-list and feeds them into primitive-undo. We can similarly find the modification between 0 and 1 and just feed it to primitive-undo.

Let’s look at it more closely. If I want to go from buffer state 1 to 2, how do I find a list of modifications to feed to primitive-undo? We can feed it modification 5 to go 5–4, or modification 9 to go 9–8, both of them move us back to the buffer state at node 2. Or we can even go 9–8–7–6–5–4, or 5–4–3–2.

So, to find a valid “route” from m to n, it needs to satisfy: 1) the start is equivalent to m and the destination is equivalent to n, and 2) the start is older than the destination, i.e., start > end, because primitive-undo can only take us backwards in the undo list. Once we found all the valid routes, we pick the shortest one and feed it to primitive-undo, teleport!

3.1 A small problem

Even though we can move between nodes with minimum steps, it is still possible to extend the undo list indefinitely by simply moving around. For example, jiggling between nodes 0 and 1 in this tree keeps growing the undo list.

The previous example looks innocuous because there is only one modification between 0 and 1. It is another story, however, to move between two distant nodes. Consider this example: moving between the tip of the two branches, node 4 and node 387, appends hundreds of modifications to the undo list. If we move back and forth between them, buffer-undo-list quickly grows to thousands of modifications and slows down the processing of it. Worse, because the capacity of buffer-undo-list is limited, we might lose old records.

Let’s come back to this tree. Don’t all those modifications on the bottom look excessive? We don’t lose any information if we trim them off. The question is, how to identify nodes that can be trimmed off without damaging the undo tree?

We have mentioned that there are two types of modifications: ordinary modifications created by the user doing some edit, and undo modifications by undoing a previous modification. Among these two types, only ordinary modifications create new buffer states. Undo modifications, on the other hand, only bring us back to previous states. That means the complete undo tree is preserved as long as our undo list preserves all the ordinary modifications.

Consider this tree where black circles represent ordinary modifications and white circles represent undo modifications. The node m is the last ordinary modification in the undo list. We only need to preserve the undo list from the beginning to m to preserve the undo tree, plus the undo list from m to the green node to bring us to the current position. Any modifications beyond the green node is dispensable. In other words, we keep all the ordinary modifications in the undo list and trim after that at the earliest node corresponding to the current position.

Now we can move around the tree efficiently, and the length of buffer-undo-list is bound. My crystal ball tells me there is only one problem left.

3.2 Another small problem

As a bonus service for its loyal customers, the garbage collector trims buffer-undo-list automatically.

I’m flattered, does that mean some cons cells will quietly disappear when the lisp machine decides to collect some garbage in the middle of my code? Luckily, that’s not the case. The garbage collector doesn’t release cells in buffer-undo-list when there are references to them besides buffer-undo-list ³. Since the data structure we generated refers to cons cells in the undo list, the boys are safe as long as we hold on to our data structures.

Of course, we can’t let the undo tree grow forever. And when we let the garbage collector trim the undo list, it inevitably damages our precious little tree.

Consider this undo tree, if the garbage collector releases the first two modifications, then they don’t end up in our modification list. In this case, we don’t regard the last two modifications as undos anymore, they are now normal edits. This is a bit weird as we now have two branches, but that’s just the fact of life.

4 Show me the code

;; Simplified definition of `vundo-m'.
(cl-defstruct vundo-m
  ;; As a modification in the mod list:
  idx
  undo-list
  ;; A doubly-linked list of equivalent states:
  prev-eqv
  next-eqv
  ;; As a node in the tree:
  children
  parent
  point)

Github, and local backup. The package requires the current development branch of Emacs, i.e., Emacs 28. The flow of the program is roughly:

Kick-start the process in vundo--refresh-buffer. It determines if we are generating everything from scratch, or incrementally updating our data.
Generate a list of modifications from buffer-undo-list by vundo--mod-list-from.
Build the tree by vundo--build-tree
Draw the tree by vundo--draw-tree.
Move around by vundo--move-to-node. It also trims buffer-undo-list.

Each modification is stored in a vundo-m struct, it also represents the corresponding buffer state and the corresponding node in the tree.

Footnotes:

Normally when you insert “ABCDE”, the individual changes are amalgamated into one. Here, for demonstration’s sake, we assume each insertion creates a separate record.

Any command other than undo breaks the undo chain.

Although the cons cells are not released, buffer-undo-list does shrink. That’s fine because all the information we need are stored in our data structures, which don’t change.