BST

November 5, 2009

My friend Kototama and I are both learning functional languages, and exchanging or comparing code for some well-known algorithms. The last to date is an insertion in a binary search tree (BST).

The most intuitive algorithm is recursive, but not tail-recursive. One cannot assume a BST to be balanced, and it behaves as a simply linked list in the worst case. Having a tail-recursive algorithm will guarantee the insertion, even in a very deep tree.

Tail-recursive cat

Example of tail recursion

The data structure is quite simple: A BST containing elements of type a is either an empty leaf (Nil) or a Node with 3 items: an element of type a (elt), and 2 sub-nodes which are BSTs containing as: l for left and r for right.

data Bst a = Nil | Node {
    elt :: a,
    l :: Bst a,
    r :: Bst a} deriving Show

Intuitive algorithm

insert_ntr Nil x = Node x Nil Nil
insert_ntr n x | x < elt n = n {l = insert_ntr (l n) x}
insert_ntr n x | x > elt n = n {r = insert_ntr (r n) x} 
insert_ntr n x | otherwise = n 

The second line tells us that inserting x in n if x < n is the same node n with its left branch equal to the insertion of x in the left of n.

Let’s take the numbers 4, 3, 1, 2 and insert them in a BST of integers, with a root equal to Nil.

  1. insert Nil 4 → this gives us a new root
  2. insert (Node 4 Nil Nil) 3 → this inserts 3 under 4 on the left.
  3. insert (Node 4 (Node 3 Nil Nil) Nil) 1 → this inserts 1 under 3 on the left.
  4. insert (Node 4 (Node 3 (Node 1 Nil Nil) Nil) Nil) 2 → this inserts 2 under 1 on the right.

The resulting tree is the following:

Node {elt = 4, l = Node {elt = 3, l = Node {elt = 1, l = Nil,
     r = Node {elt = 2, l = Nil, r = Nil}}, r = Nil}, r = Nil}

Tail-recursive algorithm

Suppose we already have 3 of the previous nodes (4, 3, 1), and want to insert our final element, the number 2. We know that 2 is going to the left of 4, but where exactly is not known yet. So in effect, the root node is a function of its left branch. It is a function that takes a tree (not sure what it is yet), attaches it to the left of number 4, keeps the right branch intact, and returns the whole thing as a root.
In turn, this left branch is the number 3, with nothing to its right and something that is going to change to its left, since 2 < 3. So the node 3 is a function of its left branch.
We go like this all the way to the node with 1, which is a function of its right branch.

  1. root = λ a → Node 4 a Nil
  2. a = λ b → Node 3 b Nil
  3. b = λ c → Node 1 Nil c
  4. c = Node 2 Nil Nil

So there we have the general algorithm: if we compose these functions, we get a complete root The key is to consider the root as a function of one of its branch, which is in turn a function of one of its branch, etc. By composing those functions step by step, we create a root as a function of the node to insert, which is done at the leaf level. The first function is composed with id, which is the identity function.

The last step is to attach the new leaf:
root = (λ c → (Node 4 (Node 3 (Node 1 Nil c) Nil) Nil) (Node 2 Nil Nil)
root = (Node 4 (Node 3 (Node 1 Nil (Node 2 Nil Nil)) Nil) Nil)

In Haskell:

insert n x = recur n x id where
    recur Nil x fun = fun $ Node x Nil Nil
    recur n x fun | x < elt n = recur (l n) x $ fun . (\k -> n {l = k})
    recur n x fun | x > elt n = recur (r n) x $ fun . (\k -> n {r = k})
    recur n x fun | otherwise = n 

The whole code is on github’s gist; I find it very beautiful.