# BST

November 5, 2009My friend Kototama and I are both learning functional languages, and exchanging or comparing code for some well-known algorithms. The last to date is an insertion in a binary search tree (BST).

The most intuitive algorithm is recursive, but not *tail-recursive*. One cannot assume a BST to be balanced, and it behaves as a simply linked list in the worst case. Having a tail-recursive algorithm will guarantee the insertion, even in a very deep tree.

*Example of tail recursion*

The data structure is quite simple: A BST containing elements of type *a* is either an empty leaf (Nil) or a Node with 3 items: an element of type *a* (elt), and 2 sub-nodes which are BSTs containing *a*s: l for *left* and r for *right*.

```
data Bst a = Nil | Node {
elt :: a,
l :: Bst a,
r :: Bst a} deriving Show
```

### Intuitive algorithm ¶

```
insert_ntr Nil x = Node x Nil Nil
insert_ntr n x | x < elt n = n {l = insert_ntr (l n) x}
insert_ntr n x | x > elt n = n {r = insert_ntr (r n) x}
insert_ntr n x | otherwise = n
```

The second line tells us that inserting x in n if x < n is the same node n with its left branch equal to the insertion of x in the left of n.

Let’s take the numbers 4, 3, 1, 2 and insert them in a BST of integers, with a root equal to Nil.

- insert Nil 4 → this gives us a new root
- insert (Node 4 Nil Nil) 3 → this inserts 3 under 4 on the left.
- insert (Node 4 (Node 3 Nil Nil) Nil) 1 → this inserts 1 under 3 on the left.
- insert (Node 4 (Node 3 (Node 1 Nil Nil) Nil) Nil) 2 → this inserts 2 under 1 on the right.

The resulting tree is the following:

```
Node {elt = 4, l = Node {elt = 3, l = Node {elt = 1, l = Nil,
r = Node {elt = 2, l = Nil, r = Nil}}, r = Nil}, r = Nil}
```

### Tail-recursive algorithm ¶

Suppose we already have 3 of the previous nodes (4, 3, 1), and want to insert our final element, the number 2. We know that 2 is going to the left of 4, but where exactly is not known yet. So in effect, the root node is a function of its left branch. It is a function that takes a tree (not sure what it is yet), attaches it to the left of number 4, keeps the right branch intact, and returns the whole thing as a root.

In turn, this left branch is the number 3, with nothing to its right and something that is going to change to its left, since 2 < 3. So the node 3 is a function of its left branch.

We go like this all the way to the node with 1, which is a function of its right branch.

- root = λ a → Node 4 a Nil
- a = λ b → Node 3 b Nil
- b = λ c → Node 1 Nil c
- c = Node 2 Nil Nil

So there we have the general algorithm: if we compose these functions, we get a complete root The key is to consider the root as a function of one of its branch, which is in turn a function of one of its branch, etc. By composing those functions step by step, we create a root as a function of the node to insert, which is done at the leaf level. The first function is composed with *id*, which is the *identity function*.

The last step is to attach the new leaf:

root = (λ c → (Node 4 (Node 3 (Node 1 Nil c) Nil) Nil) (Node 2 Nil Nil)

root = (Node 4 (Node 3 (Node 1 Nil (Node 2 Nil Nil)) Nil) Nil)

### In Haskell: ¶

```
insert n x = recur n x id where
recur Nil x fun = fun $ Node x Nil Nil
recur n x fun | x < elt n = recur (l n) x $ fun . (\k -> n {l = k})
recur n x fun | x > elt n = recur (r n) x $ fun . (\k -> n {r = k})
recur n x fun | otherwise = n
```

The whole code is on github’s gist; I find it very beautiful.