Warning

This book is new. If you'd like to go through it, then join the Learn Code Forum to get help while you attempt it. I'm also looking for feedback on the difficulty of the projects. If you don't like public forums, then email help@learncodethehardway.org to talk privately.

# Exercise 20: Binary Search Trees

In this exercise I'm going to teach you to translate an English description of a data structure into working code. You already know how to analyze the code for an algorithm or data structure using the master copy method. You also know how to read a p-code (psuedo-code) description of an algorithm. Now you will combine the two and learn how to break down a rather loose English description of the Binary Search Tree.

I'm going to start off right away and warn you to *not* visit the Wikipedia page when you do this exercise. The Binary Search Tree description on Wikipedia has mostly working Python code for the data structure, so it would defeat the point of this exercise. If you get stuck then you'll be able to read any resources you can, but first try to do it from just my descriptions here.

# Binary Search Trees

In Exercise 16 you saw how a Merge Sort takes a flat, linked list and seems to convert it into a tree of sorted parts. It keeps cutting the list into smaller pieces and then assembles the pieces back together by sorting lesser valued parts on the "left" and greater valued parts on the right. In a way, a Binary Search Tree (`BSTree`) is a data structure that does this sorting right away and never tries to keep the items in a list. The main usage for a `BSTree` is to use a tree to organize pairs of `key=value` nodes in order *ahead* of time, as you insert or delete them.

A `BSTree` does this by starting the tree at a root `key=value`, and having a right or left path (link). If you insert a new `key=value`, then the `BSTree`'s job is to start at the root and compare the `key` to each node: going left if your new key is less-than and going right if your key is greater-than. Eventually the `BSTree` finds a position in the tree that--should you follow the original path--will find it by following the same process. All operations after that do the same thing by comparing any keys to each node, moving left and right until it either finds the node or reaches a dead end.

In this way a `BSTree` is an alternative to a `Dictionary` from Exercise 17, and so it should have the same operations. The basic `BSTreeNode` would need `left`, `right`, `key`, and `value` attributes to create the tree structure. You may also need a `parent` attribute depending on how you do this. The `BSTree` then needs the following operations on a `root` `BSTreeNode`:

- get
- Given a
`key`, walk the tree to find the node, or return`None`if you reach a dead end. You go`left`if the given`key`is less-than the node's`key`. You go right if the`key`is greater-than the node's`key`. If you read a node with no left or right, then you're done and the node does not exist. There is a way to do this using recursion and using a while loop. - set
- This is nearly the same as
`get`except once you reach that dead end node you simply attach a new`BSTreeNode`on the`left`or`right`, thus extending the tree down one more branch. - delete
- Deleting nodes from a
`BSTree`is a complex operation, so I have a whole section just on`delete`. The short version is you have three conditions: the node is a leaf (no children), has one child, or has two children. If it's a leaf then just remove it. If it has one child, then replace it with the child. If it has two children, then it gets really complicated so read the section on deleting below. - list
- Walk the tree and print everything out. The important piece to
`list`is that you can walk the tree in different ways to produce different output. If you walk the`left`, then the`right`paths, you get something different than if you do the inverse. If you go all the way to the bottom and then print as you come up the tree toward`root`, you get yet another kind of output. You can also print the nodes as you go down the tree, from`root`to the "leaves". Try different styles to see which one does what.

# Deleting

Remember we have three conditions to deal with when deleting a node (which I'll call `D`):

- The
`D`node is a "leaf" node because it has no children (not`left`or`right`). Just remove it from the parent. - The
`D`node has only one child (either`left`or`right`but not both). In that case you can simply move the`value`of this child to the`D`node, then delete the child. That effectively replaces the`D`node with the child (or, "moves the child up"). - The
`D`node has both a`left`and`right`child, which means it's time to do some major surgery. First, find the minimum child of the`D.right`node called the`successor`. Set the`D.key`to the`successor.key`, and then do the same delete on this successor's children using its key.

You will most likely also need an operation for `find_minimum` and for `replace_node_in_parent` to perform those two operations. I mention that you *might* need a parent attribute depending on how you implement it. I'd say go with a `parent` node as that's easier in most cases.

Note

Everyone hates deleting from a tree. It's a complicated operation and even my favorite reference, "The Algorithm Design Manual" by Steven S. Skiena skips over it because the the implementation "looks a little ghastly". Don't be discouraged if you have a hard time figuring `delete` out.

# Exercise Challenge

You are to implement your `BSTree` using this purposefully vague description. Try not to look at too many references while you make a first attempt, then when you get stuck go read how others have done it. The point of this exercise is to attempt to solve a complicated problem from an admittedly terrible description of it.

The trick to solving this is to first translate the paragraphs of English into rough p-code. Then translate the rough p-code to more exact p-code. Once you have a more exact p-code you can then translate that into Python. Pay special attention to specific words as a single word in English could mean a great many things in Python. Sometimes you just have to make a guess and run your tests to see if that's probably the right one.

Tests will also be very important, and it might be a good idea to apply the "test first" method to this problem. You know what each of these operations should do, so you can write a test for it, then make the test work.

# Study Drills

- Can you develop a pathological test that does inserts in such a way as to make the
`BSTree`be nothing more than a fancy linked list? - What happens when you try to delete from this "pole" of a
`BSTree`? - How does the speed of the
`BSTree`compare to the speed of your newly optimized`Dictionary`? - How fast can you make the
`BSTree`using your performance analysis and tuning process?