How to build a binary tree


Build the Forest in Python Series: Binary Tree Traversal

[Updated: July 3, 2021]

The article is the second one of the Build the Forest Series. In this article, we are not going to build a tree. Instead, we will implement the functions to traverse binary trees. When we use a data structure to manage data, one important use case is to go over all the data we manage. That’s why tree traversal is essential.

Traversal is a method of examining the nodes of a tree systematically to visit each node exactly once. There are many ways to traverse a binary tree: in-order, pre-order, post-order, and level-order. The names come from the relative position of the root to its subtrees. Tree traversal is also a form of graph traversal. In-order, pre-order, and post-order traversals are types of depth-first traversal, whereas Level-order traversal is a type of breadth-first traversal. This article will go through these traversal functions’ implementation in both recursive and iterative ways.

Table of Contents

Project Setup

The same as Build the Binary Search Tree, the implementation assumes Python 3.9 or newer. In addition, we add two modules to our project: traversal.py for traversal functions and test_traversal.py for its unit tests. After adding these two files, our project layout becomes the following:

Forest-python ├── forest │ ├── __init__.py │ ├── binary_trees │ │ ├── __init__.py │ │ ├── binary_search_tree.py │ │ └── traversal.py │ └── tree_exceptions.py └── tests ├── __init__.py ├── conftest.py ├── test_binary_search_tree.py └── test_traversal.py

(The complete code is available at forest-python)

Function Interface

The traversal methods we implement here try to be as generic as possible. In theory, we can apply the traversal methods to any binary trees as long as their nodes have left and right fields, and this is true. However, for different types of binary trees, their nodes may have additional information. For instance, a threaded binary tree node has information that tells us if the node is a leaf or a thread. In this case, when we traverse a threaded binary tree, we need to check the extra information to determine if we reach the leaf or not. In other words, we cannot just check if the node’s left or right fields are empty.

Instead of adding logic for each tree type’s different conditions, we define the supported tree type, so the client (i.e., a human or a linter) would know if a tree type is supported or not. Our supported type definition looks like below:

from typing import Union from forest.binary_trees import binary_search_tree SupportedNode = Union[None, binary_search_tree.Node] SupportedTree = Union[binary_search_tree.BinarySearchTree]

Currently, we support types None and Node for nodes, and we only support type BinarySearchTree for trees; we will add more when we implement other types of binary trees.

With the type definitions, we can utilize the type annotation to define our traversal functions (use in-order traversal as an example).

from typing import Any, Iterator, Optional Pairs = Iterator[tuple[Any, Any]] """Iterator of Key-Value pairs, yield by traversal functions. For type checking""" def inorder_traverse(tree: SupportedTree, recursive: bool = True) -> Pairs: …

Note that we also define a custom type Pairs for the return. Although we need to write more code in the way, as The Zen of Python suggests, “Explicit is better than implicit.” This way, the client can clearly see the correct parameter type to input and the type the functions will return, and a type-checking tool (like mypy) can prevent type mismatch issues. Besides, we will implement them in both recursive and non-recursive ways for depth-first traversals, and that’s why we have the second parameter, recursive.

Why not @overload?

Python is a dynamically typed language, and function overloading does not make too much sense in Python. But we can utilize the type annotation to prevent type mismatch issues.

Since Python 3.5, PEP-484 introduced @overload decorator. According to the official document, “The @overload decorator allows describing functions and methods that support multiple different combinations of argument types.” It sounds great. However, it is for the benefit of type checkers only. At runtime, a client code can still pass any parameters to the function. The document also says, “An example of overload that gives a more precise type than can be expressed using a union or a type variable.” Therefore, we define the SupportedTree type using the Union. Using SupportedTree is also less code than defining several @overload-decorated definitions.

Why the return type Pairs is an iterator?  

The idea is to implement the traversal functions as generators. One main benefit to do so is that when a generator iterator returns, it picks up where it left off. This method could save a lot of memory when a tree is huge. The client code of the traversal functions can process one item at one time. Besides, the client code will be simpler and easier to read.

In-Order Traversal

In-order traversal visits a binary tree by the following method, and each node can represent as a root of the subtree.

  1. Traverse to the left subtree.
  2. Visit the root.
  3. Traverse to the right subtree.

The picture below demonstrates the idea of in-order traversal. The yellow part is the left subtree of the root 23, and the green part is root 23’s right subtree. And the small numbers (i.e., 1, 2, 3) next to the nodes indicate the traversal order. So in-order traversal traverse the tree in this order: 1 -> 2 -> 3 -> 2 -> 1 -> 2 -> 3 (The color maps the picture below)

If the binary tree is also a binary search tree, the binary-search-tree-property allows us to produce sorted order key values by in-order traversal, as shown in the picture above.

Binary tree traversal can be done by using either recursion or an auxiliary stack. As mentioned in the previous section, we implement them in both recursive and non-recursive ways.

def inorder_traverse(tree: SupportedTree, recursive: bool = True) -> Pairs: if recursive: return _inorder_traverse(node=tree.root) return _inorder_traverse_non_recursive(root=tree.root)

Because we want to hide the implementation detail, we name the actual implementation function (_inorder_traverse and _inorder_traverse_non_recursive) with a leading underscore, which means they are internal functions.

Recursive Implementation

Traversal recursively is the most natural way to traverse a tree because of the in-order traversal definition and can be implemented as the following.

def _inorder_traverse(node: SupportedNode) -> Pairs: if node: yield from _inorder_traverse(node.left) yield (node.key, node.data) yield from _inorder_traverse(node.right)

Note that we use yield from to call _inorder_traverse recursively. That’s because _inorder_traverse is a generator; to allow a generator to delegate part of its operations to another generator, we need to use yield from.

Non-Recursive Implementation

We visit the left subtree first, then the root, and the right subtree regarding the in-order traversal. So, we push the right subtree first and then the root before we visit the left subtree. The push order is because a stack is a first-in-last-out data structure. After we visit the left subtree, we pop the stack to visit the root, and then pop again to visit the right subtree. Repeat the steps until the stack is empty.

To implement an in-order traversal function, we need a stack to keep the roots of subtrees that we will visit later. When doing non-recursive traversal, the keys are: 1. when do we push a node to the stack and pull a node from the stack, and 2. when do we produce (i.e., visit) the node while we are traversing.

  1. When a node we are visiting has a right child, we push its right child to the stack and then push the node to the stack.
  2. When a node is popped from the stack, we produce the node if the node has no right child or its right child is the same as the top node.

We use Python’s built-in list as a stack to implement the non-recursive in-order traversal because the built-in list has the first-in-last-out ability: list.append(x) adds an item to the end of the list and list.pop() removes and returns the last item from the list.

def _inorder_traverse_non_recursive(root: SupportedNode) -> Pairs: if root is None: raise StopIteration stack = [] if root.right: stack.append(root.right) stack.append(root) current = root.left while True: if current: if current.right: stack.append(current.right) stack.append(current) current = current.left continue stack.append(current) current = current.left else: # current is None if len(stack) > 0: current = stack. pop() if current.right is None: yield (current.key, current.data) current = None continue else: # current.right is not None if len(stack) > 0: if current.right == stack[-1]: yield (current.key, current.data) current = stack.pop() if len(stack) > 0 else None continue else: # current.right != stack[-1]: # This case means there are more nodes on the right # Keep the current and go back to add them. continue else: # stack is empty break

Reverse In-Order Traversal

In-order traversal produces sorted output in ascending order when traversing a type of binary search tree. If the binary tree is traversed in reversed in-order traversal, the sorted result becomes descending.   To traverse in reversed in-order order, we visit the right subtree first and the left subtree last.

  1. Traverse to the right subtree.
  2. Visit the root.
  3. Traverse to the left subtree.

Reverse in-order traversal can also be implemented by recursive and non-recursive ways. And the implementation is symmetric to in-order traversal.

def reverse_inorder_traverse(tree: SupportedTree, recursive: bool = True) -> Pairs: if recursive: return _reverse_inorder_traverse(node=tree.root) return _reverse_inorder_traverse_non_recursive(root=tree.root)

Recursive reversed in-order traversal

def _reverse_inorder_traverse(node: SupportedNode) -> Pairs: if node: yield from _reverse_inorder_traverse(node.right) yield (node.key, node.data) yield from _reverse_inorder_traverse(node.left)

Non-Recursive reversed in-order traversal

def _reverse_inorder_traverse_non_recursive(root: SupportedNode) -> Pairs: if root is None: raise StopIteration stack = [] if root. left: stack.append(root.left) stack.append(root) current = root.right while True: if current: if current.left: stack.append(current.left) stack.append(current) current = current.right continue stack.append(current) current = current.right else: # current is None if len(stack) > 0: current = stack.pop() if current.left is None: yield (current.key, current.data) current = None continue else: # current.right is not None if len(stack) > 0: if current.left == stack[-1]: yield (current.key, current.data) current = stack.pop() if len(stack) > 0 else None continue else: # current.right != stack[-1]: # This case means there are more nodes on the right # Keep the current and go back to add them. continue else: # stack is empty break

Pre-Order Traversal

Pre-order traversal visits a binary tree by the following method.

  1. Visit the root.
  2. Traverse to the left subtree.
  3. Traverse to the right subtree.

The picture below shows the idea of pre-order traversal. The yellow part is the left subtree of the root 23, and the green part is root 23’s right subtree. The same as in-order traversal, the small numbers (i.e., 1, 2, 3) next to the nodes indicate the traversal order, and it traverse the tree in this order: 1 -> 1 -> 2 -> 3 ->  1 -> 2 -> 3 (The color maps the picture below)

We also implement the pre-order traversal in recursive and non-recursive ways.

def preorder_traverse(tree: SupportedTree, recursive: bool = True) -> Pairs: if recursive: return _preorder_traverse(node=tree.root) return _preorder_traverse_non_recursive(root=tree. root)

Recursive pre-order traversal

Recursive implementation is straightforward, just follow the traversal order.

def _preorder_traverse(node: SupportedNode) -> Pairs: if node: yield (node.key, node.data) yield from _preorder_traverse(node.left) yield from _preorder_traverse(node.right)

Non-recursive pre-order traversal

Regarding the non-recursive implementation, the picture below demonstrates non-recursive pre-order traversal.

Non-recursive pre-order traversal is simpler than non-recursive in-order traversal. Since we visit the root first, the process can be seen as the steps:

  1. When we visit a node, we push its right child to the stack (if it has one) and then push its left child to the stack (if it has one).
  2. When a node is popped from the stack, we produce the node.
def _preorder_traverse_non_recursive(root: SupportedNode) -> Pairs: if root is None: raise StopIteration stack = [root] while len(stack) > 0: temp = stack. pop() yield (temp.key, temp.data) # Because stack is FILO, insert right child before left child. if temp.right: stack.append(temp.right) if temp.left: stack.append(temp.left)

Post-Order Traversal

Post-order traversal visits a binary tree by the following method.

  1. Traverse to the left subtree.
  2. Traverse to the right subtree.
  3. Visit the root.

The picture below shows the idea of post-order traversal. Similar to the previous traversals, the small numbers (i.e., 1, 2, 3) next to the nodes indicate the traversal order, and the post-order traversal traverses the tree in this order:  1 -> 2 -> 3 ->  1 -> 2 -> 3 -> 3 (The color maps the picture below)

Once again, we build the post-order traversal in recursive and non-recursive approaches.

def postorder_traverse(tree: SupportedTree, recursive: bool = True) -> Pairs: if recursive: return _postorder_traverse(node=tree. root) return _postorder_traverse_non_recursive(root=tree.root)

Recursive post-order traversal

Similar to the in-order and the pre-order traversals, recursive implementation is simple.

def _postorder_traverse(node: SupportedNode) -> Pairs: if node: yield from _postorder_traverse(node.left) yield from _postorder_traverse(node.right) yield (node.key, node.data)

Non-recursive post-order traversal

However, the non-recursive implementation for post-order traversal is a bit complicated. The picture below demonstrates the post-order traversal in a non-recursive way.

  1. Like non-recursive in-order traversal, when we perform non-recursive post-order traversal, if a node we visit has a right child, we push its right child to the stack and then push the node to the stack as well.
  2. We produce the node when the node is popped from the stack, and it has no right child, or the stack becomes empty. Besides, if the node popped from the stack has a right child and the child is different from the top node in the stack, it means that we already visited the right subtree. In this case, we can produce the node. Otherwise, we swap the top node in the stack with the current node and traverse to the right subtree.
def _postorder_traverse_non_recursive(root: SupportedNode) -> Pairs: if root is None: raise StopIteration stack = [] if root.right: stack.append(root.right) stack.append(root) current = root.left while True: if current: if current.right: stack.append(current.right) stack.append(current) current = current.left continue else: # current.right is None if current.left: stack.append(current) else: yield (current.key, current.data) current = current. left else: # current is None if len(stack) > 0: current = stack.pop() if current.right is None: yield (current.key, current.data) current = None else: # current.right is not None if len(stack) > 0: if current.right != stack[-1]: yield (current.key, current.data) current = None else: # current.right == stack[-1] temp = stack.pop() stack.append(current) current = temp else: # stack is empty yield (current.key, current.data) break else: # stack is empty break

Level-Order Traversal

Unlike the previous depth-first traversal, level-order traversal is breadth-first. In this case, we visit every node on a level before going to a lower level. The idea is shown in the picture below.

Instead of using a stack, we use a queue to implement the level-order traversal function. A queue is a first-in-first-out data structure. For each node, the node is visited first and then put its children in the queue. Dequeue a node from the queue, visit the node first, and then enqueue the node’s children. Repeat until the queue is empty.

We also use Python’s built-in list as a queue to implement the level-order function because the built-in list has the first-in-first-out ability as well: list.append(x) adds an item at the end of the list and list.pop(0) removes and returns the first item from the list.

def levelorder_traverse(tree: SupportedTree) -> Pairs: queue = [tree.root] while len(queue) > 0: temp = queue.pop(0) if temp: yield (temp.key, temp.data) if temp.left: queue. append(temp.left) if temp.right: queue.append(temp.right)

Test

As always, we should have unit tests for our code as much as possible. Here, we use the basic_tree function in conftest.py that we created in Build the Binary Search Tree to test our traversal functions. In test_traversal.py, we can do the unit test like below: We check if the traversal output is the same as expected (test postorder_traverse as an example).

def test_binary_search_tree_traversal(basic_tree): """Test binary search tree traversal.""" tree = binary_search_tree.BinarySearchTree() for key, data in basic_tree: tree.insert(key=key, data=data) assert [item for item in traversal.postorder_traverse(tree)] == [ (1, "1"), (7, "7"), (15, "15"), (22, "22"), (20, "20"), (11, "11"), (4, "4"), (24, "24"), (34, "34"), (30, "30"), (23, "23") ]

(Check test_traversal.py for the complete unit test)

Analysis

Depth-First Traversal: In-Order, Reverse In-Order, Pre-Order, and Post-Order

Run-time in In-Order, Reserve In-Order, Pre-Order, and Post-Order Traversals

In graph algorithms, the running time of depth-first search is $O(V+E)$, where V is the number of vertex and E is the number of edges. (The reason why the run time is$O(V+E)$, please refer Introduction to Algorithms chapter 22.3). Since in-order, reverse in-order, pre-order, and post-order traversals are types of depth-first traversals, we can map these traversals to the depth-first search. Therefore, we have run time $O(V+E)$ for these traversals. A vertex in a binary tree is a node. Besides, for a binary tree, if the tree has $n$ nodes, it must have $n-1$ edges. Therefore, we can rewrite the run time to $O(n+(n-1))=O(2n-1)=O(n)$, where $n$ is the number of nodes. Therefore, their run-time becomes $O(n)$.

Regarding the space complexity, we need to analyze it based on our approach: recursive or non-recursive.

Space Complexity in Recursive Approach

The space usage is the call stack when we make the recursive approach, i.e., how deep the function calls itself. We traverse a binary tree from the root to the leaf level, so the tree height determines how deep the function calls itself. Therefore, if a binary tree has $n$ node, we know the average case is $O(log_{2} n)$ when the tree is randomly built; the worst case is $O(n)$ when the tree is linear chained.

Space Complexity in Non-Recursive Approach

The space usage is the size of the stack if we use the non-recursive approach. The same as the recursive approach, the stack size is decided by the height of a tree. Therefore, for a binary tree that has $n$ node, we know the average case is $O(log_{2} n)$ when the tree is randomly built; the worst case is $O(n)$ when the tree is linear chained.

Breadth-First Traversal: Level-Order

A breadth-first search algorithm has time complexity $O(V+E)$, where V is the number of vertex and E is the number of edges (please refer Introduction to Algorithms chapter 22.2 for the detail). Therefore, the running time of level-order traversal is also $O(n)$, where $n$ is the number of nodes. The reason is the same as in-order and the rest depth-first type traversals discussed above.

Since we use a queue to implement level-order traversal, the space cost is the queue’s size. The reason we use a queue in level-order traversal is to track the nodes at the same level. {log_{2} n})=O(n)$, where $n$ is the number of nodes.

However, the queue size’s best case happens when each level has only one node, i.e., the tree is linear chained. In this case, the space complexity becomes constant, i.e., $O(1)$.

The table below summarizes the complexity of each traversal implementation.

TraversalApproachTime ComplexitySpace Complexity
In-OrderRecursive$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
 Non-Recursive (Use Stack)$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
Reverse In-OrderRecursive$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
 Non-Recursive (Use Stack)$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
Pre-OrderRecursive$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
 Non-Recursive (Use Stack)$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
Post-OrderRecursive$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
 Non-Recursive (Use Stack)$O(n)$Average case: $O(log_{2} n)$; Worst case: $O(n)$
Level-OrderNon-Recursive (Use Queue)$O(n)$Best case: $O(1)$ Worst case: $O(n)$

Examples

Since we have implemented the traversal routines, we can use these traversal functions to implement the __iter__ method of the Map class from the Binary Search Tree: Example and iterate the data of a map object.

from typing import Any, Optional from forest.binary_trees import binary_search_tree from forest.binary_trees import traversal class Map: """Key-value Map implemented by a Binary Search Tree.""" def __init__(self) -> None: self._bst = binary_search_tree.BinarySearchTree() def __setitem__(self, key: Any, value: Any) -> None: """Insert (key, value) item into the map.""" self._bst.insert(key=key, data=value) def __getitem__(self, key: Any) -> Optional[Any]: """Get the data by the given key.""" node = self._bst.search(key=key) if node: return node.data return None def __delitem__(self, key: Any) -> None: """Remove a (key, value) pair from the map.""" self._bst.delete(key=key) def __iter__(self) -> traversal.Pairs: """Iterate the data in the map.""" return traversal.inorder_traverse(tree=self._bst) @property def empty(self) -> bool: """Return `True` if the map is empty; `False` otherwise. """ return self._bst.empty if __name__ == "__main__": # Initialize the Map instance. contacts = Map() # Add some items. contacts["Mark"] = "[email protected]" contacts["John"] = "[email protected]" contacts["Luke"] = "[email protected]" # Retrieve an email print(contacts["Mark"]) # Delete one item. del contacts["John"] # Check the deleted item. print(contacts["John"]) # This will print None # Iterate the items. for contact in contacts: print(contact)

(The complete example is available at bst_map.py)

Summary

Binary tree traversals are the methods to visit each node of a binary tree systematically, and each node is visited only once. They are special cases of graph search algorithms, e.g., depth-first search and breadth-first search, and can be implemented by different approaches, and each approach has its pros and cons. The following articles will introduce modified binary search trees, threaded binary search trees, which allow us to traverse the trees without using a stack or recursive approach, i. e., the space complexity for traversals is always constant.

Leaf It Up To Binary Trees. Most things in software can be broken… | by Vaidehi Joshi | basecs

Most things in software can be broken up into smaller parts. Large frameworks are really just small pieces of functionality that have been built up to create a heavyweight code infrastructure. Object-oriented programming is really just a bunch of classes that inherit from one another. And classical inheritance and the “class hierarchy” is really just a hierarchal tree structure. And trees are data structures that are really just a bunch of nodes and links that are connected to one another.

See? Everything is a bunch of little things put together. Where it gets really interesting, however, is when we start looking at how those little pieces function under the surface — that is to say, all of the different ways that they can be used to build larger abstractions.

Last week, we learned about tree data structures, which are usually abstracted away but are important pieces of how larger things — like the file system on our computers — actually work under the hood. There are a few different types of tree structures that are used in programming, but the most common (and, arguably, the most powerful) one is a binary search tree.

Binary search trees, sometimes abbreviated as BSTs, are tree data structures that adhere to a set of very specific rules. Just from their name, you might already be able to guess what those rules are. The reason these trees are so powerful and end up being so useful is because of these specific rules. But, before we gush over binary search trees any further, let’s dig a little deeper into what they are, and what makes them different from any other tree out there!

We’re already familiar with the term binary since we’ve talked about it in the context of the binary (or base 2) number system. However, binary also has a simpler meaning: anything relating to or composed of two things. In the context of trees, this might seem a little odd at first. A tree can only have one single root — that’s one of its defining characteristics! So how does one tree become two? How does a tree become…binary, exactly?

In order to understand what puts the “binary” in binary search tree, we have to think back to one other characteristic of the tree data structure: recursiveness. Trees are recursive data structures, which means that a single tree is made up of many others.

The child node of a tree structure could also very well be the parent node to many other child nodes — which would effectively make it the root node of a mini subtree of the larger tree structure.

The recursive characteristic of a tree is crucial to how a binary search actually functions (not to mention, one of the reasons that it is so powerful).

Binary trees get their name from the way that they are structured; more specifically, the get their name from the way that their subtrees are structured. Every binary tree has a root node, just as we might expect. But after that, things get a bit more narrow and strict. A binary tree can only ever have two links, connecting to two nodes. This means that every parent node can only ever have two possible child nodes — and never any more than that.

But where does recursion come into the mix? Well, if every parent node, including the root node, can only ever have two child nodes, this means that the root node can only point to two subtrees. Thus, every binary tree will contain two subtrees within it: a left subtree and a right subtree. And this rule keeps applying as we go down the tree: both the left subtree and the right subtree are binary trees in and of themselves because they are recursively part of the larger tree. So, the left subtree’s root will point to two more trees, which means that it contains its very own left subtree and right subtree!

Every binary tree will contain two subtrees within it: a left subtree and a right subtree.

The recursive aspect of a binary search tree is part of what makes it so powerful. The very fact that we know that one single BST can be split up and evenly divided into mini-trees within it will come in handy later when we start traversing down the tree!

But first, let’s take a look at one more important rule that makes binary trees so special.

If you haven’t fully grasped the power of binary search trees yet, that’s because I haven’t shared their most important characteristic: a binary search tree’s nodes must be sorted in a specific way. In fact, the way that a binary search tree is organized and sorted is what makes it “searchable”.

In order for a BST to be searchable, all of the nodes to the left of the root node must be less than the value of the root node. You might be able to guess, then, that this must mean that all of the values to the right of the root node have to be greater than the root node.

If we look at this ordering not just in theory but actually in practice, a pattern starts to reveal itself. All of the subtrees to the left of a node will always be smaller in value than the subtrees to the right of a node — and of course, because of the recursive nature of trees, this applies not just to the main overarching tree structure, but to every single nested subtree as well.

Nodes in a binary search tree are organized and order by value

In the example tree above, all of the nodes to the left of the root node, which has a value of 26, are smaller than 26. But, if we look at the green subtree on the left, we’ll notice that even though 24 is smaller than 26, it is still bigger than 19, and so it goes to the right of the node 19.

Any tree can be a binary tree if each node has only two child nodes. It’s the ordering of nodes that makes a binary tree searchable and, by extension, what makes it so powerful.

The explicit ordering of nodes is what makes a binary search tree so easy to, well, search. And this plays into all of the basic operations one might want to perform on a tree, too. For example, let’s say we wanted to perform an insert() operation, which we would expect to be able to take a value and insert it into the correct position in the tree.

How would we insert 21 into this binary search tree?

We can pseudocode an insert() function pretty easily simply because of the rules of ordering of a BST:

  1. We start at the root node, and compare that value of the root node (26) to the item we want.
  2. Since 21 is less than 26, we immediately determine that the item we want to insert is going to live somewhere within the left subtree of the larger binary search tree.
  3. We look at the new “root” node: 19. We know that the item we want to insert, 21, is greater than 19. So, we move to the right node of the node 19, since we know the item we are inserting is larger and has to be on the right subtree.
  4. Now we come to a leaf on the subtree: the node is 24, which is bigger than 21. We need to insert our item somewhere here, but we also need to make sure that the node with a value of 24 is in the correct place.
  5. We set the node we’re inserting,21, to point it’s right pointer reference to the pre-existing node 24, since 24 is greater than 21. And our insert is done!

Binary search trees are really special in computer science. And the reason for this is simple: they allow you to leverage the power of the binary search algorithm. You might recall that not all binary trees are binary search trees — they have to be organized in a specific way in order for us to perform binary searches on them.

Hold up — what even is a binary search? Well, we already started dipping our toes into binary searches in the process of pseudo-coding that insert function earlier.

A binary search is an algorithm that simplifies and speeds up searching through a sorted collection by dividing the search set into two groups and comparing an element to one that is larger or smaller than the one you’re looking for.

That definition is a mouthful, but it’s actually a lot simpler to understand and grasp by looking at it in action. So, let’s make it easier on ourselves with a drawing! We’ll work with the same binary search tree from earlier, which had a root node of 26; instead of tree format though, let’s reorganize it into a linear structure, like an array, to make it a little easier to look at.

Divide and conquer with binary search

Here we have 7 elements (nodes), and we want to find just one: 12. In most situations, we wouldn’t be able to easily see the values of all of our node, but in this example, I wrote them all out so that it’s easier to understand.

Okay — so how can we search for the node with a value of 12?

Well, we already know that our elements are all sorted by size. So let’s leverage that. We’ll start from the middle element, which is 26. Is 26 smaller or larger than the number we’re looking for? Well, it’s definitely larger. Which means we can be 100% certain that nothing to the right of this middle element could be home to the element that we’re looking for, 12. So, we can eliminate everything to the right of our array here.

Okay, so we have 3 elements to look through now! Let’s choose the middle of those, which happens to be 19. Is it bigger or smaller than 12? Definitely bigger; so we can eliminate everything to the right of this node, since we know it’ll be too big to be the right number.

Alright, one more number to remains, and that’s the one we’ll check! And of course, it’s the node we’ve been looking for: 12! Success!

If we take a look at this example again, we’ll notice that with each check and comparison that we do, we actually eliminate half of the remaining elements that we need to check. Think about that for a second.

In the process of narrowing down the search set, we also remove half of the search space. That’s enormously powerful — and that’s exactly what makes binary searches so powerful.

We reorganized our BST into an array, but a binary search on a tree and a binary search on an array function the same — just as long as all of the elements are sorted. That part is key. In an array, the elements to the left of the middle element are the left “subarray”, just like how in a binary search tree, the elements to the left of the root node make up the left “subtree”.

This algorithm is incredibly powerful when you think about the fact that we can have tons of nodes in a tree, but not have to search through all of them (the way that we would have to if we were searching through all of these elements one by one, or in a linear search). A binary search is far more efficient if our data is ordered and sorted so that we can easily cut our search space in two.

Hopefully, by seeing a binary search in action, you can start to see where it gets its name from — and why it is so perfectly named! We can search through a huge dataset, but be much smarter about it by dividing our search space in half with each comparison that we make.

Abstractions are at their best when you can wrap your head around when they are used and why they are so important and what makes them so useful. Which begs the question: when on earth do binary searches and binary search trees actually get used in the larger world of computer science?

Well, the answer is: in a lot of places! To be a little more precise in my answer: in databases, for one. If you’ve ever seen the term t-tree or b-tree in when working with a database, these are both types of binary tree data structures. But wait — it gets better! A database uses indexing to search for the correct row to retrieve and return. How does it search for the correct index? You guessed it: by using binary search!

However, binary searches and binary search trees aren’t just used in a lower level context; some of us have probably already interacted with them directly.

If you’ve ever been working on a project and realized that, somewhere along the way, you introduced a bug or broke a test, you might already be familiar with the awesome git command that is git bisect. According to the git documentation, this command lets you tell git to search through a list of all of your commits:

You use it by first telling it a “bad” commit that is known to contain the bug, and a “good” commit that is known to be before the bug was introduced. Then git bisect picks a commit between those two endpoints and asks you whether the selected commit is "good" or "bad". It continues narrowing down the range until it finds the exact commit that introduced the change.

How does this magic even work!? Through the beauty of a binary search tree, of course!

The git bisect command uses binary search to find “bad” commits.

By telling git when the last “good” commit was, it searches through all of the commits from then to now. But it doesn’t search chronologically of course — instead, it starts from the middle commit, and if you confirm that the middle commit is also “bad”, then it will continue to the earlier commits before that one, and discard all the commits afterwards. It conducts a binary search, exactly like the ones that we’ve been dealing with today!

Amazing, right? Who knew that so many of us were using such a simple concept to debug things on such a high level?! I don’t know about you, but I, for one, am convinced: we’re all better off when we leaf the traversing up to the binary search trees.

  1. Binary Trees, Professor Victor Adamchik
  2. Difference between binary search and binary search tree?, StackOverflow
  3. Data Structure and Algorithms Binary Search, TutorialsPoint
  4. Binary Trees (Advanced), Professor Nick Parlante
  5. Problem Solving With Algorithms and Data Structures, Brad Miller and David Ranum
  6. A tale of debugging with git-bisect, Hassy Veldstra

Harvard CS50 Course - Lecture: Binary Tree

You already know that binary search requires an array to be sorted. Thus, if we have an unsorted array in which we need to find a certain element, we have two options:

  1. Sort the list and apply a binary search.
  2. Keep the list always sorted when adding and removing elements from it at the same time.

One way to store lists sorted is a binary search tree. A binary (binary) search tree is a data structure that meets the following requirements:

  • It is a tree (a data structure that emulates a tree structure - it has a root and other nodes (leaves) connected by "branches" or edges without cycles).
  • Each node has 0 , 1 or 2 children.
  • Both left and right subtrees are binary search trees.
  • All nodes in the left subtree of an arbitrary node X have data key values ​​less than the data key value of node X itself.
  • All nodes of the right subtree of an arbitrary node X have data key values ​​greater than or equal to the data key value of node X itself.

Attention: the root of the "programmer" tree, unlike the real one, is at the top. Each cell is called a vertex, the vertices are connected by edges. The root of the tree is cell number 13 . The left subtree of node 3 is highlighted in the picture below:

Our tree meets all the requirements for binary trees. This means that each of its left subtrees contains only values ​​that are less than or equal to the value of the vertex, while the right subtree contains only values ​​greater than or equal to the value of the vertex. Both left and right subtrees are themselves binary subtrees.

The way to build a binary tree is not the only one: below in the picture you see another option for the same set of numbers, and there can be a lot of them in practice.

This structure helps to search: we find the minimum value by moving from the top to the left and down each time.

If you need to find the maximum number, go from the top down and to the right. Finding a number that is not the minimum or maximum is also quite easy. We descend from the root to the left or to the right, depending on whether our top is larger or smaller than the one we are looking for. So, if we need to find the number 89 we go like this:

You can also output numbers in sort order. For example, if we need to display all the numbers in ascending order, we take the left subtree and start from the leftmost vertex and go up.

Adding something to the tree is also easy. The main thing to remember about the structure. Let's say we need to add the value 7 to the tree. We go to the root, and check. Number 7, so we go to the left. There we see 5 , and we go to the right, since 7 > 5 . Further, since 7 and 8 have no children, we construct a branch from 8 to the left and attach 7 to it.

It is also possible to remove vertices from the tree, but this is somewhat more complicated.
There are three different deletion scenarios that we need to consider.

  1. The simplest option: we need to delete a vertex that has no children. For example, the number 7 that we just added. In this case, we simply go all the way to the top with that number and remove it.
  2. The node has one child node. In this case, you can delete the desired vertex, but its descendant must be connected to the "grandfather", that is, the vertex from which its immediate ancestor grew. For example, the number 3 should be removed from the same tree. In this case, its descendant, one, together with the entire subtree, is attached to 5 . This is easy to do, since all vertices to the left of 5 will be less than this number (and less than the removed triple).
  3. The most difficult case is when the vertex to be deleted has two children. Now the first thing we need to do is find the vertex where the next larger value is hidden, we need to swap them, and then delete the desired vertex. Note that the next vertex in value cannot have two children, in which case its left child will be the best candidate for the next value.

Let's remove the root node 13 from our tree. First we look for the closest to 13 is the number that is greater than it. This is 21 . Swap 21 and 13 and delete 13 .

There are different ways to build binary trees, some good, some not so good. What happens if we try to build a binary tree from a sorted list?

All numbers will simply be added to the right branch of the previous one. If we want to find a number, we will have no choice but to simply go down the chain.

This is no better than a linear search. There are other data structures that are more complex. But we will not consider them for now. Let's just say that to solve the problem of building a tree from a sorted list, you can shuffle the input data randomly.

Python binary tree traversal (code examples)

Yes, binary trees are not the most favorite topic of programmers. This is one of those old concepts that are constantly debated about the usefulness of studying. In your job, you rarely need to implement and traverse binary trees, so why focus on them so much in technical interviews?

Today we won't be flipping a binary tree (whoosh!), but we'll look at a couple of methods to get around it. By the end of the article, you will realize that binary trees are not as scary as they seem.

What is a binary tree?

We recently looked at the implementation of linked lists in Python. Each such list consists of a number of nodes pointing to other nodes. What if a node could point not to one other node, but to more of them? This is what trees are. In them, each parent node can have several child nodes. If each node has at most two descendant nodes (left and right), such a tree is called binary (binary).

In the above example, the "root" of the tree, i.e. the topmost node, has the value 1. Its children are 2 and 3. Nodes 3, 4, and 5 are called "leaves": they have no child nodes.

Building a binary tree in Python

How to build a tree in Python? The implementation will be similar to our class Node in the linked list implementation. In this case, we'll name the class TreeNode .

 class TreeNode: pass 

Define the method __init__() . As always, it takes self . We also pass a value to it that will be stored in the node.

 class TreeNode: def __init__(self, value): 

Set the value of the node, and then define the left and right pointers (set them to None first).

 class TreeNode: def __init__(self, value): self. value = value self.left = None self.right = None 

And… that's it! What, you thought that trees were much more complicated? When it comes to a binary tree, the only thing that makes it different from a linked list is that instead of next we have left and right .

Let's build the tree shown in the diagram at the beginning of the article. The top node has a value of 1. Next, we set the left and right nodes until we get the desired tree.

 tree = TreeNode(1) tree.left = TreeNode(2) tree.right = TreeNode(3) tree.left.left = TreeNode(4) tree.left.right = TreeNode(5) 

Binary tree traversal

So you've built a tree and now you're probably wondering how to see it. There is no command to display the entire tree, however we can traverse it by visiting each node. But in what order should the nodes be output?

The simplest in implementation of the bypass of the tree is straight ( Pre-Order ) , reverse ( Post-Order ) and Center ( In-Order ) . You may also hear terms such as breadth-first search and depth-first search, but their implementation is more complicated, we will look at it sometime later.

So what are the three workarounds above? Let's look at our tree again.

In forward traversal we visit parent nodes before visiting child nodes. In the case of our tree, we will traverse the nodes in this order: 1, 2, 4, 5, 3.

Reverse traversal of the binary tree is when you first visit the descendant nodes and then their parent nodes. In our case, the order of visiting nodes in the reverse traversal will be: 4, 5, 2, 3, 1.

With centered traversal of , we visit all nodes from left to right. The centered traversal of our tree is visiting nodes 4, 2, 5, 1, 3.

Let's write traversal methods for our binary tree.

Pre-Order

Let's start by defining the method pre_order() . Our method takes one argument - the root node (located above all).

 def pre_order(node): pass 

Next, we need to check if this node exists. You might argue that it would be better to check for the existence of this node's children before visiting them. But for this we would have to write two if-statements, otherwise we will get by with one.

 def pre_order(node): if node: pass 

Writing a bypass is easy. Forward traversal is visiting the parent node and then each of its children. We will "visit" the parent node by displaying it, and then "traverse" the children by calling this method recursively for each child node.

 # Prints a parent to all its children def pre_order(node): if node: print(node. value) pre_order(node.left) pre_order(node.right) 

Simple, right? We can test this code by traversing the tree we built earlier.

 pre_order(tree) 

Result:

 1 2 four 5 3 

Excellent.

Post-Order

We pass to the reverse bypass. You may think that we need to write another method for this, but in fact we only need to change one line in the previous one.

Instead of "visiting" the parent node and then "traversing" the children, we first "traverse" the children and then "visit" the parent node. That is, we will simply move print to the last line! Don't forget to change the method name to post_order() in all calls.

 # Displays the children and then the parent def post_order(node): if node: post_order(node.left) post_order(node ​​right) print(node.value) 

In the output we get:

 4 5 2 3 1 

Each child node was visited before its parent was visited.

In-Order

Finally, we write the centered traversal method. How do we traverse the left node, then the parent, and then the right? Again, we need to move the print!

 # prints the left child, then the parent, then the right child def in_order(node): if node: in_order(node. 

Learn more