# How to make a math tree diagram

## Probability Tree Diagrams Explained! — Mashup Math

### This quick introduction will teach you how to calculate probabilities using tree diagrams.

Figuring out probabilities in math can be confusing, especially since there are many rules and procedures involved. Luckily, there is a visual tool called a probability tree diagram that you can use to organize your thinking and make calculating probabilities much easier.

At first glance, a probability tree diagram may seem complicated, but this page will teach you how to read a tree diagram and how to use them to calculate probabilities in a simple way. Follow along step-by-step and you will soon become a master of reading and creating probability tree diagrams.

What is a Probability Tree Diagram?

Example 01: Probability of Tossing a Coin Once

Let’s start with a common probability event: flipping a coin that has heads on one side and tails on the other:

This simple probability tree diagram has two branches: one for each possible outcome heads or tails. Notice that the outcome is located at the end-point of a branch (this is where a tree diagram ends).

Also, notice that the probability of each outcome occurring is written as a decimal or a fraction on each branch. In this case, the probability for either outcome (flipping a coin and getting heads or tails) is fifty-fifty, which is 0.5 or 1/2.

Example 02: Probability of Tossing a Coin Twice

Now, let’s look at a probability tree diagram for flipping a coin twice!

Notice that this tree diagram is portraying two consecutive events (the first flip and the second flip), so there is a second set of branches.

And since there are four possible outcomes, there is a 0. 25 (or ¼) probability of each outcome occurring. So, for example, there is a 0.25 probability of getting heads twice in a row.

How to Find Probability

The rule for finding the probability of a particular event in a probability tree diagram occurring is to multiply the probabilities of the corresponding branches.

For example, to prove that there is 0.25 probability of getting two heads in a row, you would multiply 0.5 x 0.5 (since the probability of getting a heads on the first flip is 0.5 and the probability of getting heads on the second flip is also 0.5).

0.5 x 0.5 = 0.25

Repeat this process on the other three outcomes as follows, and then add all of the outcome probabilities together as follows:

Note that the sum of the probabilities of all of the outcomes should always equal one.

From this point, you can use your probability tree diagram to draw several conclusions such as:

·       The probability of getting heads first and tails second is 0. 5x0.5 = 0.25

·       The probability of getting at least one tails from two consecutive flips is 0.25 + 0.25 + 0.25 = 0.75

·       The probability of getting both a heads and a tails is 0.25 + 0.25 = 0.5

Independent Events and Dependent Events

What is an independent event?

Notice that, in the coin toss tree diagram example, the outcome of each coin flip is independent of the outcome of the previous toss. That means that the outcome of the first toss had no effect on the probability of the outcome of the second toss. This situation is known as an independent event.

What is a dependent event?

Unlike an independent event, a dependent event is an outcome that depends on the event that happened before it. These kinds of situations are a bit trickier when it comes to calculating probability, but you can still use a probability tree diagram to help you.

Let’s take a look at an example of how you can use a tree diagram to calculate probabilities when dependent events are involved.

How to Make a Tree Diagram

Example 03:

Greg is a baseball pitcher who throws two kinds of pitches, a fastball, and a knuckleball. The probability of throwing a strike is different for each pitch:

·       The probability of throwing a fastball for a strike is 0.6

·       The probability of throwing a knuckleball for a strike 0.2

Greg throws fastballs more frequently than he throws knuckleballs. On average, for every 10 pitches he throws, 7 of them are fastballs (0.7 probability) and 3 of them are knuckleballs (0.3 probability).

So, what is the probability that the pitcher will throw a strike on any given pitch?

To find the probability that Greg will throw a strike, start by drawing a tree diagram that shows the probability that he will throw a fastball or a knuckleball

The probability of Greg throwing a fastball is 0. 7 and the probability of him throwing a knuckleball is 0.3. Notice that the sum of the probabilities of the outcomes is 1 because 0.7 + 0.3 is 1.00.

Next, add branches for each pitch to show the probability for each pitch being a strike, starting with the fastball:

Remember that the probability of Greg throwing a fastball for a strike is 0.6, so the probability of him not throwing it for a strike is 0.4 (since 0.6 + 0.4 = 1.00)

Repeat this process for the knuckleball:

Remember that the probability of Greg throwing a knuckleball for a strike is 0.2, so the probability of him not throwing it for a strike is 0.8 (since 0.2 + 0.8 = 1.00)

Now that the probability tree diagram has been completed, you can perform your outcome calculations. Remember that the sum of the probability outcomes has to equal one:

Since you are trying to figure out the probability that Greg will throw a strike on any given pitch, you have to focus on the outcomes that result in him throwing a strike: fastball for a strike or knuckleball for a strike:

The last step is to add the strike outcome probabilities together:

0. 42 + 0.06 = 0.48

The probability of Greg throwing a strike is 0.48 or 48%.

Probability Tree Diagrams: Key Takeaways

·      A probability tree diagram is a handy visual tool that you can use to calculate probabilities for both dependent and independent events.

·      To calculate probability outcomes, multiply the probability values of the connected branches.

·      To calculate the probability of multiple outcomes, add the probabilities together.

·      The probability of all possible outcomes should always equal one. If you get any other value, go back and check for mistakes.

### Check out the animated video lessons and keep

Check out the video lessons below to learn more about how to use tree diagrams and calculating probability in math:

By Anthony Persico

Anthony is the content crafter and head educator for YouTube's MashUp Math. You can often find me happily developing animated math lessons to share on my YouTube channel . Or spending way too much time at the gym or playing on my phone.

## Probability Tree Diagrams: Examples, How to Draw

Probability > How to Use a Probability Tree

Probability trees are useful for calculating combined probabilities for sequences of events. It helps you to map out the probabilities of many possibilities graphically, without the use of complicated probability formulas.

Watch the video for an example.

How to draw a probability tree

Why Use a probability tree?
Sometimes you don’t know whether to multiply or add probabilities. A probability tree makes it easier to figure out when to add and when to multiply. Plus, seeing a graph of your problem, as opposed to a bunch of equations and numbers on a sheet of paper, can help you see the problem more clearly.

### Parts of a Probability Tree Diagram

A probability tree has two main parts: the branches and the ends(sometimes called leaves). The probability of each branch is generally written on the branches, while the outcome is written on the ends of the branches.

Probability Trees make the question of whether to multiply or add probabilities simple: multiply along the branches and add probabilities down the columns. In the following example (from Yale University), you can see how adding the far right column adds up to 1, which is what we would expect the sum total of all probabilities to be:
.9860 + 0.0040 + 0.0001 + 0.0099 = 1

### Real Life Uses

Probability trees aren’t just a theoretical tool used the in the classroom—they are used by scientists and statisticians in many branches of science, research and government. For example, the following tree was used by the Federal government as part of an early warning program to assess the risk of more eruptions on Mount Pinatubo, an active volcano in the Philippines.
Image: USGS.

### How to Use a Probability Tree or Decision Tree

Sometimes, you’ll be faced with a probability question that just doesn’t have a simple solution. Drawing a probability tree (or tree diagram) is a way for you to visually see all of the possible choices, and to avoid making mathematical errors. This how to will show you the step-by-step process of using a decision tree.
How to Use a Probability Tree: Steps
Example question: An airplane manufacturer has three factories A B and C which produce 50%, 25%, and 25%, respectively, of a particular airplane. Seventy percent of the airplanes produced in factory A are passenger airplanes, 25% of those produced in factory B are passenger airplanes, and 25% of the airplanes produced in factory C are passenger airplanes. If an airplane produced by the manufacturer is selected at random, calculate the probability the airplane will be a passenger plane.

Step 1:Draw lines to represent the first set of options in the question (in our case, 3 factories). Label them: Our question lists A B and C so that’s what we’ll use here.

Step 2: Convert the percentages to decimals, and place those on the appropriate branch in the diagram. For our example, 50% = 0.5, and 25% = 0.25.

Step 3: Draw the next set of branches. In our case, we were told that 70% of factory A’s output was passenger. Converting to decimals, we have 0.7 P (“P” is just my own shorthand here for “Passenger”) and 0.3 NP (“NP” = “Not Passenger”).

Step 4:Repeat step 3 for as many branches as you are given.

Step 5: Multiply the probabilities of the first branch that produces the desired result together. In our case, we want to know about the production of passenger planes, so we choose the first branch that leads to P.

Step 6: Multiply the remaining branches that give the desired result. In our example there are two more branches that can lead to P.

Step 6: Add up all of the probabilities you calculated in steps 5 and 6. In our example, we had:

.35 + .0625 + .0625 = .475

That’s it!

### Example 2

Example Question: If you toss a coin three times, what is the probability of getting 3 heads?

The first step is to figure out your probability of getting a heads by tossing the coin once. The probability is 0.5 (you have a 50% probability of tossing a heads and 50% probability of tossing a tails). Those probabilities are represented at the ends of each branch.

Next, add two more branches to each branch to represent the second coin toss. The probability of getting two heads is shown by the red arrow. To get the probability, multiply the branches:
0.5 * 0.5 = 0.25 (25%).
This makes sense because your possible results for one head and one tails is HH, HT, TT, or TH (each combination has a 25% probability).

Finally, add a third row (because we were trying to find the probability of throwing 3 heads). Multiplying across the branches for HHH we get:
0.5 * 0.5 * 0.5 = 0.125, or 12.5%.

In most cases, you will multiply across the branches to get probabilities. However, you may also want to add vertically to get probabilities. For example, if we wanted to find out our probability of getting HHH OR TTT, we would first calculated the probabilities for each (0.125) and then we would add both those probabilities: 0.125 + 0.125 = 0.250.

Tip: You can check you drew the tree correctly by adding vertically: all the probabilities vertically should add up to 1.

Next: Tree Diagram Real Life Example

### References

Punongbayan, R. et al. USGS Repository: Eruption Hazard Assessments and Warnings.

CITE THIS AS:
Stephanie Glen. "Probability Tree Diagrams: Examples, How to Draw" From StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/how-to-use-a-probability-tree-for-probability-questions/

---------------------------------------------------------------------------

Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. Your first 30 minutes with a Chegg tutor is free!

## Decision trees - CART mathematical apparatus. Part 1

The general principle of constructing decision trees was given in the article "Decision Trees - Basic Principles of Operation".

This article will focus on the CART algorithm. CART, short for Classification And Regression Tree, is a binary decision tree algorithm first published by Briman et al. in 1984 [1]. The algorithm is designed to solve classification and regression problems. There are also several modified versions - IndCART and DB-CART algorithms. The IndCART algorithm, which is part of the Ind package, differs from CART in using a different way of handling missing values, does not perform the regression part of the CART algorithm, and has different cutoff parameters. The DB-CART algorithm is based on the following idea: instead of using the training dataset to determine splits, use it to estimate the distribution of input and output values, and then use this estimate to determine splits. DB, respectively, means - "distribution based". This idea is claimed to result in a significant reduction in classification error compared to standard tree building methods. The main differences between the CART algorithm and the ID3 family algorithms are:

• binary representation of the decision tree;
• partition quality evaluation function;
• tree pruning mechanism;
• algorithm for handling missing values;
• building regression trees.

### Binary representation of the decision tree

In the CART algorithm, each decision tree node has two children. At each step of building the tree, the rule formed in the node divides the given set of examples (training set) into two parts - the part in which the rule is true (child - right) and the part in which the rule is not true (child - left). To select the optimal rule, the function of estimating the quality of the partition is used.

Proposed algorithmic solution.

It is quite obvious. Each node (structure or class) must have references to two descendants Left and Right - similar structures. The node must also contain a rule identifier (for more details on the rules, see below), describe in some way the right side of the rule, contain information about the number or ratio of examples of each class of the training sample "passed" through the node, and have the sign of a terminal node - a leaf. These are the minimum requirements for the structure (class) of a tree node.

### Selection of the final tree

So, we have a sequence of trees, we need to choose the best tree from it. The one we will use in the future. The most obvious is the choice of the final tree through testing on a test sample. The tree with the lowest classification error is the best tree. However, this is not the only possible way.

For a continuation of the description, see the article "Description of the CART algorithm. Part 2".

Literature

• L. Breiman, J.H. Friedman, R.A. Olshen, and C.T. Stone. Classification and Regression Trees. Wadsworth, Belmont, California, 1984.
• J.R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, California, 1993.
• Machine Learning, Neural and Statistical Classification. Editors: D. Michie, D.J. Spiegelhalter, C.C. Taylor, 02/17/1994.

## Decision trees - C4.5 mathematical apparatus | Part 1

Deconstructing decision tree learning algorithm C4.5: requirements for training dataset and classification of new objects.

This article will consider the mathematical apparatus of the learning algorithm for decision trees C4.5. The algorithm was proposed by R. Quinlan as an improved version of the ID3 algorithm, which added the ability to work with missing data. The basic building ideas were described in the article Decision Trees: General Principles.

Before proceeding to the description of the algorithm, let's define the mandatory requirements for the structure of the training data set and directly to the data itself, under which the C4.5 algorithm will work and give correct results.

1. Data must be structured, i.e. be a table whose columns are attributes (features) that describe a subject area or business process, and rows are training examples that are classified objects for which a class label is given (since the algorithm uses supervised learning). All rows must contain the same set of attributes.
2. One of the attributes must be specified as the target, i.e. class attribute. Each training example must have a class label. Input attributes can be either continuous or discrete, while a class attribute can only be discrete, i.e. take a finite number of unique values.
3. Each instance of the training set must uniquely refer to the corresponding class. Probabilistic estimates of the degree of belonging of examples to a class are not used (such a formulation refers to fuzzy decision trees). The number of classes in the training set should be much less than the number of training examples.

### Description of the learning algorithm

Let a learning set S be given, containing m attributes and n examples. For the set S, k classes C_1,C_2,…C_k are defined. The task is to build a hierarchical classification model in the form of a decision tree based on the training set S.

The decision tree is built from top to bottom - from the root node to the leaves.

At the first step of training, an "empty" tree is formed, which consists only of the root node containing the entire training set. It is required to split the root node into subsets from which descendant nodes will be formed. To do this, one of the attributes is selected and rules are formed that break the training set into subsets, the number of which is equal to the number p of unique attribute values.

As a result of splitting, p (according to the number of attribute values) subsets are obtained and, accordingly, p descendants of the root node are formed, each of which is assigned its own subset. This procedure is then recursively applied to all subsets until the training stop condition is met.

The main problem in training decision trees is choosing the attribute that will provide the best split (according to some measure of quality) at the current node. Although some decision tree learning algorithms allow each attribute to be used only once, in our case this restriction will not apply - each attribute can be used for splitting an arbitrary number of times.

Let a partitioning rule be applied to the training set, which uses the attribute A, which takes p values ​​a_1,a_2,…,a_p. As a result, p subsets S_1,S_2,…,S_p will be created, where examples will be distributed in which attribute A takes the corresponding value.

This raises the question: is the split by the selected attribute the best, or could we get a better split by choosing another attribute? To answer this question, we use information about the number of examples of all classes in the training set and in each resulting subset. 9{k} \frac{N(C_jS)}{N(S)}\text{Info}(S_i), (3)

branching attribute, you can use the following criterion:

\text{Gain}(A)=\text{Info}(S)- \text{Info}_A(S), (4)

gain - increase, increase). Then the criterion value is calculated for all potential partition attributes, and the attribute that maximizes it is selected.

The described procedure is applied to subsets S_i and further, until the values ​​of the criterion cease to increase significantly with new partitions or another stop condition is met.

If during tree construction an “empty” node is formed, where no example has fallen, then it is converted into a leaf that is associated with the class most often found in the immediate ancestor of the node.

The information gain criterion is based on the property of entropy, which means that it is the largest when all classes are equally probable, i.e. the class choice is maximally undefined, and is 0 when all instances in the node belong to the same class (in this case, the ratio under the logarithm is 1 and its value is 0). Thus, the increase in information reflects an increase in the class homogeneity of the resulting nodes.

The described procedure applies to discrete attributes. In the case of continuous attributes, the algorithm works a little differently. The threshold against which all values ​​will be compared is selected. Let the numeric attribute X take on a finite set of values ​​{x_1,x_2,…,x_p }. By ordering the examples in ascending order of attribute values, we get that any value between x_i and x_{i+1} divides all examples into two subsets. The first subset will contain the attribute values ​​x_1,x_2,…,x_i, and the second one will contain {x_{i+1},x_{i+2},…,x_p}.

Then the average can be chosen as the threshold: {T_1,T_2,…,T_{n-1}}. Consistently applying formulas (2), (3) and (4) to all potential thresholds, we choose the one that gives the maximum value according to criterion (4). Then, this value is compared with the criterion value (4) calculated for other attributes. If this value is the largest of all attributes, then it is selected as the threshold for testing.

It should be noted that all numerical tests are binary, i.e. divide the tree node into two branches.

### Practical use of decision trees

After the decision tree is built on the training data set and a decision is made about its performance (the percentage of correctly recognized examples on the training set is quite large), you can start practical work with the tree - classifying new objects.

The new object to be classified first enters the root node of the tree, and then moves through the nodes, each of which checks if the attribute value matches the rule in this node, after which the object is redirected to one of the descendant nodes.