How to make a syntax tree

Drawing Sentence Syntax Trees – Amy Reynolds

Now that you’ve learned about X-bar structure and determining constituency, you should be able to draw syntax trees. However, there are all sorts of different types of phrases and ways that they can connect, and you have a sentence you need to draw a tree for. What to do!? This page is designed to help guide you through drawing syntactic trees.

We will walk through how to make trees for the following sentences:

Amy bakes pies. 

Amy bakes pies in the summer.

Amy bakes pies for her friends.

Amy thinks that she will bake pies.

Step 1: The IP and CP phrases
There are two Phrases that are the basis of every clause: the Complementizer Phrase (CP) and the Inflectional Phrase (IP). Since we know that every sentence/clause must have these two phrases, we will start off our tree by drawing a beginning CP and IP structure. These two are assumed to combine the same way every time: 


As noted above, we automatically expect the head of the Inflection Phrase (IP) to contain the overall tense of the clause, denoted here by the (+/-) PAST feature. Other words that can appear in the head position of the IP include modal (e.g. could, should, would, might, etc.) auxiliaries. However, regardless of whether there is an actual word within the sentence, you should always show the complete IP structure, especially including I.

Step 2: Adding the Subject and Predicate

Within the IP structure, there are specific places which consistently are reserved for different parts of the sentence. Specifically, the specifier of IP is the subject of the clause (and hence, is always an NP), and the complement of IP is the predicate (i.e. verb) of the clause. This is shown in the following Tree structure: 


Looking at our sentence Amy bakes pies, we see that Amy is the subject and bakes is the predicate. This gives us the following tree structure so far: 


Note that now that we are beginning to actually use the structure for a sentence, I contains the feature -PAST because the sentence is in the present tense. Since there is no auxiliary present, just the I with the Tense feature is shown. Be sure to include all three levels of each phrase in your work — they are important for showing if you think that something is a specifier, complement, modifier, or head. Also, be sure that your Heads match up with the phrase that you are assuming that they head (e.g. a phrase cannot be the head of another phrase).

Step 3: Add other specifiers, complements and modifiers to the phrases

Now that we have the subject and predicate inserted into the structure, it is important to next consider what happens to the rest of the words and phrases left in the sentence. For instance, for the sentence Amy bakes pies, we have Amy and bakes covered, but how does pies attach to the sentence?

Consider the status of the predicate within the sentence. In this instance, the predicate is a transitive verb, which means that it requires a direct object. You must bake something. Pies, then is a complement to the VP because it is required by the head of the VP. Specifically, we know that pies is an NP, because it can be replaced with other nouns as well. This leads us to the following structure:

Now consider if the sentence was Amy bakes pies in the summer. We know that Amy, bakes, and pies should all appear in the same positions as they do in the tree above. But what about the phrase in the summer?

We would say that in the summer is a modifier of the VP. Why? Here, it is not adding additional information about the pies, instead, it is telling us when the baking is taking place. We specifically know that it must be a modifier rather than a complement, because the verb does not require that additional information about the baking — not like it requires the direct object pies. Since in the summer is a modifier of the VP, we add another higher up V’ node, so that it can be a sister to the lower V’. Hence, we get the following structure:

Note that within the PP, NP is a complement because it is required by the head in.

Now if we changed the sentence to Amy bakes pies for her friends, should the PP for her friends be in the same location? There are two possibilities here: either Amy is baking for her friends, and what she is baking is pies; or there are pies for her friends that Amy is baking. In the first instance, it is the action (baking), that is being modified, and for her friends in that instance would be a modifier of the VP, as in the summer was in the structure above. In the second instance, the object that Amy is baking is pies for her friends. If that were the case,  for her friends would not be a modifier of the VP and instead would be a modifier of the NP, as shown in the structure below:

The slight differences in meaning between the two possible structures of the same sentence can be captured if we think about corresponding questions that could be asked. If you asked What does Amy bake for her friends? (where the PP is modifying the VP), an appropriate answer could be Pies, not cupcakes, where you are answering simply with nouns, no additional phrases added (because nothing else is branching from that NP). On the other hand, if you asked What does Amy bake? for this sentence, an appropriate answer could be Pies for her friends, not cupcakes for her family, which shows that the prepositional phrases are acting as modifiers distinguishing who the objects in question are for, not who she is baking for. In that instance, the PP would be a modifier of the NP, not the VP.

Step 4: Add CPs if there are any

Within your sentence, there may be multiple clauses. If that is the case, then you can expect a Complementizer Phrase to show up. The basic structure for a CP that occurs lower in the sentences’ tree is exactly like that CP that contains the entire sentence, described above. There are two types of CPs that can occur within a larger CP phrase: CPs for complement clauses, and CPs for relative clauses. Depending on the CP type, it will attach to the larger sentence in different ways. Because CPs for relative clauses show movement, they will be covered in the Drawing Question Syntax Trees. For now, we are going to cover the CPs that are complement clauses. How these in particular attach to the larger tree should be easy to remember: the CP for a complement clause should always occur as a complement to the phrase it is attaching to. Let’s consider the sentence Amy thinks that she will bake her pies. We already know the basic structure for Amy thinks. What about that she will bake her pies? This is what we call a complement clause, which contains a ‘mini-sentence’ of sorts — this same clause could stand alone as the sentence She will bake her pies.

In this instance, the complement clause is required by the verb thinks, which reinforces the fact that the CP is a complement to the VP in this instance, giving us the structure below: 


Notice that the structure of that she will bake her pies is exactly like that of what we would make for the sentence she will bake her pies, except that the word that introduces the complement clause (that) occupies the head position of the CP. Overall, the entire CP attaches as a complement to the VP contained within the higher CP. In fact, it may be handy to remember that in English, at least, a complement clause will always attach to a VP as a complement — complement clauses do not attach to NPs. 

Now that you understand how to draw syntax trees for sentences, you are ready to learn how to draw trees for questions, as well.





Syntax Trees | Abstract Syntax Trees

Compiler Design

Syntax Trees-


Syntax trees are abstract or compact representation of parse trees.

They are also called as Abstract Syntax Trees.





Also Read- Parse Trees


Parse Trees Vs Syntax Trees-


Parse Tree Syntax Tree
Parse tree is a graphical representation of the replacement process in a derivation. Syntax tree is the compact form of a parse tree.
Each interior node represents a grammar rule.

Each leaf node represents a terminal.

Each interior node represents an operator.

Each leaf node represents an operand.

Parse trees provide every characteristic information from the real syntax. Syntax trees do not provide every characteristic information from the real syntax.
Parse trees are comparatively less dense than syntax trees. Syntax trees are comparatively more dense than parse trees.




Syntax trees are called as Abstract Syntax Trees because-

  • They are abstract representation of the parse trees.
  • They do not provide every characteristic information from the real syntax.
  • For example- no rule nodes, no parenthesis etc.






Considering the following grammar-

E → E + T | T

T → T x F | F

F → ( E ) | id


Generate the following for the string id + id x id

  1. Parse tree
  2. Syntax tree
  3. Directed Acyclic Graph (DAG)




Parse Tree-



Syntax Tree-



Directed Acyclic Graph-



Also Read- Directed Acyclic Graphs




Construct a syntax tree for the following arithmetic expression-

( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ))






We convert the given arithmetic expression into a postfix expression as-


( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ) )

ab+ * ( c – d ) + ( ( e / f ) * ( a + b ) )

ab+ * cd- + ( ( e / f ) * ( a + b ) )

ab+ * cd- + ( ef/ * ( a + b ) )

ab+ * cd- + ( ef/ * ab+ )

ab+ * cd- + ef/ab+*

ab+cd-* + ef/ab+*





We draw a syntax tree for the above postfix expression.


Steps Involved


Start pushing the symbols of the postfix expression into the stack one by one.

When an operand is encountered,

  • Push it into the stack.

When an operator is encountered

  • Push it into the stack.
  • Pop the operator and the two symbols below it from the stack.
  • Perform the operation on the two operands using the operator you have in hand.
  • Push the result back into the stack.

Continue in the similar manner and draw the syntax tree simultaneously.



The required syntax tree is-



To gain better understanding about Syntax Trees,

Watch this Video Lecture


Download Handwritten Notes Here-


Next Article- Shift Reduce Parsing


Get more notes and other study material of Compiler Design.

Watch video lectures by visiting our YouTube channel LearnVidFun.


Gate Vidyalay © 2020 Managed by MetaDiv Systems

Lecture G2. Abstract Syntax Tree (AST)

Home / Programming → Create a programming language → Lecture G2. Abstract Syntax Tree (AST)


Let's start right away with an example.

Take the arithmetic expression 1 + (2 * 3) .

Abstract syntax tree ( abstract syntax tree, AST ) for it would be:

Abstract syntax tree expression 1 + (2 * 3) .

Let's explain the phrase "abstract syntax tree" in parts.

Wood - see illustration.

“Syntactic” means that it reflects the structure (syntax) of the original expression. Namely, tree nodes correspond to operations (addition and multiplication), and leaves correspond to operands (numbers). Each tree node reflects a specific operation of the original expression (and vice versa).

"Abstract" - the tree is "cleared" of the auxiliary characters used in the source string (for this example, there are no brackets).

So, an abstract syntax tree is a tree that reflects the structural relationships between the essential elements of the original expression (the line of the language under consideration), but does not reflect the auxiliary language tools.

AST and natural language

Simplifying somewhat with an abstract syntax tree


"Abstract Syntax Tree" of a sentence in Russian

The predicate (that is, the verb) is in the node. Its referents (in mathematics we called them “operands”, and in linguistics “actants”: object and subject) depart from the predicate in leaves.

AST vs parse tree

It is important not to confuse "abstract syntax tree" and "parse tree" ( parse tree ).

Actually, they have one difference: the parsing tree is “concrete”, that is, it contains all auxiliary language tools.

A abstract (= cleaned) syntax tree contains that and only that which is essential for understanding the sentence (in programming languages, "understanding" means computation/execution).

For example, in Ruby language you can get "parse tree" with the following command:

 require 'ripper' code = "1 + (2 * 3)" pp Ripper.sexp(code) 

At the output we get the following array, which is a prefix expression ( S-expression ):

 [:program, [[:binary, [:@int, "1", [1, 0]], :+, [:paren, [[:binary, [:@int, "2", [1, 5]], :*, [:@int, "3", [1, 9]]]]]]]] 

In the form of a tree, this is what it looks like:

Ruby-specific parse tree expressions 1 + (2 * 3) .

Here we are interested in the following: parse tree reflects the fact that (2 * 3) was taken in brackets (we got the node :paren ).

Is this information needed to evaluate the expression? No, it is not needed, since in the tree it is already clear that the multiplication is performed first (it is at a lower level of the tree), and then the addition (which uses the multiplication result in the right branch).

The logic is as follows: the details of the language means of recording fell into the tree. So it's not an "abstract" tree. So it's not AST , but parse tree .

For AST see above.

How to depict AST

The sentence “mother washed the frame” can be written with chalk on a blackboard, it can be printed in a book, it can be written in a notebook with a pen. It will be the same offer.

In the same way, AST can be drawn as a picture (as here), can be written on paper (or in text) linearly as a prefix expression, or can be saved to a hard disk as a sequence of bytes.

And it is possible in the form of a structure in some programming language, for example:

 ["soap", ["mother", "frame"]] 

It does not create a conceptual difference. Without loss of generality, we will call all such options “abstract syntax trees”, although this may not be a literal drawing of a tree, but an array (data structure inside the parser development language), linear text, or a sequence of electron charges on an SSD disk.

AST and parsing

At the parsing stage, our task is to get exactly AST from the expression (from the program) (for further processing and / or calculation).

At the same time, parsing algorithms and parser auto-generation tools create, initially, parse tree . It turns out that there is an additional task to specifically mark those nodes parse tree that should fall into AST .

Consider an example. Let the following grammar be given:

  ::=  '('  ')'  ::=  | ε  ::= number ','  | number  ::= string 

Where string is the string that matches the regular expression [a-z] .

This grammar matches strings like f(1) , multiply(2,5) , now() , etc.

Consider the string multiply(2,5) . The parse tree corresponding to this string is:

The parse tree for the string “multiply(2,5)” in the given grammar0014 ) uniquely displays which inference rules were applied and in what order and which terminals of the source string were mapped.

In the recursive downward parsing method, the blue nodes correspond to calling a function that processes the specified non-terminal, and the white nodes correspond to “removing” ( shift ) a character from the input tape. Traversing the tree from top to bottom and from left to right corresponds to the order of operations for this parsing method.

The question arises: what the hell is a goat boyan why do we need such an excess amount of information, all the “offal” of parsing, if we just want to “understand” what we see in front of us “a function call with a list of such and such arguments”?

We want AST . For example, something like this:

Abstract syntax tree for the string “multiply(2,5)” of the selected structure AST.

Transform parse tree to AST

To transform parse tree into a pre-designed version AST it is required to perform a number of typical operations:

  • rename some nodes;
  • remove "service" nodes;
  • collapse the recursed subtree into a linear list/array.

Consider, for example, the following grammar (simplified assignment operator):

  ::= letter "=" number 

Where letter is a letter of the Latin alphabet, number is a number.

This grammar will correspond to strings like x=5 , y=21 and so on. ] , ["y", 21] etc. (In this case, when we say AST , we mean, first of all, the internal data structure of the development language, and not its graphical representation.)

Then we can add code (in curly braces) to this rule in the following manner:0003

  ::= letter "=" number ; { [$1, $3] } 

In automatic parser generation systems, a similar code gives the following instruction to the parser: “when you process the given inference rule (build the corresponding parse tree ), take the first character ( letter ) and the third ( number ) and execute the specified code (combine into an array of two elements) - this will be the AST.

← Lecture G1. Backus Normal Form (BNF) Lecture G3. Grammar classes [in preparation] →

Ask a question or report a typo

Building a tree of operations on a parse tree

  • Main
  • ->
  • State exam in the specialty POVTAS
  • ->
  • Theory of programming languages ​​and translation methods

The output of the CF-grammar recognizer of the input language is a sequence of grammar rules applied to construct the input string. Based on the found sequence, knowing the type of the recognizer, you can build an inference chain or an inference tree. In this case, the output tree acts as a parse tree and is the output of the parser in the compiler.
However, neither the output chain nor the parse tree is the goal of the compiler. The output tree contains a lot of redundant information that is not required for further work of the compiler. This information includes all non-terminal symbols contained in the nodes of the tree - after the tree is built, they do not carry any semantic load and are of no interest for further work.
For a complete understanding of the type and structure of the found and parsed syntactic construction of the input language, in principle, it is sufficient to know the sequence of numbers of the grammar rules used to construct it. However, the form of presentation of this sufficient information may be different depending on the implementation of the compiler itself, and on the compilation phase. This form is drunk with the internal representation of the program.
In a syntax tree, internal nodes (vertices) correspond to operations, and leaves represent operands. As a rule, the leaves of the syntactic tree are slapped with entries in the identifier table. The structure of the syntax tree reflects the syntax of the programming language in which our capa source program.
Syntax trees can be built by the compiler for any part of the input program. The syntax tree does not always have to correspond to a code fragment of the resulting program - for example, it is possible to build syntax trees for the declarative part of the language. In this case, the operations available in the tree do not require the generation of object code, but carry information about the actions that the compiler itself must perform on the corresponding elements. In the case when the syntax tree corresponds to a certain sequence of operations, which entails the generation of a fragment of the object code, one speaks of an operation tree.
The operation tree can be built directly from the output tree generated by the parser. To do this, it is enough to exclude chains of non-terminal symbols from the output tree, as well as nodes that do not carry a semantic load during code generation. An example of such nodes can be various brackets that change the order of execution of the operation and operators, but after the tree is built, they do not carry any semantic load, since no object code corresponds to them.
Which node in the tree is an operation and which is an operand cannot be determined in any way from the grammar describing the syntax of the input language. Also, it does not follow from anywhere which operations should correspond to the object code in the resulting program, and which should not. All this is determined only on the basis of semantics - "meaning" - the language of the input program. Therefore, only the compiler developer can clearly define how operands and operations themselves should be distinguished when building an operation tree, as well as which operations are semantically insignificant for generating object code.
The semantic parse tree transformation algorithm and the operation tree can be represented as follows.
Step 1. If the tree no longer contains nodes marked with non-terminal symbols, then the algorithm is terminated; otherwise, go to step 2
Step. 2. Select the leftmost deren node labeled with a non-terminal grammar symbol and make it current.

Learn more