DM546 - Compiler Construction

Home ☰

Weekly Notes

The weekly notes are listed in reverse chronological order. They can be shown and printed separately.

Note, DM546, fall 2019

Note 11, DM546, fall 2019

Exercises March 22

Cover the most important exercises left over from previous sessions. Then briefly talk about the discussion points below:
Consider possible targets for optimization other than time and space and describe plausible goals.
Try to find patterns which could be used in a high-level peep-hole optimization of for instance C-programs. Are there any possible dangers in writing a high-level optimizer for C-programs where simple but inefficient code is replaced by more complicated but also more efficient code?
Take a look at code generated by gcc (compile with gcc -S) for a few well-known and relatively short source files, such as factorial, for instance. Try to investigate the effect of the three options -O1, -O2, and -O3 with regards to code optimization.
Find more peephole patterns in the categories from the lecture or from new catagories that you come up with.

Note, DM546, fall 2019

Note 10, DM546, fall 2019

Lecture March 14

Introduction to optimization.
Peep-hole optimization.
General information about the oral exam.
Discussion of exam questions.
Course evaluation.

The core of the optimization techniques covered in the lecture is described in Supplementary Notes for DM546.

Exercises March 19

Appel 3.4, 3.5, 3.7, 3.14.

Note, DM546, fall 2019

Note 9, DM546, fall 2019

Lecture March 13

Top-down parsing.

Background material: Appel Chapter 3.

Exercises March 14

Appel 13.1.
Simulate the example from Transparency 16 (the book page 283) by hand from start to end. Note that there is potential problem in the book (and therefore also on the transparency) in that the first field is used for a pointer to to-space instead of the first pointer-field. Why is that a problem? In fact, records without pointers present a small problem here. What can you do about that?
Appel 13.2.
Appel 13.4.

Note, DM546, fall 2019

Note 8, DM546, fall 2019

Lecture March 5

Garbage collection.

Background material: Appel Chapter 13.

Exercises March 8

Appel 10.1, 10.4, 10.5.
Estimate the asymptotic complexity of finding a fixed point as a function of the number of temporaries and the size of the control flow graph (with the lecture transparencies as the starting point). Consider the representation of sets carefully. Next, consider the complexity of building the conflict graph and the complexity of coloring by simplification, including which supporting data structures to employ.
Try to formulate how many temporary memory (or register) locations one needs to evaluate an expression. For instance a+b-c+d only requires one, whereas (a+b-c+d)*(e-f) requires two. What about (a-b)*c - d/(e+f-g*h)? In general, how can we compute how many are necessary?
We have seen in the lecture that it could be good to spill vertices with a large degree. Could it have some bad effects?
We have seen in the lecture that it could be good to always use the smallest possible color. Could it have some bad effects?

Note, DM546, fall 2019

Note 7, DM546, fall 2019

Lecture February 26

Liveness analysis and register allocation.

Background material: Appel Chapters 10 and 11. It is recommended that you first skim over the material in the book and read in detail later based on the focus in the lecture.

Supplementary literature: A note on lattices and fixed points (not part of the curriculum).

Exercises March 6

Consider code generation templates for the following constructions:
- for-loops (old Algol/Pascal style): for i:=7 to 42 do SOMETHING.
- for-loops (C style).
- loops with break/continue. The loop is as a starting point infinite and starts with the keyword loop without any condition. Inside the body of the loop, break exits the loop and continue starts from the beginning. For both break and continue, this is independent of where in the loop code you are.
- switch (C) and classic case constructions. The difference is that 'switch' allows for general expressions, whereas 'case' is like 'switch', except that the expression must be a variable name and the variable must be of a type with finite domain, such as char, an enumeration such as 0..99, etc.
- records.
- arrays.
- multi-dimensionel arrays; this is different from arrays of arrays. You must find a layout such that you can efficiently compute the address of A[i,j,k], for instance. Thus, it must be different from following three pointers as you would logically do if you used A[i][j][k].
- conditional expressions in C style: ( exp ? exp : exp ).
What is the semantics of i:= 7; for i:=i+1 to i+2 do {i:=i+3; print i} according to your template above, i.e., what is printed? What are the reasonable behaviors?
What should be the behavior of the following three pieces of C-code:
- x = 3; r = x++ * x++ * x++;
- x = 3; r = ++x * ++x * ++x;
- y = 0; r = y++ + 2*y++ + 3*y++ + 4*y++;
Try it! You might be surprised...
Are there efficiency reasons to restrict switch/case expressions to be simple types and only use values from a small domain?
Some programming languages offer lists with random access (A[i]), append (A.append(42)), and insert (A.insert(pos,val)) which inserts val in between the positions pos and pos+1. Can this be implemented efficiently?

Note, DM546, fall 2019

Note 6, DM546, fall 2019

Lecture February 21

Type checking.
Code generation.

Background material: Appel Chapters 5, 7, 8, 9. In this course, we have a slightly different focus from that in the textbook. As a consequence, we will take a simpler and more direct approach to code generation. Thus, it is recommended that you merely skim over the material in the book before the lecture and read after the lecture to the extent that you need the material. You can find support for my approach at the lecture in the Supplementary Notes for DM546.

Exercises March 1

IMADA's Computer Lab has been reserved and is used for this exercise slot, so show up there.

As announced previously, this exercise session is a continuation of previous. Work on problems from that note, get things actually running, implement variations to get hands-on experience with concepts you find unclear. Among other things, work on getting the example developed at the previous exercises on nested scope and static link running. You can also develop your own examples. For instance, a variation of the factorial example where you place the factorial function in an inner scope of another function, and have factorial modify a variable in its parent scope before returning a value.

Announcements

As also pointed out in the above, the exercises on February 27 and March 1 take place in IMADA's Computer Lab.

Note, DM546, fall 2019

Note 5, DM546, fall 2019

Lecture February 19

A brief recap of Intel Pentium assembler.
Implementing advanced, high-level language constructs in assembler.

Background material: program examples, Appel Chapter 6.

Exercises February 27

IMADA's Computer Lab has been reserved and is used for this exercise slot, so show up there.

These are the exercises for this date and also for the next exercise session. It is important to get things to work in practice as a means of understanding all the underlying principles. Using two exercise sessions, you can try things out, get questions cleared up, and try again.

Assembly programs should be on a file with suffix s (small s) and an executable (a.out) can be produced using gcc, by writing gcc -no-pie file.s on command line; writing ./a.out then executes the program.

Run the factorial example by hand on paper and/or blackboard with the number 3 instead of 5. Keep detailed track of the stack and the content of all registers.
Make a debug function (in assembler) which prints the contents of the registers. It is most convenient, when this debugging feature is used, that as little as possible must be written in the interesting part of the code (to be debuggged), so in this case, you may violate the call conventions.
With the GAS examples as a starting point, implement your favorite O(n²) sorting algorithm (bubble sort, insertion sort or selection sort). From the examples, you can see how to handle start and finish, interfacing the operating system, and how to allocate space for a number of integers.
Place data directly into the code and print the result (using repetitive calls to printf) after the sorting.
With the factorial example as the starting point, implement the computation of the n'th Fibonacci number. Be careful with the stack discipline.
In Appel Chapter 6, there is an illustration of a stack frame. Here, the static link and local variables (which were omitted at the lecture) are placed between the arguments and the return address. One or both of these could in our architecture may be more naturally be placed after the return address instead of before.
With Appel Chapter 6 as a starting point, discuss the following points:
- Where should the static link point when calling a local function in ones own scope and a function further out, respectively. Remember recursive functions in this connection.
- When using a variable defined in a scope further out, which code must the compiler generate in order to find this variable and where does the compiler find the necessary information to find out which code to generate?
- What code must the compiler generate when a function is called such that static link will be set up correctly for the called function and how does the compiler find the necessary information to do this?
Decide yourself on a small program you want to implement in assembler to try things you may have had difficulties with in the above.

Background material: GAS program examples on the literature page.

Announcements

As also pointed out in the above, the exercises on February 27 and March 1 take place in IMADA's Computer Lab.

Note, DM546, fall 2019

Note 4, DM546, fall 2019

Lecture February 14

Abstract Syntax Trees.
Weeding.
Symbol tables.

Background material: Appel Chapters 4 and 5. You can find support for my approach at the lecture with regards to symbol tables in the Supplementary Notes for DM546; see the literature page.

Exercises February 22

Extend tiny expressions with a modulo operator % and an absolute value function |_|. In the action parts associated with the rules in the Bison definition file, write out the expression again, but with enough parentheses that you can verify the parsing result.
Introduce unary minus, -x, into the language as so-called syntactic sugar for 0-x, i.e., you may write -x according to the new grammar, but internally (in the AST), it is just represented as 0-x. Again, verify your result.
Discuss in detail how you can avoid building lists backwards in an LALR(1) parser. Rewrite the lecture example to obtain this.
What happens if you switch neformals and formal in the grammar in order to grammatically avoid the backwards lists problem? Try to study that problem in a simple scenario by considering the two grammars:
1. S → N$ N → f N → f ; N
2. S → N$ N → f N → N ; f
where f is a terminal symbol. Draw the DFAs, translate to table representations, and run them on the input sequence "f ; f ; f ; f $". Can you see any reasons to choose left recursion instead of right recursion for LR (and LALR) parsing?
Appel 5.1 a; do this with basis in you own hash table like implementation from an earlier work note.

With regards to literature and material on flex, bison, and tiny expressions, see the literature page.

Note, DM546, fall 2019

Note 3, DM546, fall 2019

Lecture February 11

The tool bison.
Abstract Syntax Trees.

Background material: Appel Chapters 4, Bison documentation. With regards to literature and material on bison, see the literature page.

Exercises February 15

Appel 3.1, 3.3 a-c, 3.10, 3.15.
We have considered grammars with rules for '+', '-', '*', '/', id, num and parentheses. Expand these grammars with boolean operators (and, or, not) and comparisons (==, <=, etc.) in such a way that you get the usual precedens and associativity. Use the slide introducing an unambiguous grammar for expressions as your starting point.
Show that the grammar below is LR(1), but not LALR(1), i.e., it works making an LR(1) parser table, but something goes wrong when you make the transition to an LALR(1) table. Recall that the |-shorthand used below indicates four S-productions. Remember first to introduce a new nonterminal and an eof-symbol ($).
- S → a E a | b E b | a F b | b F a
- E → e
- F → e

Note, DM546, fall 2019

Note 2, DM546, fall 2019

Lecture February 8

Syntax analysis (parsing).

Background material: Appel Chapter 3.

Exercises February 12

Appel 2.1 a-f.
Discuss the C programming style introduced in Appel, Chapter 1. Make changes in Program 1.5 corresponding to the addition of the following rules to Grammar 1.3:
- Exp → string
- Exp → sqrt ( Exp )
- Stm → if ( Exp ) then Stm
Read about Flex. Make an overview of the additional possibilities for specifying regular expressions which are available in the tool compared with what was discussed at the lecture.
What does the following regular expressions match?
- \"([^\"])*\"
- http:[^?]*\?
- [-+]?[0-9]*\.[0-9]+([eE][-+]?[0-9]+)?
Make and test the following three Flex scanners:
- Make texts politically correct. Replace "idiot" with "intellectually challenged person", etc.
- Remove all whitespace and produce lines in lengths of 80 characters.
- Remove all tags from an HTML document. For those who do not speak HTML fluently, HTML is just regular text with some extra interpreted constructions. A "tag" consists of a "less than" symbol followed by some text and closed by a "greater than" symbol (you can view the source of this page to see an example).

With regards to literature and material on C, flex, etc., see the literature page.

Note, DM546, fall 2019

Note 1, DM546, fall 2019

Lecture February 5

Introduction to the course.
Overview of a compiler.
Lexical Analysis (scanning).

Background material: Appel Chapter 1 and 2, Flex-documentation.

With regards to literature and material on flex, see the literature page. The tool flex is used in combination with the programming language C. If you have not programmed much in C before, then it would be a good idea to read about it and try to write a few small programs. References to material on C can be found via the literature page.

Exercises February 7

Only one hour is scheduled for the first exercise session. It will primarily be used to review material you already know and that we need in the course.

Review how to write programs in C by writing some small programs again, and ask questions at the exercises, if there is anything you are in doubt of. If you have difficulties with the problems below, then pose your own programming problems. The most important is to get going with the language again. In particular, it is important to look at addresses, structs, and pointers. Pointers fill the role of references in languages such as Java and Python, for instance, but in C, pointers are a datatype. This gives extra possibilities as well as difficulties compared with reference-based languages.
Implement the following primitive hash table with the operations insert, delete, and lookup. The elements in the hash table are referred to as keys. They are integers and they are placed in an array. The size of the array must be a prime. The hash function, which is used to decide where a given key should be placed in the array, should be of the form a * x mod "table size", where a is a different prime. As a simplification compared with a normal hash table, you can assume that no keys are placed in the same entry in the array. The implementation should be made in ANSI-C. Use the compiler gcc. Make a few tests.
Implement binary search trees with the operations insert, delete, and lookup. You do not have to implement any kind of rebalancing. The implementation should be made in ANSI-C. Use the compiler gcc. Make a few tests.
Read about make and explain the fundamental principles and features. You do not have to go into detail with any advanced applications. As a start, read the entire introductory section and the section on syntax. Then familiarize yourself with the remaining content, e.g., by reading the introduction to each chapter.

With regards to literature and material on C and make, see the literature page.

Data protection at SDU ▪ Databeskyttelse på SDU