DM823 String Algorithms
Fall 2010, 1st quarter Rolf Fagerberg
Official Course Description
See the
course
description at the web pages of the faculty.
Time and Place
The course starts Monday, August 30.
Textbook
As textbook, we will use the following:
Jewels of
Stringology
By M. Crochemore and W. Rytter
Published by World Scientific, 2002
ISBN 9810247826
We will also use excerpts (given as handouts) of other books, including:
Algorithms On Strings,Trees, And Sequences
By D. Gusfield
Published by Cambridge University Press, 1997
ISBN 0521585198
Flexible Pattern Matching in Strings 
Practical online search algorithms for texts and biological
sequences
By G. Navarro and M. Raffinot
Published by Cambridge University Press, 2002
ISBN 0521813077
Data Structures and Algorithms in Java (1st ed.)
By M.T. Goodrich and R. Tamassia
Published by John Wiley & Sons, 1998
ISBN 0471193089
Examination
The exam is oral, with grades on the 7point marking scale. The exam date
is November 2. A programming project must be passed in
order to attend the exam. There is a separate
document describing the experiments to be performed on the
data sets provided.
There will be a spørgetime (session for asking questions on the exam and
the curriculum) Friday, October 29, at 14.15 in
U52, and again Monday,
November 1, at 12:15 in U69.
At the oral exam, you will draw an exam topic
delineating a part of curriculum which you are to present in the first half
of the examination. More details of the exam form are described at the
bottom of the list of exam topics.
The sequence of students can be seen here.
The room will be U14, with
preparation in U17.
The grades at the exam ended up with the following
distribution.
Lectures
Date 
Time 
Room 
Contents 
Reading 
Monday, August 30 
1012 
Imada seminar room 
Introduction to course. String
terminology. Borders. The KMP linear time algorithm for exact pattern
matching. 
Course introduction slides.
Pages 1011 in textbook. Pages 19 to 23 middle in textbook. The teachers
notes on borders and notes on
the KMP algorithm.

Friday, September 3 
1416 
Imada seminar room 
Computing the shift table for the KMP algorithm. The BM algorithm. 
The teachers notes on computing the KMP shift
table. Pages 26 middle to 29 middle and pages 4144 in
textbook. Further variant algorithms for computing the KMP shift
table can be found on pages 33 to 35 in the textbook (not curriculum).

Monday, September 6 
1012 
Imada seminar room 
End of proof of linearity (for finding first
match) of BM algorithm. Periods. Proposition on primitive words and cyclic
shifts. The
Galil variant of the BM algoritm having linear time for finding all
occurrences. 
Pages 4144 in textbook. Page 11 (lower part) in textbook.
The teachers notes on periods. Page 40 of the book
by Gusfield
(given as handout). Pages 2930 in textbook. The teachers rough notes on
the proof of linearity of the BM algorithm
and
the Galil variant of the BM algorithm.

Friday, September 10 
1416 
Imada seminar room 
The prefix table. Preprocessing for the BM algorithm.

Pages 810 and 1922 of the book
by Gusfield
(given as handout). The teachers rough notes on
finding the entries of the prefix table
and
preprocessing for the BM algorithm.

Monday, September 13 
1012 
Imada seminar room 
Horspool version of BM. The fastonaverage (factorbased) version of
BM. The ShiftAND algorithm. Speed in practice of algorithms for exact
pattern matching. A bit about the value of focusing on ideas, analyses, and
proofs. Note: the expected running time for the fastonaverage algorithm
meets a
lower
bound by Yao (not curriculum), and is thus optimal.

Pages 2627 in book by Navarro and Raffinot (given as handout). Page 31 in
textbook. Pages 7072 in book by Gusfield (given as handout). The teachers
notes on the ShiftAND algorithm. Pages 3839 in
book by Navarro and Raffinot (given as handout).
Page xv in book by Gusfield (given as handout).

Friday, September 17 
1416 
Imada seminar room 
Tries and suffix trees.

Pages 566576 of the book by Goodrich and Tamassia. Sections 4.0, 4.1 and
5.1 in the textbook.

Monday, September 20 
1112 
Imada seminar room 
Ukkonens linear time algorithm for online construction of suffix
trees.

Pages 15 in the teachers notes on Ukkonens
algorithm. Section 4.2 in the textbook (note: does not entirely follow
exposition in notes).

Friday, September 24 
1416 
Imada seminar room 
End of proof of Ukkonens linear time algorithm for online construction
of suffix trees.

Pages 615 in the teachers notes on Ukkonens
algorithm. Section 4.3 in the textbook (note: does not entirely follow
exposition in notes).

Monday, September 27 
1012 
Imada seminar room 
Suffix arrays.

Pages 149155 in book by Gusfield (given as handout), Section 7.3 and
figures 7.6, 7.7, and 7.8 in the textbook.

Thursday, September 30 
1214 
U144 
Building suffix arrays in linear time. Building the LCP array in
linear time from the suffix array.

Sections 1.0, 1.1, 2, and 3 in
Linear work suffix
array construction, J. Kärkkäinen, P. Sanders, S. Burkhardt G, Journal of
the ACM, vol 53 (6), 918 936, 2005 (pdflink available from an SDU
IPaddress).
Section 3 in
LinearTime
LongestCommonPrefix Computation in Suffix Arrays and Its Applications,
T. Kasai, G. Lee, H. Arimura, S. Arikawa and K. Park, in Proceedings of
Combinatorial Pattern Matching 2001, Lecture Notes in Computer Science,
vol 2089, 181192, Springer 2001 (pdflink available from an SDU
IPaddress,
note a few errors in the paper: In the statements of Fact 2, Fact 3, Lemma 3,
and Theorem 1, ">1" should be "=>1", and in the code in Fig. 3, the last
IFstatement should be executed at each iteration of the FORloop (so it
should be moved out of the first IFstatement)).
Section 7.4 in the textbook.

Monday, October 4 
1012 
Imada seminar room 
The KMR naming method, with applications to 2D pattern matching.

Sections 7.1 and 7.2 in the textbook.

Thursday, October 7 
1214 
U144 
Sorting and searching strings.

Fast Algorithms for
Sorting and Searching Strings, J. Bentley and R. Sedgewick, Proceedings
of the 8th annual ACMSIAM symposium on Discrete algorithms, 360369, 1997.

Monday, October 11 
1012 
Imada seminar room 
Sorting strings in external memory.

External String Sorting: Faster and CacheOblivious,
R. Fagerberg, A. Pagh, R. Pagh, Proceedings
of the 23rd International Symposium on Theoretical Aspects of Computer Science
(STACS), Lecture Notes in Computer Science, vol 3884, 6879, Springer
2006. Slides for paper (note error on page 13:
"P(no collision)" should be "P(one or more collisions)").

Thursday, October 14 
1214 
U49B 
End of sorting strings in external memory. Dynamic programming:
Hirschbergs trick for finding optimal solutions in linear space.

A Linear Space
Algorithm for Computing Maximal Common Subsequences, D.S. Hirschberg,
Communication of the ACM, vol. 18(6), 341343, 1975 (pdflink available
from an SDU IPaddress). Alternatively, see the teachers
rough notes.

Course Evaluation
As part of the Study Boards schedule of course evaluations, a course
evaluation has been carried out (after the exam). The
aggregated answers (with comments removed for
anonymity, as required by the Study Board) and the
teachers plan of actions are now available.
Maintained by Rolf Fagerberg
(rolf@imada.sdu.dk)

