DM823 String Algorithms
Fall 2010, 1st quarter Rolf Fagerberg
Official Course Description
See the
course
description at the web pages of the faculty.
Time and Place
The course starts Monday, August 30.
Textbook
As textbook, we will use the following:
Jewels of
Stringology
By M. Crochemore and W. Rytter
Published by World Scientific, 2002
ISBN 981-02-4782-6
We will also use excerpts (given as handouts) of other books, including:
Algorithms On Strings,Trees, And Sequences
By D. Gusfield
Published by Cambridge University Press, 1997
ISBN 0521585198
Flexible Pattern Matching in Strings -
Practical on-line search algorithms for texts and biological
sequences
By G. Navarro and M. Raffinot
Published by Cambridge University Press, 2002
ISBN 0521813077
Data Structures and Algorithms in Java (1st ed.)
By M.T. Goodrich and R. Tamassia
Published by John Wiley & Sons, 1998
ISBN 0471193089
Examination
The exam is oral, with grades on the 7-point marking scale. The exam date
is November 2. A programming project must be passed in
order to attend the exam. There is a separate
document describing the experiments to be performed on the
data sets provided.
There will be a spørgetime (session for asking questions on the exam and
the curriculum) Friday, October 29, at 14.15 in
U52, and again Monday,
November 1, at 12:15 in U69.
At the oral exam, you will draw an exam topic
delineating a part of curriculum which you are to present in the first half
of the examination. More details of the exam form are described at the
bottom of the list of exam topics.
The sequence of students can be seen here.
The room will be U14, with
preparation in U17.
The grades at the exam ended up with the following
distribution.
Lectures
Date |
Time |
Room |
Contents |
Reading |
Monday, August 30 |
10-12 |
Imada seminar room |
Introduction to course. String
terminology. Borders. The KMP linear time algorithm for exact pattern
matching. |
Course introduction slides.
Pages 10-11 in textbook. Pages 19 to 23 middle in textbook. The teachers
notes on borders and notes on
the KMP algorithm.
|
Friday, September 3 |
14-16 |
Imada seminar room |
Computing the shift table for the KMP algorithm. The BM algorithm. |
The teachers notes on computing the KMP shift
table. Pages 26 middle to 29 middle and pages 41-44 in
textbook. Further variant algorithms for computing the KMP shift
table can be found on pages 33 to 35 in the textbook (not curriculum).
|
Monday, September 6 |
10-12 |
Imada seminar room |
End of proof of linearity (for finding first
match) of BM algorithm. Periods. Proposition on primitive words and cyclic
shifts. The
Galil variant of the BM algoritm having linear time for finding all
occurrences. |
Pages 41-44 in textbook. Page 11 (lower part) in textbook.
The teachers notes on periods. Page 40 of the book
by Gusfield
(given as handout). Pages 29-30 in textbook. The teachers rough notes on
the proof of linearity of the BM algorithm
and
the Galil variant of the BM algorithm.
|
Friday, September 10 |
14-16 |
Imada seminar room |
The prefix table. Preprocessing for the BM algorithm.
|
Pages 8-10 and 19-22 of the book
by Gusfield
(given as handout). The teachers rough notes on
finding the entries of the prefix table
and
preprocessing for the BM algorithm.
|
Monday, September 13 |
10-12 |
Imada seminar room |
Horspool version of BM. The fast-on-average (factor-based) version of
BM. The Shift-AND algorithm. Speed in practice of algorithms for exact
pattern matching. A bit about the value of focusing on ideas, analyses, and
proofs. Note: the expected running time for the fast-on-average algorithm
meets a
lower
bound by Yao (not curriculum), and is thus optimal.
|
Pages 26-27 in book by Navarro and Raffinot (given as handout). Page 31 in
textbook. Pages 70-72 in book by Gusfield (given as handout). The teachers
notes on the Shift-AND algorithm. Pages 38-39 in
book by Navarro and Raffinot (given as handout).
Page xv in book by Gusfield (given as handout).
|
Friday, September 17 |
14-16 |
Imada seminar room |
Tries and suffix trees.
|
Pages 566-576 of the book by Goodrich and Tamassia. Sections 4.0, 4.1 and
5.1 in the textbook.
|
Monday, September 20 |
11-12 |
Imada seminar room |
Ukkonens linear time algorithm for online construction of suffix
trees.
|
Pages 1-5 in the teachers notes on Ukkonens
algorithm. Section 4.2 in the textbook (note: does not entirely follow
exposition in notes).
|
Friday, September 24 |
14-16 |
Imada seminar room |
End of proof of Ukkonens linear time algorithm for online construction
of suffix trees.
|
Pages 6-15 in the teachers notes on Ukkonens
algorithm. Section 4.3 in the textbook (note: does not entirely follow
exposition in notes).
|
Monday, September 27 |
10-12 |
Imada seminar room |
Suffix arrays.
|
Pages 149-155 in book by Gusfield (given as handout), Section 7.3 and
figures 7.6, 7.7, and 7.8 in the textbook.
|
Thursday, September 30 |
12-14 |
U144 |
Building suffix arrays in linear time. Building the LCP array in
linear time from the suffix array.
|
Sections 1.0, 1.1, 2, and 3 in
Linear work suffix
array construction, J. Kärkkäinen, P. Sanders, S. Burkhardt G, Journal of
the ACM, vol 53 (6), 918 -936, 2005 (pdf-link available from an SDU
IP-address).
Section 3 in
Linear-Time
Longest-Common-Prefix Computation in Suffix Arrays and Its Applications,
T. Kasai, G. Lee, H. Arimura, S. Arikawa and K. Park, in Proceedings of
Combinatorial Pattern Matching 2001, Lecture Notes in Computer Science,
vol 2089, 181-192, Springer 2001 (pdf-link available from an SDU
IP-address,
note a few errors in the paper: In the statements of Fact 2, Fact 3, Lemma 3,
and Theorem 1, ">1" should be "=>1", and in the code in Fig. 3, the last
IF-statement should be executed at each iteration of the FOR-loop (so it
should be moved out of the first IF-statement)).
Section 7.4 in the textbook.
|
Monday, October 4 |
10-12 |
Imada seminar room |
The KMR naming method, with applications to 2D pattern matching.
|
Sections 7.1 and 7.2 in the textbook.
|
Thursday, October 7 |
12-14 |
U144 |
Sorting and searching strings.
|
Fast Algorithms for
Sorting and Searching Strings, J. Bentley and R. Sedgewick, Proceedings
of the 8th annual ACM-SIAM symposium on Discrete algorithms, 360-369, 1997.
|
Monday, October 11 |
10-12 |
Imada seminar room |
Sorting strings in external memory.
|
External String Sorting: Faster and Cache-Oblivious,
R. Fagerberg, A. Pagh, R. Pagh, Proceedings
of the 23rd International Symposium on Theoretical Aspects of Computer Science
(STACS), Lecture Notes in Computer Science, vol 3884, 68-79, Springer
2006. Slides for paper (note error on page 13:
"P(no collision)" should be "P(one or more collisions)").
|
Thursday, October 14 |
12-14 |
U49B |
End of sorting strings in external memory. Dynamic programming:
Hirschbergs trick for finding optimal solutions in linear space.
|
A Linear Space
Algorithm for Computing Maximal Common Subsequences, D.S. Hirschberg,
Communication of the ACM, vol. 18(6), 341-343, 1975 (pdf-link available
from an SDU IP-address). Alternatively, see the teachers
rough notes.
|
Course Evaluation
As part of the Study Boards schedule of course evaluations, a course
evaluation has been carried out (after the exam). The
aggregated answers (with comments removed for
anonymity, as required by the Study Board) and the
teachers plan of actions are now available.
Maintained by Rolf Fagerberg
(rolf@imada.sdu.dk)
|
|