DM823 String Algorithms

Fall 2010, 1st quarter
Rolf Fagerberg


Official Course Description

See the course description at the web pages of the faculty.

Time and Place

The course starts Monday, August 30.

Textbook

As textbook, we will use the following:

Jewels of Stringology
By M. Crochemore and W. Rytter
Published by World Scientific, 2002
ISBN 981-02-4782-6

We will also use excerpts (given as handouts) of other books, including:

Algorithms On Strings,Trees, And Sequences
By D. Gusfield
Published by Cambridge University Press, 1997
ISBN 0521585198

Flexible Pattern Matching in Strings - Practical on-line search algorithms for texts and biological sequences
By G. Navarro and M. Raffinot
Published by Cambridge University Press, 2002
ISBN 0521813077

Data Structures and Algorithms in Java (1st ed.)
By M.T. Goodrich and R. Tamassia
Published by John Wiley & Sons, 1998
ISBN 0471193089

Examination

The exam is oral, with grades on the 7-point marking scale. The exam date is November 2. A programming project must be passed in order to attend the exam. There is a separate document describing the experiments to be performed on the data sets provided.

There will be a spørgetime (session for asking questions on the exam and the curriculum) Friday, October 29, at 14.15 in U52, and again Monday, November 1, at 12:15 in U69.

At the oral exam, you will draw an exam topic delineating a part of curriculum which you are to present in the first half of the examination. More details of the exam form are described at the bottom of the list of exam topics.

The sequence of students can be seen here. The room will be U14, with preparation in U17.

The grades at the exam ended up with the following distribution.

Lectures

Date Time Room Contents Reading
Monday, August 30 10-12 Imada seminar room Introduction to course. String terminology. Borders. The KMP linear time algorithm for exact pattern matching. Course introduction slides. Pages 10-11 in textbook. Pages 19 to 23 middle in textbook. The teachers notes on borders and notes on the KMP algorithm.
Friday, September 3 14-16 Imada seminar room Computing the shift table for the KMP algorithm. The BM algorithm. The teachers notes on computing the KMP shift table. Pages 26 middle to 29 middle and pages 41-44 in textbook. Further variant algorithms for computing the KMP shift table can be found on pages 33 to 35 in the textbook (not curriculum).
Monday, September 6 10-12 Imada seminar room End of proof of linearity (for finding first match) of BM algorithm. Periods. Proposition on primitive words and cyclic shifts. The Galil variant of the BM algoritm having linear time for finding all occurrences. Pages 41-44 in textbook. Page 11 (lower part) in textbook. The teachers notes on periods. Page 40 of the book by Gusfield (given as handout). Pages 29-30 in textbook. The teachers rough notes on the proof of linearity of the BM algorithm and the Galil variant of the BM algorithm.
Friday, September 10 14-16 Imada seminar room The prefix table. Preprocessing for the BM algorithm. Pages 8-10 and 19-22 of the book by Gusfield (given as handout). The teachers rough notes on finding the entries of the prefix table and preprocessing for the BM algorithm.
Monday, September 13 10-12 Imada seminar room Horspool version of BM. The fast-on-average (factor-based) version of BM. The Shift-AND algorithm. Speed in practice of algorithms for exact pattern matching. A bit about the value of focusing on ideas, analyses, and proofs. Note: the expected running time for the fast-on-average algorithm meets a lower bound by Yao (not curriculum), and is thus optimal. Pages 26-27 in book by Navarro and Raffinot (given as handout). Page 31 in textbook. Pages 70-72 in book by Gusfield (given as handout). The teachers notes on the Shift-AND algorithm. Pages 38-39 in book by Navarro and Raffinot (given as handout). Page xv in book by Gusfield (given as handout).
Friday, September 17 14-16 Imada seminar room Tries and suffix trees. Pages 566-576 of the book by Goodrich and Tamassia. Sections 4.0, 4.1 and 5.1 in the textbook.
Monday, September 20 11-12 Imada seminar room Ukkonens linear time algorithm for online construction of suffix trees. Pages 1-5 in the teachers notes on Ukkonens algorithm. Section 4.2 in the textbook (note: does not entirely follow exposition in notes).
Friday, September 24 14-16 Imada seminar room End of proof of Ukkonens linear time algorithm for online construction of suffix trees. Pages 6-15 in the teachers notes on Ukkonens algorithm. Section 4.3 in the textbook (note: does not entirely follow exposition in notes).
Monday, September 27 10-12 Imada seminar room Suffix arrays. Pages 149-155 in book by Gusfield (given as handout), Section 7.3 and figures 7.6, 7.7, and 7.8 in the textbook.
Thursday, September 30 12-14 U144 Building suffix arrays in linear time. Building the LCP array in linear time from the suffix array. Sections 1.0, 1.1, 2, and 3 in Linear work suffix array construction, J. Kärkkäinen, P. Sanders, S. Burkhardt G, Journal of the ACM, vol 53 (6), 918 -936, 2005 (pdf-link available from an SDU IP-address). Section 3 in Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications, T. Kasai, G. Lee, H. Arimura, S. Arikawa and K. Park, in Proceedings of Combinatorial Pattern Matching 2001, Lecture Notes in Computer Science, vol 2089, 181-192, Springer 2001 (pdf-link available from an SDU IP-address, note a few errors in the paper: In the statements of Fact 2, Fact 3, Lemma 3, and Theorem 1, ">1" should be "=>1", and in the code in Fig. 3, the last IF-statement should be executed at each iteration of the FOR-loop (so it should be moved out of the first IF-statement)). Section 7.4 in the textbook.
Monday, October 4 10-12 Imada seminar room The KMR naming method, with applications to 2D pattern matching. Sections 7.1 and 7.2 in the textbook.
Thursday, October 7 12-14 U144 Sorting and searching strings. Fast Algorithms for Sorting and Searching Strings, J. Bentley and R. Sedgewick, Proceedings of the 8th annual ACM-SIAM symposium on Discrete algorithms, 360-369, 1997.
Monday, October 11 10-12 Imada seminar room Sorting strings in external memory. External String Sorting: Faster and Cache-Oblivious, R. Fagerberg, A. Pagh, R. Pagh, Proceedings of the 23rd International Symposium on Theoretical Aspects of Computer Science (STACS), Lecture Notes in Computer Science, vol 3884, 68-79, Springer 2006. Slides for paper (note error on page 13: "P(no collision)" should be "P(one or more collisions)").
Thursday, October 14 12-14 U49B End of sorting strings in external memory. Dynamic programming: Hirschbergs trick for finding optimal solutions in linear space. A Linear Space Algorithm for Computing Maximal Common Subsequences, D.S. Hirschberg, Communication of the ACM, vol. 18(6), 341-343, 1975 (pdf-link available from an SDU IP-address). Alternatively, see the teachers rough notes.

Course Evaluation

As part of the Study Boards schedule of course evaluations, a course evaluation has been carried out (after the exam). The aggregated answers (with comments removed for anonymity, as required by the Study Board) and the teachers plan of actions are now available.


Maintained by Rolf Fagerberg (rolf@imada.sdu.dk)