Fall 2016 / DM847
Introduction to Bioinformatics

General Information

Course introduction

The purpose of this course is to give an understanding of computational problems in modern biomedical research. We will start with concrete medical questions, develop a formal problem description, setup an algorithmic/statistical model, solve it and subsequently derive real-world answers from within the solved model. The course aims for giving a basic understanding of which problems arise in modern molecular biology and clinical research, and how these problems can be solved with appropriate computational tools. It is a class that needs regular attendance. Precondition for admittance to the exam will be the preparation of exercise sheets as well as the course project.

Expected learning outcome

Explain and understand the central dogma of molecular biology, central aspects of gene regulation, the basic principle of epigenetic DNA modifications, and specialties w.r.t. bacteria & phage genetics
Model ontologies for biomedical data dependencies
Design of systems biology databases
Explain and implement DNA & amino acid sequence analysis methods (HMMs, scoring matrices, and efficient statistics with them on data structures like suffix arrays)
Explain and implement statistical learning methods on biological networks (network enrichment)
Explain the specialties of bacterial genetics (the operon prediction trick)
Explain and implement methods for suffix trees, suffix arrays, and the Burrows-Wheeler transformation
Explain de novo sequence pattern screening with EM algorithm and entropy models.
Explain and implement basic methods for supervised and unsupervised data mining, as well as their application to biomedical OMICS data sets

Topics Covered

The following main topics are contained in the course:

Central dogma of molecular genetics, epigenetics, and bacterial and phage genetics
Design of online databases for molecular biology content (ontologies, and example databases: NCBI, CoryneRegNet, ONDEX)
DNA and amino acid sequence pattern models (HMMS, scoring matrices, mixed models, efficient statistics with them on big data sets)
Specialities in bacterial genetics (sequence models and functional models for operons prediction)
De novo identification of transcription factor binding motifs (recursive expectation maximization, entropy-based models)
Analysis of next-generation DNA sequencing data sets (memory-aware short sequence read mapping data with Burrows Wheeler transformation and suffix arrays, bi-modal peak calling)
Visualization of biological networks (graph layouting: small but highly variable graphs vs. huge but rather static graphs)
Systems biology and statistics on networks (network enrichment with CUSP, jActiveModules and KeyPathwayMiner)
Basic supervised and unsupervised classification methods for OMICS data analysis

Requirements

During the course the students have to complete exercise sheets and participate on one large project at the end of the semester. The project will be evaluated with pass/fail and needs to be passed in order to be eligible for the oral exam at the end of the semester.

Evaluation

You can download the evaluation results without comments here and the according action plan here. Thank you to all students who have returned the evaluation form.

Lectures

#	Date	Content	Slides
1	Wed, 14.09.2016, 8-10	Introduction	here
2	Thu, 15.09.2016, 2-4	Databases	here
3	Wed, 21.09.2016, 10-12	Sequence Logos & Operon Prediction	here
4	Wed, 28.09.2016, 10-12	Transcription Factor Binding Sites	here
5	Wed, 05.10.2016, 10-12	De Novo Motif Discovery	here
6	Tue, 11.10.2016, 08-10	ChIP Data Analysis	here
7	Tue, 25.10.2016, 2-4	Network Enrichment	here
8	Tue, 01.11.2016, 4-6	Clustering	here
9	Tue, 08.11.2016, 4-6	Data Mining	here
NEW	Thu, 10.11.2016, 4-6	Excursion to the Hospital	location here
10	Tue, 15.11.2016	--
11	Tue, 22.11.2016, 4-6	--
11	Thu, 24.11.2016, 2-4	recap

Assignment

General Notes

Here, you find all necessary information for the mandatory assignment for the course. Please note, that passing this assignment is necessary in order to be eligible to take the oral exam. Grading will be pass/fail with internal censor.

There will be no extensions to the deadlines!

Deadline Intermediate Report: December 19.

Deadline Final Hand-In: January 9.

It is allowed and encouraged to work in teams of 5 students. Make sure, when submitting your reports and code, that all your team members' names are included.

Materials

Project description
Software PEAX or directly from the vendor: here
File 1: toydata - Raw data
File 2: toydata - Peak picking results
File 3: toydata - Peak alignment
File 4: toydata - Indicator matrix
File 5: toydata - Class lables
File 6: Real data - Training: Raw data with class labels
File 7: Real data - Unlabled raw data

Additional Links

Help with R: http://www.statmethods.net
WEKA: http://www.cs.waikato.ac.nz/ml/weka/
Papers

Peak picking: here and here and here
Breath Data Mining: here and here

Exercises

#	TA Session	Topic	Hand-In Due	Download
1	Thu, 22.09.2016	Databases	Wed, 21.09.2016	here
2	Thu, 29.09.2016	Sequence Logos (corrected)	Wed, 28.09.2016	here
3	Thu, 06.10.2016	Transcription Factors	Wed, 05.10.2016	here
4	Thu, 13.10.2016	De Novo Motifs	Wed, 12.10.2016	here Upstream Sequences
5	Thu, 27.10.2016	ChIP Data	Wed, 26.10.2016	here Read Mappings
6	Thu, 03.11.2016	Network Enrichment	Wed, 02.11.2016	here Map of Quatar Map of Zimbabwe
7	Thu, 15.11.2016	Clustering	Wed, 09.11.2016	here Datasets and Helpers TransClust
8	Thu, 17.11.2016	Data Mining	Wed, 16.11.2016	here Datasets and Additional Information

Materials

All lecture slides are relevant for the exams.

Useful Links

TBA

Fall 2016 / DM847 Introduction to Bioinformatics

General Information

Course introduction

Expected learning outcome

Topics Covered

Requirements

Evaluation

Lectures

Assignment

General Notes

Materials

Additional Links

Exercises

Materials

Useful Links

Fall 2016 / DM847
Introduction to Bioinformatics