DM840, Autumn 2014

Daniel Merkle

DM840 Required Assignment 1: Chemical Spaces and Graph Grammars

In this assignment, your task is to define a chemical, and a non-chemical graph grammar and explore the compound space in a systematic way. The chemical graph grammar is based on the (autocatalytic) Formose cycle, the second graph grammar is based on the Catalan game.

Formose Reaction

The Formose cycle (figure from the German Wikipedia entry, don't use this for modeling your rules, it's not specific enough; rather look in the summary file provided):

The formose reaction, discovered by Aleksandr Butlerov in 1861, involves the formation of sugars from formaldehyde. As it is easy to see, the overall reaction of the pathway above is of the form 2A + B → 2B, where A refers to formaldehyde, and B refers to glycolaldehyde. As B can not be created of A only, but the existence of B leads to more than one B, we say that B is an autocatalytic compound.


In the Catalan game you are allowed to remove nodes and edges by clicking on a node n having exactly 3 neighbors (temporary copy from here). Applying such a move removes the 3 incident edges of n and collapses the three neighbor nodes n1, n2, and n3 and the clicked node n to one node. That node has all the neighbors that n1, n2, and n3 had before. The goal is to collapse the initial graph to one node only.


You have to design the graph grammar rules to model the i.) Formose cycle and ii.) the Catalan game. You need to design your graph grammar rules such that you can find an autocatalytic "flow" in the chemical space for the Formose reaction, and that you can find a solution to the Catalan game with graph grammars.

Attacking the Problem

In the lecture it was presented how to use the tool "MěD" to model the non-oxidative phase of the Pentose Phosphate Pathway. The overall reaction converts six ribulose-5-phosphate molecules to five Fructose 6-phosphate molecules. The graph grammar for PPP, the script to explore the chemical space and to find the PPP pathway with MěD is provided in the terminal room in the directory /home/daniel/dm840/assignment1/. You need to execute source /home/jlandersen/shared/ in order to set the required paths to find the binaries and libraries for MěD.

Before you try to design the graph grammar rules (in Graph Modeling Language (GML) format) for the Formose reaction you should study carefully how you define graph grammar rules, how you can put constraints onto atoms/bonds, how you can expand the chemical space for PPP in different ways, how you define the optimization problem, how you find solutions, and how you print products, solutions, and the chemical space. Besides that this was presented during the lecture: read carefully the README file for PPP and the Graph Grammar Rule Tutorial to understand the details of GML. Section 2.4 shows how to put constraints onto atoms/bonds.


In the directory /home/daniel/dm840/assignment1/formose you will find a file summary.pdf that is defines the necessary rules for the Formose cycle. Some of the rules have constraints that are defined similar to |{ e in outEdges(2) | label(target(e)) in { 'O' } }| = 1 in the summary file just below the rule. This constraint, for example, defines that exactly one neighbor node (i.e., the target of an incident edge) of the node with id 2 should have the label "O". In GML this can be implemented in the following way:

constrainAdj [
	id 2
	op =
	count 1
	nodeLabels [ label "O" ]

In this example the node with id 2 ( id 2 ) should have exactly ( op = ) 1 neighbor ( count 1 ) with the label O ( nodeLabels [ label "O" ] ) in order to allow the rule (not shown) being applied. Using nodeLabels [ label "O" label "N" ] would count the number of neighbors having label O or N. You can also add several constraints to a rule (see Section 2.4.4 of the tutorial). Note, that the node-id number in a depiction of a graph grammar rule in the auto-generated summary is in general different to the numbers used in the gml-file.

To solve this part you only need to provide the four rules


Templates for the four rules are provided. All four rules are depicted in the provided summary. Implement these rules in GML format. Furthermore the SMILES strings for glycolaldehyde and formaldehyde need to be defined in Use these and the rules to expand the derivation graph (see the provided file Within your expanded derivation graph you should be able to find i.) a solution that transforms two formaldehyde and one glycolaldehyde to two glycolaldehydes. Furthermore, the query for an autocatalytic solution should give a solution that shows that glycolaldehyde is an autocatalytic compound. The respective queries are also found in the file


Make sure you understood your solution to find the Formose cycle before you start to design a graph grammar for the Catalan game. Before you implement your rules (and probably a strategy to expand the derivation graph), make sure that you know which of your rules are meant for doing what. Note, that your rules are no longer rules to model chemical reactions. Therefore you can (and must) also remove nodes (which is due to mass conservation not possible in a chemical setting). In the directory /home/daniel/dm840/assignment1/catalan you will find a starting point with a small script that you can provide the level number and (optionally) the number of expansion steps:

./ 1 3

will indirectly start MěD and load the graph in the required format that corresponds to level 1 of the Catalan game, MěD will do three expansion steps based on the rules (that you have to define) and will then try to find a "flow" from the input graph to the graph having one node only. You are not allowed to use copy-paste rules (see Section 2.5 Copy-and-Paste operations of the tutorial). You should not use any other constraints than defined by constrainAdj.

Report / Submission

The report must be short and precise (maximum 10 pages, English language). It must include the following:

  • A small introduction.
  • A section for the Formose cycle: you need to provide i.) a depiction of the four rules you designed (as provided by MěD), ii.) the rules in textual form, iii.) the derivation graph in which you found the autocatalytic solution, iv.) the solution to the query for autocatalysis including the objective value.
  • A section for the Catalan game: you need to provide i.) a depiction of all the rules you designed (as provided by MěD) and (in contrast to the Formose cycle rules) the most important part: explain what the rational behind each of your designed rules is and, if necessary, why you chose to add any constraints on the rules. ii.) Motivate (or show) why your set of graph grammar rules can mimic a Catalan move. iii.) Discuss any problems that you might observe and discuss if a "flow" that you find corresponds to a sequence of moves in the Catalan game. iv.) What is the number of steps from the initial graph to the final graph for level 1? Which levels could you "solve" via the graph grammar approach?
  • In case you work as a group: a paragraph on who contributed with what.

For submitting the report and the sources proceed as follows: create the following directory structure


The formose and catalan directory are the directories that we provided including the modifications that you made, the rules that you defined, and so on. The report should be put in the corresponding directory. The assignment1 directory should not contain more than 10MB of data.

Change to the directory assignment1 and type aflever DM840

Additionally, you shall hand in a printout of your report (Department secretaries office).

The strict deadline for submission (electronically) is 10. October, 11am. Note, that the second mandatory assignment will be announced already before that deadline. The electronically submitted and the printout version need to be identical.

Frequently Asked Questions (FAQ)

  • Are we allowed to work in groups?
  • In case you are a PhD student: No. In case you are not a PhD student: You are allowed to works in groups of max 2 people. Write the members of your group on the front page of your report. In the report you have to write who contributed with what. PhD students need to work on their own.

  • Can I install MěD on a non-IMADA terminal room machine
  • No.

  • But everything runs so slow! Help me.
  • The problem is that access to the NFS server (the file server that stores your home directory) is at the time of writing very slow. This has several implications: i.) loading the binary (more than 20MB) takes some seconds. Running mod a second time will be much faster, as caching is used for the binaries. ii.) Any access to read or to write files in your home directory is slow. I recommend to work in the /tmp directory, this directory is on the local hard disk. However, you should be careful, because a reboot will delete this directory. This issue might change as a network infrastructure change is scheduled in the very near furture.

Design by | Modified by Daniel Merkle | CSS 2.0