Required Assignment 1 - Chemical Spaces and Graph Grammars
In this assignment, your task is to define a chemical, and two non-chemical graph grammar and explore the compound space in a systematic way. The chemical graph grammar is based on the (autocatalytic) Formose cycle, the second graph grammar is based on a small puzzle, and on the Catalan game.
The Formose cycle
(figure from the German Wikipedia entry, don’t use this for modeling your rules, it’s not specific enough; rather look in the summary file provided):
The formose reaction, discovered by Aleksandr Butlerov in 1861, involves the formation of sugars from formaldehyde. As it is easy to see, the overall reaction of the pathway above is of the form 2A + B → 2B, where A refers to formaldehyde, and B refers to glycolaldehyde. As B can not be created of A only, but the existence of B leads to more than one B, we say that B is an autocatalytic compound.
Reindeers
The puzzle involves seven areas and six reindeers, see the Figure above. The seven areas are laid out horizontally. The six reindeers are evenly divided into a black trio and a red trio. The black reindeers are located on the three left-most areas, the red reindeers are located at the three right-most areas. The middle space is vacant. The challenge is to transpose the trios, jumping the black reindeers to the right-most spaces and the red reindeers to the left-most spaces. Their movement is restricted. A reindeer can only jump forward, either hopping to a vacant space one place ahead or leaping over its neighbor reindeer to a vacant rock two places ahead.
Catalan
In the Catalan game you are allowed to remove nodes and edges by clicking on a node n having exactly 3 neighbors (temporary copy from here). Applying such a move removes the 3 incident edges of n and collapses the three neighbor nodes n1, n2, and n3 and the clicked node n to one node. That node has all the neighbors that n1, n2, and n3 had before. The goal is to collapse the initial graph to one node only.
Requirements
You have to design the graph transformation rules to model the i.) Formose cycle ii.) the Reindeer puzzle, and iii.) the Catalan game. You need to design your graph transformation rules such that you can find an autocatalytic “flow” in the chemical space for the Formose reaction, and that you can find a solution to the Catalan game with graph grammars.
Attacking the Problem
You can eitehr work with the VirtualBox applance that was provided (see email) or directly work on the Computer Lab machines. If you will/want to use a Computer Lab machine, here is a brief technical help of how to login to a machine, which can run mød. If you are physically at the machine, your login name should say SDU\username. If you are outside of the SDU network, you first have to either i.) use VPN to get an SDU-internal IP address (see here, or ii.) need to login to a jumphost via ssh SDU\\username@logon.sdu.dk (with your SDU password). As a second step you can then use ssh imada-1063XX.imada.sdu.dk, where XX is a number between 10 and 40. See here for the status of the Computer Lab machines. mød will not run on the logon machines, only on the Computer Lab machines.
In the lecture it will/was presented how to use the mød to model
the non-oxidative phase of
the Pentose Phosphate Pathway. The overall reaction converts
six ribulose-5-phosphate molecules to
five Fructose
6-phosphate. The graph grammar for PPP, the script to
explore the chemical space and to find the PPP pathway with mød is
provided in the VirtualBox appliance as well as in the terminal room in the
directory /mnt/Shared/daniel/dm840/assignment1/. You need to
execute source /mnt/Shared/jlandersen/shared/setEnv.sh 
in order to set the required paths to find the binaries and libraries
for mød.
If you want to use auto-completion in a python shell you need to have ipython3 installed. This is pre-installed in the VirtualBox appliance, on the Computer Lab you can easily do this by executing the command pip3 install --user ipython, and subsequently add the corresponding PATH, e.g., by adding the line export PATH=$HOME/.local/bin:$PATH to the file $HOME/.profile.
Before you try to design the graph transformation rules (in Graph Modeling
 Language (GML) format) for the Formose reaction you should study
 carefully how you define graph transformation rules, how you can put
 constraints onto atoms/bonds, how you can expand the chemical space
 for PPP in different ways, how you define the optimization problem,
 how you find solutions, and how you print products, solutions, and
 the chemical space. Besides that this was presented during the
 lecture: read carefully the README file for PPP and
 the Graph
 Grammar Rule Tutorial to understand the details of GML. Section
 2.4 shows how to put constraints onto atoms/bonds, this is however slightly outdated, details gill be given in the lecture and below.
Formose
In the directory /mnt/nfs/Shared/daniel/dm840/assignment1/formose you will
find a file summary.pdf that is defines the necessary
rules for the Formose cycle. Some of the rules have constraints that
are defined similar to |{ e in outEdges(2) | label(target(e)) in { 'O' } }| = 1
in the summary file just below the rule.
This constraint, for example, defines that exactly one neighbor node (i.e., the target of an incident edge) of the node with id 2 should have the label “O”. In GML this can be implemented in the following way:</p>
constrainAdj [
        id 2
        op "="
        count 1
        nodeLabels [ label "O" ]
]
In this example the node with id 2 ( id 2 ) should
have exactly ( op = ) 1 neighbor ( count 1 )
with the label O ( nodeLabels [ label "O" ] ) in order to
allow the rule (not shown) being applied. Using nodeLabels [
label "O" label "N" ] would count the number of neighbors
having label O or N. You can also add several constraints to a rule
(see Section 2.4.4 of the tutorial). Note, that the node-id number in
a depiction of a graph grammar rule in the auto-generated summary is in
general different to the numbers used in the gml-file.
To solve this part you only need to provide the four rules </p>
aldol_addition_backward.gml aldol_addition_forward.gml keto_enol_backward.gml keto_enol_forward.gml
Templates for the four rules are provided. All four rules are
depicted in the provided summary. Implement these rules in GML format.
Furthermore the SMILES strings for glycolaldehyde and formaldehyde
need to be defined in grammar.py. Use these and the
rules to expand the derivation graph (see the provided
file doStuff.py). Within your expanded derivation graph
you should be able to find i.) a solution that transforms two
formaldehyde and one glycolaldehyde to two
glycolaldehydes. Furthermore, the query for an autocatalytic solution
should give a solution that shows that glycolaldehyde is an
autocatalytic compound. The respective queries are also found in the
file doStuff.py.
Reindeer
The reindeer game exists im many different versions, the most prominent one with frogs (see for example here. The rules are straightforward, your goal is to make the three black and three red reindeer change their position. Their movement is restricted. A reindeer can only jump forward, either hopping to a vacant space one place ahead or leaping over its neighbor reindeer to a vacant place two places ahead. The goal is of course to use graph grammars based on the templated code you are provide, to solve the puzzle. Even though this is only a small thing after you modelled the whoe game, but what is the smallest number of moves?
Catalan
Make sure you understood your solution to find the Formose cycle,
and that you finished the Reindeer puzzle, before you start to design
a graph grammar for the Catalan game. Before you implement your rules
(and probably a strategy to expand the derivation graph), make sure
that you know which of your rules are meant for doing what. Note, that
your rules are no longer rules to model chemical reactions. Therefore
you can (and must) also remove nodes (which is due to mass
conservation not possible in a chemical setting). In the
directory /mnt/nfs/Shared/daniel/dm840/assignment1/catalan you will
find a starting point with a small script that you can provide the
level number and (optionally) the number of expansion steps:
 
./solve.sh 1 3
will indirectly start mød and load the graph in the required format
that corresponds to level 1 of the Catalan game, mød will do three
expansion steps based on the rules (that you have to define) and will
then try to find a “flow” from the input graph to the graph having one
node only. You are not allowed to use copy-paste rules (see Section
2.5 Copy-and-Paste operations of the tutorial). You should not use
any other constraints than defined by constrainAdj.
Report / Submission
The report must be short and precise (maximum 10 pages, English language). It must include the following:
- A small introduction.
- A section for the Formose cycle: you need to provide i.) a depiction of the four rules you designed (as provided by mød), ii.) the rules in textual form, iii.) the derivation graph in which you found the autocatalytic solution, iv.) the solution to the query for autocatalysis including the objective value.
- A section for the Catalan game: you need to provide i.) a depiction of all the rules you designed (as provided by mød) and (in contrast to the Formose cycle rules) the most important part: explain what the rational behind each of your designed rules is and, if necessary, why you chose to add any constraints on the rules. ii.) Motivate (or show) why your set of graph transformation rules can mimic a Catalan move. iii.) Discuss any problems that you might observe and discuss if a “flow” that you find corresponds to a sequence of moves in the Catalan game. iv.) What is the number of steps from the initial graph to the final graph for level 1? Which levels could you “solve” via the graph grammar approach?
- In case you work as a group: a paragraph on who contributed with what.
For submitting the report and the sources proceed as follows: create the following directory structure
assignment1/ assignment1/formose/ assignment1/catalan/ assignment1/report/
The formose and catalan directory are the directories that we
provided including the modifications that you made, the rules that you
defined, and so on. The report should be put in the corresponding
directory.  The assignment1 directory should not contain more than 10MB
of data.
The submission should be uploaded via blackboard.
The strict deadline for submission is 23. October, 14:00.
Frequently Asked Questions (FAQ
- Are we allowed to work in groups?
You are allowed to works in groups of max 2 people. Write the members of your group on the front page of your report. In the report you have to write who contributed with what.
- Can I install MØD on a non-IMADA terminal room machine?
You can use the VirtualBox appliance that was provided. There is also a docker version, but not all features are available in the public docker version.