DM832 Required Assignment 1: Chemical Spaces and Graph Grammars
In this assignment, your task is to define a chemical, and a non-chemical graph grammar and explore the compound space in a systematic way. The chemical graph grammar is based on the (autocatalytic) Formose cycle, the second graph grammar is based on the Catalan game.
Formose Reaction
The Formose cycle (figure from the German Wikipedia entry, don't use this for modeling your rules, it's not specific enough; rather look in the summary file provided):
The formose reaction, discovered by Aleksandr Butlerov in 1861, involves the formation of sugars from formaldehyde. As it is easy to see, the overall reaction of the pathway above is of the form 2A + B → 2B, where A refers to formaldehyde, and B refers to glycolaldehyde. As B can not be created of A only, but the existence of B leads to more than one B, we say that B is an autocatalytic compound.
Catalan
In the Catalan game you are allowed to remove nodes and edges by clicking on a node n having exactly 3 neighbors (temporary copy from here). Applying such a move removes the 3 incident edges of n and collapses the three neighbor nodes n1, n2, and n3 and the clicked node n to one node. That node has all the neighbors that n1, n2, and n3 had before. The goal is to collapse the initial graph to one node only.
Requirements
You have to design the graph grammar rules to model the i.) Formose cycle and ii.) the Catalan game. You need to design your graph grammar rules such that you can find an autocatalytic "flow" in the chemical space for the Formose reaction, and that you can find a solution to the Catalan game with graph grammars.
Attacking the Problem
In the lecture it was presented how to use the tool "MØD" to model
the non-oxidative phase of
the Pentose
Phosphate Pathway. The overall reaction converts
six ribulose-5-phosphate
molecules to
five Fructose
6-phosphate molecules. The graph grammar for PPP, the script to
explore the chemical space and to find the PPP pathway with MØD is
provided in the terminal room in the
directory /home/daniel/dm832/assignment1/
. You need to
execute source /home/jlandersen/shared/setPathsForMod.sh
in order to set the required paths to find the binaries and libraries
for MØD.
Before you try to design the graph grammar rules (in Graph Modeling
Language (GML) format) for the Formose reaction you should study
carefully how you define graph grammar rules, how you can put
constraints onto atoms/bonds, how you can expand the chemical space
for PPP in different ways, how you define the optimization problem,
how you find solutions, and how you print products, solutions, and
the chemical space. Besides that this was presented during the
lecture: read carefully the README
file for PPP and
the Graph
Grammar Rule Tutorial to understand the details of GML. Section
2.4 shows how to put constraints onto atoms/bonds.
Formose
In the
directory /home/daniel/dm832/assignment1/formose
you will
find a file summary.pdf
that is defines the necessary
rules for the Formose cycle. Some of the rules have constraints that
are defined similar to node 1: |{e | l(tar( e )) in { 'O' }}| =
1
in the summary file just below the rule. This constraint, for
example, defines that exactly one neighbor node (i.e., the target of
an incident edge) of the node with id 1 should have the label "O". In
GML this can be implemented in the following way:
constrainAdj [ id 2 op = count 1 nodeLabels [ label "O" ] ]
In this example the node with id 2 ( id 2
) should
have exactly ( op =
) 1 neighbor ( count 1
)
with the label O ( nodeLabels [ label "O" ]
) in order to
allow the rule (not shown) being applied. Using nodeLabels [
label "O" label "N" ]
would count the number of neighbors
having label O or N. You can also add several constraints to a rule
(see Section 2.4.4 of the tutorial). Note, that the node-id number in
a depiction of a graph grammar rule in the auto-generated summary is in
general different to the numbers used in the gml-file.
To solve this part you only need to provide the four rules
aldol_addition_backward.gml aldol_addition_forward.gml keto_enol_backward.gml keto_enol_forward.gml
Templates for the four rules are provided. All four rules are
depicted in the provided summary. Implement these rules in GML format.
Furthermore the SMILES strings for glycolaldehyde and formaldehyde
need to be defined in grammar.mod
. Use these and the
rules to expand the derivation graph (see the provided
file doStuff.mod
). Within your expanded derivation graph
you should be able to find i.) a solution that transforms two
formaldehyde and one glycolaldehyde to two
glycolaldehydes. Furthermore, the query for an autocatalytic solution
should give a solution that shows that glycolaldehyde is an
autocatalytic compound. The respective queries are also found in the
file doStuff.mod
.
Catalan
Make sure you understood your solution to find the Formose cycle
before you start to design a graph grammar for the Catalan
game. Before you implement your rules (and maybe a strategy to expand
the derivation graph), make sure that you know which of your rules are
meant for doing what. Note, that your rules are no longer rules to
model chemical reactions. Therefore you can (and must) also remove
nodes (which is due to mass conservation not possible in a chemical
setting). In the
directory /home/daniel/dm832/assignment1/catalan
you will
find a starting point with a small script that you can provide the
level number and (optionally) the number of expansion steps:
./solve.sh 1 3
will indirectly start MØD and load the graph in the required format
that corresponds to level 1 of the Catalan game, MØD will do three
expansion steps based on the rules (that you have to define) and will
then try to find a "flow" from the input graph to the graph having one
node only. You are not allowed to use copy-paste rules (see Section
2.5 Copy-and-Paste operations of the tutorial). You should not use
any other constraints than defined by constrainAdj
.
Report / Submission
The report must be short and precise (maximum 10 pages, English language). It must include the following:
- A small introduction.
- A section for the Formose cycle: you need to provide i.) a depiction of the four rules you designed (as provided by MØD), ii.) the rules in textual form, iii.) the derivation graph in which you found the autocatalytic solution, iv.) the solution to the query for autocatalysis including the objective value.
- A section for the Catalan game: you need to provide i.) a depiction of all the rules you designed (as provided by MØD) and (in contrast to the Formose cycle rules) the most important part: explain what the rational behind each of your designed rules is and, if necessary, why you chose to add any constraints on the rules. ii.) Motivate (or show) why your set of graph grammar rules can mimic a Catalan move. iii.) Discuss any problems that you might observe and discuss if a "flow" that you find corresponds to a sequence of moves in the Catalan game. iv.) What is the number of steps from the initial graph to the final graph for level 1? Which levels could you "solve" via the graph grammar approach?
- In case you work as a group: a paragraph on who contributed with what.
For submitting the report and the sources proceed as follows: create the following directory structure
assignment1/ assignment1/formose/ assignment1/catalan/ assignment1/report/
The formose and catalan directory are the directories that we
provided including the modifications that you made, the rules that you
defined, and so on. The report should be put in the corresponding
directory. The assignment1
directory should not contain more than 10MB
of data.
Change to the directory assignment1
and type aflever DM832 your@mail.com
.
Additionally, you shall hand in a printout of your report (Department secretaries office).
The strict deadline for submission (electronically) is 21. September, 11am. Note, that the second mandatory assignment will be announced earlier (probably end of week 37). The electronically submitted and the printout version need to be identical.
Frequently Asked Questions (FAQ)
- Are we allowed to work in groups?
- Can I install MØD on a non-IMADA terminal room machine
- But everything runs so slow! Help me.
In case you are a PhD student: No. In case you are not a PhD student: You are allowed to works in groups of max 2 people. Write the members of your group on the front page of your report. In the report you have to write who contributed with what. PhD students need to work on their own.
No.
The problem is that access to the NFS server (the file server that
stores your home directory) is at the time of writing very slow. This
has several implications: i.) loading the binary (more than 20MB)
takes some seconds. Running mod
a second time will be
much faster, as caching is used for the binaries. ii.) Any access to
read or to write files in your home directory is slow. I recommend to
work in the /tmp directory, this directory is on the
local hard disk. However, you should be careful, because a reboot will
delete this directory. This issue might change as a network
infrastructure change is scheduled in the very near furture.