12.1. Basics, Graphs, and Rules

12.1.1. Hello World

These examples use the Python 3 interface for the software. After each run a PDF summary is compiled. The content can be specified via the Python script.

Explore in the playground.

1 # Normal printing to the terminal:
2 print("Hello world")
3 # Make some headers in the summary:
4 postChapter("Hello")
5 postSection("World")
6 # Load a moleucle from a SMILES string:
7 mol = smiles("Cn1cnc2c1c(=O)n(c(=O)n2C)C", name="Caffeine")
8 # Put a visualisation of the molecule in the summary:
9 mol.print()

12.1.2. Graph Loading

Molecules are encoded as attributed graphs. They can be loaded from SMILES strings, and in general any graph can be loaded from a GML specification, or from the SMILES-like format GraphDFS.

Explore in the playground.

 1 # Load a graph from a SMILES string (only for molecule graphs):
 2 ethanol1 = smiles("CCO", name="Ethanol1")
 3 # Load a graph from a SMILES-like format, called "GraphDFS", but for general graphs:
 4 ethanol2 = graphDFS("[C]([H])([H])([H])[C]([H])([H])[O][H]", name="Ethanol2")
 5 # The GraphDFS format also supports implicit hydrogens:
 6 ethanol3 = graphDFS("CCO", name="Ethanol3")
 7 # The basic graph format is GML:
 8 ethanol4 = graphGMLString("""graph [
 9    node [ id 0 label "C" ]
10    node [ id 1 label "C" ]
11    node [ id 2 label "O" ]
12    node [ id 3 label "H" ]
13    node [ id 4 label "H" ]
14    node [ id 5 label "H" ]
15    node [ id 6 label "H" ]
16    node [ id 7 label "H" ]
17    node [ id 8 label "H" ]
18    edge [ source 1 target 0 label "-" ]
19    edge [ source 2 target 1 label "-" ]
20    edge [ source 3 target 0 label "-" ]
21    edge [ source 4 target 0 label "-" ]
22    edge [ source 5 target 0 label "-" ]
23    edge [ source 6 target 1 label "-" ]
24    edge [ source 7 target 1 label "-" ]
25    edge [ source 8 target 2 label "-" ]
26 ]""", name="Ethanol4")
27 # They really are all loading the same graph into different objects:
28 assert ethanol1.isomorphism(ethanol2) == 1
29 assert ethanol1.isomorphism(ethanol3) == 1
30 assert ethanol1.isomorphism(ethanol4) == 1
31 # and they can be visualised:
32 ethanol1.print()
33 # All loaded graphs are added to a list 'inputGraphs':
34 for g in inputGraphs:
35    g.print()

12.1.3. Printing Graphs/Molecules

The visualisation of graphs can be “prettified” using special printing options. The changes can make the graphs look like normal molecule visualisations.

Explore in the playground.

 1 # Our test graph, representing the molecule caffeine:
 2 g = smiles('Cn1cnc2c1c(=O)n(c(=O)n2C)C')
 3 # ;ake an object to hold our settings:
 4 p = GraphPrinter()
 5 # First try visualising without any prettifications:
 6 p.disableAll()
 7 g.print(p)
 8 # Now make chemical edges look like bonds, and put colour on atoms.
 9 # Also put the "charge" part of vertex labels in superscript:
10 p.edgesAsBonds = True
11 p.raiseCharges=True
12 p.withColour = True
13 g.print(p)
14 # We can also "collapse" normal hydrogen atoms into the neighbours,
15 # and just show a count:
16 p.collapseHydrogens = True
17 g.print(p)
18 # And finally we can make "internal" carbon atoms simple lines:
19 p.simpleCarbons = True
20 g.print(p)
21 # There are also options for adding indices to the vertices,
22 # and modify the rendering of labels and edges:
23 p2 = GraphPrinter()
24 p2.disableAll()
25 p2.withTexttt = True
26 p2.thick = True
27 p2.withIndex = True
28 # We can actually print two different versions at the same time:
29 g.print(p2, p)

12.1.4. Graph Interface

Graph objects have a full interface to access individual vertices and edges. The attributes of vertices and edges can be accessed both in their raw string form, and as their chemical counterpart (if they have one).

Explore in the playground.

 1 g = graphDFS("[R]{x}C([O-])CC=O")
 2 print("|V| =", g.numVertices)
 3 print("|E| =", g.numEdges)
 4 for v in g.vertices:
 5    print("v%d: label='%s'" % (v.id, v.stringLabel), end="")
 6    print("\tas molecule: atomId=%d, charge=%d" % (v.atomId, v.charge), end="")
 7    print("\tis oxygen?", v.atomId == AtomIds.Oxygen)
 8    print("\td(v) =", v.degree)
 9    for e in v.incidentEdges:
10       print("\tneighbour:", e.target.id)
11 for e in g.edges:
12    print("(v%d, v%d): label='%s'" % (e.source.id, e.target.id, e.stringLabel), end="")
13    try:
14       bt = str(e.bondType)
15    except LogicError:
16       bt = "Invalid"
17    print("\tas molecule: bondType=%s" % bt, end="")
18    print("\tis double bond?", e.bondType == BondType.Double)

12.1.5. Graph Morphisms

Graph objects have methods for finding morphisms with the VF2 algorithms for isomorphism and monomorphism. We can therefore easily detect isomorphic graphs, count automorphisms, and search for substructures.

Explore in the playground.

 1 mol1 = smiles("CC(C)CO")
 2 mol2 = smiles("C(CC)CO")
 3 # Check if there is just one isomorphism between the graphs:
 4 isomorphic = mol1.isomorphism(mol2) == 1
 5 print("Isomorphic?", isomorphic)
 6 # Find the number of automorphisms in the graph,
 7 # by explicitly enumerating all of them:
 8 numAutomorphisms = mol1.isomorphism(mol1, maxNumMatches=2**30)
 9 print("|Aut(G)| =", numAutomorphisms)
10 # Let's count the number of methyl groups:
11 methyl = smiles("[CH3]")
12 # The symmetry of the group it self should not be counted,
13 # so find the size of the automorphism group of methyl.
14 numAutMethyl = methyl.isomorphism(methyl, maxNumMatches=2**30)
15 print("|Aut(methyl)|", numAutMethyl)
16 # Now find the number of methyl matches,
17 numMono = methyl.monomorphism(mol1, maxNumMatches=2**30)
18 print("#monomorphisms =", numMono)
19 # and divide by the symmetries of methyl.
20 print("#methyl groups =", numMono / numAutMethyl)

12.1.6. Rule Loading

Rules must be specified in GML format.

Explore in the playground.

 1 # A rule (L <- K -> R) is specified by three graph fragments:
 2 # left, context, and right
 3 destroyVertex = ruleGMLString("""rule [
 4    left [
 5       node [ id 1 label "A" ]
 6    ]
 7 ]""")
 8 createVertex = ruleGMLString("""rule [
 9    right [
10       node [ id 1 label "A" ]
11    ]
12 ]""")
13 identity = ruleGMLString("""rule [
14    context [
15       node [ id 1 label "A" ]
16    ]
17 ]""")
18 # A vertex/edge can change label:
19 labelChange = ruleGMLString("""rule [
20    left [
21       node [ id 1 label "A" ]
22       edge [ source 1 target 2 label "A" ]
23    ]
24    # GML can have Python-style line comments too
25    context [
26       node [ id 2 label "Q" ]
27    ]
28    right [
29       node [ id 1 label "B" ]
30       edge [ source 1 target 2 label "B" ]
31    ]
32 ]""")
33 # A chemical rule should probably not destroy and create vertices:
34 ketoEnol = ruleGMLString("""rule [
35    left [
36       edge [ source 1 target 4 label "-" ]
37       edge [ source 1 target 2 label "-" ]
38       edge [ source 2 target 3 label "=" ]
39    ]   
40    context [
41       node [ id 1 label "C" ]
42       node [ id 2 label "C" ]
43       node [ id 3 label "O" ]
44       node [ id 4 label "H" ]
45    ]   
46    right [
47       edge [ source 1 target 2 label "=" ]
48       edge [ source 2 target 3 label "-" ]
49       edge [ source 3 target 4 label "-" ]
50    ]   
51 ]""")
52 # Rules can be printed, but label changing edges are not visualised in K:
53 ketoEnol.print()
54 # Add with custom options, like graphs:
55 p1 = GraphPrinter()
56 p2 = GraphPrinter()
57 p1.disableAll()
58 p1.withTexttt = True
59 p1.withIndex = True
60 p2.setReactionDefault()
61 for p in inputRules:
62    p.print(p1, p2)
63 # Be careful with printing options and non-existing implicit hydrogens:
64 p1.disableAll()
65 p1.edgesAsBonds = True
66 p2.setReactionDefault()
67 p2.simpleCarbons = True # !!
68 ketoEnol.print(p1, p2)

12.1.7. Rule Morphisms

Rule objects, like graph objects, have methods for finding morphisms with the VF2 algorithms for isomorphism and monomorphism. We can therefore easily detect isomorphic rules, and decide if one rule is at least as specific/general as another.

Explore in the playground.

 1 # A rule with no extra context:
 2 small = ruleGMLString("""rule [
 3    ruleID "Small"
 4    left [
 5       node [ id 1 label "H" ]
 6       node [ id 2 label "O" ]
 7       edge [ source 1 target 2 label "-" ]
 8    ]
 9    right [
10       node [ id 1 label "H+" ]
11       node [ id 2 label "O-" ]
12    ]
13 ]""")
14 # The same rule, with a bit of context:
15 large = ruleGMLString("""rule [
16    ruleID "Large"
17    left [
18       node [ id 1 label "H" ]
19       node [ id 2 label "O" ]
20       edge [ source 1 target 2 label "-" ]
21    ]
22    context [
23       node [ id 3 label "C" ]
24       edge [ source 2 target 3 label "-" ]
25    ]
26    right [
27       node [ id 1 label "H+" ]
28       node [ id 2 label "O-" ]
29    ]
30 ]""")
31 isomorphic = small.isomorphism(large) == 1
32 print("Isomorphic?", isomorphic)
33 atLeastAsGeneral = small.monomorphism(large) == 1
34 print("At least as general?", atLeastAsGeneral)

12.1.8. Formose Grammar

The graph grammar modelling the formose chemistry.

Explore in the playground.

 1 formaldehyde = smiles("C=O", name="Formaldehyde")
 2 glycolaldehyde = smiles( "OCC=O", name="Glycolaldehyde")
 3 ketoEnolGML = """rule [
 4    ruleID "Keto-enol isomerization" 
 5    left [
 6       edge [ source 1 target 4 label "-" ]
 7       edge [ source 1 target 2 label "-" ]
 8       edge [ source 2 target 3 label "=" ]
 9    ]   
10    context [
11       node [ id 1 label "C" ]
12       node [ id 2 label "C" ]
13       node [ id 3 label "O" ]
14       node [ id 4 label "H" ]
15    ]   
16    right [
17       edge [ source 1 target 2 label "=" ]
18       edge [ source 2 target 3 label "-" ]
19       edge [ source 3 target 4 label "-" ]
20    ]   
21 ]"""
22 ketoEnol_F = ruleGMLString(ketoEnolGML)
23 ketoEnol_B = ruleGMLString(ketoEnolGML, invert=True)
24 aldolAddGML = """rule [
25    ruleID "Aldol Addition"
26    left [
27       edge [ source 1 target 2 label "=" ]
28       edge [ source 2 target 3 label "-" ]
29       edge [ source 3 target 4 label "-" ]
30       edge [ source 5 target 6 label "=" ]
31    ]
32    context [
33       node [ id 1 label "C" ]
34       node [ id 2 label "C" ]
35       node [ id 3 label "O" ]
36       node [ id 4 label "H" ]
37       node [ id 5 label "O" ]
38       node [ id 6 label "C" ]
39    ]
40    right [
41       edge [ source 1 target 2 label "-" ]
42       edge [ source 2 target 3 label "=" ]
43       edge [ source 5 target 6 label "-" ]
44 
45       edge [ source 4 target 5 label "-" ]
46       edge [ source 6 target 1 label "-" ]
47    ]
48 ]"""
49 aldolAdd_F = ruleGMLString(aldolAddGML)
50 aldolAdd_B = ruleGMLString(aldolAddGML, invert=True)

12.1.9. Including Files

We can include other files (a la C/C++) to seperate functionality.

Explore in the playground.

1 include("0050_formoseGrammar.py")
2 postSection("Input Graphs")
3 for a in inputGraphs:
4    a.print()
5 postSection("Input Rules")
6 for a in inputRules:
7    a.print()