Bitbucket is a code hosting site with unlimited public and private repositories. We're also free for small teams!

Close
Ragel definitions for the OpenSMILES grammar [1]

This code is released under the "new-style BSD license".
For details see the file "LICENSE"

This distribution include two example programs. Please note that the
programs only work with the token stream and do not check for balanced
parenthesis, hanging ring closures, incorrect aromaticity, etc.

1) smiles_count - print atom and bond count information

Print the number of explicit atoms and bonds in a SMILES string. (This
does not include implicit hydrogen bond counts. Support for that would
not be hard. Perhaps a future version.)

% smiles_counts 'O2C=2NP2.O=2.[U]'
atoms: 6 bonds: 5

Use the '--repeat <N>' option to make it repeat the count N time. This
is useful for timings.

% time smiles_counts 'C1=CC=C(C=C1)C2=NC(C(=O)NC3=C2C=C(C=C3)Cl)C(=O)O.[OH-].[K+].[K+]'
atoms: 25 bonds: 24
0.000u 0.001s 0:00.00 0.0%	0+0k 0+0io 0pf+0w

% time smiles_counts --repeat 10000000 'C1=CC=C(C=C1)C2=NC(C(=O)NC3=C2C=C(C=C3)Cl)C(=O)O.[OH-].[K+].[K+]'
atoms: 25 bonds: 24
2.386u 0.008s 0:02.43 97.9%	0+0k 0+0io 0pf+0w

Yes, that's counting 4,200,000 SMILES per second.


If you do not list any command-line arguments then it reads SMILES from stdin:

% smiles_counts < tests/manual_counts.smi
atoms: 25 bonds: 31
atoms: 22 bonds: 24
atoms: 24 bonds: 27
atoms: 21 bonds: 24
atoms: 24 bonds: 27
atoms: 21 bonds: 25
atoms: 27 bonds: 31
 ...




2) smiles_terms - print a higher-level view of the SMILES token stream

For example:

% smiles_terms 'N#C[16O](C)[Na--]'
organic 0 element 7 (N)
triple bond (#)
organic 1 element 6 (C)
implicit bond
atom 2 isotope 16 element 8 (O) chiral (none) 0 hcount 0 charge 0 class 0
open branch '('
implicit bond
organic 3 element 6 (C)
close branch ')'
implicit bond
atom 4 isotope 0 element 11 (Na) chiral (none) 0 hcount 0 charge -2 class 0
end molecule

It will also print error information:

% smiles_terms '[N-+]'
Syntax error! Expecting charge digit, another '-', atom class, or ']' at position 3.


Without a command-line option it will process from stdin.

% printf "C\nC(O)[C-]\n" | ./smiles_terms 
organic 0 element 6 (C)
end molecule
organic 0 element 6 (C)
open branch '('
implicit bond
organic 1 element 8 (O)
close branch ')'
implicit bond
atom 2 isotope 0 element 6 (C) chiral (none) 0 hcount 0 charge -1 class 0
end molecule




More details will come, given time (and money if you have some you
want to give me!)




[1] I made some modifications to the OpenSMILES grammar:

  - support the empty string as valid SMILES for an empty molecule
  - support for [b] (omission in the OpenSMILES spec)
  - only support for @TB1 ... @TB20 (.. @TB30 is an error in the spec)
  - support for '0' as the first digit of two digit chiral indicators,
       that is, I support both @OH01 and @OH1.

Recent activity

dalke

Commits by dalke were pushed to dalke/opensmiles-ragel

c1c6742 - renamed "count_atoms" to "smiles_counts" Made it so I can run the tests as make targets
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.