Source

cyclops_mysql /

Filename Size Date modified Message
59 B
40 B
1.1 KB
1.5 KB
6.9 KB
4.1 KB
7.6 KB
6.0 KB
18.7 KB
3.1 KB
1.2 KB
48.6 KB
521 B
1.2 KB
175.5 KB
31.9 KB
23.4 KB
141.6 KB
312 B
48.8 KB
7.2 KB
                    cyclops_mysql version 1.0
         Cheminformatics extensions to MySQL based on OEChem

              Andrew Dalke <dalke@dalkescientific.com>
         Andrew Dalke Scientific AB, Gothenburg, Sweden

ADVERTISING:
  I make my living from consulting, custom software development, and
  training computational chemists in how to be more effective at the
  software side of what they do. If you are interested in my services,
  want improvements to cyclops_mysql, or are otherwise interested in
  giving me money, please do contact me.


This package extends MySQL with a set of user-defined functions for
doing chemical informatics tasks in the database, including
substructure searches and similarity comparisons. The implementation
uses OpenEye's OEChem for the chemistry.

This package is made freely available under the MIT license. For
details see the file "COPYING". For installation directions see the
file "INSTALL". For benchmarking information see "README.benchmark".

This implementation is quite fast, due in large part to OEChem. The
code overhead of brute-force MySQL extension function is noticible so
I've spent time optimizing common cases where one of the inputs
(SMILES or SMARTS) is a constant string. The resulting extension
handles about 10,000 SMARTS match tests per second on a modern
machine, which is perfect for small- to medium-sized data sets.

Many of the ideas for the implementation API are based on TJ
O'Donnell's book "Design and Use of Relational Databases in
Chemistry." While I have not followed the CHORD API exactly, my API is
in derived from the functions and parameters he uses. If you have no
experience with SQL and want to learn how to use it in chemistry
databases, get that book.

There is much more that can be done. For details see the file IDEAS.

If you have any questions, send them to dalke@dalkescientific.com. If
you are interested in support, bug-fixes, or new features, please note
that I am a consultant and available for hire.


This MySQL extension adds the following new SQL functions:

 * oe_version([module])

The possible values for the optional 'module' argument are:

 cyclops - return the version number of this MySQL extension.
    For this release it will be "1.0"

 oechem - returns a string in the form '2010-08-09 1.7.4'
    where the first term derives from OEChemGetVersion() and the
    rest of the string is from OEChemGetRelease()

 oegraphsim - returns a string of the form '2010-08-09 1.0.0'
    where the first term derives from OEGraphSimGetVersion() and
    the rest of the string is from OEGraphSimGetRelease()


 * oe_licensed([module])

If module is not specified then return 1 if all needed modules are
licensed. Otherwise, return 0.

If the module name is specified then return 1 if the license for that
module is valid, otherwise 0. The possible values for module are:
'cyclops', 'oechem' and 'oegraphsim'. Note that cyclops will return 1
as there is no license.

 * oe_license_date(module)

Returns a string in the format 'YYYY-MM-DD' which is the expiration
date for the given module. A example date is "2009-08-22". The
possible values for module are: 'cyclops', 'oechem' and
'oegraphsim'. Note that cyclops will return "9999-12-31" as there is
no license.

 * oe_valid_smiles(smiles)

Returns 1 if the input string is a valid SMILES string, otherwise 0.

 * oe_cansmiles(smiles)

Convert the input SMILES string into a canonical (non-isomeric)
SMILES. It is an error if the input is not a valid SMILES string. This
uses the OpenEye aromaticity model.

 * oe_isosmiles(smiles)

Convert the input SMILES string into a canonical isomeric SMILES. It
is an error if the input is not a valid SMILES string. This uses the
OpenEye aromaticity model.


 * oe_keksmiles(smiles)

Convert the input SMILES string into a Kekule (non-canonical) isomeric
SMILES. It is an error if the input is not a valid SMILES string. This
uses the OpenEye aromaticity model.

 * oe_matches(smiles, smarts)

Returns 1 if the SMARTS pattern is found at least once in the SMILES,
else returns 0.

 * oe_count_matches(smiles, smarts)

Returns the number of times the SMARTS pattern is found in the SMILES,
up to 1024 matches. This returns 0 if there are no matches.

 * oe_count_umatches(smiles, smarts)

Returns the number of times the SMARTS pattern is uniquely found in
the SMILES, up to 1024 matches. This returns 0 if there are no
matches.

A match is unique if it matches a given subset of atoms only one
time. For example, "CC" has two matches against "CCO" but only one
unique match.

 * oe_lingosim(str1, str2)

Compute the Lingo similarity between the two strings as a real
value. In most cases 'str1' and 'str2' will be canonical SMILES
strings although other names are possible.

The Lingo similarity is based on substring similarity. See Grant et
al., JCIM, 46(5):1912 2006 and Vidal, Thormann and Pons, JCIM,
45(2):386 2005. The results are comparable to fingerprint similarities
but don't require the intermediate fingerprint calculation and
storage.


 * oe_maccs_fp(smiles)

Return a hex string representation of the 166 bit MACCS key
fingerprint for the SMILES.

For example, oe_maccs_fp('Nc1ccccc1O') is
   "0000000000000000000084040000102405488283b3"

Cyclops fingerprints are a multiple of 8 bits long. The excess two
bits will be 0.

Cyclops fingerprints have the same byte and bit order as OpenEye's
OEFingerPrint.ToHexString() but omit the trailing buffer

 * oe_path_fp(smiles, num_bits=4096, min_bonds=0, max_bonds=5,
              atom_type=191, bond_type=3)

Return a OpenEye path fingerprint. The SMILES must be given but all
other fields are optional, except that if min_bonds is given then
max_bonds must also be given. Note that SQL does not have keyword
arguments, only positional ones.

The parameters are as defined for OEGraphSim's OEMakePathFP() except
that I prefer a different parameter naming scheme.

The atom_type and bond_type fields are bit-wise ORs of different
 flags. These are not listed in the OpenEye documentation so I list
 them here.

         AtomType               BondType
       =================     ================
       1 = AtomicNumber       1 = BondOrder
       2 = Aromaticity        2 = Chiral
       4 = Chiral             4 = InRing
       8 = FormalCharge
      16 = HvyDegree
      32 = Hybridization      3 = DefaultBond
      64 = InRing               = 1+2
     128 = EqHalogen
     256 = EqAromatic
     
     191 = DefaultAtom = 1+2+4+8+16+32+128 (omits 64)


 * fp_valid(fp)

Returns 1 if the fingerprint is a valid hex fingerprint (contains only
the digits 0-9 and the characters A-F and a-f), otherwise returns 0.

 * fp_contains(superstructure_fp, substructure_fp)

Test if the first fingerprint contains the second fingerprint, that
is, if every bit which is set in the second fingerprint is also on in
the first fingerprint. The fingerprints are encoded as hex strings.

Returns 1 if superstructure_fp contains substructure_fp, otherwise
returns 0. It is an error if either fingerprint string contains a
non-hex character.


 * fp_tanimoto(fp1, fp2)

Return the Tanimoto similarity between the two hex-encoded
fingerprints. The possible values range from 0.0 (not similar, or
neither fingerprint contains set bits) to 1.0 (identical).

It is an error if either string are not hex-encoded values.


>> A note about errors. Where possible I report errors during query
   initialization. These errors should be detailed enough to figure
   out the cause of the problem. Other errors can only be checked
   during row evaluation time. An example is "oe_cansmiles(smiles)"
   when the SMILES comes from another column in the database. These
   errors are reported as a NULL result.