Commits

Andrew Dalke committed eb89be6

Added information about the new threshold option.

Comments (0)

Files changed (1)

 usage: fmcs [-h] [--maximize {atoms,bonds}] [--min-num-atoms INT]
             [--compare {topology,elements,types}]
             [--atom-compare {any,elements,isotopes}]
-            [--bond-compare {any,bondtypes}] [--atom-class-tag TAG]
-            [--ring-matches-ring-only] [--complete-rings-only]
-            [--select SELECT] [--timeout SECONDS] [--output FILENAME]
+            [--bond-compare {any,bondtypes}] [--threshold THRESHOLD]
+            [--atom-class-tag TAG] [--ring-matches-ring-only]
+            [--complete-rings-only] [--select SELECT] [--timeout SECONDS]
+            [--output FILENAME]
             [--output-format {smarts,fragment-smiles,fragment-sdf,complete-sdf}]
             [--output-all] [--save-atom-class-tag TAG]
             [--save-counts-tag TAG] [--save-atom-indices-tag TAG]
             [--save-smarts-tag TAG] [--save-smiles-tag TAG] [--times] [-v]
             [--version]
             filename
-
 Find the maximum common substructure of a set of structures
 
 positional arguments:
                         bond matches every other bond. With 'bondtypes', bonds
                         are the same only if their bond types are the same.
                         (Default: bondtypes)
+  --threshold THRESHOLD
+                        Minimum structure match threshold. A value of 1.0
+                        means that the common substructure must be in all of
+                        the input structures. A value of 0.8 finds the largest
+                        substructure which is common to at least 80% of the
+                        input structures. (Default: 1.0)
   --atom-class-tag TAG  Use atom class assignments from the field 'TAG'. The
                         tag data must contain a space separated list of
                         integers in the range 1-10000, one for each atom.
 would be.
 
 
+Find the threshold MCS
+----------------------
+
+The normal MCS algorithm finds the largest substructure which is
+common to all of the input structures. The 'threshold MCS' finds the
+largest subgraph which is common to some percentage of the input
+structure. This is specified with the --threshold option, which must
+be a value between 0.0 and 1.0.
+
+% fmcs sample_files/cox2.sdf
+[#6]:1:[#6]:[#6]:[#6](-[#7]:2:[#6]:[#6]:[#7]:[#6]:2-[#6]:2:[#6]:[#6]:[#6]:[#6]:[#6]:2):[#6]:[#6]:1 17 atoms 19 bonds (complete search)
+% fmcs sample_files/cox2.sdf --threshold 0.95
+[#6]-[#6]:1:[#6]:[#7](:[#6](-[#6]:2:[#6]:[#6]:[#6]:[#6]:[#6]:2):[#7]:1)-[#6]:1:[#6]:[#6]:[#6]:[#6]:[#6]:1 18 atoms 20 bonds (complete search)
+% fmcs sample_files/cox2.sdf --threshold 0.90
+[#6]-[#6]:1:[#6]:[#7](:[#6](-[#6]:2:[#6]:[#6]:[#6]:[#6]:[#6]:2):[#7]:1)-[#6]:1:[#6]:[#6]:[#6](-[#16](=[#8])=[#8]):[#6]:[#6]:1 21 atoms 23 bonds (complete search)
+
+This is useful when looking at similarity clustering based on
+approximate methods like fingerprints. These tend to group structural
+neighbors together, but occasionally include compounds which aren't
+structurally related. You might use the threshold option to find the
+MCS which is in at least 90% of the structures, in hopes of ignoring
+the odd entries. I conjecture that you could look at the MCS sizes at
+different levels of threshold to determine overall group similarity.
+
+
 Generating fragment SMILES
 --------------------------