SMILES Simplified Molecular Input Line Entry System SMILES

  • Slides: 21
Download presentation
SMILES • Simplified Molecular Input Line Entry System (SMILES) • Widely used AND computationally

SMILES • Simplified Molecular Input Line Entry System (SMILES) • Widely used AND computationally efficient • Uses atomic symbols and a set of intuitive rules • Uses hydrogen-suppressed molecular graphs (HSMG)

SMILES Bonds SINGLE* - DOUBLE = TRIPLE # AROMATIC* * can be omitted :

SMILES Bonds SINGLE* - DOUBLE = TRIPLE # AROMATIC* * can be omitted :

Butanols 2 -Butanol iso-Butanol tert-Butanol

Butanols 2 -Butanol iso-Butanol tert-Butanol

SMILES Branches • Represented by enclosure in parentheses • Can be nested or stacked

SMILES Branches • Represented by enclosure in parentheses • Can be nested or stacked • Examples: CC(O)CC is 2 -Butanol OCC(C)C is iso-Butanol OC(C)(C)C is tert-Butanol

SMILES Bonds Ethene Chloroethene 1, 1 -Dichloroethene cis-1, 2 -Dichloroethene Trichloroethene Perchloroethene C=C Cl.

SMILES Bonds Ethene Chloroethene 1, 1 -Dichloroethene cis-1, 2 -Dichloroethene Trichloroethene Perchloroethene C=C Cl. C(Cl)=C Cl. C=CCl Cl. C(Cl)=C(Cl)Cl

SMILES Atoms • Use normal chemical symbols • Add punctuation symbols if necessary •

SMILES Atoms • Use normal chemical symbols • Add punctuation symbols if necessary • No super- or subscripts

SMILES Symbols • String of alphanumeric characters and certain punctuation symbols • Terminates at

SMILES Symbols • String of alphanumeric characters and certain punctuation symbols • Terminates at the first space encountered when read left to right • The ORGANIC SUBSET: B, C, N, O, P, S, F, Cl, Br, I

Other SMILES Atoms • Aliphatic or nonaromatic carbon: C • Atom in aromatic ring:

Other SMILES Atoms • Aliphatic or nonaromatic carbon: C • Atom in aromatic ring: lowercase letter • Designate ring closure with pairs of matching digits, e. g. c 1 ccccc 1 (or C 1=CC=CC=C 1) is Benzene, whereas C 1 CCCCC 1 is Cyclohexane

SMILES Charges • Specify attached hydrogens and charges in square brackets • Number of

SMILES Charges • Specify attached hydrogens and charges in square brackets • Number of attached hydrogens is the symbol H followed by optional digit

SMILES Charges [H+] [OH-] [OH 3+] [Fe++] [NH 4+] proton hydroxyl anion hydronium cation

SMILES Charges [H+] [OH-] [OH 3+] [Fe++] [NH 4+] proton hydroxyl anion hydronium cation iron(II) cation ammonium cation

SMILES Cyclic Structures • Break one single or one aromatic bond in each ring

SMILES Cyclic Structures • Break one single or one aromatic bond in each ring • Number in any order – Designate ring-breaking atoms by the same digit following the atomic symbol

Cyclic Structures • Numbers indicate start and stop of ring • Same number indicates

Cyclic Structures • Numbers indicate start and stop of ring • Same number indicates start and end of the ring, entered immediately following the start/end atoms • Only numbers 1 – 9 are used • A number should appear only twice • Atom can be associated w. 2 consecutive numbers, e. g. , Napthalene: c 12 ccccc 1 cccc 2

Naphthalene c 12 ccccc 1 cccc 2

Naphthalene c 12 ccccc 1 cccc 2

SMILES Conventions • Avoid two consecutive left parentheses if possible • Strive for the

SMILES Conventions • Avoid two consecutive left parentheses if possible • Strive for the fewest number of possible branches • Tautomeric bonds are not designated; enter the appropriate form

Further Restrictions • A branch cannot begin a SMILES notation • A branch cannot

Further Restrictions • A branch cannot begin a SMILES notation • A branch cannot immediately follow a double- or triple-bond symbol • Example: C=(CC)C is invalid, but • C(=CC)C or C(CC)=C are valid SMILES

SMILES Fragments • • Nitro Nitrate Nitrite Sulfonic acid Cyanide/Nitrile Azido • • N(=O)(=O)

SMILES Fragments • • Nitro Nitrate Nitrite Sulfonic acid Cyanide/Nitrile Azido • • N(=O)(=O) ON(=O) S(=O)O C#N N=N#N N+=N-

SMILES Metals [Al] [As] [Au] [Be] [Bi] [Cd] [Ca] [Fe] [Hg] [K] [Li] [Mg]

SMILES Metals [Al] [As] [Au] [Be] [Bi] [Cd] [Ca] [Fe] [Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb] [Sn] [Zr]

Disconnected Structures • Indicated by a dot • Tetramethyl ammonium bromide C[N+]C(C)C. [Br-]

Disconnected Structures • Indicated by a dot • Tetramethyl ammonium bromide C[N+]C(C)C. [Br-]

Isomeric and Chiral SMILES • Isomeric configuration indicated by forward and backward slashes: /

Isomeric and Chiral SMILES • Isomeric configuration indicated by forward and backward slashes: / • Examples: – trans-1, 2 -dibromoethene: Br/C=C/Br • Direction of the slash continues – cis-1, 2 -dibromoethene: Br/C=CBr • Direction of the slash reverses • Chirality indicated by the “@” symbol

Some Applications • JMDraw/SMILESViewer (Christoph Steinbeck) • JME Molecular Editor (Peter Ertl) • STN

Some Applications • JMDraw/SMILESViewer (Christoph Steinbeck) • JME Molecular Editor (Peter Ertl) • STN Express (SMILES as output) • Tripos (dbtranslate: SMILES to MOL) • Marvin (Ferenc Csizmadia) http: //chemaxon. com/marvin/ • CACTVS http: //www 2. ccc. uni-erlangen. de/cactvs/

Another Application • SMILESCAS Database http: //www. syrres. com/esc/smilecas. htm Over 103, 000 SMILES

Another Application • SMILESCAS Database http: //www. syrres. com/esc/smilecas. htm Over 103, 000 SMILES notations • Input CAS Registry Number • Leads to SMILES and thence to a structure search