Molecular descriptors
From DrugPedia: A Wikipedia for Drug discovery
Molecular descriptor is any molecular property to characterize the molecule to search through database, to calculate another molecular property etc.
"The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment."
Contents |
INTRODUCTION
Biological active substances interact, in most cases, with biomolecules, triggering specific molecular mechanisms like activation of an enzyme cascade or opening of an ion channel, which finally leads to a certain biological response. Quantitative structure-activity relationships (QSARs) correlate this response with molecular properties of compounds under interest. Because the response depends on the concentration of the active substance at the site of action and on the strength of interaction with the biological macromolecule, both of these aspects must be modeled quantitatively by QSAR. In the mean time, a variety of descriptors of molecular properties have been developed. Computational approaches to lipophilicity are nearly as diverse as the QSAR methods themselves. Electronic properties in terms of point charges or molecular electrostatic potentials can be evaluated by quantum-chemical ab initio methods for molecules up to 50 atoms. Using semiempirical methods like AM1 or PM3, such properties can be calculated even for larger systems. Steric descriptors reach from molecular surface and volume to connectivity and topological indices and to Verloop parameters.
The descriptors (independent variables) are correlated to the biological activity (dependent variable) by means of statistical methods. Most commonly multivariate linear regression (MLR) is used, but also partial least squares (PLS) or neural networks. In some QSAR approaches genetic algorithms are employed to identify the relevant descriptors: population of models are created and step by step, models with a better "fitness score" (i.e. with better predictivity) are produced by "genetic operations" like cross-over, point mutations or selection generate. In "classical" QSAR, both 2D (e.g. topological indices, indicator variables) and 3D (surface, volume, electronic properties) descriptors are correlated with the biological activity.
- Molecular descriptors are numerical values that characterize properties of molecules
- Examples:
o Physicochemical properties (empirical) o Values from algorithms, such as 2D fingerprints
- Vary in complexity of encoded information and in compute time
Type of Molecular Descriptor
Descriptors are partitioned into classes. Each class indicates what is assumed by the descriptor calculators about the molecule presented:
- 2D. 2D descriptors only use the atoms and connection information of the molecule for the calculation. 3D coordinates and individual conformations are not considered.
- i3D. Internal 3D descriptors use 3D coordinate information about each molecule; however, they are invariant to rotations and translations of the conformation.
- x3D. External 3D descriptors also use 3D coordinate information but also require an absolute frame of reference (e.g., molecules docked into the same receptor).
2D Molecular Descriptors
2D molecular descriptors are defined to be numerical properties that can be calculated from the connection table representation of a molecule (e.g., elements, formal charges and bonds, but not atomic coordinates). 2D descriptors are, therefore, not dependent on the conformation of a molecule and are most suitable for large database studies.
Notation and Terminology
Many descriptors make use of several fundamental quantities that can be computed from a chemical structure. This section will define these fundamental quantities. For purposes of illustration, the following chemical structure will be used:he fundamental quantities of a chemical structure depend solely on the structure as drawn, i.e., no modifications to the structure are implied with the exception of the addition or subtraction of hydrogen atoms to full valence.
Z denotes the atomic number of an atom; lone pair pseudo-atoms (LP) are given an atomic number of 0. Heavy atoms are atoms that have an atomic number strictly greater than 1 (not H nor LP). A trivial atom is an LP pseudo-atom or a hydrogen with exactly one heavy neighbor. In the reference structure, H1, LP1 and LP2 are trivial.
The hydrogen count, h, of an atom is the number of hydrogens to which it is (or should be) attached. This count includes all hydrogen atoms that are necessary to fill valence. In the reference structure, F has h = 0, N has h = 1 and O1 has h = 1.
The heavy degree, d, of an atom is the number of heavy atoms to which it is bonded. That is, d is the number of bonded neighbors of the atom in the hydrogen suppressed graph. In the reference structure, F has d = 1, C6 has d = 3 and N has d = 2.
Pharmacophore Feature Descriptors
The Pharmacophore Atom Type descriptors consider only the heavy atoms of a molecule and assign a type to each atom (using a rule-based system). That is, hydrogens are suppressed during the calculation. The feature set is Donor, Acceptor, Polar (both Donor and Acceptor), Positive (base), Negative (acid), Hydrophobe and Other. Assignments may take into account implied protonation, deprotonation, keto/enol considerations and tautomerism at a biologically relevant pH. For example, -COOH will be typed in its deprotonated form regardless of how the structure is stored.
Partial Charge Descriptors
Descriptors that depend on the partial charge of each atom of a chemical structure require calculation of those partial charges. An unfortunate complication is the fact that there are numerous methods of calculating partial charges. Rather than enforce a particular method, MOE provides several versions of most of the charge-dependent descriptors. The only difference between these variants is the source of the partial charges. The following variants are supported: PEOE, Q (described below).
PEOE. The Partial Equalization of Orbital Electronegativities (PEOE) method of calculating atomic partial charges [Gasteiger 1980] is a method in which charge is transferred between bonded atoms until equilibrium. To guarantee convergence, the amount of charge transferred at each iteration is damped with an exponentially decreasing scale factor.
3D Molecular Descriptors
There are two types of 3D molecular descriptors: those that depend on internal coordinates only and those that depend on absolute orientation. 3D molecular descriptors are classified as "i3D" for internal coordinate dependent 3D and "x3D" for external coordinate dependent. A good example is the dipole moment: the magnitude of the dipole moment does not depend on absolute orientation in space; however, the x component of the dipole moment does depend on absolute orientation.
Descriptors for Large Data Sets
- Descriptors representing properties of complete molecules
o Examples: LogP, Molar Refractivity
- Descriptors calculated from 2D graphs
o Examples: Topological Indexes, 2D fingerprints
- Descriptors requiring 3D representations
Example: Pharmacophore descriptors
DESCRIPTORS CALCULATED FROM 2D STRUCTURES
- Simple counts of features
o Lipinski Rule of Five (H bonds, MW, etc.) o Number of ring systems o Number of rotatable bonds
- Not likely to discriminate sufficiently when used alone
- Combined with other descriptors for best effect
Physicochemical Properties
- Hydrophobicity
o LogP – the logarithm of the partition coefficient between n-octanol and water
- ClogP (Leo and Hansch) – based on small set of values from a small set of simple molecules