A molecular similarity-based approximation to fast methodologies for developing QSAR models is presented in this paper with the aim of evaluating reliability of different similarity proposals. Inhibition of carbonic anhydrase presented by a dataset of 47 sulfonamides was the chemical framework employed to build QSAR models. A view of different chemical isomorphism extraction methods, similarity metrics, and structural difference-based approaches is given in order to provide pharmacological chemists with fast modeling and predictive tools. Recent isomorphism detection approaches which account for dissimilarity between border atoms and bonds were also employed to analyze differences involved in scaffold substitution positions. Thus, the compounds characterization attending to several chemical criteria has been studied by principal component analysis and prediction of the inhibitory activity was carried out by partial least squares regression processes. Fingerprint-based and approximate similarities gave the best statistical results (Q2>0.70). The latter demonstrated to be useful to detect anomalous behavior presented by very similar molecules which show very different activities. External validations (test r2 ≈ 0.90) were also carried out in order to evaluate predictions obtained for molecules which have not participated in the fitting stage.
Description and application of similarity-based methods for fast and simple QSAR model development
QSAR Comb. Sci. 2008, 27, 457-468.