List of Contributors |
|
xv | |
Preface |
|
xvii | |
About the Companion Website |
|
xix | |
Part 1 Chemical Databases |
|
1 | (82) |
|
|
3 | (34) |
|
Gilles Marcou and Alexandre Varnek |
|
|
|
3 | (4) |
|
|
5 | (2) |
|
Step-by-Step Instructions |
|
|
7 | (27) |
|
|
34 | (2) |
|
|
36 | (1) |
|
2 Relational Chemical Databases: Creation, Management, and Usage |
|
|
37 | (30) |
|
|
|
|
37 | (4) |
|
Step-by-Step Instructions |
|
|
41 | (24) |
|
|
65 | (1) |
|
|
65 | (2) |
|
3 Handling of Markush Structures |
|
|
67 | (8) |
|
|
|
|
|
67 | (1) |
|
Step-by-Step Instructions |
|
|
68 | (5) |
|
|
73 | (1) |
|
|
73 | (2) |
|
4 Processing of SMILES, InChl, and Hashed Fingerprints |
|
|
75 | (8) |
|
Joao Montargil Aires de Sousa |
|
|
|
75 | (1) |
|
|
76 | (2) |
|
Step-by-Step Instructions |
|
|
78 | (2) |
|
|
80 | (1) |
|
|
81 | (2) |
Part 2 Library Design |
|
83 | (20) |
|
5 Design of Diverse and Focused Compound Libraries |
|
|
85 | (18) |
|
Antonio de la Vega de Leon |
|
|
|
|
|
|
85 | (2) |
|
|
86 | (1) |
|
|
86 | (1) |
|
Compound Library Creation |
|
|
87 | (3) |
|
Compound Library Analysis |
|
|
90 | (5) |
|
Normalization of Descriptor Values |
|
|
91 | (1) |
|
Visualizing Descriptor Distributions |
|
|
92 | (2) |
|
Decorrelation and Dimension Reduction |
|
|
94 | (1) |
|
Partitioning and Diverse Subset Calculation |
|
|
95 | (3) |
|
|
95 | (2) |
|
|
97 | (1) |
|
|
98 | (3) |
|
Combinatorial Enumeration of Compounds |
|
|
98 | (1) |
|
Retrosynthetic Approaches to Library Design |
|
|
99 | (2) |
|
|
101 | (2) |
Part 3 Data Analysis and Visualization |
|
103 | (24) |
|
6 Hierarchical Clustering in R |
|
|
105 | (14) |
|
|
|
|
105 | (1) |
|
|
106 | (1) |
|
|
107 | (1) |
|
Hierarchical Clustering Using Fingerprints |
|
|
108 | (3) |
|
Hierarchical Clustering Using Descriptors |
|
|
111 | (2) |
|
Visualization of the Data Sets |
|
|
113 | (3) |
|
Alternative Clustering Methods |
|
|
116 | (1) |
|
|
117 | (1) |
|
|
118 | (1) |
|
7 Data Visualization and Analysis Using Kohonen Self-Organizing Maps |
|
|
119 | (8) |
|
Jodo Montargil Aires de Sousa |
|
|
|
119 | (1) |
|
|
120 | (1) |
|
|
121 | (5) |
|
|
126 | (1) |
|
|
126 | (1) |
Part 4 Obtaining and Validation QSAR/QSPR Models |
|
127 | (114) |
|
8 Descriptors Generation Using the CDK Toolkit and Web Services |
|
|
129 | (6) |
|
Joao Montargil Aires de Sousa |
|
|
|
129 | (1) |
|
|
130 | (1) |
|
Step-by-Step Instructions |
|
|
131 | (2) |
|
|
133 | (1) |
|
|
134 | (1) |
|
9 QSPR Models on Fragment Descriptors |
|
|
135 | (28) |
|
|
|
|
135 | (1) |
|
|
136 | (25) |
|
|
137 | (2) |
|
Data Split Into Training and Test Sets |
|
|
139 | (1) |
|
Substructure Molecular Fragment (SMF) Descriptors |
|
|
139 | (3) |
|
|
142 | (1) |
|
Forward and Backward Stepwise Variable Selection |
|
|
142 | (1) |
|
Parameters of Internal Model Validation |
|
|
143 | (1) |
|
Applicability Domain (AD) of the Model |
|
|
143 | (1) |
|
Storage and Retrieval Modeling Results |
|
|
144 | (1) |
|
Analysis of Modeling Results |
|
|
144 | (4) |
|
Root-Mean Squared Error (RMSE) Estimation |
|
|
148 | (3) |
|
|
151 | (1) |
|
Analysis of n-Fold Cross-Validation Results |
|
|
151 | (2) |
|
Loading Structure-Data File |
|
|
153 | (1) |
|
Descriptors and Fitting Equation |
|
|
154 | (1) |
|
|
155 | (1) |
|
|
155 | (1) |
|
Model Applicability Domain |
|
|
155 | (1) |
|
n-Fold External Cross-Validation |
|
|
155 | (1) |
|
Saving and Loading of the Consensus Modeling Results |
|
|
155 | (1) |
|
Statistical Parameters of the Consensus Model |
|
|
156 | (1) |
|
Consensus Model Performance as a Function of Individual Models Acceptance Threshold |
|
|
157 | (1) |
|
Building Consensus Model on the Entire Data Set |
|
|
158 | (1) |
|
|
159 | (1) |
|
Loading Selected Models and Choosing their Applicability Domain |
|
|
160 | (1) |
|
Reporting Predicted Values |
|
|
160 | (1) |
|
Analysis of the Fragments Contributions |
|
|
161 | (1) |
|
|
161 | (2) |
|
10 Cross-Validation and the Variable Selection Bias |
|
|
163 | (12) |
|
|
|
|
|
|
163 | (2) |
|
Step-by-Step Instructions |
|
|
165 | (7) |
|
|
172 | (1) |
|
|
173 | (2) |
|
|
175 | (18) |
|
|
|
|
|
|
176 | (2) |
|
|
178 | (2) |
|
Step-by-Step Instructions |
|
|
180 | (11) |
|
|
191 | (1) |
|
|
192 | (1) |
|
|
193 | (16) |
|
|
|
|
|
|
194 | (3) |
|
Step-by-Step Instructions |
|
|
197 | (10) |
|
|
207 | (1) |
|
|
208 | (1) |
|
13 Benchmarking Machine-Learning Methods |
|
|
209 | (14) |
|
|
|
|
|
|
209 | (1) |
|
Step-by-Step Instructions |
|
|
210 | (12) |
|
|
222 | (1) |
|
|
222 | (1) |
|
14 Compound Classification Using the scikit-learn Library |
|
|
223 | (18) |
|
|
|
|
|
224 | (1) |
|
|
225 | (5) |
|
Step-by-Step Instructions |
|
|
230 | (8) |
|
|
230 | (1) |
|
|
231 | (3) |
|
|
234 | (3) |
|
|
237 | (1) |
|
|
238 | (1) |
|
|
239 | (2) |
Part 5 Ensemble Modeling |
|
241 | (38) |
|
15 Bagging and Boosting of Classification Models |
|
|
243 | (6) |
|
|
|
|
|
|
243 | (1) |
|
|
244 | (1) |
|
Step by Step Instructions |
|
|
245 | (2) |
|
|
247 | (1) |
|
|
247 | (2) |
|
16 Bagging and Boosting of Regression Models |
|
|
249 | (8) |
|
|
|
|
|
|
249 | (1) |
|
|
249 | (1) |
|
Step-by-Step Instructions |
|
|
250 | (5) |
|
|
255 | (1) |
|
|
255 | (2) |
|
17 Instability of Interpretable Rules |
|
|
257 | (6) |
|
|
|
|
|
|
257 | (1) |
|
|
258 | (1) |
|
Step-by-Step Instructions |
|
|
258 | (3) |
|
|
261 | (1) |
|
|
261 | (2) |
|
18 Random Subspaces and Random Forest |
|
|
263 | (8) |
|
|
|
|
|
|
264 | (1) |
|
|
264 | (1) |
|
Step-by-Step Instructions |
|
|
265 | (4) |
|
|
269 | (1) |
|
|
269 | (2) |
|
|
271 | (8) |
|
|
|
|
|
|
271 | (1) |
|
|
272 | (1) |
|
Step-by-Step Instructions |
|
|
273 | (4) |
|
|
277 | (1) |
|
|
278 | (1) |
Part 6 3D Pharmacophore Modeling |
|
279 | (32) |
|
20 3D Pharmacophore Modeling Techniques in Computer-Aided Molecular Design Using LigandScout |
|
|
281 | (30) |
|
|
|
|
|
|
|
281 | (2) |
|
Theory: 3D Pharmacophores |
|
|
283 | (1) |
|
Representation of Pharmacophore Models |
|
|
283 | (5) |
|
Hydrogen-Bonding Interactions |
|
|
285 | (1) |
|
|
285 | (1) |
|
Aromatic and Cation-pi Interactions |
|
|
286 | (1) |
|
|
286 | (1) |
|
|
286 | (1) |
|
|
287 | (1) |
|
|
288 | (1) |
|
Manual Pharmacophore Construction |
|
|
288 | (1) |
|
Structure-Based Pharmacophore Models |
|
|
289 | (1) |
|
Ligand-Based Pharmacophore Models |
|
|
289 | (2) |
|
3D Pharmacophore-Based Virtual Screening |
|
|
291 | (16) |
|
3D Pharmacophore Creation |
|
|
291 | (1) |
|
Annotated Database Creation |
|
|
291 | (1) |
|
Virtual Screening-Database Searching |
|
|
292 | (1) |
|
|
292 | (2) |
|
Tutorial: Creating 3D-Pharmacophore Models Using LigandScout |
|
|
294 | (1) |
|
Creating Structure-Based Pharmacophores From a Ligand-Protein Complex |
|
|
294 | (2) |
|
Description: Create a Structure-Based Pharmacophore Model |
|
|
296 | (1) |
|
Create a Shared Feature Pharmacophore Model From Multiple Ligand-Protein Complexes |
|
|
296 | (1) |
|
Description: Create a Shared Feature Pharmacophore and Align it to Ligands |
|
|
297 | (1) |
|
Create Ligand-Based Pharmacophore Models |
|
|
298 | (2) |
|
Description: Ligand-Based Pharmacophore Model Creation |
|
|
300 | (1) |
|
Tutorial: Pharmacophore-Based Virtual Screening Using LigandScout |
|
|
301 | (1) |
|
Virtual Screening, Model Editing, and Viewing Hits in the Target Active Site |
|
|
301 | (1) |
|
Description: Virtual Screening and Pharmacophore Model Editing |
|
|
302 | (1) |
|
Analyzing Screening Results with Respect to the Binding Site |
|
|
303 | (2) |
|
Description: Analyzing Hits in the Active Site Using LigandScout |
|
|
305 | (1) |
|
Parallel Virtual Screening of Multiple Databases Using LigandScout |
|
|
305 | (1) |
|
Virtual Screening in the Screening Perspective of LigandScout |
|
|
306 | (1) |
|
Description: Virtual Screening Using LigandScout |
|
|
306 | (1) |
|
|
307 | (1) |
|
|
307 | (1) |
|
|
307 | (4) |
Part 7 The Protein 3D-Structures in Virtual Screening |
|
311 | (42) |
|
21 The Protein 3D-Structures in Virtual Screening |
|
|
313 | (40) |
|
|
|
|
313 | (1) |
|
Description of the Example Case |
|
|
314 | (1) |
|
Thrombin and Blood Coagulation |
|
|
314 | (1) |
|
Active Thrombin and Inactive Prothrombin |
|
|
314 | (1) |
|
Thrombin as a Drug Target |
|
|
314 | (1) |
|
Thrombin Three-Dimensional Structure: The 1OYT PDB File |
|
|
315 | (1) |
|
|
315 | (1) |
|
Overall Description of the Input Data Available on the Editor Website |
|
|
315 | (1) |
|
Exercise 1: Protein Analysis and Preparation |
|
|
316 | (14) |
|
Step 1: Identification of Molecules Described in the 1OYT PDB File |
|
|
316 | (4) |
|
Step 2: Protein Quality Analysis of the Thrombin/Inhibitor PDB Complex Using MOE Geometry Utility |
|
|
320 | (1) |
|
Step 3: Preparation of the Protein for Drug Design Applications |
|
|
321 | (4) |
|
Step 4: Description of the Protein-Ligand Binding Mode |
|
|
325 | (3) |
|
Step 5: Detection of Protein Cavities |
|
|
328 | (2) |
|
Exercise 2: Retrospective Virtual Screening Using the Pharmacophore Approach |
|
|
330 | (11) |
|
Step 1: Description of the Test Library |
|
|
332 | (1) |
|
Step 2.1: Pharmacophore Design, Overview |
|
|
333 | (1) |
|
Step 2.2: Pharmacophore Design, Flexible Alignment of Three Thrombin Inhibitors |
|
|
334 | (1) |
|
Step 2.3: Pharmacophore Design, Query Generation |
|
|
335 | (2) |
|
Step 3: Pharmacophore Search |
|
|
337 | (4) |
|
Exercise 3: Retrospective Virtual Screening Using the Docking Approach |
|
|
341 | (9) |
|
Step 1: Description of the Test Library |
|
|
341 | (1) |
|
Step 2: Preparation of the Input |
|
|
341 | (1) |
|
Step 3: Re-Docking of the Crystallographic Ligand |
|
|
341 | (4) |
|
Step 4: Virtual Screening of a Database |
|
|
345 | (5) |
|
|
350 | (1) |
|
|
351 | (2) |
Part 8 Protein-Ligand Docking |
|
353 | (24) |
|
22 Protein-Ligand Docking |
|
|
355 | (22) |
|
|
|
|
|
355 | (1) |
|
Description of the Example Case |
|
|
356 | (1) |
|
|
356 | (4) |
|
|
359 | (1) |
|
|
359 | (1) |
|
|
360 | (1) |
|
Description of Input Data Available on the Editor Website |
|
|
360 | (2) |
|
|
362 | (10) |
|
A Quick Start with LeadIT |
|
|
362 | (1) |
|
Re-Docking of Tacrine into AChE |
|
|
362 | (1) |
|
Preparation of AChE From lACJ PDB File |
|
|
362 | (1) |
|
Docking of Neutral Tacrine, then of Positively Charged Tacrine |
|
|
363 | (2) |
|
Docking of Positively Charged Tacrine in AChE in Presence of Water |
|
|
365 | (1) |
|
Cross-Docking of Tacrine-Pyridone and Donepezil Into AChE |
|
|
366 | (1) |
|
Preparation of AChE From lACJ PDB File |
|
|
366 | (1) |
|
Cross-Docking of Tacrine-Pyridone Inhibitor and Donepezil in AChE in Presence of Water |
|
|
367 | (3) |
|
Re-Docking of Donepezil in AChE in Presence of Water |
|
|
370 | (2) |
|
|
372 | (1) |
|
Annex: Screen Captures of LeadIT Graphical Interface |
|
|
372 | (3) |
|
|
375 | (2) |
Part 9 Pharmacophorical Profiling Using Shape Analysis |
|
377 | (16) |
|
23 Pharmacophorical Profiling Using Shape Analysis |
|
|
379 | (14) |
|
|
|
|
|
|
|
379 | (1) |
|
Description of the Example Case |
|
|
380 | (1) |
|
|
380 | (1) |
|
Description of the Searched Data Set |
|
|
381 | (1) |
|
|
381 | (1) |
|
|
381 | (4) |
|
|
381 | (3) |
|
|
384 | (1) |
|
Other Programs for Shape Comparison |
|
|
384 | (1) |
|
Description of Input Data Available on the Editor Website |
|
|
385 | (2) |
|
|
387 | (3) |
|
Preamble: Practical Considerations |
|
|
387 | (1) |
|
|
387 | (1) |
|
What are ROCS Output Files? |
|
|
387 | (1) |
|
|
388 | (2) |
|
|
390 | (1) |
|
|
391 | (2) |
Part 10 Algorithmic Chemoinformatics |
|
393 | (56) |
|
24 Algorithmic Chemoinformatics |
|
|
395 | (54) |
|
|
Antonio de la Vega de Leon |
|
|
|
|
395 | (1) |
|
Similarity Searching Using Data Fusion Techniques |
|
|
396 | (1) |
|
Introduction to Virtual Screening |
|
|
396 | (1) |
|
The Three Pillars of Virtual Screening |
|
|
397 | (1) |
|
|
397 | (1) |
|
|
397 | (1) |
|
Search Strategy (Data Fusion) |
|
|
397 | (1) |
|
|
397 | (5) |
|
|
397 | (2) |
|
Fingerprint Representations |
|
|
399 | (1) |
|
|
399 | (1) |
|
|
399 | (1) |
|
Generation of Fingerprints |
|
|
399 | (3) |
|
|
402 | (2) |
|
|
404 | (1) |
|
Completed Virtual Screening Program |
|
|
405 | (1) |
|
Benchmarking VS Performance |
|
|
406 | (2) |
|
|
407 | (1) |
|
|
407 | (1) |
|
Multiple Runs and Reproducibility |
|
|
408 | (1) |
|
Adjusting the VS Program for Benchmarking |
|
|
408 | (6) |
|
Analyzing Benchmark Results |
|
|
410 | (4) |
|
|
414 | (1) |
|
Introduction to Chemoinformatics Toolkits |
|
|
415 | (1) |
|
|
415 | (1) |
|
|
416 | (1) |
|
Basic Usage: Creating and Manipulating Molecules in RDKit |
|
|
417 | (2) |
|
Creation of Molecule Objects |
|
|
417 | (1) |
|
|
418 | (1) |
|
|
418 | (1) |
|
|
419 | (1) |
|
An Example: Hill Notation for Molecules |
|
|
419 | (1) |
|
Canonical SMILES: The CANON Algorithm |
|
|
420 | (1) |
|
|
420 | (2) |
|
|
420 | (1) |
|
|
421 | (1) |
|
|
422 | (3) |
|
Canonicalization of SMILES |
|
|
425 | (3) |
|
|
427 | (1) |
|
|
428 | (3) |
|
|
431 | (1) |
|
Substructure Searching: The Ullmann Algorithm |
|
|
432 | (1) |
|
|
432 | (1) |
|
|
433 | (3) |
|
|
436 | (1) |
|
|
436 | (5) |
|
|
440 | (1) |
|
|
441 | (1) |
|
Atom Environment Fingerprints |
|
|
441 | (1) |
|
|
441 | (2) |
|
|
443 | (4) |
|
|
443 | (1) |
|
The Initial Atom Invariant |
|
|
444 | (1) |
|
|
444 | (3) |
|
|
447 | (1) |
|
|
447 | (2) |
Index |
|
449 | |