| Preface |
|
xi | |
| Acknowledgments |
|
xiii | |
| Biography |
|
xv | |
| Chapter 1 Introduction |
|
1 | |
| Chapter 2 Relational Database Fundamentals |
|
5 | |
|
|
|
5 | |
|
2.2 Tables, Rows, and Columns |
|
|
5 | |
|
2.3 External and Internal Representations of Data |
|
|
7 | |
|
2.4 Advantages over Spreadsheets |
|
|
8 | |
|
|
|
8 | |
|
|
|
8 | |
|
2.5 Relationships among Tables |
|
|
9 | |
|
2.5.1 One-to-Many Relationships |
|
|
9 | |
|
2.5.2 One-to-One Relationships |
|
|
11 | |
|
2.5.3 Many-to-Many Relationships |
|
|
12 | |
|
2.6 Entity Relationship Diagrams |
|
|
12 | |
|
|
|
14 | |
|
|
|
14 | |
|
|
|
15 | |
|
|
|
15 | |
|
|
|
15 | |
|
|
|
16 | |
|
|
|
16 | |
|
|
|
16 | |
|
|
|
17 | |
|
|
|
17 | |
|
2.13.2 Second Normal Form |
|
|
18 | |
|
|
|
19 | |
|
2.13.4 Summary of Normal Forms |
|
|
20 | |
|
|
|
20 | |
| Chapter 3 Structured Query Language (SQL) |
|
21 | |
|
|
|
21 | |
|
3.2 Databases, Schemas, Tables, Rows, and Columns |
|
|
21 | |
|
|
|
22 | |
|
|
|
23 | |
|
|
|
24 | |
|
|
|
25 | |
|
|
|
26 | |
|
|
|
26 | |
|
3.7.2 Aggregate Functions |
|
|
27 | |
|
3.8 Domains, Triggers, and Views |
|
|
28 | |
|
3.9 Unions, Intersections, and Differences |
|
|
29 | |
|
|
|
30 | |
| Chapter 4 Relational Database Management Systems |
|
31 | |
|
|
|
31 | |
|
|
|
32 | |
|
4.3 A Sampling of Differences |
|
|
32 | |
|
|
|
33 | |
|
|
|
35 | |
|
|
|
35 | |
| Chapter 5 Client and Web Applications |
|
37 | |
|
|
|
37 | |
|
5.2 Command Line Programs |
|
|
37 | |
|
5.3 Web-Based Applications |
|
|
38 | |
|
|
|
39 | |
|
5.5 SQL Interfaces in Various Languages |
|
|
41 | |
|
|
|
43 | |
|
|
|
44 | |
|
|
|
44 | |
|
|
|
45 | |
|
|
|
46 | |
| Chapter 6 Data Storage, Searching, and Manipulation |
|
47 | |
|
|
|
47 | |
|
6.2 General Schema Design Decisions |
|
|
47 | |
|
6.3 Sample Schema for Tracking Chemical Samples |
|
|
49 | |
|
6.4 Schemas for PubChem Data |
|
|
53 | |
|
|
|
54 | |
|
|
|
56 | |
|
|
|
58 | |
|
6.5 Data Constraints and Data Integrity |
|
|
60 | |
|
6.6 Developing Complex SQL |
|
|
63 | |
|
|
|
66 | |
|
|
|
67 | |
|
|
|
70 | |
| Chapter 7 Computer Representations of Molecular Structures |
|
71 | |
|
|
|
71 | |
|
7.2 SMILES Representation of Molecular Structure |
|
|
72 | |
|
7.3 Extensions to SQL for Chemical Structures |
|
|
72 | |
|
7.4 SMARTS Representation of Molecular Searches |
|
|
74 | |
|
7.5 SMILES and SMARTS Quirks |
|
|
76 | |
|
|
|
76 | |
|
|
|
77 | |
|
|
|
77 | |
|
|
|
80 | |
|
|
|
80 | |
|
|
|
81 | |
|
|
|
81 | |
|
7.5.8 InChI and Canonical SMILES |
|
|
82 | |
|
7.6 SMILES and Inorganic Structures |
|
|
82 | |
|
7.7 Other SMILES Extensions |
|
|
82 | |
|
7.8 Input and Output of Molecular Structures |
|
|
83 | |
|
7.9 Useful SQL Extensions |
|
|
85 | |
|
7.10 SMILES as an SQL Data Type |
|
|
86 | |
|
|
|
86 | |
|
|
|
87 | |
|
|
|
88 | |
|
|
|
88 | |
| Chapter 8 Molecular Fragments and Fingerprints |
|
91 | |
|
|
|
91 | |
|
|
|
91 | |
|
|
|
92 | |
|
8.2.2 MACCS Keys and Other Fragment Keys |
|
|
95 | |
|
|
|
95 | |
|
|
|
96 | |
|
8.5 Computing Fragment-Based Properties |
|
|
96 | |
|
|
|
98 | |
| Chapter 9 Reactions and Transformations |
|
99 | |
|
|
|
99 | |
|
|
|
99 | |
|
|
|
100 | |
|
9.3.1 Unimolecular Transformations |
|
|
101 | |
|
9.3.2 Multi-Component Transformations |
|
|
104 | |
|
9.4 Canonical Reaction SMILES |
|
|
106 | |
|
|
|
107 | |
| Chapter 10 PostgreSQL Extensions |
|
109 | |
|
|
|
109 | |
|
10.2 Composite Data Types |
|
|
109 | |
|
10.3 Composite Data Type for Experimental Values |
|
|
111 | |
|
10.4 Array Data Types for Two- and Three-Dimensional Coordinates |
|
|
115 | |
|
10.5 Functions in Other Languages |
|
|
117 | |
|
|
|
117 | |
|
10.5.2 Plperl, Plpython, Pltcl |
|
|
118 | |
|
10.5.3 Core Chemical Functions |
|
|
119 | |
|
10.5.4 C Language Functions |
|
|
120 | |
|
|
|
121 | |
|
|
|
121 | |
| Chapter 11 Three-Dimensional Molecular Structure Tables |
|
123 | |
|
|
|
123 | |
|
11.2 Using Tables Instead of Files |
|
|
123 | |
|
11.3 Molfile and Other Common File Formats |
|
|
124 | |
|
11.4 Processing SDF Files |
|
|
125 | |
|
11.5 Using Tables Instead of Files in Client Programs |
|
|
131 | |
|
11.6 File Import, Export, and Conversions |
|
|
132 | |
|
11.7 Functions Using Three-Dimensional Atomic Coordinates |
|
|
133 | |
|
|
|
135 | |
|
11.9 Other Representations of Three-Dimensional Molecular Structure |
|
|
136 | |
|
|
|
136 | |
| Chapter 12 More on Client and Web Interfaces to RDBMS |
|
137 | |
|
|
|
137 | |
|
12.2 Store All Possible Data in the RDBMS |
|
|
139 | |
|
12.3 Advanced SQL Techniques |
|
|
140 | |
|
12.3.1 Placeholders in SQL Statements |
|
|
141 | |
|
12.3.2 Bind Values in SQL Statements |
|
|
142 | |
|
|
|
143 | |
|
|
|
147 | |
|
12.5.1 Hierarchical Clustering |
|
|
147 | |
|
|
|
148 | |
|
|
|
153 | |
| Chapter 13 Applications |
|
155 | |
|
|
|
155 | |
|
13.2 Compound Registration |
|
|
155 | |
|
13.3 Experimental Chemical and Biological Data Integration |
|
|
162 | |
|
13.4 Data from External Sources |
|
|
164 | |
|
|
|
167 | |
|
|
|
168 | |
|
|
|
168 | |
|
|
|
169 | |
|
|
|
170 | |
|
|
|
170 | |
|
|
|
171 | |
|
|
|
171 | |
|
|
|
172 | |
| Appendix |
|
173 | |
|
|
|
173 | |
|
A.2 Symbols and Bonds from Simplified Molecular Input Line Entry System (SMILES) |
|
|
173 | |
|
|
|
175 | |
|
|
|
176 | |
|
|
|
176 | |
|
|
|
176 | |
|
|
|
176 | |
|
|
|
177 | |
|
|
|
177 | |
|
|
|
177 | |
|
|
|
177 | |
|
|
|
181 | |
|
A.5 Tables Used in Functions |
|
|
182 | |
|
|
|
183 | |
|
|
|
183 | |
|
|
|
183 | |
|
A.6 Core Function Implementation for PostgreSQL |
|
|
188 | |
|
|
|
188 | |
|
|
|
191 | |
|
|
|
197 | |
|
A.7 C Language PostgreSQL Functions |
|
|
203 | |
|
A.8 Database Utilities Dbutils |
|
|
205 | |
|
A.9 Loading Files into Simple Tables |
|
|
206 | |
|
|
|
207 | |
|
|
|
208 | |
|
|
|
210 | |
| Index |
|
211 | |