Preface |
|
xv | |
Nota Bene |
|
xxi | |
About the Author |
|
xxiii | |
|
Part I Fundamental Algorithms and Methods of Medical Informatics |
|
|
|
Chapter 1 Parsing and Transforming Text Files |
|
|
3 | (18) |
|
1.1 Peeking into Large Files |
|
|
3 | (2) |
|
|
3 | (2) |
|
|
5 | (1) |
|
1.2 Paging through Large Text Files |
|
|
5 | (2) |
|
|
5 | (2) |
|
|
7 | (1) |
|
1.3 Extracting Lines that Match a Regular Expression |
|
|
7 | (3) |
|
|
8 | (2) |
|
|
10 | (1) |
|
1.4 Changing Every File in a Subdirectory |
|
|
10 | (2) |
|
|
10 | (1) |
|
|
11 | (1) |
|
1.5 Counting the Words in a File |
|
|
12 | (2) |
|
|
12 | (2) |
|
|
14 | (1) |
|
1.6 Making a Word List with Occurrence Tally |
|
|
14 | (2) |
|
|
14 | (2) |
|
|
16 | (1) |
|
1.7 Using Printf Formatting Style |
|
|
16 | (5) |
|
|
17 | (1) |
|
|
18 | (3) |
|
Chapter 2 Utility Scripts |
|
|
21 | (16) |
|
|
21 | (1) |
|
|
21 | (1) |
|
|
22 | (1) |
|
2.2 Converting Non-ASCII to Base64 ASCII |
|
|
22 | (2) |
|
|
23 | (1) |
|
|
24 | (1) |
|
2.3 Creating a Universally Unique Identifier |
|
|
24 | (1) |
|
|
24 | (1) |
|
|
25 | (1) |
|
2.4 Splitting Text into Sentences |
|
|
25 | (2) |
|
|
26 | (1) |
|
|
26 | (1) |
|
2.5 One-Way Hash on a Name |
|
|
27 | (3) |
|
|
28 | (2) |
|
|
30 | (1) |
|
2.6 One-Way Hash on a File |
|
|
30 | (1) |
|
|
30 | (1) |
|
|
31 | (1) |
|
2.7 A Prime Number Generator |
|
|
31 | (6) |
|
|
32 | (2) |
|
|
34 | (3) |
|
Chapter 3 Viewing and Modifying Images |
|
|
37 | (16) |
|
|
37 | (3) |
|
|
38 | (1) |
|
|
39 | (1) |
|
3.2 Converting between Image Formats |
|
|
40 | (2) |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
42 | (2) |
|
|
42 | (1) |
|
|
43 | (1) |
|
3.4 Drawing a Graph from List Data |
|
|
44 | (2) |
|
|
44 | (2) |
|
|
46 | (1) |
|
3.5 Drawing an Image Mashup |
|
|
46 | (7) |
|
|
46 | (4) |
|
|
50 | (3) |
|
|
53 | (28) |
|
4.1 ZIPF Distribution of Text File |
|
|
53 | (4) |
|
|
54 | (2) |
|
|
56 | (1) |
|
4.2 Preparing a Concordance |
|
|
57 | (3) |
|
|
57 | (2) |
|
|
59 | (1) |
|
|
60 | (3) |
|
|
61 | (2) |
|
|
63 | (1) |
|
|
63 | (6) |
|
|
65 | (3) |
|
|
68 | (1) |
|
4.5 Comparing Texts Using Similarity Scores |
|
|
69 | (12) |
|
|
69 | (7) |
|
|
76 | (5) |
|
Part II Medical Data Resources |
|
|
|
Chapter 5 The National Library of Medicine's Medical Subject Headings (MESH) |
|
|
81 | (18) |
|
5.1 Determining the Hierarchical Lineage for MeSH Terms |
|
|
83 | (5) |
|
|
83 | (3) |
|
|
86 | (2) |
|
5.2 Creating a MeSH Database |
|
|
88 | (2) |
|
|
88 | (2) |
|
|
90 | (1) |
|
5.3 Reading the MeSH Database |
|
|
90 | (2) |
|
|
91 | (1) |
|
|
92 | (1) |
|
5.4 Creating an SQLite Database for MeSH |
|
|
92 | (4) |
|
|
93 | (3) |
|
|
96 | (1) |
|
5.5 Reading the SQLite MeSH Database |
|
|
96 | (3) |
|
|
96 | (1) |
|
|
97 | (2) |
|
Chapter 6 The International Classification of Diseases |
|
|
99 | (8) |
|
6.1 Creating the ICD Dictionary |
|
|
99 | (3) |
|
|
100 | (1) |
|
|
101 | (1) |
|
6.2 Building the ICD-O (Oncology) Dictionary |
|
|
102 | (5) |
|
|
103 | (1) |
|
|
104 | (3) |
|
Chapter 7 Seer: The Cancer Surveillance, Epidemiology, and End Results Program |
|
|
107 | (16) |
|
7.1 Parsing the SEER Data Files |
|
|
107 | (3) |
|
|
107 | (2) |
|
|
109 | (1) |
|
7.2 Finding the Occurrences of All Cancers in the SEER Data Files |
|
|
110 | (5) |
|
|
111 | (3) |
|
|
114 | (1) |
|
7.3 Finding the Age Distributions of the Cancers in the SEER Data Files |
|
|
115 | (8) |
|
|
115 | (4) |
|
|
119 | (4) |
|
Chapter 8 OMIM: The Online Mendelian Inheritance in Man |
|
|
123 | (8) |
|
8.1 Collecting the OMIM Entry Terms |
|
|
124 | (2) |
|
|
124 | (1) |
|
|
125 | (1) |
|
8.2 Finding Inherited Cancer Conditions |
|
|
126 | (5) |
|
|
126 | (2) |
|
|
128 | (3) |
|
|
131 | (12) |
|
9.1 Building a Large Text Corpus of Biomedical Information |
|
|
131 | (3) |
|
|
132 | (2) |
|
|
134 | (1) |
|
9.2 Creating a List of Doublets from a PubMed Corpus |
|
|
134 | (5) |
|
|
136 | (2) |
|
|
138 | (1) |
|
9.3 Downloading Gene Synonyms from PubMed |
|
|
139 | (1) |
|
9.4 Downloading Protein Synonyms from PubMed |
|
|
140 | (3) |
|
|
143 | (14) |
|
10.1 Finding a Taxonomic Hierarchy |
|
|
143 | (5) |
|
|
144 | (3) |
|
|
147 | (1) |
|
10.2 Finding the Restricted Classes of Human Infectious Pathogens |
|
|
148 | (9) |
|
|
148 | (5) |
|
|
153 | (4) |
|
Chapter 11 Developmental Lineage Classification and Taxonomy of Neoplasms |
|
|
157 | (20) |
|
11.1 Building the Doublet Hash |
|
|
158 | (3) |
|
|
158 | (3) |
|
|
161 | (1) |
|
11.2 Scanning the Literature for Candidate Terms |
|
|
161 | (6) |
|
|
161 | (5) |
|
|
166 | (1) |
|
11.3 Adding Terms to the Neoplasm Classification |
|
|
167 | (4) |
|
|
168 | (2) |
|
|
170 | (1) |
|
11.4 Determining the Lineage of Every Neoplasm Concept |
|
|
171 | (6) |
|
|
172 | (3) |
|
|
175 | (2) |
|
Chapter 12 U.S. Census Files |
|
|
177 | (16) |
|
12.1 Total Population of the United States |
|
|
177 | (5) |
|
|
177 | (4) |
|
|
181 | (1) |
|
12.2 Stratified Distribution for the U.S. Census |
|
|
182 | (3) |
|
|
182 | (2) |
|
|
184 | (1) |
|
|
185 | (8) |
|
|
186 | (3) |
|
|
189 | (4) |
|
Chapter 13 Centers for Disease Control and Prevention Mortality Files |
|
|
193 | (16) |
|
13.1 Death Certificate Data |
|
|
193 | (3) |
|
13.2 Obtaining the CDC Data Files |
|
|
196 | (1) |
|
13.3 How Death Certificates Are Represented in Data Records |
|
|
197 | (3) |
|
13.4 Ranking, by Number of Occurrences, Every Condition in the CDC Mortality Files |
|
|
200 | (9) |
|
|
200 | (4) |
|
|
204 | (5) |
|
Part III Primary Tasks of Medical Informatics |
|
|
|
|
209 | (10) |
|
14.1 A Neoplasm Autocoder |
|
|
209 | (7) |
|
|
210 | (5) |
|
|
215 | (1) |
|
|
216 | (3) |
|
Chapter 15 Text Scrubber for Deidentifying Confidential Text |
|
|
219 | (8) |
|
|
220 | (2) |
|
|
222 | (5) |
|
Chapter 16 Web Pages and CGI Scripts |
|
|
227 | (10) |
|
|
227 | (3) |
|
|
227 | (2) |
|
|
229 | (1) |
|
16.2 CGI Script for Searching the Neoplasm Classification |
|
|
230 | (7) |
|
|
231 | (4) |
|
|
235 | (2) |
|
Chapter 17 Image Annotation |
|
|
237 | (12) |
|
17.1 Inserting a Header Comment |
|
|
238 | (2) |
|
|
238 | (2) |
|
|
240 | (1) |
|
17.2 Extracting the Header Comment in a JPEG Image File |
|
|
240 | (2) |
|
|
240 | (1) |
|
|
241 | (1) |
|
17.3 Inserting IPTC Annotations |
|
|
242 | (1) |
|
17.4 Extracting Comment, EXIF, and IPTC Annotations |
|
|
242 | (1) |
|
|
242 | (1) |
|
|
242 | (1) |
|
|
243 | (1) |
|
17.6 Finding DICOM Images |
|
|
244 | (1) |
|
17.7 DICOM-to-JPEG Conversion |
|
|
245 | (4) |
|
|
245 | (1) |
|
|
246 | (3) |
|
Chapter 18 Describing Data with Data, Using XML |
|
|
249 | (20) |
|
|
250 | (4) |
|
|
250 | (2) |
|
|
252 | (1) |
|
18.1.3 Resource Description Framework (RDF) |
|
|
252 | (2) |
|
18.2 Dublin Core Metadata |
|
|
254 | (1) |
|
18.3 Insert an RDF Document into an Image File |
|
|
254 | (2) |
|
|
255 | (1) |
|
|
256 | (1) |
|
18.4 Insert an Image File into an RDF Document |
|
|
256 | (3) |
|
|
257 | (1) |
|
|
258 | (1) |
|
|
259 | (1) |
|
18.6 Visualizing an RDF Schema with GraphViz |
|
|
260 | (2) |
|
|
262 | (1) |
|
18.8 Converting a Data Structure to GraphViz |
|
|
263 | (6) |
|
|
263 | (2) |
|
|
265 | (4) |
|
Part IV Medical Discovery |
|
|
|
Chapter 19 Case Study: Emphysema Rates |
|
|
269 | (6) |
|
|
270 | (3) |
|
|
273 | (2) |
|
Chapter 20 Case Study: Cancer Occurrence Rates |
|
|
275 | (10) |
|
|
275 | (6) |
|
|
281 | (4) |
|
Chapter 21 Case Study: Germ Cell Tumor Rates Across Ethnicities |
|
|
285 | (10) |
|
|
286 | (7) |
|
|
293 | (2) |
|
Chapter 22 Case Study: Ranking the Death-Certifying Process, By State |
|
|
295 | (6) |
|
|
295 | (3) |
|
|
298 | (3) |
|
Chapter 23 Case Study: Data Mashups for Epidemics |
|
|
301 | (14) |
|
23.1 Tally of Coccidioidomycosis Cases by State |
|
|
302 | (5) |
|
|
303 | (3) |
|
|
306 | (1) |
|
23.2 Creating the Map Mashup |
|
|
307 | (8) |
|
|
307 | (4) |
|
|
311 | (4) |
|
Chapter 24 Case Study: Sickle Cell Rates |
|
|
315 | (6) |
|
|
315 | (3) |
|
|
318 | (3) |
|
Chapter 25 Case Study: Site-Specific Tumor Biology |
|
|
321 | (14) |
|
25.1 Anatomic Origins of Mesotheliomas |
|
|
321 | (2) |
|
25.2 Mesothelioma Records in the SEER Data Sets |
|
|
323 | (6) |
|
|
324 | (5) |
|
|
329 | (1) |
|
25.3 Graphic Representation |
|
|
329 | (6) |
|
|
330 | (3) |
|
|
333 | (2) |
|
Chapter 26 Case Study: Bimodal Tumors |
|
|
335 | (16) |
|
|
337 | (7) |
|
|
344 | (7) |
|
Chapter 27 Case Study: The Age of Occurrence of Precancers |
|
|
351 | (10) |
|
|
351 | (6) |
|
|
357 | (4) |
|
Epilogue for Healthcare Professionals and Medical Scientists |
|
|
361 | (6) |
|
Learn One or More Open Source Programming Languages |
|
|
361 | (1) |
|
Don't Agonize Over Which Language You Should Choose |
|
|
362 | (1) |
|
|
362 | (1) |
|
Unless You Are a Professional Programmer, Relax and Enjoy Being a Newbie |
|
|
363 | (1) |
|
Do Not Delegate Simple Programming Tasks to Others |
|
|
363 | (1) |
|
Break Complex Tasks into Simple Methods and Algorithms |
|
|
364 | (1) |
|
|
364 | (1) |
|
Concentrate on the Questions, Not the Answers |
|
|
365 | (2) |
|
|
367 | (10) |
|
|
367 | (1) |
|
|
367 | (1) |
|
|
367 | (1) |
|
|
368 | (1) |
|
|
369 | (1) |
|
How to Acquire the Public Data Files Used in This Book |
|
|
370 | (6) |
|
Other Publicly Available Files, Data Sets, and Utilities |
|
|
376 | (1) |
Index |
|
377 | |