Complete guidance for mastering the tools and techniques of the digital revolution
With the digital revolution opening up tremendous opportunities in many fields, there is a growing need for skilled professionals who can develop data-intensive systems and extract information and knowledge from them. This book frames for the first time a new systematic approach for tackling the challenges of data-intensive computing, providing decision makers and technical experts alike with practical tools for dealing with our exploding data collections.
Emphasizing data-intensive thinking and interdisciplinary collaboration, The Data Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business examines the essential components of knowledge discovery, surveys many of the current research efforts worldwide, and points to new areas for innovation. Complete with a wealth of examples and DISPEL-based methods demonstrating how to gain more from data in real-world systems, the book:
- Outlines the concepts and rationale for implementing data-intensive computing in organizations
- Covers from the ground up problem-solving strategies for data analysis in a data-rich world
- Introduces techniques for data-intensive engineering using the Data-Intensive Systems Process Engineering Language DISPEL
- Features in-depth case studies in customer relations, environmental hazards, seismology, and more
- Showcases successful applications in areas ranging from astronomy and the humanities to transport engineering
- Includes sample program snippets throughout the text as well as additional materials on a companion website
The Data Bonanza is a must-have guide for information strategists, data analysts, and engineers in business, research, and government, and for anyone wishing to be on the cutting edge of data mining, machine learning, databases, distributed systems, or large-scale computing.
CONTRIBUTORS xv FOREWORD xvii PREFACE xix THE EDITORS xxix PART I
STRATEGIES FOR SUCCESS IN THE DIGITAL-DATA REVOLUTION 1
1. The Digital-Data
Challenge 5 Malcolm Atkinson and Mark Parsons 1.1 The Digital Revolution /
5 1.2 Changing How We Think and Behave / 6 1.3 Moving Adroitly in this
Fast-Changing Field / 8 1.4 Digital-Data Challenges Exist Everywhere / 8
1.5 Changing How We Work / 9 1.6 Divide and Conquer Offers the Solution / 10
1.7 Engineering Data-to-Knowledge Highways / 12 References / 13
2. The
Digital-Data Revolution 15 Malcolm Atkinson 2.1 Data, Information, and
Knowledge / 16 2.2 Increasing Volumes and Diversity of Data / 18 2.3
Changing the Ways We Work with Data / 28 References / 33
3. The
Data-Intensive Survival Guide 37 Malcolm Atkinson 3.1 Introduction:
Challenges and Strategy / 38 3.2 Three Categories of Expert / 39 3.3 The
Data-Intensive Architecture / 41 3.4 An Operational Data-Intensive System /
42 3.5 Introducing DISPEL / 44 3.6 A Simple DISPEL Example / 45 3.7
Supporting Data-Intensive Experts / 47 3.8 DISPEL in the Context of
Contemporary Systems / 48 3.9 Datascopes / 51 3.10 Ramps for Incremental
Engagement / 54 3.11 Readers' Guide to the Rest of This Book / 56
References / 58
4. Data-Intensive Thinking with DISPEL 61 Malcolm Atkinson
4.1 Processing Elements / 62 4.2 Connections / 64 4.3 Data Streams and
Structure / 65 4.4 Functions / 66 4.5 The Three-Level Type System / 72 4.6
Registry, Libraries, and Descriptions / 81 4.7 Achieving Data-Intensive
Performance / 86 4.8 Reliability and Control / 108 4.9 The
Data-to-Knowledge Highway / 116 References / 121 PART II DATA-INTENSIVE
KNOWLEDGE DISCOVERY 123
5. Data-Intensive Analysis 127 Oscar Corcho and
Jano van Hemert 5.1 Knowledge Discovery in Telco Inc. / 128 5.2
Understanding Customers to Prevent Churn / 130 5.3 Preventing Churn Across
Multiple Companies / 134 5.4 Understanding Customers by Combining
Heterogeneous Public and Private Data / 137 5.5 Conclusions / 144
References / 145
6. Problem Solving in Data-Intensive Knowledge Discovery
147 Oscar Corcho and Jano van Hemert 6.1 The Conventional Life Cycle of
Knowledge Discovery / 148 6.2 Knowledge Discovery Over Heterogeneous Data
Sources / 155 6.3 Knowledge Discovery from Private and Public, Structured
and Nonstructured Data / 158 6.4 Conclusions / 162 References / 162
7.
Data-Intensive Components and Usage Patterns 165 Oscar Corcho 7.1 Data
Source Access and Transformation Components / 166 7.2 Data Integration
Components / 172 7.3 Data Preparation and Processing Components / 173 7.4
Data-Mining Components / 174 7.5 Visualization and Knowledge Delivery
Components / 176 References / 178
8. Sharing and Reuse in Knowledge
Discovery 181 Oscar Corcho 8.1 Strategies for Sharing and Reuse / 182 8.2
Data Analysis Ontologies for Data Analysis Experts / 185 8.3 Generic
Ontologies for Metadata Generation / 188 8.4 Domain Ontologies for Domain
Experts / 189 8.5 Conclusions / 190 References / 191 PART III
DATA-INTENSIVE ENGINEERING 193
9. Platforms for Data-Intensive Analysis 197
David Snelling 9.1 The Hourglass Reprise / 198 9.2 The Motivation for a
Platform / 200 9.3 Realization / 201 References / 201
10. Definition of
the DISPEL Language 203 Paul Martin and Gagarine Yaikhom 10.1 A Simple
Example / 204 10.2 Processing Elements / 205 10.3 Data Streams / 213 10.4
Type System / 217 10.5 Registration / 222 10.6 Packaging / 224 10.7
Workflow Submission / 225 10.8 Examples of DISPEL / 227 10.9 Summary / 235
References / 236
11. DISPEL Development 237 Adrian Mouat and David Snelling
11.1 The Development Landscape / 237 11.2 Data-Intensive Workbenches / 239
11.3 Data-Intensive Component Libraries / 247 11.4 Summary / 248 References
/ 248
12. DISPEL Enactment 251 Chee Sun Liew, Amrey Krause, and David
Snelling 12.1 Overview of DISPEL Enactment / 251 12.2 DISPEL Language
Processing / 253 12.3 DISPEL Optimization / 255 12.4 DISPEL Deployment /
266 12.5 DISPEL Execution and Control / 268 References / 273 PART IV
DATA-INTENSIVE APPLICATION EXPERIENCE 275
13. The Application Foundations of
DISPEL 277 Rob Baxter 13.1 Characteristics of Data-Intensive Applications /
277 13.2 Evaluating Application Performance / 280 13.3 Reviewing the
Data-Intensive Strategy / 283
14. Analytical Platform for Customer
Relationship Management 287 Maciej Jarka and Mark Parsons 14.1 Data
Analysis in the Telecoms Business / 288 14.2 Analytical Customer
Relationship Management / 289 14.3 Scenario 1: Churn Prediction / 291 14.4
Scenario 2: Cross Selling / 293 14.5 Exploiting the Models and Rules / 296
14.6 Summary: Lessons Learned / 299 References / 299
15. Environmental Risk
Management 301 Ladislav Hluch'y, Ondrej Habala, Viet Tran, and Branislav !
Simo 15.1 Environmental Modeling / 302 15.2 Cascading Simulation Models /
303 15.3 Environmental Data Sources and Their Management / 305 15.4
Scenario 1: ORAVA / 309 15.5 Scenario 2: RADAR / 313 15.6 Scenario 3: SVP /
318 15.7 New Technologies for Environmental Data Mining / 321 15.8 Summary:
Lessons Learned / 323 References / 325
16. Analyzing Gene Expression
Imaging Data in Developmental Biology 327 Liangxiu Han, Jano van Hemert,
Ian Overton, Paolo Besana, and Richard Baldock 16.1 Understanding
Biological Function / 328 16.2 Gene Image Annotation / 330 16.3 Automated
Annotation of Gene Expression Images / 331 16.4 Exploitation and Future Work
/ 341 16.5 Summary / 345 References / 346
17. Data-Intensive Seismology:
Research Horizons 353 Michelle Galea, Andreas Rietbrock, Alessandro Spinuso,
and Luca Trani 17.1 Introduction / 354 17.2 Seismic Ambient Noise
Processing / 356 17.3 Solution Implementation / 358 17.4 Evaluation / 369
17.5 Further Work / 372 17.6 Conclusions / 373 References / 375 PART V
DATA-INTENSIVE BEACONS OF SUCCESS 377
18. Data-Intensive Methods in
Astronomy 381 Thomas D. Kitching, Robert G. Mann, Laura E. Valkonen, Mark S.
Holliman, Alastair Hume, and Keith T. Noddle 18.1 Introduction / 381 18.2
The Virtual Observatory / 382 18.3 Data-Intensive Photometric Classification
of Quasars / 383 18.4 Probing the Dark Universe with Weak Gravitational
Lensing / 387 18.5 Future Research Issues / 392 18.6 Conclusions / 392
References / 393
19. The World at One's Fingertips: Interactive
Interpretation of Environmental Data 395 Jon Blower, Keith Haines, and
Alastair Gemmell 19.1 Introduction / 395 19.2 The Current State of the Art
/ 397 19.3 The Technical Landscape / 401 19.4 Interactive Visualization /
403 19.5 From Visualization to Intercomparison / 406 19.6 Future
Development: The Environmental Cloud / 409 19.7 Conclusions / 411
References / 412
20. Data-Driven Research in the Humanities--the DARIAH
Research Infrastructure 417 Andreas Aschenbrenner, Tobias Blanke, Christiane
Fritze, andWolfgang Pempe 20.1 Introduction / 417 20.2 The Tradition of
Digital Humanities / 420 20.3 Humanities Research Data / 422 20.4 Use Case
/ 426 20.5 Conclusion and Future Development / 429 References / 430
21.
Analysis of Large and Complex Engineering and Transport Data 431 Jim Austin
21.1 Introduction / 431 21.2 Applications and Challenges / 432 21.3 The
Methods Used / 434 21.4 Future Developments / 438 21.5 Conclusions / 439
References / 440
22. Estimating Species Distributions--Across Space,
Through Time, and with Features of the Environment 441 Steve Kelling, Daniel
Fink, Wesley Hochachka, Ken Rosenberg, Robert Cook, Theodoros Damoulas,
Claudio Silva, and William Michener 22.1 Introduction / 442 22.2 Data
Discovery, Access, and Synthesis / 443 22.3 Model Development / 448 22.4
Managing Computational Requirements / 449 22.5 Exploring and Visualizing
Model Results / 450 22.6 Analysis Results / 452 22.7 Conclusion / 454
References / 456 PART VI THE DATA-INTENSIVE FUTURE 459
23. Data-Intensive
Trends 461 Malcolm Atkinson and Paolo Besana 23.1 Reprise / 461 23.2
Data-Intensive Applications / 469 References / 476
24. Data-Rich Futures
477 Malcolm Atkinson 24.1 Future Data Infrastructure / 478 24.2 Future
Data Economy / 485 24.3 Future Data Society and Professionalism / 489
References / 494 Appendix A: Glossary 499 Michelle Galea and Malcolm
Atkinson Appendix B: DISPEL Reference Manual 507 Paul Martin Appendix C:
Component Definitions 531 Malcolm Atkinson and Chee Sun Liew INDEX 537
MALCOLM ATKINSON, PhD, is Professor of e-Science in the School of Informatics at the University of Edinburgh in Scotland. He is also Data-Intensive Research Group leader, Director of the e-Science Institute, IT architect for the ADMIRE and VERCE EU projects and UK e-Science Envoy. Professor Atkinson has been leading research projects for several decades and served on many advisory bodies.