Muutke küpsiste eelistusi

E-raamat: Data Mining Techniques in Grid Computing Environments [Wiley Online]

Edited by (University of Ulster)
  • Formaat: 288 pages
  • Ilmumisaeg: 14-Nov-2008
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 470699906
  • ISBN-13: 9780470699904
Teised raamatud teemal:
  • Wiley Online
  • Hind: 163,83 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Formaat: 288 pages
  • Ilmumisaeg: 14-Nov-2008
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 470699906
  • ISBN-13: 9780470699904
Teised raamatud teemal:
Based around eleven international real life case studies and including contributions from leading experts in the field this groundbreaking book explores the need for the grid-enabling of data mining applications and provides a comprehensive study of the technology, techniques and management skills necessary to create them. This book provides a simultaneous design blueprint, user guide, and research agenda for current and future developments and will appeal to a broad audience; from developers and users of data mining and grid technology, to advanced undergraduate and postgraduate students interested in this field.
Preface xiii
List of Contributors
xvii
Data mining meets grid computing: Time to dance?
1(16)
Alberto Sanchez
Jesus Montes
Werner Dubitzky
Julio J. Valdes
Maria S. Perez
Pedro de Miguel
Introduction
2(1)
Data mining
3(3)
Complex data mining problems
3(1)
Data mining challenges
4(2)
Grid computing
6(3)
Grid computing challenges
9(1)
Data mining grid - mining grid data
9(3)
Data mining grid: a grid facilitating large-scale data mining
9(2)
Mining grid data: analyzing grid systems with data mining techniques
11(1)
Conclusions
12(1)
Summary of
Chapters in this Volume
13(4)
Data analysis services in the knowledge grid
17(20)
Eugenio Cesario
Antonio Congiusta
Domenico Talia
Paolo Trunfio
Introduction
17(1)
Approach
18(2)
Knowledge Grid services
20(9)
The Knowledge Grid architecture
21(3)
Implementation
24(5)
Data analysis services
29(2)
Design of Knowledge Grid applications
31(3)
The VEGA visual language
31(1)
UML application modelling
32(1)
Applications and experiments
33(1)
Conclusions
34(3)
GridMiner: An advanced support for e-science analytics
37(20)
Peter Brezany
Ivan Janciak
A. Min Tjoa
Introduction
37(2)
Rationale behind the design and development of GridMiner
39(1)
Use Case
40(1)
Knowledge discovery process and its support by the GridMiner
41(9)
Phases of knowledge discovery
42(3)
Workflow management
45(1)
Data management
46(1)
Data mining services and OLAP
47(2)
Security
49(1)
Graphical user interface
50(2)
Future developments
52(1)
High-level data mining model
52(1)
Data mining query language
52(1)
Distributed mining of data streams
52(1)
Conclusions
53(4)
ADaM services: Scientific data mining in the service-oriented architecture paradigm
57(14)
Rahul Ramachandran
Sara Graves
John Rushing
Ken Keyzer
Manil Maskey
Hong Lin
Helen Conover
Introduction
58(1)
ADaM system overview
58(2)
ADaM toolkit overview
60(1)
Mining in a service-oriented architecture
61(1)
Mining web services
62(4)
Implementation architecture
63(1)
Workflow example
64(1)
Implementation issues
64(2)
Mining grid services
66(3)
Architecture components
67(1)
Workflow example
68(1)
Summary
69(2)
Mining for misconfigured machines in grid systems
71(20)
Noam Palatin
Arie Leizarowitz
Assaf Schuster
Ran Wolff
Introduction
71(2)
Preliminaries and related work
73(2)
System misconfiguration detection
73(1)
Outlier detection
74(1)
Acquiring, pre-processing and storing data
75(2)
Data sources and acquisition
75(1)
Pre-processing
75(1)
Data organization
76(1)
Data analysis
77(3)
General approach
77(1)
Notation
78(1)
Algorithm
78(2)
Correctness and termination
80(1)
The GMS
80(2)
Evaluation
82(6)
Qualitative results
82(1)
Quantitative results
83(2)
Interoperability
85(3)
Conclusions and future work
88(3)
FAEHIM: Federated Analysis Environment for Heterogeneous Intelligent Mining
91(14)
Ali Shaikh Ali
Omer F. Rana
Introduction
91(2)
Requirements of a distributed knowledge discovery framework
93(1)
Knowledge discovery specific requirements
93(1)
Distributed framework specific requirements
94(1)
Workflow-based knowledge discovery
94(1)
Data mining toolkit
95(1)
Data mining service framework
96(3)
Distributed data mining services
99(1)
Data manipulation tools
100(1)
Availability
101(1)
Empirical experiments
101(3)
Evaluating the framework accuracy
102(1)
Evaluating the running time of the framework
103(1)
Conclusions
104(1)
Scalable and privacy preserving distributed data analysis over a service-oriented platform
105(14)
William K. Cheung
Introduction
105(1)
A service-oriented solution
106(1)
Background
107(2)
Types of distributed data analysis
107(1)
A brief review of distributed data analysis
108(1)
Data mining services and data analysis management systems
108(1)
Model-based scalable, privacy preserving, distributed data analysis
109(2)
Hierarchical local data abstractions
109(1)
Learning global models from local abstractions
110(1)
Modelling distributed data mining and workflow processes
111(1)
DDM processes in BPEL4WS
111(1)
Implementation details
112(1)
Lessons learned
112(2)
Performance of running distributed data analysis on BPEL
112(1)
Issues specific to service-oriented distributed data analysis
113(1)
Compatibility of Web services development tools
114(1)
Further research directions
114(2)
Optimizing BPEL4WS process execution
114(1)
Improved support of data analysis process management
115(1)
Improved support of data privacy preservation
115(1)
Conclusions
116(3)
Building and using analytical workflows in Discovery Net
119(22)
Moustafa Ghanem
Vasa Curin
Patrick Wendel
Yike Guo
Introduction
119(2)
Workflows on the grid
120(1)
Discovery Net system
121(5)
System overview
121(1)
Workflow representation in DPML
122(1)
Multiple data models
123(1)
Workflow-based services
123(1)
Multiple execution models
123(1)
Data flow pull model
124(1)
Streaming and batch transfer of data elements
124(1)
Control flow push model
125(1)
Embedding
125(1)
Architecture for Discovery Net
126(5)
Motivation for a new server architecture
126(1)
Management of hosting environments
127(1)
Activity management
127(1)
Collaborative workflow platform
127(1)
Architecture overview
127(2)
Activity service definition layer
129(1)
Activity services bus
130(1)
Collaboration and execution services
130(1)
Workflow Services Bus
130(1)
Prototyping and production clients
130(1)
Data management
131(2)
Example of a workflow study
133(3)
ADR studies
133(1)
Analysis overview
133(1)
Service for transforming event data into patient annotations
134(1)
Service for defining exclusions
134(1)
Service for defining exposures
135(1)
Service for building the classification model
135(1)
Validation service
135(1)
Summary
136(1)
Future directions
136(5)
Building workflows that traverse the bioinformatics data landscape
141(24)
Robert Stevens
Paul Fisher
Jun Zhao
Carole Goble
Andy Brass
Introduction
141(2)
The bioinformatics data landscape
143(1)
The bioinformatics experiment landscape
143(2)
Taverna for bioinformatics experiments
145(3)
Three-tiered enactment in Taverna
146(1)
The open-typing data models
147(1)
Building workflows in Taverna
148(2)
Designing a SCUFL workflow
149(1)
Workflow case study
150(9)
The bioinformatics task
152(1)
Current approaches and issues
153(1)
Constructing workflows
154(2)
Candidate genes involved in trypanosomiasis resistance
156(1)
Workflows and the systematic approach
157(2)
Discussion
159(6)
Specification of distributed data mining workflows with DataMiningGrid
165(14)
Dennis Wegener
Michael May
Introduction
165(2)
DataMiningGrid environment
167(2)
General architecture
167(1)
Grid environment
167(1)
Scalability
167(1)
Workflow environment
168(1)
Operations for workflow construction
169(2)
Chaining
169(1)
Looping
169(1)
Branching
170(1)
Shipping algorithms
170(1)
Shipping data
170(1)
Parameter variation
171(1)
Parallelization
171(1)
Extensibility
171(2)
Case studies
173(2)
Evaluation criteria and experimental methodology
173(1)
Partitioning data
173(2)
Classifier comparison scenario
175(1)
Parameter optimization
175(1)
Discussion and related work
175(1)
Open issues
176(1)
Conclusions
176(3)
Anteater: Service-oriented data mining
179(22)
Renato A. Ferreira
Dorgival O. Guedes
Wagner Meira Jr.
Introduction
179(2)
The architecture
181(2)
Runtime framework
183(6)
Labelled stream
185(1)
Global persistent storage
185(1)
Termination detection
186(1)
Application of the model
187(2)
Parallel algorithms for data mining
189(6)
Decision trees
189(4)
Clustering
193(2)
Visual metaphors
195(1)
Case studies
196(1)
Future developments
197(1)
Conclusions and future work
198(3)
DMGA: A generic brokering-based Data Mining Grid Architecture
201(20)
Alberto Sanchez
Maria S. Perez
Pierre Gueant
Jose M. Pena
Pilar Herrero
Introduction
201(1)
DMGA overview
202(2)
Horizontal composition
204(2)
Vertical composition
206(2)
The need for brokering
208(1)
Brokering-based data mining grid architecture
209(1)
Use cases: Apriori, ID3 and J4.8 algorithms
210(6)
Horizontal composition use case: Apriori
210(3)
Vertical composition use cases: ID3 and J4.8
213(3)
Related work
216(1)
Conclusions
217(4)
Grid-based data mining with the Environmental Scenario Search Engine (ESSE)
221(26)
Mikhail Zhizhin
Alexey Poyda
Dmitry Mishin
Dmitry Medvedev
Eric Kihn
Vassily Lyutsarev
Environmental data source: NCEP/NCAR reanalysis data set
222(1)
Fuzzy search engine
223(8)
Operators of fuzzy logic
224(2)
Fuzzy logic predicates
226(1)
Fuzzy states in time
227(2)
Relative importance of parameters
229(1)
Fuzzy search optimization
229(2)
Software architecture
231(6)
Database schema optimization
231(2)
Data grid layer
233(2)
ESSE data resource
235(1)
ESSE data processor
235(2)
Applications
237(6)
Global air temperature trends
238(1)
Statistics of extreme weather events
239(1)
Atmospheric fronts
239(4)
Conclusions
243(4)
Data pre-processing using OGSA-DAI
247(16)
Martin Swain
Neil P. Chue Hong
Introduction
247(1)
Data pre-processing for grid-enabled data mining
248(1)
Using OGSA-DAI to support data mining applications
248(7)
OGSA-DAI's activity framework
249(4)
OGSA-DAI workflows for data management and pre-processing
253(2)
Data pre-processing scenarios in data mining applications
255(3)
Calculating a data summary
255(1)
Discovering association rules in protein unfolding simulations
256(1)
Mining distributed medical databases
257(1)
State-of-the-art solutions for grid data management
258(1)
Discussion
259(1)
Open Issues
259(1)
Conclusions
260(3)
Index 263
Werner Dubitzky, PhD, is Chair of Bioinformatics at the Biomedical Sciences Research Institute in the Faculty of Life and Health Sciences at the University of Ulster. His research investigates systems biology, knowledge management in biology, grid computing, and data mining.

Krzysztof Kurowski, PhD, leads the Applications Department at Poznan Supercomputing and Networking Center in Poland. His research is focused on the modeling of advanced applications, scheduling, and resource management in networked environments.