Klienditugi: 7440010 (E-R 10-18)

High-Performance Parallel Database Processing and Grid Databases [Kõva köide]

4.50/5 (4 hinnangut Goodreads-ist)

David Taniar (Monash University), Sushant Goel, Clement H. C. Leung (Department of Computer Science, Victoria University), Wenny Rahayu ( La Trobe University)

Formaat: Hardback, 576 pages, kõrgus x laius x paksus: 239x158x31 mm, kaal: 907 g, Drawings: 143 B&W, 0 Color
Sari: Wiley Series on Parallel and Distributed Computing
Ilmumisaeg: 31-Oct-2008
Kirjastus: John Wiley & Sons Inc
ISBN-10: 0470107626
ISBN-13: 9780470107621

Teised raamatud teemal:

Kõva köide
Hind: 213,33 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja
Raamatukogudele

Formaat: Hardback, 576 pages, kõrgus x laius x paksus: 239x158x31 mm, kaal: 907 g, Drawings: 143 B&W, 0 Color
Sari: Wiley Series on Parallel and Distributed Computing
Ilmumisaeg: 31-Oct-2008
Kirjastus: John Wiley & Sons Inc
ISBN-10: 0470107626
ISBN-13: 9780470107621

Teised raamatud teemal:

Püsilink: https://www.kriso.ee/db/9780470107621.html

Märksõnad:

Grid databases and parallel query processing are becoming important components in the development of database management systems, and this textbook covers the fundamentals of parallelism in data-intensive applications. Taniar (information technology, Monash U., Australia), Leung (computer science, Victoria U., Australia), Rahayu (database design, La Trobe U., Australia) and Goel (computer systems engineering, RMIT U., Australia) have created this book for researchers and practitioners working in parallel databases who need further training in grid concepts, algorithms, analytical models and transactions. Additional material is provided on parallel data mining, clustering, OLAP and replica management as well. Annotation ©2009 Book News, Inc., Portland, OR (booknews.com)

The latest techniques and principles of parallel and grid database processing

The growth in grid databases, coupled with the utility of parallel query processing, presents an important opportunity to understand and utilize high-performance parallel database processing within a major database management system (DBMS). This important new book provides readers with a fundamental understanding of parallelism in data-intensive applications, and demonstrates how to develop faster capabilities to support them. It presents a balanced treatment of the theoretical and practical aspects of high-performance databases to demonstrate how parallel query is executed in a DBMS, including concepts, algorithms, analytical models, and grid transactions.

High-Performance Parallel Database Processing and Grid Databases serves as a valuable resource for researchers working in parallel databases and for practitioners interested in building a high-performance database. It is also a much-needed, self-contained textbook for database courses at the advanced undergraduate and graduate levels.

Preface

Part I Introduction

1. Introduction

1.1. A Brief Overview: Parallel Databases and Grid Databases

1.2. Parallel Query Processing: Motivations

1.3. Parallel Query Processing: Objectives

1.3.1. Speed Up

1.3.2. Scale Up

1.3.3. Parallel Obstacles

1.4. Forms of Parallelism

1.4.1. Interquery Parallelism

1.4.2. Intraquery Parallelism

1.4.3. Intraoperation Parallelism

1.4.4. Interoperation Parallelism

1.4.5. Mixed Parallelism—A More Practical Solution

1.5. Parallel Database Architectures

1.5.1. Shared-Memory and Shared-Disk Architectures

1.5.2. Shared-Nothing Architecture

1.5.3. Shared-Something Architecture

1.5.4. Interconnection Networks

1.6. Grid Database Architecture

1.7. Structure of this Book

1.8. Summary

1.9. Bibliographical Notes

1.10. Exercises

2. Analytical Models

2.1. Cost Models

2.2. Cost Notations

2.2.1. Data Parameters

2.2.2. Systems Parameters

2.2.3. Query Parameters

2.2.4. Time Unit Costs

2.2.5. Communication Costs

2.3. Skew Model

2.4. Basic Operations in Parallel Databases

2.4.1. Disk Operations

2.4.2. Main Memory Operations

2.4.3. Data Computation and Data Distribution

2.5. Summary

2.6. Bibliographical Notes

2.7. Exercises

Part II Basic Query Parallelism

3. Parallel Search

3.1. Search Queries

3.1.1. Exact-Match Search

3.1.2. Range Search Query

3.1.3. Multiattribute Search Query

3.2. Data Partitioning

3.2.1. Basic Data Partitioning

3.2.2. Complex Data Partitioning

3.3. Search Algorithms

3.3.1. Serial Search Algorithms

3.3.2. Parallel Search Algorithms

3.4. Summary

3.5. Bibliographical Notes

3.6. Exercises

4. Parallel Sort and GroupBy

4.1. Sorting, Duplicate Removal, and Aggregate Queries

4.1.1. Sorting and Duplicate Removal

4.1.2. Scalar Aggregate

4.1.3. GroupBy

4.2. Serial External Sorting Method

4.3. Algorithms for Parallel External Sort

4.3.1. Parallel Merge-All Sort

4.3.2. Parallel Binary-Merge Sort

4.3.3. Parallel Redistribution Binary-Merge Sort

4.3.4. Parallel Redistribution Merge-All Sort

4.3.5. Parallel Partitioned Sort

4.4. Parallel Algorithms for GroupBy Queries

4.4.1. Traditional Methods (Merge-All and Hierarchical Merging)

4.4.2. Two-Phase Method

4.4.3. Redistribution Method

4.5. Cost Models for Parallel Sort

4.5.1. Cost Models for Serial External Merge-Sort

4.5.2. Cost Models for Parallel Merge-All Sort

4.5.3. Cost Models for Parallel Binary-Merge Sort

100

4.5.4. Cost Models for Parallel Redistribution Binary-Merge Sort

101

4.5.5. Cost Models for Parallel Redistribution Merge-All Sort

102

4.5.6. Cost Models for Parallel Partitioned Sort

103

4.6. Cost Models for Parallel GroupBy

104

4.6.1. Cost Models for Parallel Two-Phase Method

104

4.6.2. Cost Models for Parallel Redistribution Method

107

4.7. Summary

109

4.8. Bibliographical Notes

110

4.9. Exercises

110

5. Parallel Join

112

5.1. Join Operations

112

5.2. Serial Join Algorithms

114

5.2.1. Nested-Loop Join Algorithm

114

5.2.2. Sort-Merge Join Algorithm

116

5.2.3. Hash-Based Join Algorithm

117

5.2.4. Comparison

120

5.3. Parallel Join Algorithms

120

5.3.1. Divide and Broadcast-Based Parallel Join Algorithms

121

5.3.2. Disjoint Partitioning-Based Parallel Join Algorithms

124

5.4. Cost Models

128

5.4.1. Cost Models for Divide and Broadcast

128

5.4.2. Cost Models for Disjoint Partitioning

129

5.4.3. Cost Models for Local Join

130

5.5. Parallel Join Optimization

132

5.5.1. Optimizing Main Memory

132

5.5.2. Load Balancing

133

5.6. Summary

134

5.7. Bibliographical Notes

135

5.8. Exercises

136

Part III Advanced Parallel Query Processing

6. Parallel GroupBy-Join

141

6.1. Groupby-Join Queries

141

6.1.1. Groupby Before Join

142

6.1.2. Groupby After Join

142

6.2. Parallel Algorithms for Groupby-Before-Join Query Processing

143

6.2.1. Early Distribution Scheme

143

6.2.2. Early GroupBy with Partitioning Scheme

145

6.2.3. Early GroupBy with Replication Scheme

146

6.3. Parallel Algorithms for Groupby-After-Join Query Processing

148

6.3.1. Join Partitioning Scheme

148

6.3.2. GroupBy Partitioning Scheme

150

6.4. Cost Model Notations

151

6.5. Cost Model for Groupby-Before-Join Query Processing

153

6.5.1. Cost Models for the Early Distribution Scheme

153

6.5.2. Cost Models for the Early GroupBy with Partitioning Scheme

156

6.5.3. Cost Models for the Early GroupBy with Replication Scheme

158

6.6. Cost Model for "Groupby-After-Join" Query Processing

159

6.6.1. Cost Models for the Join Partitioning Scheme

159

6.6.2. Cost Models for the GroupBy Partitioning Scheme

161

6.7. Summary

163

6.8. Bibliographical Notes

164

6.9. Exercises

164

7. Parallel Indexing

167

7.1. Parallel Indexing–an Internal Perspective on Parallel Indexing Structures

168

7.2. Parallel Indexing Structures

169

7.2.1. Nonreplicated Indexing (NRI) Structures

169

7.2.2. Partially Replicated Indexing (PRI) Structures

171

7.2.3. Fully Replicated Indexing (FRI) Structures

178

7.3. Index Maintenance

180

7.3.1. Maintaining a Parallel Nonreplicated Index

182

7.3.2. Maintaining a Parallel Partially Replicated Index

182

7.3.3. Maintaining a Parallel Fully Replicated Index

188

7.3.4. Complexity Degree of Index Maintenance

188

7.4. Index Storage Analysis

188

7.4.1. Storage Cost Models for Uniprocessors

189

7.4.2. Storage Cost Models for Parallel Processors

191

7.5. Parallel Processing of Search Queries using Index

192

7.5.1. Parallel One-Index Search Query Processing

192

7.5.2. Parallel Multi-Index Search Query Processing

195

7.6. Parallel Index Join Algorithms

200

7.6.1. Parallel One-Index Join

200

7.6.2. Parallel Two-Index Join

203

7.7. Comparative Analysis

207

7.7.1. Comparative Analysis of Parallel Search Index

207

7.7.2. Comparative Analysis of Parallel Index Join

213

7.8. Summary

216

7.9. Bibliographical Notes

217

7.10. Exercises

217

8. Parallel Universal Qualification—Collection Join Queries

219

8.1. Universal Quantification and Collection Join

220

8.2. Collection Types and Collection Join Queries

222

8.2.1. Collection-Equi Join Queries

222

8.2.2. Collection–Intersect Join Queries

223

8.2.3. Subcollection Join Queries

224

8.3. Parallel Algorithms for Collection Join Queries

225

8.4. Parallel Collection-Equi Join Algorithms

225

8.4.1. Disjoint Data Partitioning

226

8.4.2. Parallel Double Sort-Merge Collection-Equi Join Algorithm

227

8.4.3. Parallel Sort-Hash Collection-Equi Join Algorithm

228

8.4.4. Parallel Hash Collection-Equi Join Algorithm

232

8.5. Parallel Collection-Intersect Join Algorithms

233

8.5.1. Non-Disjoint Data Partitioning

234

8.5.2. Parallel Sort-Merge Nested-Loop Collection-Intersect Join Algorithm

244

8.5.3. Parallel Sort-Hash Collection-Intersect Join Algorithm

245

8.5.4. Parallel Hash Collection-Intersect Join Algorithm

246

8.6. Parallel Subcollection Join Algorithms

246

8.6.1. Data Partitioning

247

8.6.2. Parallel Sort-Merge Nested-Loop Subcollection Join Algorithm

248

8.6.3. Parallel Sort-Hash Subcollection Join Algorithm

249

8.6.4. Parallel Hash Subcollection Join Algorithm

251

8.7. Summary

252

8.8. Bibliographical Notes

252

8.9. Exercises

254

9. Parallel Query Scheduling and Optimization

256

9.1. Query Execution Plan

257

9.2. Subqueries Execution Scheduling Strategies

259

9.2.1. Serial Execution Among Subqueries

259

9.2.2. Parallel Execution Among Subqueries

261

9.3. Serial vs. Parallel Execution Scheduling

264

9.3.1. Nonskewed Subqueries

264

9.3.2. Skewed Subqueries

265

9.3.3. Skewed and Nonskewed Subqueries

267

9.4. Scheduling Rules

269

9.5. Cluster Query Processing Model

270

9.5.1. Overview of Dynamic Query Processing

271

9.5.2. A Cluster Query Processing Architecture

272

9.5.3. Load Information Exchange

273

9.6. Dynamic Cluster Query Optimization

275

9.6.1. Correction

276

9.6.2. Migration

280

9.6.3. Partition

281

9.7. Other Approaches to Dynamic Query Optimization

284

9.8. Summary

285

9.9. Bibliographical Notes

286

9.10. Exercises

286

Part IV Grid Databases

10. Transactions in Distributed and Grid Databases

291

10.1. Grid Database Challenges

292

10.2. Distributed Database Systems and Multidatabase Systems

293

10.2.1. Distributed Database Systems

293

10.2.2. Multidatabase Systems

297

10.3. Basic Definitions on Transaction Management

299

10.4. Acid Properties of Transactions

301

10.5. Transaction Management in Various Database Systems

303

10.5.1. Transaction Management in Centralized and Homogeneous Distributed Database Systems

303

10.5.2. Transaction Management in Heterogeneous Distributed Database Systems

305

10.6. Requirements in Grid Database Systems

307

10.7. Concurrency Control Protocols

309

10.8. Atomic Commit Protocols

310

10.8.1. Homogeneous Distributed Database Systems

310

10.8.2. Heterogeneous Distributed Database Systems

313

10.9. Replica Synchronization Protocols

314

10.9.1. Network Partitioning

315

10.9.2. Replica Synchronization Protocols

316

10.10. Summary

318

10.11. Bibliographical Notes

318

10.12. Exercises

319

11. Grid Concurrency Control

321

11.1. A Grid Database Environment

321

11.2. An Example

322

11.3. Grid Concurrency Control

324

11.3.1. Basic Functions Required by GCC

324

11.3.2. Grid Serializability Theorem

325

11.3.3. Grid Concurrency Control Protocol

329

11.3.4. Revisiting the Earlier Example

333

11.3.5. Comparison with Traditional Concurrency Control Protocols

334

11.4. Correctness of GCC Protocol

336

11.5. Features of GCC Protocol

338

11.6. Summary

339

11.7. Bibliographical Notes

339

11.8. Exercises

339

12. Grid Transaction Atomicity and Durability

341

12.1. Motivation

342

12.2. Grid Atomic Commit Protocol (Grid-ACP)

343

12.2.1. State Diagram of Grid-ACP

343

12.2.2. Grid-ACP Algorithm

344

12.2.3. Early-Abort Grid-ACP

346

12.2.4. Discussion

348

12.2.5. Message and Time Complexity Comparison Analysis

349

12.2.6. Correctness of Grid-ACP

350

12.3. Handling Failure of Sites with Grid-ACP

351

12.3.1. Model for Storing Log Files at the Originator and Participating Sites

351

12.3.2. Logs Required at the Originator Site

352

12.3.3. Logs Required at the Participant Site

353

12.3.4. Failure Recovery Algorithm for Grid-ACP

353

12.3.5. Comparison of Recovery Protocols

359

12.3.6. Correctness of Recovery Algorithm

361

12.4. Summary

365

12.5. Bibliographical Notes

366

12.6. Exercises

366

13. Replica Management in Grids

367

13.1. Motivation

367

13.2. Replica Architecture

368

13.2.1. High-Level Replica Management Architecture

368

13.2.2. Some Problems

369

13.3. Grid Replica Access Protocol (GRAP)

371

13.3.1. Read Transaction Operation for GRAP

371

13.3.2. Write Transaction Operation for GRAP

372

13.3.3. Revisiting the Example Problem

375

13.3.4. Correctness of GRAP

377

13.4. Handling Multiple Partitioning

378

13.4.1. Contingency GRAP

378

13.4.2. Comparison of Replica Management Protocols

381

13.4.3. Correctness of Contingency GRAP

383

13.5. Summary

384

13.6. Bibliographical Notes

385

13.7. Exercises

385

14. Grid Atomic Commitment in Replicated Data

387

14.1. Motivation

388

14.1.1. Architectural Reasons

388

14.1.2. Motivating Example

388

14.2. Modified Grid Atomic Commitment Protocol

390

14.2.1. Modified Grid-ACP

390

14.2.2. Correctness of Modified Grid-ACP

393

14.3. Transaction Properties in Replicated Environment

395

14.4. Summary

397

14.5. Bibliographical Notes

397

14.6. Exercises

398

Part V Other Data-Intensive Applications

15. Parallel Online Analytic Processing (OLAP) and Business Intelligence

401

15.1. Parallel Multidimensional Analysis

402

15.2. Parallelization of ROLLUP Queries

405

15.2.1. Analysis of Basic Single ROLLUP Queries

405

15.2.2. Analysis of Multiple ROLLUP Queries

409

15.2.3. Analysis of Partial ROLLUP Queries

411

15.2.4. Parallelization Without Using ROLLUP

412

15.3. Parallelization of CUBE Queries

412

15.3.1. Analysis of Basic CUBE Queries

413

15.3.2. Analysis of Partial CUBE Queries

416

15.3.3. Parallelization Without Using CUBE

417

15.4. Parallelization of Top-N and Ranking Queries

418

15.5. Parallelization of Cume_Dist Queries

419

15.6. Parallelization of NTILE and Histogram Queries

420

15.7. Parallelization of Moving Average and Windowing Queries

422

15.8. Summary

424

15.9. Bibliographical Notes

424

15.10. Exercises

425

16. Parallel Data Mining—Association Rules and Sequential Patterns

427

16.1. From Databases To Data Warehousing To Data Mining: A Journey

428

16.2. Data Mining: A Brief Overview

431

16.2.1. Data Mining Tasks

431

16.2.2. Querying vs. Mining

433

16.2.3. Parallelism in Data Mining

436

16.3. Parallel Association Rules

440

16.3.1. Association Rules: Concepts

441

16.3.2. Association Rules: Processes

444

16.3.3. Association Rules: Parallel Processing

448

16.4. Parallel Sequential Patterns

450

16.4.1. Sequential Patterns: Concepts

452

16.4.2. Sequential Patterns: Processes

456

16.4.3. Sequential Patterns: Parallel Processing

459

16.5. Summary

461

16.6. Bibliographical Notes

461

16.7. Exercises

462

17. Parallel Clustering and Classification

464

17.1. Clustering and Classification

464

17.1.1. Clustering

464

17.1.2. Classification

465

17.2. Parallel Clustering

467

17.2.1. Clustering: Concepts

467

17.2.2. k-Means Algorithm

468

17.2.3. Parallel k-Means Clustering

471

17.3. Parallel Classification

477

17.3.1. Decision Tree Classification: Structures

477

17.3.2. Decision Tree Classification: Processes

480

17.3.3. Decision Tree Classification: Parallel Processing

488

17.4. Summary

495

17.5. Bibliographical Notes

498

17.6. Exercises

498

Permissions

501

List of Conferences and Journals

507

Bibliography

511

Index

541

David Taniar, PhD, lectures in information technology at Monash University, Australia. Dr. Taniar has published extensively in the field of high- performance parallel databases and is the Editor in Chief of the International Journal of Data Warehousing and Mining. Clement H. C. Leung, PhD, is Foundation Chair in Computer Science at Victoria University, Australia. Dr. Leung previously held the Established Chair in Computer Science at the University of London.

Wenny Rahayu, PhD, is Associate Professor at La Trobe University, Australia, and actively works in the areas of database design and implementation, covering object-relational databases and Web databases.

Sushant Goel, PhD, is a software consultant and holds a PhD in computer systems engineering from RMIT University, Australia. His research interests are in grid transaction management and software development processes, such as agile computing.

High-Performance Parallel Database Processing and Grid Databases [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv