Muutke küpsiste eelistusi

Frontiers in Massive Data Analysis [Pehme köide]

  • Formaat: Paperback / softback, 190 pages, kõrgus x laius: 229x152 mm
  • Ilmumisaeg: 03-Oct-2013
  • Kirjastus: National Academies Press
  • ISBN-10: 0309287782
  • ISBN-13: 9780309287784
Teised raamatud teemal:
  • Formaat: Paperback / softback, 190 pages, kõrgus x laius: 229x152 mm
  • Ilmumisaeg: 03-Oct-2013
  • Kirjastus: National Academies Press
  • ISBN-10: 0309287782
  • ISBN-13: 9780309287784
Teised raamatud teemal:
Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data.



Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scaleterabytes and petabytesis increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledgefrom computer science, statistics, machine learning, and application disciplinesthat must be brought to bear to make useful inferences from massive data.

Table of Contents



Front Matter Summary 1 Introduction 2 Massive Data in Science, Technology, Commerce, National Defense, Telecommunications, and Other Endeavors 3 Scaling the Infrastructure for Data Management 4 Temporal Data and Real-Time Algorithms 5 Large-Scale Data Representations 6 Resources, Trade-offs, and Limitations 7 Building Models from Massive Data 8 Sampling and Massive Data 9 Human Interaction with Data 10 The Seven Computational Giants of Massive Data Analysis 11 Conclusions Appendixes Appendix A: Acronyms Appendix B: Biographical Sketches of Committee Members
Summary 1(10)
1 Introduction
11(11)
The Challenge
11(6)
What Has Changed in Recent Years?
17(2)
Organization of This Report
19(2)
References
21(1)
2 Massive Data In Science, Technology, Commerce, National Defense, Telecommunications, And Other Endeavors
22(19)
Where Are Massive Data Appearing?
22(2)
Challenges to the Analysis of Massive Data
24(1)
Trends in Massive Data Analysis
25(4)
Examples
29(10)
References
39(2)
3 Scaling The Infrastructure For Data Management
41(17)
Scaling the Number of Data Sets
41(3)
Scaling Computing Technology through Distributed and Parallel Systems
44(11)
Trends and Future Research
55(1)
References
56(2)
4 Temporal Data And Real-Time Algorithms
58(8)
Introduction
58(1)
Data Acquisition
59(2)
Data Processing, Representation, and Inference
61(2)
System and Hardware for Temporal Data Sets
63(1)
Challenges
63(1)
References
64(2)
5 Large-Scale Data Representations
66(16)
Overview
66(1)
Goals of Data Representation
67(6)
Challenges and Future Directions
73(6)
References
79(3)
6 Resources, Trade-Offs, And Limitations
82(11)
Introduction
82(1)
Relevant Aspects of Theoretical Computer Science
83(4)
Gaps and Opportunities
87(4)
References
91(2)
7 Building Models From Massive Data
93(27)
Introduction to Statistical Models
93(6)
Data Cleaning
99(2)
Classes of Models
101(6)
Model Tuning and Evaluation
107(5)
Challenges
112(6)
References
118(2)
8 Sampling And Massive Data
120(13)
Common Techniques of Statistical Sampling
120(7)
Challenges When Sampling from Massive Data
127(4)
References
131(2)
9 Human Interaction With Data
133(13)
Introduction
133(2)
State of the Art
135(4)
Hybrid Human/Computer Data Analysis
139(2)
Opportunities, Challenges, and Directions
141(3)
References
144(2)
10 The Seven Computational Giants Of Massive Data Analysis
146(15)
Basic Statistics
148(1)
Generalized N-Body Problems
149(1)
Graph-Theoretic Computations
150(2)
Linear Algebraic Computations
152(1)
Optimizations
153(1)
Integration
154(1)
Alignment Problems
154(1)
Discussion
155(1)
References
156(5)
11 Conclusions
161(8)
APPENDIXES
A Acronyms
169(2)
B Biographical Sketches Of Committee Members
171