Muutke küpsiste eelistusi

E-raamat: Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics

(Goldsmiths University of London, United Kingdom)
  • Formaat - PDF+DRM
  • Hind: 59,79 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

"Data Science Foundations is most welcome and, indeed, a piece of literature that the field is very much in need ofquite different from most data analytics texts which largely ignore foundational concepts and simply present a cookbook of methodsa very useful text and I would certainly use it in my teaching." - Mark Girolami, Warwick University

Data Science encompasses the traditional disciplines of mathematics, statistics, data analysis, machine learning, and pattern recognition. This book is designed to provide a new framework for Data Science, based on a solid foundation in mathematics and computational science. It is written in an accessible style, for readers who are engaged with the subject but not necessarily experts in all aspects. It includes a wide range of case studies from diverse fields, and seeks to inspire and motivate the reader with respect to data, associated information, and derived knowledge.

Arvustused

"Fionn Murtagh new book is an advanced text in data science which is highly recommended for those seeking for new directions in the field. From the use of ultrametric spaces for modeling the human mind to the study of narratives through hierarchical structures, this book is thought provoking and intellectually challenging." Prof. Y. Neuman, Ben-Gurion University of the Negev, author of Introduction to Computational Cultural Psychology

"Overall, I think this book bring new insights in data science. Many books can be found for the basics of data science. In this book, on the contrary, the approach which is discussed goes a step further. This book is quite technical in some parts and some mathematical background will help the readers to understand the details provided in some chapters. However, since R code is provided, as well as many illustrative examples, practitioners should also find their groove. The book contains many illustrative examples but also theory. It can thus be interesting for readers with different back-grounds. Theoretical-oriented readers will find cues on why it works while practical-oriented readers will find some ways and cues on how to handle their data to get the best of it." Josiane Mothe, Université de Toulouse, IRIT-CNRS

"An intriguing book, and one that is set apart from the mainstream "big data analytics" texts, Data Science Foundations is most welcome and, indeed, a piece of literature that the field is very much in need of. Murtagh presents the geometric ideas of metric and ultrametric spaces in a very innovative way, quite different to the more formal and dry mathematical presentations of these types of concepts. This book is also quite different from most data analytics texts which largely ignore foundational concepts and simply present a cookbook of methodsGeometry, topology, metric mapping, random projections, and applications to chemical analysis data challenge the reader out of the comfort zone of superficial data analytics methods. This is a very useful text and I would certainly use it in my teaching." Mark Girolami, Warwick University

Preface xiii
I Narratives from Film and Literature, from Social Media and Contemporary Life 1(42)
1 The Correspondence Analysis Platform for Mapping Semantics
3(22)
1.1 The Visualization and Verbalization of Data
3(1)
1.2 Analysis of Narrative from Film and Drama
4(7)
1.2.1 Introduction
4(1)
1.2.2 The Changing Nature of Movie and Drama
4(1)
1.2.3 Correspondence Analysis as a Semantic Analysis Platform
5(1)
1.2.4 Casablanca Narrative: Illustrative Analysis
5(1)
1.2.5 Modelling Semantics via the Geometry and Topology of Information
6(2)
1.2.6 Casablanca Narrative: Illustrative Analysis Continued
8(1)
1.2.7 Platform for Analysis of Semantics
8(2)
1.2.8 Deeper Look at Semantics of Casablanca: Text Mining
10(1)
1.2.9 Analysis of a Pivotal Scene
10(1)
1.3 Application of Narrative Analysis to Science and Engineering Research
11(8)
1.3.1 Assessing Coverage and Completeness
12(2)
1.3.2 Change over Time
14(1)
1.3.3 Conclusion on the Policy Case Studies
15(4)
1.4 Human Resources Multivariate Performance Grading
19(2)
1.5 Data Analytics as the Narrative of the Analysis Processing
21(1)
1.6 Annex: The Correspondence Analysis and Hierarchical Clustering Platform
21(4)
1.6.1 Analysis Chain
21(1)
1.6.2 Correspondence Analysis: Mapping X2 Distances into Euclidean Distances
22(1)
1.6.3 Input: Cloud of Points Endowed with the Chi-Squared Metric
22(1)
1.6.4 Output: Cloud of Points Endowed with the Euclidean Metric in Factor Space
23(1)
1.6.5 Supplementary Elements: Information Space Fusion
23(1)
1.6.6 Hierarchical Clustering: Sequence-Constrained
24(1)
2 Analysis and Synthesis of Narrative: Semantics of Interactivity
25(18)
2.1 Impact and Effect in Narrative: A Shock Occurrence in Social Media
25(7)
2.1.1 Analysis
25(1)
2.1.2 Two Critical Tweets in Terms of Their Words
26(1)
2.1.3 Two Critical Tweets in Terms of Twitter Sub-narratives
26(6)
2.2 Analysis and Synthesis, Episodization and Narrativization
32(1)
2.3 Storytelling as Narrative Synthesis and Generation
33(2)
2.4 Machine Learning and Data Mining in Film Script Analysis
35(1)
2.5 Style Analytics: Statistical Significance of Style Features
36(1)
2.6 Typicality and Atypicality for Narrative Summarization and Transcoding
37(3)
2.7 Integration and Assembling of Narrative
40(3)
II Foundations of Analytics through the Geometry and Topology of Complex Systems 43(42)
3 Symmetry in Data Mining and Analysis through Hierarchy
45(24)
3.1 Analytics as the Discovery of Hierarchical Symmetries in Data
45(1)
3.2 Introduction to Hierarchical Clustering, p-Adic and m-Adic Numbers
45(3)
3.2.1 Structure in Observed or Measured Data
46(1)
3.2.2 Brief Look Again at Hierarchical Clustering
46(1)
3.2.3 Brief Introduction to p-Adic Numbers
47(1)
3.2.4 Brief Discussion of p-Adic and m-Adic Numbers
47(1)
3.3 Ultrametric Topology
48(4)
3.3.1 Ultrametric Space for Representing Hierarchy
48(1)
3.3.2 Geometrical Properties of Ultrametric Spaces
48(1)
3.3.3 Ultrametric Matrices and Their Properties
48(2)
3.3.4 Clustering through Matrix Row and Column Permutation
50(1)
3.3.5 Other Data Symmetries
51(1)
3.4 Generalized Ultrametric and Formal Concept Analysis
52(2)
3.4.1 Link with Formal Concept Analysis
52(2)
3.4.2 Applications of Generalized Ultrametrics
54(1)
3.5 Hierarchy in a p-Adic Number System
54(4)
3.5.1 p-Adic Encoding of a Dendrogram
54(3)
3.5.2 p-Adic Distance on a Dendrogram
57(1)
3.5.3 Scale-Related Symmetry
58(1)
3.6 Tree Symmetries through the Wreath Product Group
58(4)
3.6.1 Wreath Product Group for Hierarchical Clustering
58(1)
3.6.2 Wreath Product Invariance
59(1)
3.6.3 Wreath Product Invariance: Haar Wavelet Transform of Dendrogram
60(2)
3.7 Tree and Data Stream Symmetries from Permutation Groups
62(2)
3.7.1 Permutation Representation of a Data Stream
62(1)
3.7.2 Permutation Representation of a Hierarchy
63(1)
3.8 Remarkable Symmetries in Very High-Dimensional Spaces
64(1)
3.9 Short Commentary on This
Chapter
65(4)
4 Geometry and Topology of Data Analysis: in p-Adic Terms
69(16)
4.1 Numbers and Their Representations
69(2)
4.1.1 Series Representations of Numbers
69(1)
4.1.2 Field
70(1)
4.2 p-Adic Valuation, p-Adic Absolute Value, p-Adic Norm
71(1)
4.3 p-Adic Numbers as Series Expansions
72(1)
4.4 Canonical p-Adic Expansion; p-Adic Integer or Unit Ball
73(1)
4.5 Non-Archimedean Norms as p-Adic Integer Norms in the Unit Ball
74(1)
4.5.1 Archimedean and Non-Archimedean Absolute Value Properties
74(1)
4.5.2 A Non-Archimedean Absolute Value, or Norm, is Less Than or Equal to One, and an Archimedean Absolute Value, or Norm, is Unbounded
74(1)
4.6 Going Further: Negative p-Adic Numbers, and p-Adic Fractions
75(1)
4.7 Number Systems in the Physical and Natural Sciences
76(1)
4.8 p-Adic Numbers in Computational Biology and Computer Hardware
77(1)
4.9 Measurement Requires a Norm, Implying Distance and Topology
78(1)
4.10 Ultrametric Topology
79(1)
4.11 Short Review of p-Adic Cosmology
80(1)
4.12 Unbounded Increase in Mass or Other Measured Quantity
81(1)
4.13 Scale-Free Partial Order or Hierarchical Systems
81(2)
4.14 p-Adic Indexing of the Sphere
83(1)
4.15 Diffusion and Other Dynamic Processes in Ultrametric Spaces
83(2)
III New Challenges and New Solutions for Information Search and Discovery 85(46)
5 Fast, Linear Time, m-Adic Hierarchical Clustering
87(16)
5.1 Pervasive Ultrametricity: Computational Consequences
87(2)
5.1.1 Ultrametrics in Data Analytics
87(1)
5.1.2 Quantifying Ultrametricity
88(1)
5.1.3 Pervasive Ultrametricity
88(1)
5.1.4 Computational Implications
89(1)
5.2 Applications in Search and Discovery using the Baire Metric
89(2)
5.2.1 Baire Metric
89(1)
5.2.2 Large Numbers of Observables
89(1)
5.2.3 High-Dimensional Data
90(1)
5.2.4 First Approach Based on Reduced Precision of Measurement
90(1)
5.2.5 Random Projections in High-Dimensional Spaces, Followed by the Baire Distance
91(1)
5.2.6 Summary Comments on Search and Discovery
91(1)
5.3 m-Adic Hierarchy and Construction
91(1)
5.4 The Bare Metric, the Baire Ultrametric
92(2)
5.4.1 Metric and Ultrametric Spaces
92(1)
5.4.2 Ultrametric Baire Space and Distance
93(1)
5.5 Multidimensional Use of the Baire Metric through Random Projections
94(1)
5.6 Hierarchical Tree Defined from m-Adic Encoding
95(1)
5.7 Longest Common Prefix and Hashing
96(1)
5.7.1 From Random Projection to Hashing
96(1)
5.8 Enhancing Ultrametricity through Precision of Measurement
97(2)
5.8.1 Quantifying Ultrametricity
97(1)
5.8.2 Pervasiveness of Ultrametricity
98(1)
5.9 Generalized Ultrametric and Formal Concept Analysis
99(1)
5.9.1 Generalized Ultrametric
99(1)
5.9.2 Formal Concept Analysis
99(1)
5.10 Linear Time and Direct Reading Hierarchical Clustering
100(1)
5.10.1 Linear Time, or 0(N) Computational Complexity, Hierarchical Clustering
100(1)
5.10.2 Grid-Based Clustering Algorithms
100(1)
5.11 Summary: Many Viewpoints, Various Implementations
101(2)
6 Big Data Scaling through Metric Mapping
103(28)
6.1 Mean Random Projection, Marginal Sum, Seriation
104(4)
6.1.1 Mean of Random Projections as A Seriation
105(2)
6.1.2 Normalization of the Random Projections
107(1)
6.2 Ultrametric and Ordering of Rows, Columns
108(1)
6.3 Power Iteration Clustering
108(2)
6.4 Input Data for Eigenreduction
110(1)
6.4.1 Implementation: Equivalence of Iterative Approximation and Batch Calculation
110(1)
6.5 Inducing a Hierarchical Clustering from Seriation
111(1)
6.6 Short Summary of All These Methodological Underpinnings
112(1)
6.6.1 Trivial First Eigenvalue, Eigenvector in Correspondence Analysis
112(1)
6.7 Very High-Dimensional Data Spaces: Data Piling
113(1)
6.8 Recap on Correspondence Analysis for Following Applications
114(3)
6.8.1 Clouds of Points, Masses and Inertia
115(1)
6.8.2 Relative and Absolute Contributions
116(1)
6.9 Evaluation 1: Uniformly Distributed Data Cloud Points
117(1)
6.9.1 Computation Time Requirements
118(1)
6.10 Evaluation 2: Time Series of Financial Futures
118(2)
6.11 Evaluation 3: Chemistry Data, Power Law Distributed
120(4)
6.11.1 Data and Determining Power Law Properties
120(1)
6.11.2 Randomly Generating Power Law Distributed Data in Varying Embedding Dimensions
120(4)
6.12 Application 1: Quantifying Effectiveness through Aggregate Outcome
124(1)
6.12.1 Computational Requirements, from Original Space and Factor Space Identities
124(1)
6.13 Application 2: Data Piling as Seriation of Dual Space
125(1)
6.14 Brief Concluding Summary
126(1)
6.15 Annex: R Software Used in Simulations and Evaluations
126(7)
6.15.1 Evaluation 1: Dense, Uniformly Distributed Data
127(1)
6.15.2 Evaluation 2: Financial Futures
128(1)
6.15.3 Evaluation 3: Chemicals of Specified Marginal Distribution
129(2)
IV New Frontiers: New Vistas on Information, Cognition and the Human Mind 131(56)
7 On Ultrametric Algorithmic Information
133(14)
7.1 Introduction to Information Measures
133(1)
7.2 Wavelet Transform of a Set of Points Endowed with an Ultrametric
134(3)
7.3 An Object as a Chain of Successively Finer Approximations
137(2)
7.3.1 Approximation Chain using a Hierarchy
138(1)
7.3.2 Dendrogram Wavelet Transform of Spherically Complete Space
138(1)
7.4 Generating Faces: Case Study Using a Simplified Model
139(4)
7.4.1 A Simplified Model of Face Generation
139(4)
7.4.2 Discussion of Psychological and Other Consequences
143(1)
7.5 Complexity of an Object: Hierarchical Information
143(1)
7.6 Consequences Arising from This
Chapter
144(3)
8 Geometry and Topology of Matte Blanco's Bi-Logic in Psychoanalytics
147(16)
8.1 Approaching Data and the Object of Study, Mental Processes
147(5)
8.1.1 Historical Role of Psychometrics and Mathematical Psychology
148(1)
8.1.2 Summary of
Chapter Content
148(1)
8.1.3 Determining Depth of Emotion, and Tracking Emotion
148(4)
8.2 Matte Blanco's Psychoanalysis: A Selective Review
152(3)
8.3 Real World, Metric Space: Context for Asymmetric Mental Processes
155(1)
8.4 Ultrametric Topology, Background and Relevance in Psychoanalysis
156(3)
8.4.1 Ultrametric
156(1)
8.4.2 Inducing an Ultrametric through Agglomerative Hierarchical Clustering
157(1)
8.4.3 Transitions from Metric to Ultrametric Representation, and Vice Versa, through Data Transformation
157(1)
8.4.4 Practical Applications
158(1)
8.5 Conclusion: Analytics of Human Mental Processes
159(1)
8.6 Annex 1: Far Greater Computational Power of Unconscious Mental Processes
160(1)
8.7 Annex 2: Text Analysis as a Proxy for Both Facets of Bi-Logic
161(2)
9 Ultrametric Model of Mind: Application to Text Content Analysis
163(18)
9.1 Introduction
163(1)
9.2 Quantifying Ultrametricity
164(3)
9.2.1 Ultrametricity Coefficient of Lerman
164(1)
9.2.2 Ultrametricity Coefficient of Rammal, Toulouse and Virasoro
164(1)
9.2.3 Ultrametricity Coefficients of Treves and of Hartman
165(1)
9.2.4 Bayesian Network Modelling
165(1)
9.2.5 Our Ultrametricity Coefficient
165(1)
9.2.6 What the Ultrametricity Coefficient Reveals
166(1)
9.3 Semantic Mapping: Interrelationships to Euclidean, Factor Space
167(3)
9.3.1 Correspondence Analysis: Mapping X2 into Euclidean Distances
167(1)
9.3.2 Input: Cloud of Points Endowed with the Chi-Squared Metric
167(1)
9.3.3 Output; Cloud of Points Endowed with the Euclidean Metric in Factor Space
168(1)
9.3.4 Conclusions on Correspondence Analysis and Introduction to the Numerical Experiments to Follow
169(1)
9.4 Determining Ultrametricity through Text Unit Interrelationships
170(4)
9.4.1 Brothers Grimm
170(1)
9.4.2 Jane Austen
171(1)
9.4.3 Air Accident Reports
172(1)
9.4.4 DreamBank
172(2)
9.5 Ultrametric Properties of Words
174(3)
9.5.1 Objectives and Choice of Data
174(1)
9.5.2 General Discussion of Ultrametricity of Words
175(1)
9.5.3 Conclusions on the Word Analysis
175(2)
9.6 Concluding Comments on this
Chapter
177(1)
9.7 Annex 1: Pseudo-Code for Assessing Ultrametric-Respecting Triplet
177(1)
9.8 Annex 2: Bradley Ultrametricity Coefficient
178(3)
10 Concluding Discussion on Software Environments
181(6)
10.1 Introduction
181(1)
10.2 Complementary Use with Apache Solr (and Lucene)
182(1)
10.3 In Summary: Treating Massive Data Sets with Correspondence Analysis
182(3)
10.3.1 Aggregating Similar or Identical Profiles Is Welcome
182(1)
10.3.2 Resolution Level of the Analysis Carried Out
183(1)
10.3.3 Random Projections in Order to Benefit from Data Piling in High Dimensions
183(1)
10.3.4 Massive Observation Cardinality, Moderate Sized Dimensionality
184(1)
10.4 Concluding Notes
185(2)
Bibliography 187(16)
Index 203
Fionn Murtagh's very first post after his PhD was educational research at a national level, followed by nuclear energy risk assessment. He then worked for a dozen years on the Hubble Space Telescope, as a European Space Agency Senior Scientist. Following many Professor of Computer Science positions, teaching and research, and senior management positions in Ireland, France, USA and UK, he is very happy now to be advancing data science as Professor of Data Science, and Director, Centre for Mathematics and Data Science, at the University of Huddersfield.