Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics

Fionn Murtagh (Goldsmiths University of London, United Kingdom)

Formaat: 224 pages
Sari: Chapman & Hall/CRC Computer Science & Data Analysis
Ilmumisaeg: 22-Sep-2017
Kirjastus: Chapman & Hall/CRC
ISBN-13: 9781498763943

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 59,79 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 224 pages
Sari: Chapman & Hall/CRC Computer Science & Data Analysis
Ilmumisaeg: 22-Sep-2017
Kirjastus: Chapman & Hall/CRC
ISBN-13: 9781498763943

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

"Data Science Foundations is most welcome and, indeed, a piece of literature that the field is very much in need ofquite different from most data analytics texts which largely ignore foundational concepts and simply present a cookbook of methodsa very useful text and I would certainly use it in my teaching." - Mark Girolami, Warwick University

Data Science encompasses the traditional disciplines of mathematics, statistics, data analysis, machine learning, and pattern recognition. This book is designed to provide a new framework for Data Science, based on a solid foundation in mathematics and computational science. It is written in an accessible style, for readers who are engaged with the subject but not necessarily experts in all aspects. It includes a wide range of case studies from diverse fields, and seeks to inspire and motivate the reader with respect to data, associated information, and derived knowledge.

Arvustused

"Fionn Murtagh new book is an advanced text in data science which is highly recommended for those seeking for new directions in the field. From the use of ultrametric spaces for modeling the human mind to the study of narratives through hierarchical structures, this book is thought provoking and intellectually challenging." Prof. Y. Neuman, Ben-Gurion University of the Negev, author of Introduction to Computational Cultural Psychology

"Overall, I think this book bring new insights in data science. Many books can be found for the basics of data science. In this book, on the contrary, the approach which is discussed goes a step further. This book is quite technical in some parts and some mathematical background will help the readers to understand the details provided in some chapters. However, since R code is provided, as well as many illustrative examples, practitioners should also find their groove. The book contains many illustrative examples but also theory. It can thus be interesting for readers with different back-grounds. Theoretical-oriented readers will find cues on why it works while practical-oriented readers will find some ways and cues on how to handle their data to get the best of it." Josiane Mothe, Université de Toulouse, IRIT-CNRS

"An intriguing book, and one that is set apart from the mainstream "big data analytics" texts, Data Science Foundations is most welcome and, indeed, a piece of literature that the field is very much in need of. Murtagh presents the geometric ideas of metric and ultrametric spaces in a very innovative way, quite different to the more formal and dry mathematical presentations of these types of concepts. This book is also quite different from most data analytics texts which largely ignore foundational concepts and simply present a cookbook of methodsGeometry, topology, metric mapping, random projections, and applications to chemical analysis data challenge the reader out of the comfort zone of superficial data analytics methods. This is a very useful text and I would certainly use it in my teaching." Mark Girolami, Warwick University

Preface

xiii

I Narratives from Film and Literature, from Social Media and Contemporary Life

(42)

1 The Correspondence Analysis Platform for Mapping Semantics

(22)

1.1 The Visualization and Verbalization of Data

(1)

1.2 Analysis of Narrative from Film and Drama

(7)

1.2.1 Introduction

(1)

1.2.2 The Changing Nature of Movie and Drama

(1)

1.2.3 Correspondence Analysis as a Semantic Analysis Platform

(1)

1.2.4 Casablanca Narrative: Illustrative Analysis

(1)

1.2.5 Modelling Semantics via the Geometry and Topology of Information

(2)

1.2.6 Casablanca Narrative: Illustrative Analysis Continued

(1)

1.2.7 Platform for Analysis of Semantics

(2)

1.2.8 Deeper Look at Semantics of Casablanca: Text Mining

(1)

1.2.9 Analysis of a Pivotal Scene

(1)

1.3 Application of Narrative Analysis to Science and Engineering Research

(8)

1.3.1 Assessing Coverage and Completeness

(2)

1.3.2 Change over Time

(1)

1.3.3 Conclusion on the Policy Case Studies

(4)

1.4 Human Resources Multivariate Performance Grading

(2)

1.5 Data Analytics as the Narrative of the Analysis Processing

(1)

1.6 Annex: The Correspondence Analysis and Hierarchical Clustering Platform

(4)

1.6.1 Analysis Chain

(1)

1.6.2 Correspondence Analysis: Mapping X2 Distances into Euclidean Distances

(1)

1.6.3 Input: Cloud of Points Endowed with the Chi-Squared Metric

(1)

1.6.4 Output: Cloud of Points Endowed with the Euclidean Metric in Factor Space

(1)

1.6.5 Supplementary Elements: Information Space Fusion

(1)

1.6.6 Hierarchical Clustering: Sequence-Constrained

(1)

2 Analysis and Synthesis of Narrative: Semantics of Interactivity

(18)

2.1 Impact and Effect in Narrative: A Shock Occurrence in Social Media

(7)

2.1.1 Analysis

(1)

2.1.2 Two Critical Tweets in Terms of Their Words

(1)

2.1.3 Two Critical Tweets in Terms of Twitter Sub-narratives

(6)

2.2 Analysis and Synthesis, Episodization and Narrativization

(1)

2.3 Storytelling as Narrative Synthesis and Generation

(2)

2.4 Machine Learning and Data Mining in Film Script Analysis

(1)

2.5 Style Analytics: Statistical Significance of Style Features

(1)

2.6 Typicality and Atypicality for Narrative Summarization and Transcoding

(3)

2.7 Integration and Assembling of Narrative

(3)

II Foundations of Analytics through the Geometry and Topology of Complex Systems

(42)

3 Symmetry in Data Mining and Analysis through Hierarchy

(24)

3.1 Analytics as the Discovery of Hierarchical Symmetries in Data

(1)

3.2 Introduction to Hierarchical Clustering, p-Adic and m-Adic Numbers

(3)

3.2.1 Structure in Observed or Measured Data

(1)

3.2.2 Brief Look Again at Hierarchical Clustering

(1)

3.2.3 Brief Introduction to p-Adic Numbers

(1)

3.2.4 Brief Discussion of p-Adic and m-Adic Numbers

(1)

3.3 Ultrametric Topology

(4)

3.3.1 Ultrametric Space for Representing Hierarchy

(1)

3.3.2 Geometrical Properties of Ultrametric Spaces

(1)

3.3.3 Ultrametric Matrices and Their Properties

(2)

3.3.4 Clustering through Matrix Row and Column Permutation

(1)

3.3.5 Other Data Symmetries

(1)

3.4 Generalized Ultrametric and Formal Concept Analysis

(2)

3.4.1 Link with Formal Concept Analysis

(2)

3.4.2 Applications of Generalized Ultrametrics

(1)

3.5 Hierarchy in a p-Adic Number System

(4)

3.5.1 p-Adic Encoding of a Dendrogram

(3)

3.5.2 p-Adic Distance on a Dendrogram

(1)

3.5.3 Scale-Related Symmetry

(1)

3.6 Tree Symmetries through the Wreath Product Group

(4)

3.6.1 Wreath Product Group for Hierarchical Clustering

(1)

3.6.2 Wreath Product Invariance

(1)

3.6.3 Wreath Product Invariance: Haar Wavelet Transform of Dendrogram

(2)

3.7 Tree and Data Stream Symmetries from Permutation Groups

(2)

3.7.1 Permutation Representation of a Data Stream

(1)

3.7.2 Permutation Representation of a Hierarchy

(1)

3.8 Remarkable Symmetries in Very High-Dimensional Spaces

(1)

3.9 Short Commentary on This
Chapter

(4)

4 Geometry and Topology of Data Analysis: in p-Adic Terms

(16)

4.1 Numbers and Their Representations

(2)

4.1.1 Series Representations of Numbers

(1)

4.1.2 Field

(1)

4.2 p-Adic Valuation, p-Adic Absolute Value, p-Adic Norm

(1)

4.3 p-Adic Numbers as Series Expansions

(1)

4.4 Canonical p-Adic Expansion; p-Adic Integer or Unit Ball

(1)

4.5 Non-Archimedean Norms as p-Adic Integer Norms in the Unit Ball

(1)

4.5.1 Archimedean and Non-Archimedean Absolute Value Properties

(1)

4.5.2 A Non-Archimedean Absolute Value, or Norm, is Less Than or Equal to One, and an Archimedean Absolute Value, or Norm, is Unbounded

(1)

4.6 Going Further: Negative p-Adic Numbers, and p-Adic Fractions

(1)

4.7 Number Systems in the Physical and Natural Sciences

(1)

4.8 p-Adic Numbers in Computational Biology and Computer Hardware

(1)

4.9 Measurement Requires a Norm, Implying Distance and Topology

(1)

4.10 Ultrametric Topology

(1)

4.11 Short Review of p-Adic Cosmology

(1)

4.12 Unbounded Increase in Mass or Other Measured Quantity

(1)

4.13 Scale-Free Partial Order or Hierarchical Systems

(2)

4.14 p-Adic Indexing of the Sphere

(1)

4.15 Diffusion and Other Dynamic Processes in Ultrametric Spaces

(2)

III New Challenges and New Solutions for Information Search and Discovery

(46)

5 Fast, Linear Time, m-Adic Hierarchical Clustering

(16)

5.1 Pervasive Ultrametricity: Computational Consequences

(2)

5.1.1 Ultrametrics in Data Analytics

(1)

5.1.2 Quantifying Ultrametricity

(1)

5.1.3 Pervasive Ultrametricity

(1)

5.1.4 Computational Implications

(1)

5.2 Applications in Search and Discovery using the Baire Metric

(2)

5.2.1 Baire Metric

(1)

5.2.2 Large Numbers of Observables

(1)

5.2.3 High-Dimensional Data

(1)

5.2.4 First Approach Based on Reduced Precision of Measurement

(1)

5.2.5 Random Projections in High-Dimensional Spaces, Followed by the Baire Distance

(1)

5.2.6 Summary Comments on Search and Discovery

(1)

5.3 m-Adic Hierarchy and Construction

(1)

5.4 The Bare Metric, the Baire Ultrametric

(2)

5.4.1 Metric and Ultrametric Spaces

(1)

5.4.2 Ultrametric Baire Space and Distance

(1)

5.5 Multidimensional Use of the Baire Metric through Random Projections

(1)

5.6 Hierarchical Tree Defined from m-Adic Encoding

(1)

5.7 Longest Common Prefix and Hashing

(1)

5.7.1 From Random Projection to Hashing

(1)

5.8 Enhancing Ultrametricity through Precision of Measurement

(2)

5.8.1 Quantifying Ultrametricity

(1)

5.8.2 Pervasiveness of Ultrametricity

(1)

5.9 Generalized Ultrametric and Formal Concept Analysis

(1)

5.9.1 Generalized Ultrametric

(1)

5.9.2 Formal Concept Analysis

(1)

5.10 Linear Time and Direct Reading Hierarchical Clustering

100

(1)

5.10.1 Linear Time, or 0(N) Computational Complexity, Hierarchical Clustering

100

(1)

5.10.2 Grid-Based Clustering Algorithms

100

(1)

5.11 Summary: Many Viewpoints, Various Implementations

101

(2)

6 Big Data Scaling through Metric Mapping

103

(28)

6.1 Mean Random Projection, Marginal Sum, Seriation

104

(4)

6.1.1 Mean of Random Projections as A Seriation

105

(2)

6.1.2 Normalization of the Random Projections

107

(1)

6.2 Ultrametric and Ordering of Rows, Columns

108

(1)

6.3 Power Iteration Clustering

108

(2)

6.4 Input Data for Eigenreduction

110

(1)

6.4.1 Implementation: Equivalence of Iterative Approximation and Batch Calculation

110

(1)

6.5 Inducing a Hierarchical Clustering from Seriation

111

(1)

6.6 Short Summary of All These Methodological Underpinnings

112

(1)

6.6.1 Trivial First Eigenvalue, Eigenvector in Correspondence Analysis

112

(1)

6.7 Very High-Dimensional Data Spaces: Data Piling

113

(1)

6.8 Recap on Correspondence Analysis for Following Applications

114

(3)

6.8.1 Clouds of Points, Masses and Inertia

115

(1)

6.8.2 Relative and Absolute Contributions

116

(1)

6.9 Evaluation 1: Uniformly Distributed Data Cloud Points

117

(1)

6.9.1 Computation Time Requirements

118

(1)

6.10 Evaluation 2: Time Series of Financial Futures

118

(2)

6.11 Evaluation 3: Chemistry Data, Power Law Distributed

120

(4)

6.11.1 Data and Determining Power Law Properties

120

(1)

6.11.2 Randomly Generating Power Law Distributed Data in Varying Embedding Dimensions

120

(4)

6.12 Application 1: Quantifying Effectiveness through Aggregate Outcome

124

(1)

6.12.1 Computational Requirements, from Original Space and Factor Space Identities

124

(1)

6.13 Application 2: Data Piling as Seriation of Dual Space

125

(1)

6.14 Brief Concluding Summary

126

(1)

6.15 Annex: R Software Used in Simulations and Evaluations

126

(7)

6.15.1 Evaluation 1: Dense, Uniformly Distributed Data

127

(1)

6.15.2 Evaluation 2: Financial Futures

128

(1)

6.15.3 Evaluation 3: Chemicals of Specified Marginal Distribution

129

(2)

IV New Frontiers: New Vistas on Information, Cognition and the Human Mind

131

(56)

7 On Ultrametric Algorithmic Information

133

(14)

7.1 Introduction to Information Measures

133

(1)

7.2 Wavelet Transform of a Set of Points Endowed with an Ultrametric

134

(3)

7.3 An Object as a Chain of Successively Finer Approximations

137

(2)

7.3.1 Approximation Chain using a Hierarchy

138

(1)

7.3.2 Dendrogram Wavelet Transform of Spherically Complete Space

138

(1)

7.4 Generating Faces: Case Study Using a Simplified Model

139

(4)

7.4.1 A Simplified Model of Face Generation

139

(4)

7.4.2 Discussion of Psychological and Other Consequences

143

(1)

7.5 Complexity of an Object: Hierarchical Information

143

(1)

7.6 Consequences Arising from This
Chapter

144

(3)

8 Geometry and Topology of Matte Blanco's Bi-Logic in Psychoanalytics

147

(16)

8.1 Approaching Data and the Object of Study, Mental Processes

147

(5)

8.1.1 Historical Role of Psychometrics and Mathematical Psychology

148

(1)

8.1.2 Summary of
Chapter Content

148

(1)

8.1.3 Determining Depth of Emotion, and Tracking Emotion

148

(4)

8.2 Matte Blanco's Psychoanalysis: A Selective Review

152

(3)

8.3 Real World, Metric Space: Context for Asymmetric Mental Processes

155

(1)

8.4 Ultrametric Topology, Background and Relevance in Psychoanalysis

156

(3)

8.4.1 Ultrametric

156

(1)

8.4.2 Inducing an Ultrametric through Agglomerative Hierarchical Clustering

157

(1)

8.4.3 Transitions from Metric to Ultrametric Representation, and Vice Versa, through Data Transformation

157

(1)

8.4.4 Practical Applications

158

(1)

8.5 Conclusion: Analytics of Human Mental Processes

159

(1)

8.6 Annex 1: Far Greater Computational Power of Unconscious Mental Processes

160

(1)

8.7 Annex 2: Text Analysis as a Proxy for Both Facets of Bi-Logic

161

(2)

9 Ultrametric Model of Mind: Application to Text Content Analysis

163

(18)

9.1 Introduction

163

(1)

9.2 Quantifying Ultrametricity

164

(3)

9.2.1 Ultrametricity Coefficient of Lerman

164

(1)

9.2.2 Ultrametricity Coefficient of Rammal, Toulouse and Virasoro

164

(1)

9.2.3 Ultrametricity Coefficients of Treves and of Hartman

165

(1)

9.2.4 Bayesian Network Modelling

165

(1)

9.2.5 Our Ultrametricity Coefficient

165

(1)

9.2.6 What the Ultrametricity Coefficient Reveals

166

(1)

9.3 Semantic Mapping: Interrelationships to Euclidean, Factor Space

167

(3)

9.3.1 Correspondence Analysis: Mapping X2 into Euclidean Distances

167

(1)

9.3.2 Input: Cloud of Points Endowed with the Chi-Squared Metric

167

(1)

9.3.3 Output; Cloud of Points Endowed with the Euclidean Metric in Factor Space

168

(1)

9.3.4 Conclusions on Correspondence Analysis and Introduction to the Numerical Experiments to Follow

169

(1)

9.4 Determining Ultrametricity through Text Unit Interrelationships

170

(4)

9.4.1 Brothers Grimm

170

(1)

9.4.2 Jane Austen

171

(1)

9.4.3 Air Accident Reports

172

(1)

9.4.4 DreamBank

172

(2)

9.5 Ultrametric Properties of Words

174

(3)

9.5.1 Objectives and Choice of Data

174

(1)

9.5.2 General Discussion of Ultrametricity of Words

175

(1)

9.5.3 Conclusions on the Word Analysis

175

(2)

9.6 Concluding Comments on this
Chapter

177

(1)

9.7 Annex 1: Pseudo-Code for Assessing Ultrametric-Respecting Triplet

177

(1)

9.8 Annex 2: Bradley Ultrametricity Coefficient

178

(3)

10 Concluding Discussion on Software Environments

181

(6)

10.1 Introduction

181

(1)

10.2 Complementary Use with Apache Solr (and Lucene)

182

(1)

10.3 In Summary: Treating Massive Data Sets with Correspondence Analysis

182

(3)

10.3.1 Aggregating Similar or Identical Profiles Is Welcome

182

(1)

10.3.2 Resolution Level of the Analysis Carried Out

183

(1)

10.3.3 Random Projections in Order to Benefit from Data Piling in High Dimensions

183

(1)

10.3.4 Massive Observation Cardinality, Moderate Sized Dimensionality

184

(1)

10.4 Concluding Notes

185

(2)

Bibliography

187

(16)

Index

203

Fionn Murtagh's very first post after his PhD was educational research at a national level, followed by nuclear energy risk assessment. He then worked for a dozen years on the Hubble Space Telescope, as a European Space Agency Senior Scientist. Following many Professor of Computer Science positions, teaching and research, and senior management positions in Ireland, France, USA and UK, he is very happy now to be advancing data science as Professor of Data Science, and Director, Centre for Mathematics and Data Science, at the University of Huddersfield.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97814987639432e.html

Märksõnad:

E-raamat: Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv