Muutke küpsiste eelistusi

E-raamat: Advances in Data Science: Symbolic, Complex, and Network Data

Edited by , Edited by , Edited by (Universite de Paris IX - Dauphine, France), Edited by
  • Formaat: PDF+DRM
  • Ilmumisaeg: 23-Jan-2020
  • Kirjastus: ISTE Ltd and John Wiley & Sons Inc
  • Keel: eng
  • ISBN-13: 9781119695103
  • Formaat - PDF+DRM
  • Hind: 171,60 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Raamatukogudele
  • Formaat: PDF+DRM
  • Ilmumisaeg: 23-Jan-2020
  • Kirjastus: ISTE Ltd and John Wiley & Sons Inc
  • Keel: eng
  • ISBN-13: 9781119695103

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Data science unifies statistics, data analysis and machine learning to achieve a better understanding of the masses of data which are produced today, and to improve prediction. Special kinds of data (symbolic, network, complex, compositional) are increasingly frequent in data science. These data require specific methodologies, but there is a lack of reference work in this field.





Advances in Data Science fills this gap. It presents a collection of up-to-date contributions by eminent scholars following two international workshops held in Beijing and Paris. The 10 chapters are organized into four parts: Symbolic Data, Complex Data, Network Data and Clustering. They include fundamental contributions, as well as applications to several domains, including business and the social sciences.
Preface xi
Part 1 Symbolic Data
1(98)
Chapter 1 Explanatory Tools for Machine Learning in the Symbolic Data Analysis Framework
3(28)
Edwin Diday
1.1 Introduction
4(2)
1.2 Introduction to Symbolic Data Analysis
6(4)
1.2.1 What are complex data?
6(1)
1.2.2 What are "classes" and "class of complex data"?
7(1)
1.2.3 Which kind of class variability?
7(1)
1.2.4 What are "symbolic variables" and "symbolic data tables"?
7(2)
1.2.5 Symbolic Data Analysis (SDA)
9(1)
1.3 Symbolic data tables from Dynamic Clustering Method and EM
10(6)
1.3.1 The "dynamical clustering method" (DCM)
10(1)
1.3.2 Examples of DCM applications
10(2)
1.3.3 Clustering methods by mixture decomposition
12(1)
1.3.4 Symbolic data tables from clustering
13(2)
1.3.5 A general way to compare results of clustering methods by the "explanatory power" of their associated symbolic data table
15(1)
1.3.6 Quality criteria of classes and variables based on the cells of the symbolic data table containing intervals or inferred distributions
15(1)
1.4 Criteria for ranking individuals, classes and their bar chart descriptive symbolic variables
16(7)
1.4.1 A theoretical framework for SDA
16(2)
1.4.2 Characterization of a category and a class by a measure of discordance
18(1)
1.4.3 Link between a characterization by the criteria W and the standard Tf-Idf
19(2)
1.4.4 Ranking the individuals, the symbolic variables and the classes of a bar chart symbolic data table
21(2)
1.5 Two directions of research
23(4)
1.5.1 Parametrization of concordance and discordance criteria
23(2)
1.5.2 Improving the explanatory power of any machine learning tool by a filtering process
25(2)
1.6 Conclusion
27(1)
1.7 References
28(3)
Chapter 2 Likelihood in the Symbolic Context
31(18)
Richard Emilion
Edwin Diday
2.1 Introduction
31(1)
2.2 Probabilistic setting
32(6)
2.2.1 Description variable and class variable
32(1)
2.2.2 Conditional distributions
33(1)
2.2.3 Symbolic variables
33(2)
2.2.4 Examples
35(2)
2.2.5 Probability measures on (C, C), likelihood
37(1)
2.3 Parametric models for p = 1
38(7)
2.3.1 LDA model
38(3)
2.3.2 BLS method
41(1)
2.3.3 Interval-valued variables
42(1)
2.3.4 Probability vectors and histogram-valued variables
42(3)
2.4 Nonparametric estimation for p = 1
45(1)
2.4.1 Multihistograms and multivariate polygons
45(1)
2.4.2 Dirichlet kernel mixtures
45(1)
2.4.3 Dirichlet Process Mixture (DPM)
45(1)
2.5 Density models for p ≥ 2
46(1)
2.6 Conclusion
46(1)
2.7 References
47(2)
Chapter 3 Dimension Reduction and Visualization of Symbolic Interval-Valued Data Using Sliced Inverse Regression
49(30)
Han-Ming Wu
Chiun-How Kao
Chun-houh Chen
3.1 Introduction
49(2)
3.2 PCA for interval-valued data and the sliced inverse regression
51(2)
3.2.1 PCA for interval-valued data
51(1)
3.2.2 Classic SIR
52(1)
3.3 SIR for interval-valued data
53(5)
3.3.1 Quantification approaches
54(2)
3.3.2 Distributional approaches
56(2)
3.4 Projections and visualization in DR subspace
58(3)
3.4.1 Linear combinations of intervals
58(1)
3.4.2 The graphical representation of the projected intervals in the 2D DR subspace
59(2)
3.5 Some computational issues
61(2)
3.5.1 Standardization of interval-valued data
61(1)
3.5.2 The slicing schemes for iSIR
62(1)
3.5.3 The evaluation of DR components
62(1)
3.6 Simulation studies
63(2)
3.6.1 Scenario 1: aggregated data
63(1)
3.6.2 Scenario 2: data based on interval arithmetic
63(1)
3.6.3 Results
64(1)
3.7 A real data example: face recognition data
65(8)
3.8 Conclusion and discussion
73(1)
3.9 References
74(5)
Chapter 4 On the "Complexity" of Social Reality. Some Reflections About the Use of Symbolic Data Analysis in Social Sciences
79(20)
Frederic Lebaron
4.1 Introduction
79(1)
4.2 Social sciences facing "complexity"
80(3)
4.2.1 The total social fact, a designation of "complexity" in social sciences
80(1)
4.2.2 Two families of answers
80(1)
4.2.3 The contemporary deepening of the two approaches, "reductionist" and "encompassing"
81(1)
4.2.4 Issues of scale and heterogeneity
82(1)
4.3 Symbolic data analysis in the social sciences: an example
83(12)
4.3.1 Symbolic data analysis
83(1)
4.3.2 An exploratory case study on European data
83(11)
4.3.3 A sociological interpretation
94(1)
4.4 Conclusion
95(1)
4.5 References
96(3)
Part 2 Complex Data
99(40)
Chapter 5 A Spatial Dependence Measure and Prediction of Georeferenced Data Streams Summarized by Histograms
101(18)
Rosanna Verde
Antonio Balzanella
5.1 Introduction
101(2)
5.2 Processing setup
103(1)
5.3 Main definitions
104(2)
5.4 Online summarization of a data stream through CluStream for Histogram data
106(1)
5.5 Spatial dependence monitoring: a variogram for histogram data
107(3)
5.6 Ordinary kriging for histogram data
110(2)
5.7 Experimental results on real data
112(4)
5.8 Conclusion
116(1)
5.9 References
116(3)
Chapter 6 Incremental Calculation Framework for Complex Data
119(20)
Huiwen Wang
Yuan Wei
Siyang Wang
6.1 Introduction
119(3)
6.2 Basic data
122(2)
6.2.1 The basic data space
122(1)
6.2.2 Sample covariance matrix
123(1)
6.3 Incremental calculation of complex data
124(7)
6.3.1 Transformation of complex data
124(1)
6.3.2 Online decomposition of covariance matrix
125(3)
6.3.3 Adopted algorithms
128(3)
6.4 Simulation studies
131(4)
6.4.1 Functional linear regression
131(2)
6.4.2 Compositional PCA
133(2)
6.5 Conclusion
135(1)
6.6 Acknowledgment
135(1)
6.7 References
135(4)
Part 3 Network Data
139(48)
Chapter 7 Recommender Systems and Attributed Networks
141(28)
Francoise Fogelman-Soulie
Lanxiang Mei
Jianyu Zhang
Yiming Li
Wen Ge
Yinglan Li
Qiaofei Ye
7.1 Introduction
141(1)
7.2 Recommender systems
142(8)
7.2.1 Data used
143(2)
7.2.2 Model-based collaborative filtering
145(1)
7.2.3 Neighborhood-based collaborative filtering
145(3)
7.2.4 Hybrid models
148(2)
7.3 Social networks
150(4)
7.3.1 Non-independence
150(1)
7.3.2 Definition of a social network
150(1)
7.3.3 Properties of social networks
151(1)
7.3.4 Bipartite networks
152(1)
7.3.5 Multilayer networks
153(1)
7.4 Using social networks for recommendation
154(2)
7.4.1 Social filtering
154(1)
7.4.2 Extension to use attributes
155(1)
7.4.3 Remarks
156(1)
7.5 Experiments
156(7)
7.5.1 Performance evaluation
156(1)
7.5.2 Datasets
157(1)
7.5.3 Analysis of one-mode projected networks
158(2)
7.5.4 Models evaluated
160(1)
7.5.5 Results
160(3)
7.6 Perspectives
163(1)
7.7 References
163(6)
Chapter 8 Attributed Networks Partitioning Based on Modularity Optimization
169(18)
David Combe
Christine Largeron
Baptiste Jeudy
Francoise Fogelman-Soulie
Jing Wang
8.1 Introduction
169(2)
8.2 Related work
171(1)
8.3 Inertia based modularity
172(2)
8.4 I-Louvain
174(2)
8.5 Incremental computation of the modularity gain
176(3)
8.6 Evaluation of I-Louvain method
179(2)
8.6.1 Performance of I-Louvain on artificial datasets
179(1)
8.6.2 Run-time of I-Louvain
180(1)
8.7 Conclusion
181(1)
8.8 References
182(5)
Part 4 Clustering
187(42)
Chapter 9 A Novel Clustering Method with Automatic Weighting of Tables and Variables
189(20)
Rodrigo C. de Araujo
Francisco de Assis Tenorio de Carvalho
Yves Lechevallier
9.1 Introduction
189(1)
9.2 Related Work
190(1)
9.3 Definitions, notations and objective
191(5)
9.3.1 Choice of distances
192(1)
9.3.2 Criterion W measures the homogeneity of the partition P on the set of tables
193(2)
9.3.3 Optimization of the criterion W
195(1)
9.4 Hard clustering with automated weighting of tables and variables
196(5)
9.4.1 Clustering algorithms MND--W and MND--WT
196(5)
9.5 Applications: UCI data sets
201(5)
9.5.1 Application I: Iris plant
201(3)
9.5.2 Application II: multi-features dataset
204(2)
9.6 Conclusion
206(1)
9.7 References
206(3)
Chapter 10 Clustering and Generalized ANOVA for Symbolic Data Constructed from Open Data
209(20)
Simona Korenjak-Cerne
Natasa Kejzar
Vladimir Batageu
10.1 Introduction
209(1)
10.2 Data description based on discrete (membership) distributions
210(2)
10.3 Clustering
212(9)
10.3.1 TIMSS -- study of teaching approaches
215(2)
10.3.2 Clustering countries based on age--sex distributions of their populations
217(4)
10.4 Generalized ANOVA
221(4)
10.5 Conclusion
225(1)
10.6 References
226(3)
List of Authors 229(4)
Index 233
Edwin Diday is Emeritus Professor at Paris-Dauphine University-PSL. He helped to introduce the symbolic data analysis paradigm and the dynamic clustering method (opening the path to local models), as well as pyramidal clustering for spatial representation of overlapping clusters.





Rong Guan is Associate Professor at the School of Statistics and Mathematics, Central University of Finance and Economics, Beijing. Her research covers complex and symbolic data analysis and financial distress diagnosis.





Gilbert Saporta is Emeritus Professor at Conservatoire National des Arts et Métiers, France. His current research focuses on functional data analysis and clusterwise and sparse methods. He is Honorary President of the French Statistical Society.





Huiwen Wang is Professor at the School of Economics and Management, Beihang University, Beijing. Her research covers dimension reduction, PLS regression, symbolic data analysis, compositional data analysis, functional data analysis and statistical modeling methods for mixed data.