Muutke küpsiste eelistusi

E-raamat: Gene Expression Data Analysis: A Statistical and Machine Learning Perspective [Taylor & Francis e-raamat]

(Tezpur Univ.), (University of Colorado), (Tezpur Univ.)
  • Formaat: 360 pages, 42 Tables, black and white; 68 Line drawings, black and white; 2 Halftones, black and white; 70 Illustrations, black and white
  • Ilmumisaeg: 22-Nov-2021
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-13: 9780429322655
  • Taylor & Francis e-raamat
  • Hind: 207,73 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Tavahind: 296,75 €
  • Säästad 30%
  • Formaat: 360 pages, 42 Tables, black and white; 68 Line drawings, black and white; 2 Halftones, black and white; 70 Illustrations, black and white
  • Ilmumisaeg: 22-Nov-2021
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-13: 9780429322655
"The book introduces phenomenal growth of data generated by increasing numbers of genome sequencing projects and other throughput technology-led experimental efforts. It provides information about various sources of gene expression data, and pre-processing, analysis, and validation of such data"--

Development of high throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA-sequencing are two such widely used high throughput technologies for monitoring the expression patterns of thousands of genes simultaneously. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data towards the identification of interesting patterns that are relevant for a given biological question requires high performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge.

Gene Expression Data Analysis: A Statistical and Machine Learning Perspective

has been written keeping a multi-disciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning and statistical perspectives. Readers will be able to acquire both theoretical as well as practical knowledge of methods for identification of novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems and repositories that are commonly used in analyzing gene expression data and validating results.This book will benefit students, researchers and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine learning based methods for analyzing gene expression data.

Key features:

  • An introduction to the Central Dogma of molecular biology and information flow in biological systems.
  • A systematic overview of the methods for generating gene expression data.
  • Background knowledge on statistical modeling and machine learning techniques.
  • Detailed methodology of analyzing gene expression data with an example case study.
  • Clustering methods for finding co-expression patterns from microarray, bulkRNA and scRNA data.
  • A large number of practical tools, systems and repositories that are useful for computational biologists to create, analyze and validate biologically relevant gene expression patterns.
  • Suitable for multi-disciplinary researchers and practitioners in computer science and biological sciences.


The book introduces phenomenal growth of data generated by increasing numbers of genome sequencing projects and other throughput technology-led experimental efforts. It provides information about various sources of gene expression data, and pre-processing, analysis, and validation of such data.

Acknowledgements xiii
Authors xv
Preface xvii
1 Introduction
1(26)
1.1 Introduction
1(1)
1.2 Central Dogma
2(1)
1.3 Measuring Gene Expression
2(2)
1.4 Representation of Gene Expression Data
4(2)
1.5 Gene Expression Data Analysis: Applications
6(2)
1.6 Machine Learning
8(2)
1.7 Statistical and Biological Evaluation
10(1)
1.8 Gene Expression Analysis Approaches
11(10)
1.8.1 Preprocessing in Microarray and RNAseq Data
12(4)
1.8.2 Co-Expressed Pattern-Finding Using Machine Learning
16(4)
1.8.3 Co-Expressed Pattern-Finding Using Network-Based Approaches
20(1)
1.9 Differential Co-Expression Analysis
21(1)
1.10 Differential Expression Analysis
21(1)
1.11 Tools and Systems for Gene Expression Data Analysis
22(1)
1.11.1 (Diff) Co-Expression Analysis Tools and Systems
22(1)
1.11.2 Differential Expression Analysis Tools and Systems
23(1)
1.12 Contribution of This Book
23(1)
1.13 Organization of This Book
24(3)
2 Information Flow in Biological Systems
27(12)
2.1 Concept of Systems Theory
27(1)
2.1.1 A Brief History of Systems Thinking
27(1)
2.1.2 Areas of Application of Systems Theory in Biology
28(1)
2.2 Complexity in Biological Systems
28(2)
2.2.1 Hierarchical Organization of Biological Systems from Macroscopic Levels to Microscopic Levels
28(1)
2.2.2 Information Flow in Biological Systems
29(1)
2.2.3 Top-Down and Bottom-Up Flow
30(1)
2.3 Central Dogma of Molecular Biology
30(4)
2.3.1 DNA Replication
31(1)
2.3.2 Transcription
32(1)
2.3.3 Translation
33(1)
2.4 Ambiguity in Central Dogma
34(3)
2.4.1 Reverse Transcription
35(1)
2.4.2 RNA Replication
36(1)
2.5 Discussion
37(2)
2.5.1 Biological Information Flow from a Computer Science Perspective
37(1)
2.5.2 Future Perspective
37(2)
3 Gene Expression Data Generation
39(14)
3.1 History of Gene Expression Data Generation
39(2)
3.2 Low-Throughput Methods
41(2)
3.2.1 Northern Blotting
41(1)
3.2.2 Ribonuclease Protection Assay
41(1)
3.2.3 qRT-PCR
42(1)
3.2.4 SAGE
42(1)
3.3 High-Throughput Methods
43(9)
3.3.1 Microarray
43(1)
3.3.2 RNA-Seq
44(2)
3.3.3 Types of RNA-Seq
46(2)
3.3.4 Gene Expression Data Repositories
48(2)
3.3.5 Standards in Gene Expression Data
50(2)
3.4
Chapter Summary
52(1)
4 Statistical Foundations and Machine Learning
53(92)
4.1 Introduction
53(1)
4.2 Statistical Background
53(14)
4.2.1 Statistical Modeling
53(1)
4.2.2 Probability Distributions
54(1)
4.2.3 Hypothesis Testing
54(1)
4.2.4 Exact Tests
55(1)
4.2.5 Common Data Distributions
56(8)
4.2.6 Multiple Testing
64(1)
4.2.7 False Discovery Rate
64(1)
4.2.8 Maximum Likelihood Estimation
65(2)
4.3 Machine Learning Background
67(73)
4.3.1 Significance of Machine Learning
68(2)
4.3.2 Machine Learning and Its Types
70(3)
4.3.3 Supervised Learning Methods
73(11)
4.3.4 Unsupervised Learning Methods
84(40)
4.3.5 Outlier Mining
124(4)
4.3.6 Association Rule Mining
128(12)
4.4
Chapter Summary
140(5)
4.4.1 Statistical Modeling
140(1)
4.4.2 Supervised Learning: Classification and Regression Analysis
140(1)
4.4.3 Proximity Measures
141(1)
4.4.4 Unsupervised Learning: Clustering
141(1)
4.4.5 Unsupervised Learning: Biclustering
142(1)
4.4.6 Unsupervised Learning: Triclustering
142(1)
4.4.7 Outlier Mining
143(1)
4.4.8 Unsupervised Learning: Association Mining
143(2)
5 Co-Expression Analysis
145(74)
5.1 Introduction
145(2)
5.2 Gene Co-Expression Analysis
147(4)
5.2.1 Types of Gene Co-Expression
148(1)
5.2.2 An Example
148(3)
5.3 Measures to Identify Co-Expressed Patterns
151(1)
5.4 Co-Expression Analysis Using Clustering
152(40)
5.4.1 CEA Using Clustering: A Generic Architecture
153(10)
5.4.2 Co-Expressed Pattern Finding Using 1-Way Clustering
163(15)
5.4.3 Subspace or 2-way Clustering in Co-Expression Mining
178(8)
5.4.4 Co-Expressed Pattern-Finding Using 3-Way Clustering
186(6)
5.5 Network Analysis for Co-Expressed Pattern-Finding
192(23)
5.5.1 Definition of CEN
193(1)
5.5.2 Analyzing CENs: A Generic Architecture
193(22)
5.6
Chapter Summary and Recommendations
215(4)
6 Differential Expression Analysis
219(42)
6.1 Introduction
219(2)
6.1.1 Importance of DE Analysis
220(1)
6.2 Differential Expression (DE) of a Gene
221(1)
6.2.1 Differential Expression of a Gene: An Example
221(1)
6.3 Differential Expression Analysis (DEA)
222(28)
6.3.1 A Generic Framework
223(1)
6.3.2 Preprocessing
223(7)
6.3.3 DE Genes Identification
230(13)
6.3.4 DE Gene Analysis
243(4)
6.3.5 Statistical Validation
247(2)
6.3.6 Discussion
249(1)
6.4 Biomarker Identification Using DEA: A Case Study
250(7)
6.4.1 Problem Definition
251(1)
6.4.2 Dataset Used
251(1)
6.4.3 Preprocessing
251(1)
6.4.4 Framework of Analysis Used
252(2)
6.4.5 Results
254(2)
6.4.6 Discussion
256(1)
6.5 Summary and Recommendations
257(4)
7 Tools and Systems
261(34)
7.1 Introduction
261(4)
7.1.1 Generic Characteristics of a Systems Biology Tool
261(1)
7.1.2 Target Systems Biology Activities
262(3)
7.2 Systems Biology Tools
265(13)
7.2.1 A Taxonomy
265(1)
7.2.2 Pre-Processing Tools
266(12)
7.3 Gene Expression Data Analysis Tools
278(6)
7.3.1 Co-Expression Analysis
279(4)
7.3.2 Differential Co-Expression Analysis
283(1)
7.3.3 Differential Expression Analysis
283(1)
7.4 Visualization
284(1)
7.5 Validation
285(3)
7.5.1 Statistical Validation
286(2)
7.6 Biological Validation
288(1)
7.7
Chapter Summary and Concluding Remarks
289(6)
8 Concluding Remarks and Research Challenges
295(6)
8.1 Concluding Remarks
295(1)
8.2 Some Issues and Research Challenges
296(5)
Bibliography 301(46)
Glossary 347(8)
Index 355
Pankaj Barah is an Assistant professor in Molecular Biology and Biotechnology at Tezpur University. He has received his M.Sc. degree in Bioinformatics (2006) from University of Madras in India and PhD in Computational Systems Biology (2013) from the Norwegian University of Science and Technology (NTNU), Trondheim, Norway. He has worked as Bioinformatics scientist in the division of Theoretical Bioinformatics at German Cancer Research Center (DKFZ) in Heidelberg, Germany during 2015-2017. His research areas include- computational systems biology, bioinformatics, evolutionary systems biology, Next Generation Sequencing (NGS), Big data analytics and biological networks. He has authored 20 research articles, edited two books and written 5 book chapters. He is recipient of Ramalingaswami Re-entry Fellowship from the Department of Biotechnology, Government of India. Dr. Barah is currently a member of the Indian National Young Academy of Sciences.

Dhruba Kumar Bhattacharyya is a professor in Computer Science and Engineering at Tezpur University. He teaches machine learning, network security, cryptography and computational biology in UG, PG and PhD classes at Tezpur University. Professor Bhattacharyya's research areas include machine learning, network security, and bioinformatics. He has published more than 280 research articles in leading international journals and peer-reviewed conference proceedings. Dr. Bhattacharyya has authored 5 technical reference books and edited 9 technical volumes. Under his guidance, twenty students have successfully completed Ph.D. in the areas of machine learning, bioinformatics and network security. He is PI of several major research grants, including the Centre of Excellence of Ministry of HRD of Government of India under FAST instituted at Tezpur University. Professor Bhattacharyya is a Fellow of IETE and IE, India. He is also a Senior Member of IEEE. More details about Dr Bhattacharyya can be found at http://agnigarh.tezu.ernet.in/_dkb/index.html.

Jugal Kumar Kalita teaches computer science at the University of Colorado, Colorado Springs. He received M.S. and Ph.D. degrees in computer and information science from the University of Pennsylvania in Philadelphia in 1988 and 1990, respectively. Prior to that he had received an M.Sc. from the University of Saskatchewan in Saskatoon, Canada in 1984 and a B.Tech. from the Indian Institute of Technology, Kharagpur in 1982. His expertise is in the areas of artificial intelligence and machine learning, and the application of techniques in machine learning to network security, natural language processing and bioinformatics. He has published 130 papers in journals and refereed conferences. He is the author of a book on Perl titled "On Perl: Perl for Students and Professionals". He is also a coauthor of a book titled "Network Anomaly Detection: A Machine Learning Perspective" with Dr Dhruba K Bhattacharyya. He received the Chancellor's Award at the University of Colorado, Colorado Springs, in 2011, in recognition of lifelong excellence in teaching, research and service. More details about Dr. Kalita can be found at http://www.cs.uccs.edu/_kalita.