Muutke küpsiste eelistusi

E-raamat: Text Mining for Information Professionals: An Uncharted Territory

  • Formaat: PDF+DRM
  • Ilmumisaeg: 21-Apr-2022
  • Kirjastus: Springer Nature Switzerland AG
  • Keel: eng
  • ISBN-13: 9783030850852
  • Formaat - PDF+DRM
  • Hind: 86,44 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Ilmumisaeg: 21-Apr-2022
  • Kirjastus: Springer Nature Switzerland AG
  • Keel: eng
  • ISBN-13: 9783030850852

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book focuses on a basic theoretical framework dealing with the problems, solutions, and applications of text mining and its various facets in a very practical form of case studies, use cases, and stories.





The book contains 11 chapters with 14 case studies showing 8 different text mining and visualization approaches, and 17 stories. In addition, both a website and a Github account are also maintained for the book. They contain the code, data, and notebooks for the case studies; a summary of all the stories shared by the librarians/faculty; and hyperlinks to open an interactive virtual RStudio/Jupyter Notebook environment. The interactive virtual environment runs case studies based on the R programming language for hands-on practice in the cloud without installing any software.





From understanding different types and forms of data to case studies showing the application of each text mining approaches on data retrieved from various resources, this book is a must-read for all library professionals interested in text mining and its application in libraries. Additionally, this book will also be helpful to archivists, digital curators, or any other humanities and social science professionals who want to understand the basic theory behind text data, text mining, and various tools and techniques available to solve and visualize their research problems. 
1 The Computational Library
1(32)
1.1 Computational Thinking
1(5)
1.2 Genealogy of Text Mining in Libraries
6(2)
1.3 What Is Text Mining?
8(9)
1.3.1 Text Characteristics
10(2)
1.3.2 Different Text Mining Tasks
12(3)
1.3.3 Supervised vs. Unsupervised Learning Methods
15(2)
1.3.4 Cost, Benefits, and Barriers
17(1)
1.3.5 Limitations
17(1)
1.4 Case Study: Clustering of Documents Using Two Different Tools
17(13)
References
30(3)
2 Text Data and Where to Find Them?
33(46)
2.1 Data
33(5)
2.1.1 Digital Trace Data
34(4)
2.2 Different Types of Data
38(1)
2.3 Data File Types
39(9)
2.3.1 Plain Text
39(2)
2.3.2 CSV
41(1)
2.3.3 JSON
42(3)
2.3.4 XML
45(1)
2.3.5 Binary Files
46(2)
2.4 Metadata
48(3)
2.4.1 What Is a Metadata Standard?
49(1)
2.4.2 Steps to Create Quality Metadata
50(1)
2.5 Digital Data Creation
51(3)
2.6 Different Ways of Getting Data
54(23)
2.6.1 Downloading Digital Data
56(1)
2.6.2 Downloading Data from Online Repositories
56(1)
2.6.3 Downloading Data from Relational Databases
56(7)
2.6.4 Web APIs
63(3)
2.6.5 Web Scraping/Screen Scraping
66(11)
References
77(2)
3 Text Pre-Processing
79(26)
3.1 Introduction
79(2)
3.1.1 Level of Text Representation
81(1)
3.2 Text Transformation
81(1)
3.2.1 Corpus Creation
81(1)
3.2.2 Dictionary Creation
82(1)
3.3 Text Pre-Processing
82(4)
3.3.1 Case Normalization
82(1)
3.3.2 Morphological Normalization
83(1)
3.3.3 Token ization
83(1)
3.3.4 Stemming
84(1)
3.3.5 Lemmatization
84(1)
3.3.6 Stopwords
85(1)
3.3.7 Object Standardization
85(1)
3.4 Feature Engineering
86(10)
3.4.1 Semantic Parsing
86(1)
3.4.2 Bag of Words (BOW)
86(1)
3.4.3 N-Grams
87(1)
3.4.4 Creation of Matrix
88(1)
3.4.5 Term Frequency-Inverse Document Frequency (TF-IDF)
89(1)
3.4.6 Syntactical Parsing
90(1)
3.4.7 Parts-of-Speech Tagging (POS)
91(2)
3.4.8 Named Entity Recognition (NER)
93(1)
3.4.9 Similarity Computation Using Distances
94(1)
3.4.10 Word Embedding
95(1)
3.5 Case Study: An Analysis of Tolkien's Books
96(7)
References
103(2)
4 Topic Modeling
105(34)
4.1 What Is Topic Modeling?
105(5)
4.1.1 Topic Evolution
106(1)
4.1.2 Application and Visualization
107(1)
4.1.3 Available Tools and Packages
108(1)
4.1.4 When to Use Topic Modeling
109(1)
4.1.5 When Not to Use Topic Modeling
110(1)
4.2 Methods and Algorithms
110(3)
4.3 Topic Modeling and Libraries
113(6)
4.3.1 Use Cases
117(2)
4.4 Case Study: Topic Modeling of Documents Using Three Different Tools
119(17)
References
136(3)
5 Network Text Analysis
139(34)
5.1 What Is Network Text Analysis?
139(10)
5.1.1 Two-Mode Networks
141(1)
5.1.2 Centrality Measures
142(3)
5.1.3 Graph Algorithms
145(1)
5.1.4 Comparison of Network Text Analysis with Others
145(1)
5.1.5 How to Perform Network Text Analysis?
146(1)
5.1.6 Available Tools and Packages
147(1)
5.1.7 Applications
147(1)
5.1.8 Advantages
148(1)
5.1.9 Limitations
149(1)
5.2 Topic Maps
149(4)
5.2.1 Constructs of Topic Maps
150(1)
5.2.2 Topic Map Software Architecture
151(1)
5.2.3 Typical Uses
152(1)
5.2.4 Advantages of Topic Maps
152(1)
5.2.5 Disadvantages of Topic Maps
153(1)
5.3 Network Text Analysis and Libraries
153(5)
5.3.1 Use Cases
156(2)
5.4 Case Study: Network Text Analysis of Documents Using Two Different R Packages
158(13)
References
171(2)
6 Burst Detection
173(18)
6.1 What Is Burst Detection?
173(6)
6.1.1 How to Detect a Burst?
174(1)
6.1.2 Comparison of Burst Detection with Others
175(1)
6.1.3 How to Perform Burst Detection?
176(1)
6.1.4 Available Tools and Packages
177(1)
6.1.5 Applications
178(1)
6.1.6 Advantages
178(1)
6.1.7 Limitations
178(1)
6.2 Burst Detection and Libraries
179(1)
6.2.1 Use Cases
179(1)
6.2.2 Marketing
180(1)
6.2.3 Reference Desk Service
180(1)
6.3 Case Study: Burst Detection of Documents Using Two Different Tools
180(8)
References
188(3)
7 Sentiment Analysis
191(22)
7.1 What Is Sentiment Analysis?
191(6)
7.1.1 Levels of Granularity
192(1)
7.1.2 Approaches for Sentiment Analysis
193(1)
7.1.3 How to Perform Sentiment Analysis?
194(1)
7.1.4 Available Tools and Packages
195(1)
7.1.5 Applications
196(1)
7.1.6 Advantages
196(1)
7.1.7 Limitations
196(1)
7.2 Sentiment Analysis and Libraries
197(4)
7.2.1 Use Cases
200(1)
7.3 Case Study: Sentiment Analysis of Documents Using Two Different Tools
201(9)
References
210(3)
8 Predictive Modeling
213(30)
8.1 What Is Predictive Modeling?
213(15)
8.1.1 Why Use Machine Learning?
215(1)
8.1.2 Machine Learning Methods
215(1)
8.1.3 Feature Selection and Representation
216(1)
8.1.4 Machine Learning Algorithms
216(3)
8.1.5 Classification Task
219(2)
8.1.6 How to Perform Predictive Modeling on Text Documents?
221(6)
8.1.7 Available Tools and Packages
227(1)
8.1.8 Advantages
227(1)
8.1.9 Limitations
228(1)
8.2 Machine Learning and Libraries
228(8)
8.2.1 Challenges
230(4)
8.2.2 Use Cases
234(2)
8.3 Case Study: Predictive Modeling of Documents Using RapidMiner
236(4)
References
240(3)
9 Information Visualization
243(52)
9.1 What Is Information Visualization?
243(8)
9.1.1 Information Visualization Framework
244(1)
9.1.2 Data Scale Types
245(1)
9.1.3 Graphic Variable Types
246(1)
9.1.4 Types of Datasets
247(1)
9.1.5 Attribute Semantics
248(1)
9.1.6 What Is an Appropriate Visual Representation for a Given Dataset?
248(1)
9.1.7 Graphical Decoding
248(1)
9.1.8 How Does One Know How Good a Visual Encoding Is?
249(1)
9.1.9 Main Purpose of Visualization
249(1)
9.1.10 Modes of Visualization
250(1)
9.1.11 Methods of Graphic Visualization
250(1)
9.2 Fundamental Graphs
251(3)
9.3 Networks and Trees
254(1)
9.4 Advanced Graphs
255(6)
9.5 Rules on Visual Design
261(1)
9.6 Text Visualization
262(7)
9.7 Document Visualization
269(1)
9.8 Information Visualization and Libraries
270(20)
9.8.1 Use Cases
282(7)
9.8.2 Information Visualization Skills for Librarians
289(1)
9.8.3 Conclusion
289(1)
9.9 Case Study: To Build a Dashboard Using R
290(2)
References
292(3)
10 Tools and Techniques for Text Mining and Visualization
295(24)
10.1 Introduction
295(1)
10.2 Text Mining Tools
296(14)
10.2.1 R
296(1)
10.2.2 Topic-Modeling-Tool
297(2)
10.2.3 RapidMiner
299(2)
10.2.4 Waikato Environment for Knowledge Analysis (WEKA)
301(1)
10.2.5 Orange
302(2)
10.2.6 Voyant Tools
304(2)
10.2.7 Science of Science (Sci2) Tool
306(1)
10.2.8 LancsBox
307(1)
10.2.9 ConText
308(1)
10.2.10 Overview Docs
309(1)
10.3 Visualization Tools
310(8)
10.3.1 Gephi
310(1)
10.3.2 Tableau Public
311(1)
10.3.3 Infogram
312(1)
10.3.4 Microsoft Power BI
312(2)
10.3.5 Datawrapper
314(1)
10.3.6 RAWGraphs
315(1)
10.3.7 WORDij
315(1)
10.3.8 Palladio
316(1)
10.3.9 Chart Studio
317(1)
References
318(1)
11 Text Data and Mining Ethics
319(30)
11.1 Text Data Management
319(11)
11.1.1 Plan
320(1)
11.1.2 Lifecycle
320(5)
11.1.3 Citation
325(1)
11.1.4 Sharing
326(1)
11.1.5 Need of Data Management for Text Mining
326(1)
11.1.6 Benefits of Data Management for Text Mining
327(1)
11.1.7 Ethical and Legal Rules Related to Text Data
327(3)
11.2 Social Media Ethics
330(2)
11.2.1 Framework for Ethical Research with Social Media Data
332(1)
11.3 Ethical and Legal Issues Related to Text Mining
332(15)
11.3.1 Copyright
337(1)
11.3.2 License Conditions
337(1)
11.3.3 Algorithmic Confounding/Biasness
338(9)
References
347(2)
Index 349
Manika Lamba is a Ph.D. candidate at the Department of Library and Information Science, University of Delhi, India. She is currently serving as the Editor-in-Chief of the International Journal of Library and Information Services (IJLIS), the Elected Standing Committee Member for IFLA Science and Technology Libraries Section, and Newsletter Officer for ASIS&T South Asia Chapter. She was Editor-at-large for dh+lib (an ACRL Digital Humanities Interest Group project) and was featured in the Information Professionals Share their Top Tips for 2019 blog by the Copyright Clearance Center (CCC). She is an active reviewer for more than 17 international journals, including IEEE Access, Scientometrics, Library Hi-Tech, and the Journal of Information Science. Her scholarship focuses on the intersections of computational social science, social informatics, information retrieval, services, and management.





Margam Madhusudhan is currently working as a Professor in the Department of Library and Information Science, University of Delhi, India. He has worked as Deputy Dean Academics and Member of Academic Council at the University of Delhi. He is a member of many academic bodies, editorial board of national and international LIS journals. He is the recipient of the "Award for Excellence" (Highly Commended) in 2019, Excellence in Research in 2017, P.V. Verghese Award in 2013. He has 22 years of teaching, administration, and research experience at the university level.