Muutke küpsiste eelistusi

E-raamat: Web Information Retrieval

  • Formaat - PDF+DRM
  • Hind: 55,56 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

With the proliferation of huge amounts of (heterogeneous) data on the Web, the importance of information retrieval (IR) has grown considerably over the last few years. Big players in the computer industry, such as Google, Microsoft and Yahoo!, are the primary contributors of technology for fast access to Web-based information; and searching capabilities are now integrated into most information systems, ranging from business management software and customer relationship systems to social networks and mobile phone applications.

Ceri and his co-authors aim at taking their readers from the foundations of modern information retrieval to the most advanced challenges of Web IR. To this end, their book is divided into three parts. The first part addresses the principles of IR and provides a systematic and compact description of basic information retrieval techniques (including binary, vector space and probabilistic models as well as natural language search processing) before focusing on its application to the Web. Part two addresses the foundational aspects of Web IR by discussing the general architecture of search engines (with a focus on the crawling and indexing processes), describing link analysis methods (specifically Page Rank and HITS), addressing recommendation and diversification, and finally presenting advertising in search (the main source of revenues for search engines). The third and final part describes advanced aspects of Web search, each chapter providing a self-contained, up-to-date survey on current Web research directions. Topics in this part include meta-search and multi-domain search, semantic search, search in the context of multimedia data, and crowd search.

The book is ideally suited to courses on information retrieval, as it covers all Web-independent foundational aspects. Its presentation is self-contained and does not require prior background knowledge. It can also be used in the context of classic courses on data management, allowing the instructor to cover both structured and unstructured data in various formats. Its classroom use is facilitated by a set of slides, which can be downloaded from www.search-computing.org.

Arvustused

From the reviews:

The book covers not only a wide range, but everything that is essential to the topic of Web information retrieval. this book is an excellent starting point into the field of Web information retrieval, and can be recommended for classroom use. (Gottfried Vossen, zbMATH, Vol. 1283, 2014)

... this book is a valuable resource for students and instructors in web IR, primarily as a reference to supplement course teaching. Researchers and practitioners should find the book a useful quick reference guide for key concepts, techniques, and recent trends in web IR. (Wingyan Chung, ACM Computing Reviews, July 2014)

Part I Principles of Information Retrieval
1 An Introduction to Information Retrieval
3(10)
1.1 What Is Information Retrieval?
3(3)
1.1.1 Defining Relevance
4(1)
1.1.2 Dealing with Large, Unstructured Data Collections
4(1)
1.1.3 Formal Characterization
5(1)
1.1.4 Typical Information Retrieval Tasks
5(1)
1.2 Evaluating an Information Retrieval System
6(5)
1.2.1 Aspects of Information Retrieval Evaluation
6(1)
1.2.2 Precision, Recall, and Their Trade-Offs
7(2)
1.2.3 Ranked Retrieval
9(1)
1.2.4 Standard Test Collections
10(1)
1.3 Exercises
11(2)
2 The Information Retrieval Process
13(14)
2.1 A Bird's Eye View
13(2)
2.1.1 Logical View of Documents
14(1)
2.1.2 Indexing Process
15(1)
2.2 A Closer Look at Text
15(4)
2.2.1 Textual Operations
16(2)
2.2.2 Empirical Laws About Text
18(1)
2.3 Data Structures for Indexing
19(6)
2.3.1 Inverted Indexes
20(1)
2.3.2 Dictionary Compression
21(2)
2.3.3 B and B+ Trees
23(2)
2.3.4 Evaluation of B and B+ Trees
25(1)
2.4 Exercises
25(2)
3 Information Retrieval Models
27(12)
3.1 Similarity and Matching Strategies
27(1)
3.2 Boolean Model
28(2)
3.2.1 Evaluating Boolean Similarity
28(1)
3.2.2 Extensions and Limitations of the Boolean Model
29(1)
3.3 Vector Space Model
30(2)
3.3.1 Evaluating Vector Similarity
30(1)
3.3.2 Weighting Schemes and tf x idf
31(1)
3.3.3 Evaluation of the Vector Space Model
32(1)
3.4 Probabilistic Model
32(4)
3.4.1 Binary Independence Model
33(1)
3.4.2 Bootstrapping Relevance Estimation
34(1)
3.4.3 Iterative Refinement and Relevance Feedback
35(1)
3.4.4 Evaluation of the Probabilistic Model
36(1)
3.5 Exercises
36(3)
4 Classification and Clustering
39(18)
4.1 Addressing Information Overload with Machine Learning
39(1)
4.2 Classification
40(5)
4.2.1 Naive Bayes Classifiers
41(1)
4.2.2 Regression Classifiers
42(1)
4.2.3 Decision Trees
43(1)
4.2.4 Support Vector Machines
44(1)
4.3 Clustering
45(8)
4.3.1 Data Processing
46(1)
4.3.2 Similarity Function Selection
46(2)
4.3.3 Cluster Analysis
48(3)
4.3.4 Cluster Validation
51(1)
4.3.5 Labeling
52(1)
4.4 Application Scenarios for Clustering
53(3)
4.4.1 Search Results Clustering
53(2)
4.4.2 Database Clustering
55(1)
4.5 Exercises
56(1)
5 Natural Language Processing for Search
57(14)
5.1 Challenges of Natural Language Processing
57(2)
5.1.1 Dealing with Ambiguity
58(1)
5.1.2 Leveraging Probability
58(1)
5.2 Modeling Natural Language Tasks with Machine Learning
59(2)
5.2.1 Language Models
59(1)
5.2.2 Hidden Markov Models
60(1)
5.2.3 Conditional Random Fields
60(1)
5.3 Question Answering Systems
61(7)
5.3.1 What Is Question Answering?
61(1)
5.3.2 Question Answering Phases
62(2)
5.3.3 Deep Question Answering
64(2)
5.3.4 Shallow Semantic Structures for Text Representation
66(1)
5.3.5 Answer Reranking
67(1)
5.4 Exercises
68(3)
Part II Information Retrieval for the Web
6 Search Engines
71(20)
6.1 The Search Challenge
71(1)
6.2 A Brief History of Search Engines
72(2)
6.3 Architecture and Components
74(1)
6.4 Crawling
75(10)
6.4.1 Crawling Process
76(2)
6.4.2 Architecture of Web Crawlers
78(2)
6.4.3 DNS Resolution and URL Filtering
80(1)
6.4.4 Duplicate Elimination
80(1)
6.4.5 Distribution and Parallelization
81(1)
6.4.6 Maintenance of the URL Frontier
82(2)
6.4.7 Crawling Directives
84(1)
6.5 Indexing
85(5)
6.5.1 Distributed Indexing
87(1)
6.5.2 Dynamic Indexing
88(1)
6.5.3 Caching
89(1)
6.6 Exercises
90(1)
7 Link Analysis
91(20)
7.1 The Web Graph
91(2)
7.2 Link-Based Ranking
93(1)
7.3 PageRank
94(7)
7.3.1 Random Surfer Interpretation
96(1)
7.3.2 Managing Dangling Nodes
97(2)
7.3.3 Managing Disconnected Graphs
99(1)
7.3.4 Efficient Computation of the PageRank Vector
100(1)
7.3.5 Use of PageRank in Google
101(1)
7.4 Hypertext-Induced Topic Search (HITS)
101(8)
7.4.1 Building the Query-Induced Neighborhood Graph
102(1)
7.4.2 Computing the Hub and Authority Scores
103(4)
7.4.3 Uniqueness of Hub and Authority Scores
107(1)
7.4.4 Issues in HITS Application
108(1)
7.5 On the Value of Link-Based Analysis
109(1)
7.6 Exercises
110(1)
8 Recommendation and Diversification for the Web
111(10)
8.1 Pruning Information
111(1)
8.2 Recommendation Systems
112(4)
8.2.1 User Profiling
112(1)
8.2.2 Types of Recommender Systems
113(1)
8.2.3 Content-Based Recommendation Techniques
113(1)
8.2.4 Collaborative Filtering Techniques
114(2)
8.3 Result Diversification
116(4)
8.3.1 Scope
116(1)
8.3.2 Diversification Definition
116(1)
8.3.3 Diversity Criteria
117(1)
8.3.4 Balancing Relevance and Diversity
117(1)
8.3.5 Diversification Approaches
118(1)
8.3.6 Multi-domain Diversification
119(1)
8.4 Exercises
120(1)
9 Advertising in Search
121(16)
9.1 Web Monetization
121(1)
9.2 Advertising on the Web
121(3)
9.3 Terminology of Online Advertising
124(1)
9.4 Auctions
125(4)
9.4.1 First-Price Auctions
126(1)
9.4.2 Second-Price Auctions
127(2)
9.5 Pragmatic Details of Auction Implementation
129(1)
9.6 Federated Advertising
130(2)
9.7 Exercises
132(5)
Part III Advanced Aspects of Web Search
10 Publishing Data on the Web
137(24)
10.1 Options for Publishing Data on the Web
137(2)
10.2 The Deep Web
139(3)
10.3 Web APIs
142(3)
10.4 Microformats
145(3)
10.5 RDFa
148(4)
10.6 Linked Data
152(4)
10.7 Conclusion and Outlook
156(2)
10.8 Exercises
158(3)
11 Meta-search and Multi-domain Search
161(20)
11.1 Introduction and Motivation
161(1)
11.2 Top-k Query Processing over Data Sources
162(6)
11.2.1 OID-Based Problem
163(3)
11.2.2 Attribute-Based Problem
166(2)
11.3 Meta-search
168(3)
11.4 Multi-domain Search
171(7)
11.4.1 Service Registration
171(2)
11.4.2 Processing Multi-domain Queries
173(2)
11.4.3 Exploratory Search
175(2)
11.4.4 Data Visualization
177(1)
11.5 Exercises
178(3)
12 Semantic Search
181(26)
12.1 Understanding Semantic Search
181(3)
12.2 Semantic Model
184(4)
12.3 Resources
188(2)
12.3.1 System Perspective
188(2)
12.3.2 User Perspective
190(1)
12.4 Queries
190(5)
12.4.1 User Perspective
192(1)
12.4.2 System Perspective
192(2)
12.4.3 Query Translation and Presentation
194(1)
12.5 Semantic Matching
195(3)
12.6 Constructing the Semantic Model
198(4)
12.7 Semantic Resources Annotation
202(2)
12.8 Conclusions and Outlook
204(1)
12.9 Exercises
205(2)
13 Multimedia Search
207(16)
13.1 Motivations and Challenges of Multimedia Search
207(4)
13.1.1 Requirements and Applications
207(2)
13.1.2 Challenges
209(2)
13.2 MIR Architecture
211(5)
13.2.1 Content Process
213(1)
13.2.2 Query Process
214(2)
13.3 MIR Metadata
216(1)
13.4 MIR Content Processing
217(1)
13.5 Research Projects and Commercial Systems
218(3)
13.5.1 Research Projects
218(2)
13.5.2 Commercial Systems
220(1)
13.6 Exercises
221(2)
14 Search Process and Interfaces
223(12)
14.1 Search Process
223(2)
14.2 Information Seeking Paradigms
225(3)
14.3 User Interfaces for Search
228(6)
14.3.1 Query Specification
228(2)
14.3.2 Result Presentation
230(3)
14.3.3 Faceted Search
233(1)
14.4 Exercises
234(1)
15 Human Computation and Crowdsearching
235(24)
15.1 Introduction
235(3)
15.1.1 Background
236(2)
15.2 Applications
238(6)
15.2.1 Games with a Purpose
238(2)
15.2.2 Crowdsourcing
240(2)
15.2.3 Human Sensing and Mobilization
242(2)
15.3 The Human Computation Framework
244(6)
15.3.1 Phases of Human Computation
244(2)
15.3.2 Human Performers
246(1)
15.3.3 Examples of Human Computation
246(3)
15.3.4 Dimensions of Human Computation Applications
249(1)
15.4 Research Challenges and Projects
250(6)
15.4.1 The CrowdSearcher Project
250(2)
15.4.2 The CUbRIK Project
252(4)
15.5 Open Issues
256(1)
15.6 Exercises
257(2)
References 259(18)
Index 277
Stefano Ceri is a professor of Database Systems at the Politecnico di Milano and the director of Alta Scuola Politecnica. He is the recipient of the 2013 SIGMOD Edgar F. Codd Innovation Award for a series of influential contributions to several areas of database management, including distributed databases, rule-based systems, web-based application design, and search computing.

Alessandro Bozzon is an assistant professor of Information Retrieval at the Delft University of Technology. His research is on information management on the Web, with specific focus on Information Retrieval and human- and social-computation.





Marco Brambilla is an assistant professor of Software Engineering at Politecnico di Milano and shareholder at WebRatio. His research is on Web modeling tools and methods, spanning crowdsourcing, social networks, search engines, BPM, SOA and enterprise architectures.





Emanuele Della Valle is an assistant professor of Software Project Management at Politecnico di Milano. His research is on Intelligent Web Information Systems and includes Semantic Web, Search Engines, Data Stream Processing, Rank-aware Databases and Crowdsourcing.





Piero Fraternali is a professor of Web Technologies at Politecnico di Milano, co-inventor of the Web Modeling Language, the basis of the WebRatio tool company and of the recent OMG Interaction Flow Modeling Language (IFML). His research focuses on Web development tools and on social-human computation.





Silvia Quarteroni is a senior consultant at Elca Informatique, Switzerland. She holds a Computer Science PhD on Question Answering systems and her main research interests concern statistical approaches to natural language processing.