Muutke küpsiste eelistusi

E-raamat: Crowdsourced Data Management: Hybrid Machine-Human Computing

  • Formaat: EPUB+DRM
  • Ilmumisaeg: 12-Oct-2018
  • Kirjastus: Springer Verlag, Singapore
  • Keel: eng
  • ISBN-13: 9789811078477
  • Formaat - EPUB+DRM
  • Hind: 110,53 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: EPUB+DRM
  • Ilmumisaeg: 12-Oct-2018
  • Kirjastus: Springer Verlag, Singapore
  • Keel: eng
  • ISBN-13: 9789811078477

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book provides an overview of crowdsourced data management. Covering all aspects including the workflow, algorithms and research potential, it particularly focuses on the latest techniques and recent advances. The authors identify three key aspects in determining the performance of crowdsourced data management: quality control, cost control and latency control. By surveying and synthesizing a wide spectrum of studies on crowdsourced data management, the book outlines important factors that need to be considered to improve crowdsourced data management. It also introduces a practical crowdsourced-database-system design and presents a number of crowdsourced operators. Self-contained and covering theory, algorithms, techniques and applications, it is a valuable reference resource for researchers and students new to crowdsourced data management with a basic knowledge of data structures and databases.

1 Introduction
1(1)
1.1 Motivation
1(1)
1.2 Crowdsourcing Overview
2(2)
1.3 Crowdsourced Data Management
4(4)
References
8(3)
2 Crowdsourcing Background
11(1)
2.1 Crowdsourcing Overview
11(1)
2.2 Crowdsourcing Workflow
12(4)
2.2.1 Workflow from Requester Side
12(3)
2.2.2 Workflow from Worker Side
15(1)
2.2.3 Workflow from Platform Side
16(1)
2.3 Crowdsourcing Platforms
16(2)
2.3.1 Amazon Mechanical Turk (AMT)
16(1)
2.3.2 CrowdFlower
17(1)
2.3.3 Other Platforms
17(1)
2.4 Existing Surveys, Tutorials, and Books
18(1)
2.5 Optimization Goal of Crowdsourced Data Management
18(3)
References
19(2)
3 Quality Control
21(24)
3.1 Overview of Quality Control
21(2)
3.2 Truth Inference
23(13)
3.2.1 Truth Inference Problem
23(2)
3.2.2 Unified Solution Framework
25(3)
3.2.3 Comparisons of Existing Works
28(7)
3.2.4 Extensions of Truth Inference
35(1)
3.3 Task Assignment
36(6)
3.3.1 Task Assignment Setting
36(4)
3.3.2 Worker Selection Setting
40(2)
3.4 Summary of Quality Control
42(3)
References
42(3)
4 Cost Control
45(18)
4.1 Overview of Cost Control
45(1)
4.2 Task Pruning
46(3)
4.2.1 Difficulty Measurement
47(1)
4.2.2 Threshold Selection
48(1)
4.2.3 Pros and Cons
49(1)
4.3 Answer Deduction
49(2)
4.3.1 Iterative Workflow
49(1)
4.3.2 Presentation Order
50(1)
4.3.3 Pros and Cons
51(1)
4.4 Task Selection
51(3)
4.4.1 Model-Driven
52(1)
4.4.2 Problem-Driven
53(1)
4.4.3 Pros and Cons
54(1)
4.5 Sampling
54(3)
4.5.1 Crowdsourced Aggregation
54(1)
4.5.2 Data Cleaning
55(2)
4.5.3 Pros and Cons
57(1)
4.6 Task Design
57(3)
4.6.1 User Interface Design
58(1)
4.6.2 Non-monetary Incentives
59(1)
4.6.3 Pros and Cons
60(1)
4.7 Summary of Cost Control
60(3)
References
61(2)
5 Latency Control
63(8)
5.1 Overview of Latency Control
63(1)
5.2 Single-Task Latency Control
64(2)
5.2.1 Recruitment Time
64(1)
5.2.2 Qualification Test Time
65(1)
5.2.3 Work Time
65(1)
5.3 Single-Batch Latency Control
66(2)
5.3.1 Statistical Model
66(1)
5.3.2 Straggler Mitigation
66(2)
5.4 Multi-batch Latency Control
68(1)
5.4.1 Motivation of Multiple Batches
68(1)
5.4.2 Two Basic Ideas
68(1)
5.5 Summary of Latency Control
69(2)
References
70(1)
6 Crowdsourcing Database Systems and Optimization
71(26)
6.1 Overview of Crowdsourcing Database Systems
71(4)
6.2 Crowdsourcing Query Language
75(7)
6.2.1 CrowdDB
75(1)
6.2.2 Qurk
76(1)
6.2.3 Deco
77(1)
6.2.4 CDAS
78(2)
6.2.5 CDB
80(2)
6.3 Crowdsourcing Query Optimization
82(11)
6.3.1 CrowdDB
82(2)
6.3.2 Qurk
84(1)
6.3.3 Deco
85(2)
6.3.4 CDAS
87(4)
6.3.5 CDB
91(2)
6.4 Summary of Crowdsourcing Database Systems
93(4)
References
94(3)
7 Crowdsourced Operators
97
7.1 Crowdsourced Selection
97(4)
7.1.1 Crowdsourced Filtering
98(1)
7.1.2 Crowdsourced Find
99(2)
7.1.3 Crowdsourced Search
101(1)
7.2 Crowdsourced Collection
101(3)
7.2.1 Crowdsourced Enumeration
101(3)
7.2.2 Crowdsourced Fill
104(1)
7.3 Crowdsourced Join (Crowdsourced Entity Resolution)
104(9)
7.3.1 Background
104(1)
7.3.2 Candidate Set Generation
105(1)
7.3.3 Candidate Set Verification
106(2)
7.3.4 Human Interface for Join
108(1)
7.3.5 Other Approaches
109(4)
7.4 Crowdsourced Sort, Top-k, and Max/Min
113(8)
7.4.1 Workflow
113(1)
7.4.2 Pairwise Comparisons
113(1)
7.4.3 Result Inference
114(5)
7.4.4 Task Selection
119(1)
7.4.5 Crowdsourced Max
120(1)
7.5 Crowdsourced Aggregation
121(2)
7.5.1 Crowdsourced Count
121(1)
7.5.2 Crowdsourced Median
122(1)
7.5.3 Crowdsourced Group By
123(1)
7.6 Crowdsourced Categorization
123(1)
7.7 Crowdsourced Skyline
124(2)
7.7.1 Crowdsourced Skyline on Incomplete Data
125(1)
7.7.2 Crowdsourced Skyline with Comparisons
126(1)
7.8 Crowdsourced Planning
126(6)
7.8.1 General Crowdsourced Planning Query
127(2)
7.8.2 An Application: Route Planning
129(3)
7.9 Crowdsourced Schema Matching
132
Guoliang Li is an associate professor at the Department of Computer Science, Tsinghua University, Beijing, China. His research interests include crowdsourced data management, big spatio-temporal data analytics, large-scale data cleaning and integration. He has published more than 100 papers at leading conferences and in journals, such as SIGMOD, VLDB, ICDE, SIGKDD, SIGIR, TODS, VLDB Journal, and TKDE. He is a PC co-chair of WAIM 2014, WebDB 2014, and NDBC 2016. He servers as associate editor for IEEE Transactions and Data Engineering, the VLDB Journal, BigData Research, IEEE Data Engineering Bulletin. He has regularly served as a PC member for several conferences, such as SIGMOD, VLDB, KDD, ICDE, WWW, IJCAI, and AAAI. His papers have been cited more than 4500 times. He received the VLDB 2017 Early Research Contribution Award, IEEE TCDE Early Career Award 2014, The national youth talent support program 2016, Young ChangJiang Scholar 2016, NSFC Excellent Young Scholars Award 2014, and the CCF Young Scientist award 2014.





Prof. Michael J. Franklin is the inaugural holder of the Liew Family Chair of Computer Science at the University of Chicago. An authority on databases, data analytics, data management and distributed systems, he also serves as senior advisor to the provost on computation and data science. Most recently he was the Thomas M. Siebel Professor of Computer Science and chair of the Computer Science Division of the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley, where he currently is an adjunct professor. He co-founded and directs Berkeleys Algorithms, Machines and People Laboratory (AMPLab), a leading academic big data analytics research center. The AMPLab won a National Science Foundation CISE "Expeditions in Computing" award, which was announced as part of the White House Big Data Research initiative in March 2012, and has received support from over 30 industrial sponsors. AMPLab has created industry-changing open source big data software including Apache Spark and BDAS, the Berkeley Data Analytics Stack.   At Berkeley Professor Franklin also served as an executive committee member for the Berkeley Institute for Data Science, a campus-wide initiative to advance data science environments. He is a fellow of the Association for Computing Machinery and two-time recipient of the ACM SIGMOD.