CROWDSOURCING AND ITS APPLICATIONS ON SCIENTIFIC RESEARCH Crowdsourcing

  • Slides: 61
Download presentation
CROWDSOURCING AND ITS APPLICATIONS ON SCIENTIFIC RESEARCH

CROWDSOURCING AND ITS APPLICATIONS ON SCIENTIFIC RESEARCH

Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Some Examples �Call for professional helps �Award 50, 000 to 1, 000 for each

Some Examples �Call for professional helps �Award 50, 000 to 1, 000 for each tasks �Office work platform �Microtask platform �Over 30, 000 tasks at the same time

What Tasks are crowdsourceable?

What Tasks are crowdsourceable?

Software Development Reward: 25, 000 USD

Software Development Reward: 25, 000 USD

Data Entry Reward: 4. 4 USD/hour

Data Entry Reward: 4. 4 USD/hour

Image Tagging Reward: 0. 04 USD

Image Tagging Reward: 0. 04 USD

Trip Advice Reward: points on Yahoo! Answers

Trip Advice Reward: points on Yahoo! Answers

The impact of crowdsourcing on scientific research?

The impact of crowdsourcing on scientific research?

Amazon Mechanical Turk A micro-task marketplace Task prices are usually between 0. 01 to

Amazon Mechanical Turk A micro-task marketplace Task prices are usually between 0. 01 to 1 USD Easy-to-use interface

Amazon Mechanical Turk Human Intelligence Task (HIT) Tasks hard for computers Developer Prepay the

Amazon Mechanical Turk Human Intelligence Task (HIT) Tasks hard for computers Developer Prepay the money Publish HITs Get results Worker Complete the HITs Get paid

Who are the workers?

Who are the workers?

A Survey of Mechanical Turk Survey on 1000 Turkers (Turk workers) Two identical surveys

A Survey of Mechanical Turk Survey on 1000 Turkers (Turk workers) Two identical surveys (Oct. 2008 and Dec. 2008) Consistent results Blog post: �A Computer Scientist in a Business School

Gender Education Age Annual Income

Gender Education Age Annual Income

Compare with Internet Demographics Use the data from Com. Score In summary, Tukers are

Compare with Internet Demographics Use the data from Com. Score In summary, Tukers are younger �Portion of 21 -35 years old: 51% vs. 22% in internet mainly female � 70% female vs. 50 % female having lower income � 65% turkers with income < 60 k/year vs. 45% in internet having smaller family � 55% turkers have no children vs. 40% in internet

How Much Turkers Earn?

How Much Turkers Earn?

Why Turkers Turk?

Why Turkers Turk?

Research Applications

Research Applications

Dataset Collection Dataset is important in computer science! In multimedia analysis Is there X

Dataset Collection Dataset is important in computer science! In multimedia analysis Is there X in the image Where is Y in the image In natural language processing What is the emotion of this sentence And in lots of other applications

Dataset Collection Utility Annotation By Sorokin and Forsyth at UIUC Image analysis �Type keyword

Dataset Collection Utility Annotation By Sorokin and Forsyth at UIUC Image analysis �Type keyword �Select examples �Click on landmarks �Outline figures

0. 01 USD/ task

0. 01 USD/ task

0. 02 USD/ task

0. 02 USD/ task

0. 01 USD/ task

0. 01 USD/ task

0. 01 USD/ task

0. 01 USD/ task

Dataset Collection Linguistic annotations (Snow et al. 2008) Word similarity USD 0. 2 to

Dataset Collection Linguistic annotations (Snow et al. 2008) Word similarity USD 0. 2 to label 30 word pairs

Dataset Collection Linguistic annotations (Snow et al. 2008) Affect recognition USD 0. 4 to

Dataset Collection Linguistic annotations (Snow et al. 2008) Affect recognition USD 0. 4 to label 20 headlines (140 labels)

Dataset Collection Linguistic annotations (Snow et al. 2008) Textual entailment �If “Microsoft was established

Dataset Collection Linguistic annotations (Snow et al. 2008) Textual entailment �If “Microsoft was established in Italy in 1985”, then “Microsoft was established in 1985” ? Word sense disambiguation �“a bass on the line” vs. “a funky bass line” Temporal annotation �Ran happens before fell: �“The horse ran past the barn fekk”

Dataset Collection Document relevance evaluation Alonso et al. (2008) User rating collection Kittur et

Dataset Collection Document relevance evaluation Alonso et al. (2008) User rating collection Kittur et al. (2008) Noun compound paraphrasing Nakov (2008) Name resoluation Su et al. (2007)

Data Characteristic Cost? Efficiency? Quality?

Data Characteristic Cost? Efficiency? Quality?

Cost and Efficiency In image annotation Sorokin and Forsyth, 2008

Cost and Efficiency In image annotation Sorokin and Forsyth, 2008

Cost and Efficiency In linguistic annotation Snow et. al, 2008

Cost and Efficiency In linguistic annotation Snow et. al, 2008

Cheap and fast! Is it good?

Cheap and fast! Is it good?

Quality Multiple non-experts can beat experts 三個臭皮匠勝過一個諸葛亮 Black line agreement among turkers Green line:

Quality Multiple non-experts can beat experts 三個臭皮匠勝過一個諸葛亮 Black line agreement among turkers Green line: single expert Golden result: agreement among multiple experts

In addition to Dataset Collection

In addition to Dataset Collection

Qo. E Measurement Qo. E (Quality of Experience) Subjective measure of user perception Traditional

Qo. E Measurement Qo. E (Quality of Experience) Subjective measure of user perception Traditional approach User studies by MOS ratings (Bad -> Excellent) Crowdsourcing with paired comparison Diverse user input Easy to understand Interval scale scores can be calculated

Acoustic Qo. E Evaluation

Acoustic Qo. E Evaluation

Acoustic Qo. E Evaluation Which one is better? Simple pair comparison

Acoustic Qo. E Evaluation Which one is better? Simple pair comparison

Optical Qo. E evaluation

Optical Qo. E evaluation

Interactive Qo. E Evaluation

Interactive Qo. E Evaluation

Acoustic Qo. E MP 3 Compression Rate Vo. IP Loss Rate

Acoustic Qo. E MP 3 Compression Rate Vo. IP Loss Rate

Optical Qo. E Video Codec Packet loss rate

Optical Qo. E Video Codec Packet loss rate

Iterative Task

Iterative Task

Iterative Tasks Turkit: tools for iterative tasks on Mturk Imperative programming paradigm Basic elements

Iterative Tasks Turkit: tools for iterative tasks on Mturk Imperative programming paradigm Basic elements �Variable (a = b) �Control (if else statement) �Loop (for, while statement) Turning MTurk into a programming platform which integrates human brain powers

Iterative Text Improvement A Wikipedia-like scenario One Turker improve the text Other Turkers vote

Iterative Text Improvement A Wikipedia-like scenario One Turker improve the text Other Turkers vote if the improvement is valid

Iterative Text Improvement Image description Instructions for the improve-HIT �Please improve the description for

Iterative Text Improvement Image description Instructions for the improve-HIT �Please improve the description for this image �People will vote whether to approve your changes �Use no more than 500 characters Instructions for the vote-HIT �Please select the better description for this image �Your vote must agree with the majority to be approved

Iterative Text Improvement Image description A partial view of a pocket calculator together with

Iterative Text Improvement Image description A partial view of a pocket calculator together with some coins and a pen. A view of personal items a calculator, and some gold and copper coins, and a round tip pen, these are all pocket and wallet sized item used for business, writing, calculating prices or solving math problems and purchasing items. A close-up photograph of the following items: * A CASIO multi-function calculator * A ball point pen, uncapped * Various coins, apparently European, both copper and gold …Various British coins; two of £ 1 value, three of 20 p value and one of 1 p value. …

Iterative Text Improvement Image description A close-up photograph of the following items: A CASIO

Iterative Text Improvement Image description A close-up photograph of the following items: A CASIO multi-function, solar powered scientific calculator. A blue ball point pen with a blue rubber grip and the tip extended. Six British coins; two of £ 1 value, three of 20 p value and one of 1 p value. Seems to be a theme illustration for a brochure or document cover treating finance - probably personal finance.

Iterative Text Improvement Handwriting Recognition Version 1 You (? ) (work). (? ) work

Iterative Text Improvement Handwriting Recognition Version 1 You (? ) (work). (? ) work (not) (time). I (? ) a few grammatical mistakes. Overall your writing style is a bit too (phoney). You do (? ) have good (points), but they got lost amidst the (writing). (signature)

Iterative Text Improvement Handwriting Recognition Version 6 “You (misspelled) (several) (words). Please spell-check your

Iterative Text Improvement Handwriting Recognition Version 6 “You (misspelled) (several) (words). Please spell-check your work next time. I also notice a few grammatical mistakes. Overall your writing style is a bit too phoney. You do make some good (points), but they got lost amidst the (writing). (signature)”

Cost and Efficiency

Cost and Efficiency

More on Methodology

More on Methodology

Repeated Labeling Crowdsourcing -> Multiple imperfect labeler Each worker is a labeler Labels are

Repeated Labeling Crowdsourcing -> Multiple imperfect labeler Each worker is a labeler Labels are not always correct Repeated labeling Improve the supervised induction �Increase the single-label accuracy �Decrease the cost for acquiring training data

Repeated Labeling Repeated labeling helps improve the overall quality when the accuracy of single

Repeated Labeling Repeated labeling helps improve the overall quality when the accuracy of single labeler low.

Selected Repeated Labeling Repeat-label the most uncertain points Label uncertainty (LU) Whether the label

Selected Repeated Labeling Repeat-label the most uncertain points Label uncertainty (LU) Whether the label distribution is stable Calculated from beta distribution Model uncertainty (MU) Whether the model has high confidence for the label Calculated from model predictions

Selected Repeated Labeling Selected repeated labeling improves the overall quality of crowdsourcing approach. GRR:

Selected Repeated Labeling Selected repeated labeling improves the overall quality of crowdsourcing approach. GRR: no selected repeated labeling MU: Model Uncertainty LU: Label Uncertainty LMU: integrate Label and Model Uncertainty

Incentive vs. Performance High financial incentive -> high performance? User studies (Mason and Watt

Incentive vs. Performance High financial incentive -> high performance? User studies (Mason and Watt 2009) Order images �Ex: choose the busiest image Solve word puzzles

Incentive vs. Performance High incentive -> high quantity, not high quality

Incentive vs. Performance High incentive -> high quantity, not high quality

Incentive vs. Performance How much workers think they deserve Workers always wants more Users

Incentive vs. Performance How much workers think they deserve Workers always wants more Users would be influenced by their paid amount Pay little at first, and incrementally increase the payment

Conclusion Crowdsourcing provides a new paradigm and a new platform for computer science researches.

Conclusion Crowdsourcing provides a new paradigm and a new platform for computer science researches. New applications, new methodologies, and new businesses are quickly developing with the aid of crowdsouring.