Please feel free to comment on my previous post about data contamination on MTurk below.
23 thoughts on “Comments on MTurk Data integrity.”
Nathan Favero
Thanks for sharing this. Very helpful. Two quick thoughts (one of which I mentioned briefly on Twitter already):
1. If doing experiments on MTurk, I would expect that having some bots/repeat respondents in your data should generally bias estimates of treatment effects towards 0. Thus, tests of treatment effects will tend to be conservative. This is definitely something to be aware of, and if we care about precisely estimating effect sizes (e.g., comparing to effect sizes from other studies), we should acknowledge that our estimates are probably biased towards zero. This isn’t all that different from a case where there’s (simple) measurement error.
2. This isn’t so different from the problem of inattentive subjects, except for the scale of it. Good manipulation checks should catch this kind of thing, from what I can tell.
billy
Too bad you did not post the original qualifications. It is obvious you are skilled with statistics, but have little experience on mturk.
It is quite a broad stoke to say mturk gives you bad data when you did not solicit high quality participants to begin with.
Sean Dennis
My co-authors and I have just posted the following working paper to SSRN that investigates the root cause of this issue. Importantly, we find no evidence of bots.
Thanks for posting this Sean! Looks like a helpful addition to the conversation.
I think some of the parties in the conversation might be talking past each other as concerns “bots.” For some people, bots seems to be “completely automated responses.” For other people, bots more means computer-assisted or subsidized.
I look at responses that have identical text and that came in *at the same time,* and I think, “Look! A bot!” Maybe that’s a misuse of the term, but what I really mean is that it’s clear that one entity is filling out the survey multiple times–and in a way such that computers are helping the effort. Maybe that’s having two browser windows open simultaneously–each logged in from a different account–and using copy/paste to reproduce the same text. And maybe you wouldn’t call that kind of activity a bot. But in any event, it’s fraudulent and damaging to the dataset.
In your paper, you also note receiving verbatim repeated text in open-ended responses, though I missed it if you said whether any of these came in simultaneously. But I think we’re looking at the same phenomenon here, and I think it’s clear that computers are helping a single individual fill out a survey thoughtlessly, and multiple times. Whether that counts as a “bot” or not might just be a matter of semantics.
Tim, love the analysis. I run Lucid which is the largest marketplace for survey responses.
Fraudulent responses via bot or bot-like behavior started in earnest in the summer of 2016. MTurk is just one source in the highly active survey sample marketplace. As systems have become more automated and programmatic, the ability for fraud at scale became more profitable. Simply put: we saw the peak of this two years ago. Slightly surprised that it took this long to discover in MTurk responses.
What did we have to do to combat? 1) millions of dollars spent on fraud detection and security software 2) technical integrations between sources and survey software and 3) training for buyers and sellers of respondents. It’s not a silver bullet answer but rather a long grind to reduce fraud and error in the survey process. Over two quarters we were able to cut the fraud rate in half.
I’m passionate about quality of responses. Check out more info here: https://luc.id/quality/
swiv
If your university is in the habit of offering particularly well-paying surveys, you also be assured that approx. the same 50 people are taking part in a good number of them:
Looking now at some old data where I had quality problems. Participants with duplicate locations were more likely to fail the open ended questions. Also, my age question was to select the year from drop down menu. Interestingly, disproportionate answers for 1987, 1988, and 1989 (so around the 30’s you found). Most of the text fails are versions of “good” and “nice”.
Thanks for sharing this. Very helpful. Two quick thoughts (one of which I mentioned briefly on Twitter already):
1. If doing experiments on MTurk, I would expect that having some bots/repeat respondents in your data should generally bias estimates of treatment effects towards 0. Thus, tests of treatment effects will tend to be conservative. This is definitely something to be aware of, and if we care about precisely estimating effect sizes (e.g., comparing to effect sizes from other studies), we should acknowledge that our estimates are probably biased towards zero. This isn’t all that different from a case where there’s (simple) measurement error.
2. This isn’t so different from the problem of inattentive subjects, except for the scale of it. Good manipulation checks should catch this kind of thing, from what I can tell.
Too bad you did not post the original qualifications. It is obvious you are skilled with statistics, but have little experience on mturk.
It is quite a broad stoke to say mturk gives you bad data when you did not solicit high quality participants to begin with.
My co-authors and I have just posted the following working paper to SSRN that investigates the root cause of this issue. Importantly, we find no evidence of bots.
https://ssrn.com/abstract=3233954
Thanks for posting this Sean! Looks like a helpful addition to the conversation.
I think some of the parties in the conversation might be talking past each other as concerns “bots.” For some people, bots seems to be “completely automated responses.” For other people, bots more means computer-assisted or subsidized.
I look at responses that have identical text and that came in *at the same time,* and I think, “Look! A bot!” Maybe that’s a misuse of the term, but what I really mean is that it’s clear that one entity is filling out the survey multiple times–and in a way such that computers are helping the effort. Maybe that’s having two browser windows open simultaneously–each logged in from a different account–and using copy/paste to reproduce the same text. And maybe you wouldn’t call that kind of activity a bot. But in any event, it’s fraudulent and damaging to the dataset.
In your paper, you also note receiving verbatim repeated text in open-ended responses, though I missed it if you said whether any of these came in simultaneously. But I think we’re looking at the same phenomenon here, and I think it’s clear that computers are helping a single individual fill out a survey thoughtlessly, and multiple times. Whether that counts as a “bot” or not might just be a matter of semantics.
Tim, love the analysis. I run Lucid which is the largest marketplace for survey responses.
Fraudulent responses via bot or bot-like behavior started in earnest in the summer of 2016. MTurk is just one source in the highly active survey sample marketplace. As systems have become more automated and programmatic, the ability for fraud at scale became more profitable. Simply put: we saw the peak of this two years ago. Slightly surprised that it took this long to discover in MTurk responses.
What did we have to do to combat? 1) millions of dollars spent on fraud detection and security software 2) technical integrations between sources and survey software and 3) training for buyers and sellers of respondents. It’s not a silver bullet answer but rather a long grind to reduce fraud and error in the survey process. Over two quarters we were able to cut the fraud rate in half.
I’m passionate about quality of responses. Check out more info here: https://luc.id/quality/
If your university is in the habit of offering particularly well-paying surveys, you also be assured that approx. the same 50 people are taking part in a good number of them:
https://www.reddit.com/r/mturk/comments/98ko4r/this_automatic_accept_script/
https://www.reddit.com/r/mturk/comments/98j1d6/turkerhub_has_a_secret_script_members_pay_for/
Thank you for the detailed information.
Good reading. Nice article
Looking now at some old data where I had quality problems. Participants with duplicate locations were more likely to fail the open ended questions. Also, my age question was to select the year from drop down menu. Interestingly, disproportionate answers for 1987, 1988, and 1989 (so around the 30’s you found). Most of the text fails are versions of “good” and “nice”.
To learn aws training, best training institutes are available in hyderabad with expert faculty.
Thank you for the nice and informative blog. Education helps to develop personal development, skills and capability to survive in the world.
The blog is great , nice info
thanks for sharing!
I want to say thanks to you. I have bookmark your site for future updates.
r for data science
thanks for sharing, really very useful
thank you for sharing this content, it was really use full.
Thanks for the thoughts. I totally agree!
Will share.
Nice post…keep sharing with us
Nice post…keep sharing with us
Very nice blog and content, thanks for sharing your knowledge and wisdom with us,
best regards
Alex from Vender Carro
take care
Excellent analysis.
Do you have an update on this research? I think you and Yanna did a great job!
thanks