23 thoughts on “Comments on MTurk Data integrity.

  1. Nathan Favero

    Thanks for sharing this. Very helpful. Two quick thoughts (one of which I mentioned briefly on Twitter already):

    1. If doing experiments on MTurk, I would expect that having some bots/repeat respondents in your data should generally bias estimates of treatment effects towards 0. Thus, tests of treatment effects will tend to be conservative. This is definitely something to be aware of, and if we care about precisely estimating effect sizes (e.g., comparing to effect sizes from other studies), we should acknowledge that our estimates are probably biased towards zero. This isn’t all that different from a case where there’s (simple) measurement error.

    2. This isn’t so different from the problem of inattentive subjects, except for the scale of it. Good manipulation checks should catch this kind of thing, from what I can tell.

  2. billy

    Too bad you did not post the original qualifications. It is obvious you are skilled with statistics, but have little experience on mturk.
    It is quite a broad stoke to say mturk gives you bad data when you did not solicit high quality participants to begin with.

  3. Timothy Ryan Post author

    Thanks for posting this Sean! Looks like a helpful addition to the conversation.

    I think some of the parties in the conversation might be talking past each other as concerns “bots.” For some people, bots seems to be “completely automated responses.” For other people, bots more means computer-assisted or subsidized.

    I look at responses that have identical text and that came in *at the same time,* and I think, “Look! A bot!” Maybe that’s a misuse of the term, but what I really mean is that it’s clear that one entity is filling out the survey multiple times–and in a way such that computers are helping the effort. Maybe that’s having two browser windows open simultaneously–each logged in from a different account–and using copy/paste to reproduce the same text. And maybe you wouldn’t call that kind of activity a bot. But in any event, it’s fraudulent and damaging to the dataset.

    In your paper, you also note receiving verbatim repeated text in open-ended responses, though I missed it if you said whether any of these came in simultaneously. But I think we’re looking at the same phenomenon here, and I think it’s clear that computers are helping a single individual fill out a survey thoughtlessly, and multiple times. Whether that counts as a “bot” or not might just be a matter of semantics.

  4. Patrick Comer

    Tim, love the analysis. I run Lucid which is the largest marketplace for survey responses.

    Fraudulent responses via bot or bot-like behavior started in earnest in the summer of 2016. MTurk is just one source in the highly active survey sample marketplace. As systems have become more automated and programmatic, the ability for fraud at scale became more profitable. Simply put: we saw the peak of this two years ago. Slightly surprised that it took this long to discover in MTurk responses.

    What did we have to do to combat? 1) millions of dollars spent on fraud detection and security software 2) technical integrations between sources and survey software and 3) training for buyers and sellers of respondents. It’s not a silver bullet answer but rather a long grind to reduce fraud and error in the survey process. Over two quarters we were able to cut the fraud rate in half.

    I’m passionate about quality of responses. Check out more info here: https://luc.id/quality/

  5. Beatriz

    Looking now at some old data where I had quality problems. Participants with duplicate locations were more likely to fail the open ended questions. Also, my age question was to select the year from drop down menu. Interestingly, disproportionate answers for 1987, 1988, and 1989 (so around the 30’s you found). Most of the text fails are versions of “good” and “nice”.

