Tips For Scholars: Detecting Bots During Social Media Recruitment

As part of the follow-up study to the Media & Teen Mental Health project, our team leveraged social media for recruiting participants and posted digital flyers on different platforms, such as Twitter, Facebook, Instagram, TikTok, and Reddit. After sharing the recruitment survey on Facebook, we received a substantial surge of interest in our research study. While initially exciting, after further inspection we realized that over 90% of these new responses were from bots (computer programs that automatically complete surveys). Although these responses were discarded, their information provided valuable insights into bot behavior and activity. 

We noticed questionable patterns and developed guidelines to determine the likelihood of a bot response. We decided to throw away a participant's response if it met at least 3 of the following criteria: location, name/email address, timing, demographics, and repeat answers. 

Location:

  • Since the study was intended only for U.S. participants, IP addresses located outside of the U.S. were red flags

  • Multiple submissions from the same IP address was another sign of fraudulent activity

  • The exact same longitude and latitude on multiple unrelated responses was suspicious

  • Location data did not match the self-reported location provided by the respondent (IP address, area code, and/or state) 

Name/email address:

  • Mismatches between respondent name and email indicated potential fraud (for example: John Smith had the email address “chadroberts123@gmail.com”)

  • Too many numbers or random letters in the handle seemed dubious (for example: “jd14780791@gmail.com”) 

  • Mismatches between respondent name, gender, and email suggested possible deceit (for example: Christina Le who reported as female had the email address “zackarymaradv49@gmail.com”)

  • Inconsistencies with parent/child names and corresponding email addresses generated skepticism (for example: Jared Murray signed up as a parent with the email address “abinbayaravichandrann@gmail.com” and his child Alen Henderson’s email address was “glenlishn@gmail.com”)

Timing:

  • Genuine survey responses took 5 minutes on average and surveys that were completed too quickly or slowly were flagged for additional investigation (for example: some surveys were completed in 12 seconds and some took over 50 minutes)

  • Multiple completed surveys in a row with the same timing were marked for further scrutiny

  • Note: bots can learn, so be on the lookout for long response times that suggest a single bot is learning

Demographics:

  • Improbable demographics revealed potential imposters (for example: the percentage of Native American respondents was higher compared to the national average)

  • Note: keep in mind the characteristics of the local population that you are recruiting from

Repeat answers:

  • Same phrasing on different survey responses signaled possible bot activity

While online study recruitment has become increasingly popular, there is a large risk of bot interference and researchers need to be more aware of the problem. Jennifer Doty, PhD, CFLE from the Department of Family, Youth, and Community Sciences at the University of Florida explains her experience with bots:

“Last spring, we collected prescreener data online to interview youth from a variety of racial and ethnic groups. We launched our internet search via Facebook and online listservs. At first, we had a trickle of interest, but on April 20th we had an explosion of interest. Upon examination, we could see that the data was generated by a bot—the emails were strange and some were repeated, and we had about 100 times the number of American Indians we would expect. This was also the day that Derek Chauvin’s jury reached a verdict. We suspect that bots were especially active on a day where racial tensions were high. After this, we researched strategies to identify mischievous responders and included them in our next grant proposal. To ensure validity of participants and avoid mischievous responders, in our next project, we will include ReCAPTCHA technology screening and track IP addresses. In addition, we will include open-ended questions, which bots often leave empty. We will also include up to four screening questions that will help us flag mischievous responders. For example, we will require youth to match the year to the age they report and validate their age at the time of a recent event in history. Another strategy is including questions like, “Does the earth move around the sun?” These strategies have been used in previous studies to validate online samples.”

Celeste Campos-Castillo, PhD, an Associate Professor from the Department of Sociology at University of Wisconsin-Milwaukee also shares strategies for identifying bots:

“Other useful tips come from the world of researchers using crowdsourcing platforms, such as Amazon Mechanical Turk, to recruit respondents. These platforms provide researchers access to thousands of workers who complete tasks, including surveys, in exchange for payment. Unfortunately, the average payment is notoriously below minimum wage standards, leading workers to seek ways to complete as many tasks with little effort. This includes using bots to complete tasks automatically and virtual private servers (VPS) to sidestep IP address requirements (e.g., the survey prevents multiple submissions from the same address or requires that the address comes from a specific geographic region). Numerous papers document the problem with these platforms and provide solutions, so here are a few. One set of solutions plants questions that only a human responder who is paying close attention could complete correctly. Examples include questions directing the respondent to select a specific response option (e.g., “Select the neither agree or disagree option in order to proceed”) and asking confirmation of statements that could not possibly be true of anyone (e.g., “I have conducted business with the country of Latveria”). Another set embeds technology within the survey to aid in detection, such as protocols to detect a VPS.”

Ultimately, bots serve as threats to the study sample and data quality. When recruiting for research participants online and one notices a sudden spike in recruitment numbers, make sure to thoroughly check for any inconsistent responses, improbable responses, and unusual comments. Including a reCAPTCHA at the beginning of the recruitment survey or common sense questions are also recommended techniques to deter and identify bots. Before posting recruitment materials on an online platform, make sure to conduct a quick search for recent bot activity.

It is critical to detect and prevent bots from infiltrating participant responses. Bots harm research design and methodology by creating inaccuracies and skewing data. Utilizing the methods listed above could help prevent unreliable and invalid research findings and further your efforts to generate meaningful data.

Sisi Peng

CSS Fellow

Neeku Salehi

CSS Intern

Previous
Previous

Hannah, 16

Next
Next

Making the Science of Adolescent Development Part of Your Story