A Closer Exploration of Residential Proxies and CAPTCHA-Breaking Services
By Dr. Fyodor Yarochkin, Philippe Lin, Bakuei Matsukawa, Ryan Flores, and Chen-yu Dai
Chen-yu Dai leads the threat intelligence team at the global CSIRT of an internet conglomerate with versatile e-commerce, fintech, and telecom businesses.
In our previous releases, “Abusing Web Services Using Automated CAPTCHA-Breaking Services and Residential Proxies” and “Agents of Abuse: Residential Proxies and CAPTCHA-Solving Services,” we covered what proxyware and CAPTCHA-breaking services are and how malicious actors use these services to enable bots, scrapers, and stuffers. This entry focuses on the details of our technical findings and analyses of select residential proxies and CAPTCHA-solving services. Included in this report are relevant indicators of compromise (IOCs) and security recommendations on how organizations can detect and thwart malicious traffic.
This post is divided into two parts: The first part will tackle our findings and observations of select residential proxies, while the second part will discuss our observations on some CAPTCHA-defeating services. Our investigation took place between the months of January and September 2022.
By sharing these observations, we intend to provide website owners, administrators, and antifraud and security personnel additional perspectives that can help them validate or correlate their own observations and experiences.
Residential proxies
Residential proxies are a relatively new service that only gained popularity over the past few years. Some academic papers and blog entries have tackled residential proxies’ ecosystems, geologic distributions, and the malicious activities they enable:
- “Resident Evil: Understanding Residential IP Proxy as a Dark Service” by Mi et al
- “Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet” by Choi et al
- “Purpose Built Criminal Proxy Services and the Malicious Activity They Enable” by DomainTools Research
This report, which is part two of a two-part series, aims to add more insights to these comprehensive reports by investigating more residential proxy providers.
The difference between residential proxy providers and proxyware
Before we proceed with our findings, it’s important to first differentiate between these two terms:
Residential proxy providers provide paying customers access to residential IP addresses, usually for the nominal purposes of web localization testing, advertisement survey, marketing survey, research, and anonymity. According to Mi et al’s 2019 report, five of the leading residential proxy providers are Luminati (now called Bright Data), Proxies Online, Geosurf, IAPS Security and ProxyRack. One of their sources for residential IP is proxyware.
Proxyware is the software running on the exit nodes, either voluntarily installed by users for some “passive income” or involuntarily by malicious drive-by downloads. We examined the marketing campaigns for “passive income” software that includes proxyware functionality and identified the top seven most marketed proxyware: Honeygain, IPRoyal Pawns, PacketStream, EarnApp (a Luminati app), Peer2Profit, Income by Spider, and Traffmonetizer. Proxyware can also be bundled with shady shareware or repacked software.
Provider | Price (USD) | Notes |
SmartProxy | $12.5/GB | Exit nodes purchased from various proxyware. Also backed by AdwareSDK |
DoveIP | $12/3GB | Backed by Guerilla Android malware botnet |
IPRoyal Pawns | $4/GB | Backed by Pawns.app |
PIA S5 Proxy | $36 for 150 IPs | The successor of 911[.]re |
GeoNode | $14/GB | |
GeoSurf | $300/month | |
Shifter[.]io | $149.99/month | |
Oxylab | $300/25GB/month | |
Infatica | $96/8GB/month | Backed by Infatica peer-to-business (P2B) and software development kit (SDK) |
Table 1. Service rates of popular residential proxy and proxyware service providers in 2022
Other residential proxy and proxyware services available in the market are ASocks, AstroProxy, Bright Data (formerly Luminati), Dichvusocks[.]us, Froxy, HydraProxy, IPBurger, Leaf Proxies, NetNut, Proxy6[.]net, Proxy Bonanza, Proxy Cheap, ProxyEmpire, ProxyGuys, Proxy LTE, Proxyrack, Proxy-Seller, Rayobyte (formerly BlazingSEO), SOAX, Storm Proxies, The Social Proxy, and Webshare.
Limited by time, resources, and ease of payment options, we have chosen to focus on three residential proxy services for this report, namely, SmartProxy, IPRoyal Pawns, and Infatica. A simple comparison is summarized in Table 2.
Service characteristic/Provider | SmartProxy | IPRoyal | Infatica |
Distinct IP ratio (Worldwide) | 83.43% | 67.64% | ~85% |
Infrastructure | Sourced from proxyware | It is very likely that they rely on their own Pawns.app infrastructure with or without additional exit nodes. | Own infrastructure |
KYC | No | No | Yes |
Super exit nodes | No | Few | Unknown/unverified |
Consecutive class C networks | Yes | Yes | Unknown/unverified |
Rotating IP | Always | Always | Configurable |
Table 2. A summary of residential proxy services we’ve focused on in this report
Definition of terms:
- Infrastructure denotes how a residential proxy provider obtains the exit nodes in someone else’s home. SmartProxy does not have its own infrastructure and purchases exit nodes from other exit-node providers. We believe that Infatica has its own infrastructure.
- KYC (know your customer) pertains to whether the proxy provider verifies its users’ identities by asking for a government-issued ID to validate users’ legal names or otherwise.
- Super exit node is an exit node that holds unproportionally large amounts of traffic.
- Consecutive class C means one or several consecutive chunks of class C networks (a set of 256 IP networks) are observed from the proxy traffic.
- Rotating IP means a user’s exit node is changed per request (via HTTP GET, POST, or others). Infatica lets user configure a “sticky IP,” wherein a session can last for five minutes to an hour, and the traffic comes from the same exit node within the configured time.
Table 3 shows the three residential proxy providers’ distribution of countries in which exit nodes are located. A detailed discussion of each distribution will be discussed in their respective subsections.
SmartProxy | Turkey, Poland, Spain, France, Romania, Egypt, India, Italy, Ukraine, Russia |
IPRoyal | United States, Nigeria, Taiwan, Brazil, Russia, India, China, Canada, Indonesia, Vietnam |
Infatica | Brazil, China, Russia, Vietnam, Turkey, Ukraine, United States, Taiwan, Poland, Japan |
Table 3. SmartProxy, IPRoyal, and Infatica exit node country locations
We discuss SmartProxy, IPRoyal, and Infatica residential proxy services in detail in succeeding subsections.
Technical analyses of select residential proxy services
SmartProxy
SmartProxy is a reseller of residential IP provided by proxyware and libraries. Their package is quite affordable, given their claims to have over 40 million residential IPs. This service does not have a KYC policy, which means anyone can simply pay and use the network.
Figure 1. SmartProxy service cost
During our experiments, we collected 140,889 entries, 117,551 of which are distinct IPs, which is 83.43% of all entries. There are no prominent super exit nodes. By cross-referencing the exit nodes’ IPs with the Trend Micro™ Smart Protection Network™, it is highly likely that SmartProxy is using the infrastructure provided by the following proxyware services and passive income software: Bright (Luminati), Decacopy, EasyAsVPNgo, Honeygain, IPRoyal, Peer2Profit, RelevantKnowledge, RestMinder, SoftEther3, Taskbar, tuxlerVPN, Urban VPN and Walliant.
We have also observed several consecutive IP chunks from the following exit nodes:
- 89[.]106[.]102.0 to 89[.]106[.]105.0 UNICS Ltd. in Bulgaria (89[.]106[.]101.0 to 89[.]106[.]105.255)
- 45[.]159.140.0 to 45[.]159[.]143.0 AS212635 in the Netherlands
- 212[.]58.102/103/114/119/121.0 Magticom in Georgia (212.58.96.0 to 212[.]58[.]127.255)
We are not certain why there are consecutive IP chunks. It is possible that the IPs are leased to some companies that run proxyware, or the IPs used dedicated exit nodes, or someone in the ISP wanted to make extra money potentially illicitly. The rest of the IPs are dispersed and look very residential. The latter is a possibility that Akamai discussed in a recent blog entry.
According to data gathered from Maxmind GeoIP Databases, SmartProxy exit nodes are distributed in the following countries:
Figure 2. The distribution of countries wherein SmartProxy’s exit nodes are located as observed in our investigation
It should also be noted that we are not certain about how SmartProxy pays proxyware providers and how their businesses are coordinated.
IPRoyal Network
We first learned about IPRoyal by their passive-income proxyware service, Pawns.app (which is possibly why it is also called IPRoyal Pawns). IPRoyal also sells residential, datacenter, and cellular IPs, and even sneaker bots. Based on what we’ve observed, they clearly know their target audience and what they are doing.
Figure 3. The services provided by IPRoyal Network
Initially, we thought IPRoyal had a KYC policy. However, we were able to register as a paying customer with a burner phone number and email account. With a service that costs US$4 per 1GB of traffic at the time of writing, IPRoyal is currently one of the cheapest providers in the market. From the value-added tax (VAT) invoice, we were able to determine that IPRoyal’s limited liability company (LLC) is registered in the UAE.
Figure 4. An invoice that shows that IPRoyal is registered in the UAE
Just like most residential proxy providers, they ask customers to build a proxy list with specified regions and countries with which they can reroute the traffic through the exit nodes via SOCK4, SOCK5, HTTP, or HTTPS. It is surprising that a customer can specify a region, a country, and even a city for just US$4.
Figure 5. Users can specify the region, country, and even state when using IPRoyal’s residential proxy service
As shown in Figure 5, after specifying the country (Japan), we were given the option to choose one of several prefectures or just any random IP. The way to use IPRoyal is as simple as calling cURL.
$ curl -v --socks5 user:pass@geo.iproyal.com:port -L https://target-web-host/index.php
* Trying 5.161.134.33:port...
* SOCKS5 connect to IPv4 111.222.33.44:443 (locally resolved)
* SOCKS5 request granted. * Connected to (nil) (5.161.134.33) port port (#0)
During our experiments, we observed a super exit node 180[.]243[.]2.108 in Indonesia, which got 95 hits per 1,000 requests. IPRoyal gave us 19,196 worldwide distinct IPs over 28,381 requests, which comes up to a distinct IP ratio of 67.64%.
The top five countries in which exit nodes are distributed are the US, Nigeria, Taiwan, Brazil, and Russia. However, we observed that huge chunks of IPs in the US and Nigeria are hosted by NL Hosting Solutions and AT&T. The IPs in Taiwan look dispersed enough for us to believe that they are really installed in many machines throughout the country.
Figure 6. The distribution of countries in which IPRoyal’s exit nodes are located, as observed in our investigation
To better understand IPRoyal’s service, we also tested its exit nodes in the Asia-Pacific (APAC) region.
Figure 7. The distribution of countries in the APAC region in which IPRoyal exit nodes are located
It is worth noting that despite China’s Great Firewall and its strict regulations on foreign currency exchanges, there are quite a few IPRoyal exit nodes in the country. There are also several residential proxy providers that have advertised their service in China. Based on the data we’ve gathered, the distinct IP ratio in APAC is at 34.3%. We also tested IPRoyal’s exit nodes that are specifically located in Japan. However, the distinct ratio is decreased to as low as 22.3%, indicating that they are likely to have only hundreds of exit nodes in the country.
IPRoyal uses a large chunk of consecutive IPs in Nigeria and in the US, namely:
- 41[.]71[.]172.0 to 41[.]71[.]187.255 Visafone Communications (Lagos, Nigeria)
- 99[.]93[.]63.129 to 99[.]93[.]63.255 Lauderdale AT&T (US)
- 94[.]103[.]184.0 to 94[.]103[.]184.255 Ashburn Jy-mobile (US)
- 102[.]129[.]176.0 to 102[.]129[.]176.255 Greenwood AT&T (US)
According to data from Maxmind GeoIP Databases, the IP chunks in Nigeria are operated by NL Hosting Solutions, Ltd.
We are curious to know whether IPRoyal is using their own infrastructure (one that is provided by Pawns.app), or if they are also buying IPs from other proxyware services. Unfortunately, without a significant number of endpoints, we cannot draw a meaningful conclusion. According to Trend Micro Smart Protection Network data, among Trend Micro’s customers, only 26 endpoints were running Pawns.app in December 2022. Eight users who run Pawns.app are also running the following proxyware services: Honeygain, PacketStream, Peer2Profit, and Traffmonetizer.
During our almost two months of observation, we have seen the following suspicious or malicious behaviors that IPRoyal Network customers are doing via the service:
- Registering multiple TikTok accounts
- Crawling ThetaScan[.]io, a crypto-ledger, for account balances
- Crawling Brazilians’ Cadastro de Pessoas Físicas (CPF) or the natural persons registry, for personally identifiable information (PII) such as name, gender, and date of birth from a hosted IP address
- Sending SMTP spam messages, especially to AT&T mail servers
- Crawling Steam user profiles, friend lists, assets, and badges
- Access to OVH Cloud, a virtual private server (VPS) hosting provider
- Tentatively buying FIFA tickets
- Tentatively buying tickets from TicketMaster
- Bruteforcing Google API accounts
- Bruteforcing Venmo, an online payment system
- Bruteforcing Standard Bank, a financial institution in South Africa
- Bruteforcing TikTok with Facebook tokens
Figure 8. Bots trying to buy FIFA tickets via an IPRoyal residential IP, but the tickets were sold out
Figure 9. Bots trying to brute force a Venmo account that likely has two-factor authentication (2FA) enabled
These observed behaviors of IPRoyal Network’s customers give us a better idea of what they are doing via the network and allow us to better understand the types of victims their actions will have.
Infatica
We could not find a similar passive-income software from Infatica. However, Infatica has an SDK monetization program for app authors, who could include the Infatica SDK in their Windows, macOS, and Android apps to make money.
Figure 10. Infatica provides a tool that estimates how much app owners can earn annually from the number of monthly active users
As seen in Figure 10, one monthly active user can be liquidated US$0.04 to US$0.06. However, we have found that Infatica SDK is not merely used by the original author of an app. There are malicious actors who repacked freeware and shareware written by other people to conduct drive-by downloads of the Infatica peer-to-business (P2B) service, an app that secretly runs in the background without an icon, in order to make money out of the freeware. We checked a repacked Infatica SDK hash (0c71619bf4d9b2edeaf07936800c51735e49c7baf3d7ba2e3a5583bb7aa20607) on VirusTotal and discovered that it’s flagged as a malicious file by one security vendor.
Infatica has a KYC policy. A paying customer must pass Veriff’s “real human inspection” process by taking a picture of one’s government-issued ID and by taking a selfie. After the verification, a user can purchase a US$1.99 trial plan for 100MB of traffic.
Figure 11. Infatica’s pricing plans
After paying for a trial plan, we received a VAT invoice that states that Infatica is registered as a company in Singapore.The address appears to be associated to a Russian-owned organization and a Russian-Asian Sports Academy. Based on our investigation, Infatica’s bank account is with the Community Federal Savings Bank, which is located in the US. It’s also possible that their banking service is proxied by TransferWise.
Infatica has a similar interface for customers to create proxy lists. Unlike IPRoyal Network, we could not specify countries and cities on the light plan. However, Infatica has provided a unique function that a user can keep using the same exit node for 5, 10, 30, or 60 minutes, for an online activity that requires the user to stay at the same IP address, such as signing up on Facebook. A proxy list in Infatica looks like this:
- username:password@185.130.105.109:10000
- username:password@185.130.105.109:10001
- username:password@185.130.105.109:10002
- Username:password@185.130.105.109:10003
- username:password@185.130.105.109:10004
When connecting to port 10000, the traffic goes out from the same exit node within five minutes (if the rotation time is set to 5). Every HTTPS request takes four to five seconds to finish, because Infatica uses a command-and-control mechanism to pass the traffic. This mechanism might look inefficient at first glance, but it also allows us to simultaneously use up to 1,000 exit nodes, which is something we could not do with SmartProxy or IPRoyal Network.
We have composed two lists for the experiments and obtained 7,722 Asian IPs and 8,306 worldwide IPs. The two IP lists’ distribution per country is quite different from IPRoyal and SmartProxy, indicating that Infatica uses a different infrastructure.
Figure 12. The distribution of countries wherein Infatica exit nodes are located as observed in in our investigation
Figure 13. The distribution of APAC countries wherein Infatica exit nodes are located
According to Trend Micro Smart Protection Network data, there are 5,285 endpoints running Infatica in December 2022.
Among the IP addresses we have obtained during the trial period, we did not observe any consecutive IP chunks.
By using VirusTotal, we were able to observe several unwanted or repacked software that bundle the Infatica SDK, including BurnAware, Curso Mecanet, JewelVPN, iTop VPN, Ninja VPN, and Soft Cleaner. Based on our observation, some of these apps are also victims of malicious actors. For example, the Curso Mecanet downloaded from the author’s original website is clean, but many repacked versions on the internet contain Infatica DLL.
During our one month of observation, we have seen the following suspicious or malicious behaviors that Infatica proxy customers are doing via the service:
- Bruteforcing of Simperium, a cross-site data synchronization service
- Bruteforcing of Bitwarden
- Scraping of house prices
- Scraping of Lazada and Walmart prices
- Creating accounts on Live.com, Instagram, and Mail.RU
An overview of the underground proxy market
Proxy services are in high demand on the underground market. And traditionally, proxy services have been sold and marketed in the same group as bulletproof systems. However, the practical use of proxy services in the underground is different. Figure 14 is a screenshot of comment from a forum that illustrates one of the common proxy service use cases:
Figure 14. A post from xss[.]is forum
Traditionally, proxy services were often used for a variety of click-fraud monetization schemes. However, proxy services are also commonly used by carders to bypass antifraud system checks by matching the ZIP code of the connecting IP address with the US address associated with a credit card.
Advanced crime groups also use proxies to minimize the risk of detection. For example, it is common for some APT threat actors to use proxy services to match source IP with the location of a potential victim when accessing compromised resources, such as accessing corporate email accounts, a tactic first reported in 2021.
Many online services check the country information of connecting IP addresses and may either block or attempt to match the country to other attributes, such as a mobile phone number provided by the user. Proxy services are widely used to bypass such checks. For example, bulk social media platform account sellers would use proxies along with SMS PVA (SMS Phone Verified Accounts) services to match a phone number to the connecting IP address when creating accounts for sale.
Proxyware software could also be used to monetize compromised machines when no other monetization schemes are available by simply installing proxyware clients on compromised systems. It should be noted, though, that this type of deployment wouldn’t yield substantial amounts of profit unless it is done at very large scale.
It’s also interesting to note that many of the proxy service providers, even the ones on Russian-speaking forums, are likely originating from China. There are multiple hints pointing to this, such as posts showing awkward Russian translations, and users having Chinese names written in Chinese pinyin.
Figure 15. A user with a Chinese name sells proxy services using machine-translated Russian
Figure 16. An image of a proxy advertisement with a filename that includes Chinese characters posted in Russian forums
All these indicators point to a relatively large market of proxy services in the Chinese-speaking underground. And this comes as no surprise, as proxy services are highly sought-after by Chinese-speaking internet users who use such services to access sites and applications that are not accessible from China.
Access brokers are also often customers of various proxy services for understandable reasons. Access brokers run large amounts of network scans and credential brute force attacks, which require them to proxy requests using third-party systems so as not to be blocked by targeted networks.
Provisioning of residential exit nodes
So how are residential proxies provisioned? It could be argued that a residential proxy provider rents multiple machine pools to provide enough proxies. However, during our investigation, we observed multiple techniques that providers use to build residential proxy services. It’s also important to keep in mind that some proxy service providers may not even have their own infrastructure in place, and instead, have acquired the infrastructure from other sellers that they would go on to resell.
In our investigation, we’ve found that it’s very common to find proxy services that use compromised systems and networks in the underground market.
Figure 17. DoveIP and bullet-proxy proxy services, a frontend backed by the compromised devices of a threat actor group that we dubbed Lemon Group
We have seen several threat actors advertising proxy services, which were installed on hacked machines. In our SMS PVA paper, we examined the proxy service infrastructure, such as the ones used by doveip[.]com and bullet-proxy[.]com. We’ve found that these services were built on the top of compromised Android devices — the devices’ firmware are pre-loaded with a plugin loader, one of which could simply turn the mobile device into a SOCKS5 proxy node, which communicates with the backend proxy provision system.
The provisioning of residential and mobile proxy services through proxyware is also very common. The proxyware software commonly bundles an SDK or a component that could turn a machine into a proxy network exit node. This behavior is often mentioned in an end-user license agreement (EULA) for certain freeware, such as Walliant, Global Hop SDK, and RestMinder, but unfortunately, not many users of free software read license agreements.
Captcha-solving services
Figure 18 provides an overview of the distribution of CAPTCHA-solving services based on Trend Micro Smart Protection Network data from January to September 2022. We also attempted to make an estimate of the demographics of CAPTCHA-solving services’ workers.
Figure 18. Distribution of CAPTCHA-solving services observed by Trend Micro Smart Protection Network from January to December 2022
Here are the definitions of some important terms:
- CAPTCHA providers – Vendors who generate CAPTCHAs, such as Google reCAPTCHA, GeeText, and FunCaptcha
- CAPTCHA solvers – Vendors who use automated systems and/or hire workers to solve CAPTCHAs, such as 2Captcha
- CAPTCHA workers – Human workers who manually solve CAPTCHAs for money
To conduct the research in an ethical and legal manner, we have set up a proper website with CA-signed (certificate authority-signed) SSL (secure sockets layer) certificates and several login pages with CAPTCHAs that are provided by free or commercial CAPTCHA providers. We have also signed up for several CAPTCHA solver services to log in to our own website by using automated scripts. Figure 20 shows what our IP capture-the-flag (CTF) login page looks like.
Figure 19. Simulated login page used in our experiment
When using a browser, the login procedure is as follows:
- The user browses https://TEST-SITE, types the token, and selects the “I am not a robot” checkbox.
- The user clicks on the “Submit” button and the page is redirected to real_login.php, where the CAPTCHA is validated and logged in our database.
When using a bot, the login procedure is changed to the following:
- The API visits https://TEST-SITE/members.html and extracts parameters, including the site key, nonce, and timestamp.
- The API calls the CAPTCHA-solving service to pass the parameters and receive a token.
- The API polls the CAPTCHA-solving service with the token until the CAPTCHA is solved and a very long HASH string is returned.
- The API submits the very long HASH to to https://TEST-SITE/real_login.php and gets validated.
We have chosen Google reCAPTCHA to conduct our first batch of experiments, which is a very popular and affordable CAPTCHA provider and is therefore targeted by most CAPTCHA solvers.
To obtain more insights on CAPTCHA workers’ demographics, we need to know the workers’ IP addresses. We have only found two CAPTCHA providers that reveal a solver’s IP address, namely FunCaptcha and GeeTest v4. Although FunCaptcha looked very promising, we did not manage to obtain a quotation from Arkose Labs and were unable to test it. GeeTest provided a trial license, and their customer service was helpful.
Since most of the CAPTCHA services ask potential customers to contact their sales teams for price information, the prices listed in Table 4 are merely for reference and may be different from quotations made to other customers:
CAPTCHA service | Price |
reCAPTCHA | First 1 million CAPTCHAs are free, US$2.00 for every 1,000 CAPTCHAs after first 1 million |
hCAPTCHA | US$0.99 for every 1,000 CAPTCHAs Pro plan: US$99 per month for every 100,000 CAPTCHAs |
GeeTest v3/v4 | US$4.16 for every 1,000 CAPTCHAs Basic plan: US$6,000 per month for every 144,000 CAPTCHAs |
Table 4. CAPTCHA services and price quotations from the services’ sales teams
CAPTCHA solvers, on the opposite side, are using algorithms and human workers to solve CAPTCHAs. All of the CAPTCHA-solving services that we have tested accept a long list of third-party payment methods. However, none of these services accepted our US-based prepaid debit card. Some CAPTCHA solvers have a KYC policy, which, ethically speaking, prevented us from enrolling with the services. We have thus chosen to focus on three popular solvers, the first three services in Table 5, that we discuss in detail in the next section.
Table 5. Popular CAPTCHA solvers, the first three of which are the focus of our study
CAPTCHA-solving services: How to identify the netflow
To have a clear understanding of how shady users use APIs to solve CAPTCHAs, we have signed up with several CAPTCHA-solving services. Some of them accept a wide range of payment methods, while some, such as Death By Captcha, only has Dutch and Ukrainian payment gateways and does not accept US customers at all. All CAPTCHA-solving services accept cryptocurrencies to some extent. Some take most main cryptocurrencies, such as ETH, BTC, XMR, USDT, BNB, while some only accept USDT with a minimum equivalent of US$10 with a high commission. Overall, there is one thing in common with all the CAPTCHA-solving services we’ve checked: they don’t take US anonymous prepaid cards, unless the cards are linked to certified PayPal accounts. The prices in Table 6 are merely provided as a reference.
CAPTCHA-solving service | Per 1,000 normal CAPTCHAs (USD) | Per 1,000 reCAPTCHAs (USD) |
2Captcha | $1.00 | $2.99 |
AZCaptcha | $0.40 | $1.00 |
Anti-Captcha | $0.50 | $1.89 |
Death By Captcha | $1.39 | $2.89 |
Table 6. CAPTCHA-solving services’ prices for 1,000 normal CAPTCHAs and reCAPTCHAs
In this section, we would like to provide some details about and propose ways to identify a CAPTCHA-solving service user, so that the website owners, administrators, and security personnel can block related traffic.
2Captcha
2Captcha is a very popular CAPTCHA-solving service. The only known IP, according to 2Captcha’s API documentation, is 138.201.188.166. However, we have never observed this IP in our tests. 2Captcha has two API entry points, namely:
- hxxp://2captcha.com/in.php, which is used to submit a CAPTCHA
- hxxp://2captcha.com/res.php, which is used to get the CAPTCHA solution
The API documentation is well-written, and the patterns match what we’ve observed with that of Trend Micro’s telemetric data. A step-by-step example of solving Google reCAPTCHA v2 is as follows:
- Call the 2Captcha API at hxxp://2captcha.com/in.php?key=APIKEY&method=userrecaptcha&googlekey=DATA-SITEKEY&pageurl=SITE-URL
- It returns a job ID, OK|12345678901t
- The API caller polls the ID every three to five seconds until it is solved or timed out at the URL, hxxp://2captcha.com/res.php?key=APIKEY&action=get&id=12345678901
If the CAPTCHA is solved, a very long string is returned and can be POST-ed to the login form. We have observed two interesting phenomena in the experiment.
First, 2Captcha caches the page at SITE-URL after a couple of API calls from Amazon Web Services (AWS) IP segments. The user-agent is fixed during a certain period (which is not always identical to the user-agent shown below, as they update the user-agent string from time to time) across all websites that 2Captcha caches. For example, we got:
34.221.5.223 - - 0.000 - [22/Nov/2022:15:00:12 +0000] "GET /members.html HTTP/1.1" 200 988 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" "-"
18.236.196.233 - - 0.000 - [22/Nov/2022:15:00:12 +0000] "GET /members.html HTTP/1.1" 200 988 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" "-"
54.162.120.217 - - 0.000 - [22/Nov/2022:15:00:15 +0000] "GET /members.html HTTP/1.1" 200 988 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" "-"
After the first day of using 2Cpatcha, we’ve observed that such caching behavior took place on random days and not every day.
Second, the POST-after-polling causes some significant delays, which we discovered after comparing the timestamp provided by Google when the CAPTCHA was solved. A very consistent behavior is observed, either by using our own code or the Python-binding library provided by 2Captcha, namely:
- Initial access (t = 0)
- members.html not accessed
- Captcha solved according to Google: (t + 23s)
- POST at real_login (t + 51s to t + 1m16s)
The CAPTCHA’s solved time can vary with high deviation because some workers are faster while others are slower. There is a significant delay between the solved time and the polling API (res.php) returning “OK,” which is always around one minute. We suspect that the behavior is relevant to 2Captcha’s working pipeline. It’s possible that an immediate return might cause some sort of congestion. The time difference between the solved timestamp provided by Google and the POST to the next page indicates that the visitor is not a human. Moreover, if the automation script is not well-written and submits directly to the POST-ed page without visiting members.html, we can assume that the behavior is likely generated by a bot.
When we tested 2Captcha against GeeTest v3 and v4, the delay disappeared. GeeTest v3 requires an automated script to fetch a challenge token before the CAPTCHA can be solved. Therefore, it is not easy to tell a human from an automation script. With regards to GeeTest v4, we received a couple of ERROR_CAPTCHA_UNSOLVABLE messages during the experiments. This indicates that 2Captcha workers do not always want to make money by solving a GeeTest v4 for some unknown reason, as we don’t think it takes more time to solve GeeTest v4 than reCAPTCHA v2.
2Captcha claims that it typically has thousands of workers online each day. For undisclosed reasons, the service, allegedly, also has thousands of banned CAPTCHA workers.
Figure 20. Real-time statistics provided by 2Captcha
There are abundant third-party scripts on GitHub that work for both 2Captcha and RUCaptcha, indicating that they share similar backgrounds and are compatible at the API level.
AZCaptcha
Compared to 2Captcha, AZCaptcha offers more affordable CAPTCHA-solving service fees, with a minimum top-up price of US$10. During our investigation, we were fortunate enough to have availed of AZCaptcha’s US$0.02 promo for all trial accounts. It should be noted that it’s sort of ironic how all CAPTCHA-solving services we’ve studied, including AZCaptcha, ask us to solve a CAPTCHA when logging in to their services. AZCaptcha uses Google reCAPTCHA v2.
Figure 21. Ironically, AZCaptcha asks us to solve a CAPTCHA before we can log in to their service
Even though AZCaptcha’s real-time statistics appear to feature a modest success rate of 76.26%, based on what we’ve observed, it’s possible that the service’s actual success rate is even lower, clocking in at around 50%.
Figure 22. Real-time statistics provided by AZCaptcha
According to their API document, it is possible to add an image or text instruction that CAPTCHA workers can follow to solve a CAPTCHA. We have tested this feature by adding an external image and a “click here for instruction” test in hopes of getting more insights on their workers. However, by the end of the experiment, we did not get any clicks on the image we’ve attached, which is either because nobody bothered to read the instruction or that the instruction was not shown on a worker’s screen.
AZCaptcha’s API documentation is almost identical to that of 2Captcha’s. For example, the DATA-SITEKEY and the screenshots for Google reCAPTCHA are the same. AZCaptcha’s API calls are also compatible with that of 2Captcha’s. AZCaptcha also has the mysterious caching mechanism from AWS IP blocks, but it did not take place until after the API was called three times. Both services also have the same user-agent.
54.185.164.103 - - 0.000 - [25/Nov/2022:14:00:06 +0000] "GET /members.html HTTP/1.1" 200 988 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" "-" 34.220.239.237 - - 0.000 - [25/Nov/2022:14:00:06 +0000] "GET /members.html HTTP/1.1" 200 988 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" "-" 34.210.236.209 - - 0.000 - [25/Nov/2022:14:16:51 +0000] "GET /members.html HTTP/1.1" 200 988 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" "-"
AZCaptcha’s Google reCAPTCHA v2 solving behavior is similar to 2Captcha’s, just slower:
- Initial access (t = 0)
- members.html not accessed
- Captcha solved according to Google: (t + 6s, t+ 10s, etc.)
- POST at real_login (t + 1m15s to t + 2m13s)
Overall, the API calls, the business model, and the delays are highly similar to a slower and lower quality of 2Captcha. The netflow and the detection methods are also alike.
Anti-Captcha
Anti-Captcha is also a popular CAPTCHA-solving service. It is infamous for this “We will remove bad workers” image.
Figure 23. Anti-captcha's homepage that features a message about eliminating “cheaters” or CAPTCHA workers who are not working hard or producing accurate results
The minimum top-up credit differs per payment gateway. OCR Data Solutions asks for a minimum of US$5, while Secured by 3-D, a bank-imposed security measure, is mandatory. PayPro Global asks for a minimum of US$10 and does not accept prepaid debit cards. The payment gateways support various payment methods.
Anti-Captcha's API documentation is more comprehensive than those of 2Captcha and AZCaptcha. Their documentation informs API users whether a CAPTCHA service discloses the solver’s IP addresses and suggest whether they should use ProxyOn (via a proxy) or ProxyOff (direct access from API users and CAPTCHA workers). For example, “Use this type of task [ProxyOn] to solve ReCAPTCHAs in Google services. In all other cases, use RecaptchaV2TaskProxyless to solve ReCAPTCHA in proxy-off mode. Google's API does not disclose the solver's IP address to website owners.” In our investigation, we have tested the ProxyOn option. However, there were no incoming connections to the proxy server we were monitoring, regardless of whichever protocol we chose among SOCKS5, HTTP, or HTTPS.
Anti-Captcha is, in general, much faster than 2Captcha and AZCaptcha. We did not observe the one-minute delay when solving reCAPTCHAs. In fact, it was so fast, it seemed as if they were running it on a very clean IP, and Google did not ask for a CAPTCHA at all. When solving GeeTest v4, we have seen an interesting error message:
{'errorCode': 'ERROR_CAPTCHA_UNSOLVABLE',
'errorDescription': 'Captcha could not be solved by 5 different workers',
'errorId': 12}
We were not sure whether it was really sent to five different workers or there was simply no one who wanted to solve it. Since the trial version of GeeTest v4 only asks users to slide a piece of puzzle to the right position, the CAPTCHA is not unsolvable.
Based on our observation, we were not able to see a way to distinguish Anti-Captcha from a human user if the automation script is properly written.
Death By Captcha
We were not able to test Death By Captcha because their payment gateways, InterKassa and Payeers, do not accept US cards. It appears that Death By Captcha’s servers are located at UTC-4, but we are not totally certain. The complete services fees are for registered users only, as indicated on Death By Captcha’s homepage.
Figure 24. Service fees on Death By Captcha’s homepage
Insights on CAPTCHA Workers
Dan Woods of F5 Labs’ interesting and comprehensive feature, “I Was a Human CAPTCHA Solver,” which boosted our curiosity regarding CAPTCHA workers and where they are located. Thanks to GeeTest v4, which discloses a CAPTCHA solver’s IP address, we can take a closer look at their geolocations.
Given the small amount of money that a CAPTCHA worker can earn (US$2 for an 11-hour work day, according to Dan Woods), it is surprising to see some of them using a paid VPN service. There are VPN-using CAPTCHA workers in both 2Captcha and Anti-Captcha. Based on our observation, 7% of 2Captcha workers use a VPN, while for Anti-Captcha workers, the percentage is as high as 31%.
2Captcha | Anti-Captcha | |
Tested CAPTCHA (n) | 1,032 | 1,000 |
No. of distinct IPs | 445 | 270 |
VPN users | 30 (7%) | 84 (31%) |
TOR (The Onion Router) nodes | 0 | 0 |
Datacenter IPs | 35 (8%) | 102 (38%) |
Residential/cellular IPs | 380 | 84 |
Table 7. VPN-using CAPTCHA workers in 2Captcha and Anti-Captcha
There are only two overlapping IPs between 2Captcha and Anti-Captcha workers, which makes sense, because a worker can hardly work for both solvers at the same time. We could draw the distribution of their originating countries (Each VPN is counted by the country in which exit nodes are located).
Figure 25. Countries where the most CAPTCHA workers are located and the corresponding numbers of solved CAPTCHAs
Figure 26. Countries where the most CAPTCHA workers are located and the corresponding distinct IPs
We can see that both services have quite a few workers with Bengali and Venezuelan IPs. 2Cpatcha has more Indonesian, Indian, Filipino, and Vietnamese workers than Anti-Captcha, while the latter has more workers in the US and Singapore, which are highly likely VPN exit nodes.
Each CAPTCHA-solving service uses different VPNs. 2Captcha workers use IPVanish, Mullvad, ProtonVPN, VPN Gate, and NordVPN, while Anti-Captcha workers use TunnelBear, Digital Ocean (hosting), OVH (hosting), M247 (used by PIA and ProtonVPN), VPN Unlimited, and Surfshark.
Conclusion
Residential proxies and CAPTCHA-defeating services are two emerging services that can allow abuse and criminal behavior to continue by circumventing the IP filtering and CAPTCHA protection mechanisms that are commonly used by websites. These services are often seen to be built using questionable means. Residential proxies can be built on top of misleading profit sharing schemes and piggyback on repackaged apps or pre-infected Android phones. CAPTCHA-defeating services are powered by human CAPTCHA solvers. These human solvers often come from low income countries where there’re a lot of people willing to do the work for relatively small pay, even if it means that they are helping violate a website’s terms of use and service. Interestingly, we also have seen some human solvers boost their CAPTCHA-solving rates and monetize on anti-CAPTCHA services by employing rapidly-evolving AI technologies.
We hope that by sharing these technical findings, website and platform operators can use the fingerprints and clues we've indicated to help them sift through their site traffic and filter out valid transactions from abuse- or crime-related ones. With this report, we hope that people would become aware and vigilant, preventing them from participating in jobs or programs that are designed to circumvent mechanisms designed to ensure that our online interactions and activities only come from authentic human traffic.
Like it? Add this infographic to your site:
1. Click on the box below. 2. Press Ctrl+A to select all. 3. Press Ctrl+C to copy. 4. Paste the code into your page (Ctrl+V).
Image will appear the same size as you see above.