We’ve already written about CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), the mechanism used to protect web sites, forums, and mailing systems against the automatic creation of accounts and contents. As my colleague Tad Heppner wrote in his November 2007 post, most common CAPTCHA systems work by generating distorted characters, text, or pictures that can be easily recognized by the human brain but present significant difficulty for computer-based optical character recognition or other image-recognition systems.

It should come as no surprise, however, that spammers continue to try to crack CAPTCHA. We’ve now seen a new version of a professional spammer tool on the web. XRumer 5 sells for $520 and promises advanced CAPTCHA decoding methods.

For a long time spammers have searched to defeat CAPTCHA mechanisms to create fake email accounts to send spam. Before telling you more about this new crooked utility, let’s review some older techniques used by spammers.

As shown in the following image (source XMCO), the most common CAPTCHA methods can be broken.

The first method of cracking is manual. People from developing countries offer services. The competition is intense. On some dedicated forums, proposals surge in from Vietnam or Bangladesh. They claim that lots of people are ready to work 24 hours a day to process hundred of thousands of CAPTCHA. Rates vary from $8 to $1 per 1,000 CAPTCHA.

A less expensive solution consists in using private individuals to do the work free of charge. I am sure some readers remember this unusual offer, in which it was possible to undress “Melissa” in exchange for some CAPTCHA work. This allowed a spammer to create fake Yahoo Mail accounts.

It is also possible to find free web services. The CAPTCHA Killer web site offers such services. Its designer claims the offer “is 100% focused on increasing accessibility on the Internet” for the “1 Million Americans that suffer from blindness.” The service makes available an API to automate the process. However, I was not surprised to read a cross-reference on that site saying they have been notified that using CAPTCHA Killer with Myspace was against the latter’s Terms of Service.

A very technical approach uses rainbow tables, in which each CAPTCHA image is associated with its character string. In March 2008, someone nicknamed Maluc created PHP scripts to download, extract, and save thousands CAPTCHA images from Yahoo, Google, and Hotmail. When finished, each collection will help spammers create new recognition tables or verify the accuracy of its OCR algorithm. When successful, only one millisecond is needed to compare a new footprint with the ones included in the database. You have to pay between $1,500 and $5,000 for such algorithms, which suppress the noise, create a black-and-white picture, break it into segments (one letter per segment), and identify the character.

A programmer called Wangrun in the Chinese province of Anhui says he developed software to decode CAPTCHA systems. Depending on the complexity of the CAPTCHA image, he charges between $500 and $6,000 per decoder. No price is quoted for the most difficult images but, in a comment, he writes it is feasible. Wangrun declines to say what his customers use the decoders for, but says he has “very many” of them.

Spammers can also use zombie machines to help them crack CAPTCHA. We’ve read on the Virus Bulletin web site that compromised systems making up a large botnet were recently used to help in the registration process for Windows Live Mail accounts. When the bot (detected by VirusScan as Generix.dx) asked for registration, it received a CAPTCHA and immediately presented its image to a central server that attempted to decode it and returned the result. The decipher technique was successful only around 35 percent of the time, VB said, but a new idea was launched. The fact that large numbers of infected systems were running repeated attempts suggests a high number of new accounts for spamming were created at that time.

Finally, turnkey tools are another method for defeating CAPTCHA defenses. XRumer 5 is one of them. It can flood message and links forums, guestbooks, blogs, wikis, etc. It automatically finds and fills in required fields with no need of a browser. If the forum requires registration, the program will register, log in, and post the spammer text. XRumer goes beyond JavaScript protection, pictocode protection (typing a number displayed in a box), and protection by e-mail activation. If a CAPTCHA image is detected, the program automatically downloads it, analyzes it, and fills in the form.

Version 5 can work on most recent versions of popular engines such as VBulletin, IPB, and phpBB, according to its creator. XRumer can also create accounts on gmail.com for posting. And its clients seem happy. One of them wrote last week on a forum “all that for only $500? It’s very cheap! I’d easily charge 2k for that. Solving gmail captcha is no joke. I paid 4k just for that from an OCR developer. …”

XRumer is also able to solve the “pick the cat captchas” presented in picture below.

On October 3, XRumer’s maker explained he analyzed many forums and discovered that most of this type of CAPTCHA used identical pictures. Thus XRumer can distinguish them by their sizes in bytes. And it concludes: “It’s so easy, isn’t it? Oh, they can make some distortion on images? Well, we have a time to improve our algorithm. We analyze forums, blogs, guestbooks permanently, and there is one important thing: that type of captchas used not more than 0,01% of resources (1 of 10,000 sites).”

Once again, we are reminded that malware design is a business. And once again, my searches drive me to Russia, where criminals create and employ malicious software as well as engage in identity theft and virtual prostitution. The company or individual behind XRumer appears to be the same as that which proposed an automated sex-talk service called CyberLover.ru in 2007. One name I got from a whois request today is Alexander Ryabchenko. When the media pointed the finger at him in 2007, Ryabchenko emailed to Reuters that he could not be accused of identity theft with the CyberLover concept. He explained “the program can find no more information than the user is prepared to provide.”

If anyone should ask Ryabchenko why he commercializes XRumer, I suggest he repeat the CAPTCHA Killer web site argument: to help the million people suffering from blindness.