Linux.com

Feature: Open Source

CAPTCHA your blog comments with FOSS utilities

By Donald W. McArthur on August 23, 2006 (8:00:00 AM)

Share    Print    Comments   

Bloggers hate automated comment spammers. One way to foil these vermin is a system called CAPTCHA, which is described in Wikipedia as "an acronym for 'completely automated public Turing test to tell computers and humans apart'" -- in other words, a challenge-response system designed to determine whether a site visitor is human or a bot. Here's how I implemented a CAPTCHA system for blog comments using free and open source (FOSS) command-line utilities and PHP, the Web server scripting language.

I've implemented this system under two Linux distributions -- Ubuntu Dapper Drake and CentOS 4 -- both running the current versions of the Apache Web server and PHP. The CLI utilities are from GNU Enscript and ImageMagick.

Ubuntu required that I download, compile, and install Enscript. I made no changes to the default configuration settings, and the install was without incident. ImageMagick is in the Dapper Drake repositories, so the command sudo apt-get install imagemagick got that installed quickly.

On the CentOS box, neither utility was installed by default, nor available via yum update, so I had to install both applications manually. Again, I made no changes to the default configuration settings, and the installation went without incident.

The blog comments script

My plan for the PHP script for the blog comments page was as follows:

  • Design the overall system such that session state could be ignored. I wanted to avoid the complexity of keeping track on the server of which image had been sent to which blog commenter.
  • Generate a random six-character string
  • Create a PostScript file using the string
  • Create a .png image file from the PostScript file
  • Name the image file using a limited pool of filenames
  • Encrypt the original string
  • Return the image file and encrypted string to the user

This is the relevant portion of the PHP code I used to accomplish those goals:

I create an array of characters that comprise the pool from which to randomly select six. For image clarity, I avoid the numerals one and zero, and the upper and lower case letters "L" and "O":

$arr_chars = array ('a','b','c','d','e','f','g','h','j','k','m','n','p','r','s','t','u',
'v','w','x','y','z','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','J','K','M',
'N','P','Q','R','S','T','U','V','W','X','Y','Z');
(all on one line)

The function array_rand() takes two arguments: the name of an array, and the number of randomly chosen keys you wish returned. The result is stored in another array:

$arr_rand_keys = array_rand ($arr_chars, 6);

I concatenate the six characters into a string:

$captcha_cleartext = $arr_chars[$arr_rand_keys[0]] . $arr_chars[$arr_rand_keys[1]] . $arr_chars[$arr_rand_keys[2]] . $arr_chars[$arr_rand_keys[3]] . $arr_chars[$arr_rand_keys[4]] . $arr_chars[$arr_rand_keys[5]];
(all on one line)

I encrypt the string for use as a hidden input field. The crypt() function takes two arguments: the cleartext string, and a "salt" used to "seed" the encryption process. If a salt is not provided, the system will provide a random one, which will foil our efforts. Substitute something reasonably complex for 'salt_value'. You will need this value again to compare the user's input in the PHP script that handles this form's submission. It is essential that the 'salt_value' be the same in both PHP scripts.

$captcha_encrypted = crypt ($captcha_cleartext, 'salt_value');

In order to avoid an automated attack that repeatedly executes this script and fills the hard drive with CAPTCHA images, I limit the pool of possible image filenames to a reasonable number, then re-use them. I start with a text file named captcha_filenum that contains the single numeral zero in it, and read the number from the file and store it in another array:

$arr_filenum = file ('/var/www/html/captcha_filenum');

Then I concatenate a CAPTCHA image filename using the number:

$captcha_filename = 'captcha_' . $arr_filenum[0] . '.png';

I limit the pool of filenames to 100. I increment the filenum value unless it is greater than 98:

if ($arr_filenum[0] > 98) {
$new_value = 0;
} else {
$new_value = $arr_filenum[0] + 1;
}

Next, I write the new filenum value back to the file captcha_filenum. The Apache Web server runs as the system account apache, which must have write permissions for the directory.

$fh = fopen ('/var/www/html/captcha_filenum', 'w');
fwrite ($fh, $new_value);
fclose ($fh);

I want to put the CAPTCHA images in a directory I can exclude from the backup process:

$path = "/var/www/html/captchas/";

Now I concatenate the path and filename:

$full_path = $path . $captcha_filename;

I use the command-line utilities enscript and convert (which is supplied by ImageMagick) to first turn the randomly generated six-character string into a PostScript file, and then into an image file. The Linux pipe command can run the output of one command into the input of the next. I use the font value Courier-BoldOblique20, but you can use any font, as long as it exists in your file /usr/local/share/enscript/afm/font.map. Since Apache doesn't know where the command-line utilities are located, I provide full paths to them:

$command = "echo '$captcha_cleartext' | /usr/local/bin/enscript -o - -B -f 'Courier-BoldOblique20' | /usr/local/bin/convert -trim +repage - $full_path";
(all on one line)

Now execute the command:

exec ("$command");

After I display the blog comment form elements (text boxes and textarea boxes) I display the CAPTCHA image. I also include, as a hidden field, the encrypted string representing the original randomly chosen six-character string. This encrypted string will be returned to the server with the user's CAPTCHA submission. The user can do no harm by having the encrypted string, and by using this technique I don't have to keep track of the session state.

print "Prove you're not a bot. Enter this: <img src=\"/captchas/$captcha_filename\" /> here: <input type=\"text\" name=\"captcha_test\" size=\"10\" />;"
(all on one line)

print "<input type=\"hidden\" name=\"captcha_encrypted\" value=\"$captcha_encrypted\" />";
(all on one line)

The CAPTCHA image file and text entry box will be displayed like this:

Text box

And the hidden field entry will look like this:

<input type="hidden" name="captcha_encrypted" value="tojt1Xx62dqSA" />

The form handling script

Now that we've displayed a CAPTCHA image and asked the user for input we have to handle the submitted data in another PHP script. The second script will:

  • Encrypt the user's entry using the same "salt" used to encrypt the original randomly generated string
  • Compare that encrypted string with the encrypted string returned in the hidden field
  • If the two don't match, reject the comment submission

This is the relevant portion of the PHP code I use to accomplish those goals:

I trim the user's submission to remove leading and trailing whitespace:

$comment_test = trim ($_POST['captcha_test']);

Then I gather the encrypted string from the hidden field.

$returned_encrypt = $_POST['captcha_encrypted'];

There is no decrypt() function -- crypt() is a one-way process. I use crypt() and the 'salt_value' to encrypt the user's submission. I can then compare the result with the encrypted CAPTCHA string returned from the hidden field in the comments page. If the user accurately entered what was displayed on the CAPTCHA image, the two should be equivalent.

$test_encrypt = crypt ($comment_test, 'salt_value');

I don't do anything else unless the captcha has been entered correctly:

if ($returned_encrypt == $test_encrypt) {
// Add the comment to the database.
} else {
// Display a rejection notice.
}

That's all there is to it. The CLI utilities enscript and convert create image files on the fly, and the PHP crypt() function allows us to safely test blog comment submissions for human origin.

Update -- Someone just visited my site and defeated my CAPTCHA by using a script to resend input cleartext and encrypted values. The exploit involved repeatedly submitting comments using the same encrypted and cleartext versions of the CAPTCHA.

To solve the problem, I created a new database table to store the CAPTCHA as it is issued, mark it as "used" when it is returned with a comment, and accept no more comments utilizing that CAPTCHA. Also, having been issued, the CAPTCHA is no longer available for issuance for an arbitrary time period.

Whew. That'll take you down a peg.

Share    Print    Comments   

Comments

on CAPTCHA your blog comments with FOSS utilities

Note: Comments are owned by the poster. We are not responsible for their content.

Disability act

Posted by: Anonymous Coward on August 24, 2006 08:23 PM
.. which illegally bars poorly sighted people from posting comments. Excellent.

#

Other options?

Posted by: alandd on August 25, 2006 12:17 AM
Sincerely, I want to know.

On the one hand, as a person without vision problems, I find captchas annoying anyway. I can image that those with vision problems find they are a complete blockade to participation.

On the other hand unfettered spam in website comments destroys the value of the site and comments completely.

So, what alternatives are there to captcha that allow the sight impared to participate while still hindering spam-bots?

 

#

Re:Other options?

Posted by: Anonymous Coward on August 28, 2006 09:07 PM
Here is an excellent article about acccessibility for captcha facilities :
<a href="http://www.standards-schmandards.com/index.php?2005/01/01/11-captcha" title="standards-...ndards.com">http://www.standards-schmandards.com/index.php?20<nobr>0<wbr></nobr> 5/01/01/11-captcha</a standards-...ndards.com>

#

Well, that would be good...

Posted by: Anonymous Coward on August 24, 2006 01:55 AM
... if it actually worked, and wasn't quite so easy to circumvent.

<a href="http://www.mcarthurweb.com/archive.php?item=210" title="mcarthurweb.com">At least, he's noticed..</a mcarthurweb.com>

#

Great article!

Posted by: Anonymous Coward on August 24, 2006 02:04 AM
Most CMSes already have such functionality built-in but is great to see something like this in action, nonetheless. It shows that the author was smart and used simple logic plus some nifty utilities of the operating system to create a way to validate input on his/her website.

And as long as salt_value remains unknown to outsiders, he really doesn´t have to track session variables or worry about people figuring out the encryption hash used. Brilliant!

#

Re:Great article!

Posted by: Anonymous Coward on August 24, 2006 02:06 AM
Actually, one of the reasons it doesn't work is that the user can make up their own challenge and salt...

#

Great...

Posted by: Anonymous Coward on August 24, 2006 06:20 AM
...but how will you allow blind users like myself who use accessibility solutions to post comments on your blog?

#

Re:Great...

Posted by: Anonymous Coward on August 24, 2006 04:17 PM
I agree (even though I am not blind). I don't use captchas on my site (wordpress-based), but manage to capture all spams in my moderation queue without too much hassle.

The best, accessibility-aware, solution I have seen is to include plain-text maths questions (ie. What is 8+4? ). This would still be scriptable, but not nearly as simple to do.

mrben <a href="http://www.jedimoose.org/" title="jedimoose.org">http://www.jedimoose.org/</a jedimoose.org>

#

Irony?

Posted by: Anonymous Coward on August 24, 2006 07:17 AM
Okay, is it ironic that I can post this comment without a captcha?

By the way, your member is too small, your long lost ancestor has died and left you millions, and you need to invest in the best new stock!

#

Updates

Posted by: Anonymous Coward on August 24, 2006 11:44 AM
After adding a server-side screening for re-use of issued Captcha strings, another reader ran the images through a software ocr program and declared them easy to read. So I made some changes to the ImageMagick script, distorted the images, and did some testing of ocr capabilities against them. The script now looks like this:

$command = "echo '$str_captcha' |<nobr> <wbr></nobr>/usr/local/bin/enscript -o - -B -f 'Courier-BoldOblique50' |<nobr> <wbr></nobr>/usr/local/bin/convert -spread 2 -trim +repage - $absolute_path";

You can view what am image now looks like at:

<a href="http://www.mcarthurweb.com/archive.php?item=210" title="mcarthurweb.com">http://www.mcarthurweb.com/archive.php?item=210</a mcarthurweb.com>

#

sorry to be so negative about this, but..

Posted by: Anonymous Coward on August 24, 2006 03:49 PM
To me, these generated images that have popped up everywhere to verify human users are a sign of complete and utter defeat in our effort to defeat the bots. I *hate* typing in those codes, it's just busy work and it slows me down on every site that is now doing this. It's a statement coders make that is saying "we couldn't find a way to distinguish between users and bots and so we've had to fall back on this method".

#

your logic is at fault

Posted by: Anonymous Coward on August 27, 2006 04:07 PM

It's a statement coders make that is saying "we couldn't find a way to distinguish between users and bots and so we've had to fall back on this method".


But they have found a way. This is it.


Given that a webserver can't telepathically detect a human at the other end of a TCP connection, because telepathy doesn't exist outside SF stories, any "way to distinguish" is going to require some kind of response from the human, which you will hate, whatever it is.

#

Hmm

Posted by: Anonymous Coward on August 24, 2006 04:58 PM
You could have the CAPTCHA only for anonymous posts, but not be needed by people who have registred accounts on the website.

There is also a thing to put in links that get posted in comments (and forums) that makes some search engine ignore the link and not count the link as a reference to the site so it wont get a higher site rank.
If I remember correctly, it is something like:
<a href="http://www.example.com/" rel="nofollow">

#

Big hole in your suggestion

Posted by: Anonymous Coward on August 27, 2006 04:10 PM

but not be needed by people who have registred accounts on the website.


But there's nothing to stop a bot from registering an account! (or several accounts.)

#

its "yum install"

Posted by: Anonymous Coward on August 24, 2006 08:10 PM
enscript and imagemagick are both standard packages with CentOS4.

If they are not already installed, the correct yum command is:

yum install enscript ImageMagick

#

Filenames on the Fly

Posted by: Anonymous Coward on August 24, 2006 09:03 PM
Nice idea. What about creating files on the fly named $captcha_encrypted.png and after submitting the comment, delete them via unlink($_POST['captcha_encrypted'].".png")?
Of course, you will have to find a method of deleting the old images, that have been accumulated over time. Maybe with



foreach (glob("*.png") as $filename) {

echo "$filename size " . filesize($filename) . "\n";

unlink($filename);

}


at the beginning of the script.

#

Re:Filenames on the Fly

Posted by: Anonymous Coward on August 25, 2006 11:13 PM
A better way to achieve this would be a function like



function captcha_destroy_img() {

  $dir="/dir/to/captcha/images"

  $directory=opendir($dir);

  $file_array=array();

  while (($file = readdir($directory))!=false) {

    if (preg_match("/.png\$/",$file)) unlink($dir.$file);

  }

  closedir($directory);

}

#

Re:Filenames on the Fly

Posted by: Anonymous Coward on October 20, 2006 09:15 PM
The PHP/PEAR package Text_CAPTCHA <a href="http://pear.php.net/manual/en/package.text.text-captcha.php" title="php.net">http://pear.php.net/manual/en/package.text.text-c<nobr>a<wbr></nobr> ptcha.php</a php.net> does something like this, ensuring unique image names by hashing the session ID as in
md5(session_id()) . '.png'
The image is deleted on validating the CAPTCHA. However, I wonder what happens to CAPTCHAs which never get validated because the form wasn't submitted. Also, the CAPTCHA_test.php example in the package adds a timestamp to the SRC attribute of the image, as in 4b9b92da1c76ee5ac9ade60b5447514c.png?1161349897 for linking to the image 4b9b92da1c76ee5ac9ade60b5447514c.png. I'd appreciate if anyone could explain these two issues for me.

#

You'd think

Posted by: Anonymous Coward on August 30, 2006 09:34 AM
that a techie group like Linux.com could figure out a way to produce web pages that don't require the reader to scroll horizontally in order read them.

#

Re:You'd think

Posted by: Anonymous Coward on January 06, 2007 09:10 PM
<a href="http://horserace.emito.us/jersey_horse_race.html" title="emito.us">jersey horse race</a emito.us>
<a href="http://omega.immod.us/omega_300m.html" title="immod.us">omega 300m</a immod.us>
<a href="http://holdem.eudom.us/hold_em_starting_hand_odds.html" title="eudom.us">hold em starting hand odds</a eudom.us>
<a href="http://marketing.gifta.us/entry_level_marketing_salary.html" title="gifta.us">entry level marketing salary</a gifta.us>
<a href="http://college.gazar.us/education_major_college.html" title="gazar.us">education major college</a gazar.us>
<a href="http://college.habby.us/pondicherry_engineering_college.html" title="habby.us">pondicherry engineering college</a habby.us>
<a href="http://hotels.emuze.us/cbre_hotels.html" title="emuze.us">cbre hotels</a emuze.us>
<a href="http://marketing.fuki.us/secondary_marketing_conference.html" title="marketing.fuki.us">secondary marketing conference</a marketing.fuki.us>
<a href="http://cardgamescanasta.ewaz.us/canasta_card_games.html" title="cardgamescanasta.ewaz.us">canasta card games</a cardgamescanasta.ewaz.us>
<a href="http://date.emuze.us/reasons_to_date_a_cross_country_runner.html" title="emuze.us">reasons to date a cross country runner</a emuze.us>
<a href="http://credit.ewaz.us/knoxville_teachers_credit.html" title="credit.ewaz.us">knoxville teachers credit</a credit.ewaz.us>
<a href="http://loan.gifta.us/loan_businesses.html" title="gifta.us">loan businesses</a gifta.us>
<a href="http://college.emito.us/minority_college_scholarships.html" title="emito.us">minority college scholarships</a emito.us>
<a href="http://dating.imact.us/dating_syracuse_ny.html" title="imact.us">dating syracuse ny</a imact.us>
<a href="http://college.gi-gi.us/east_central_community_college.html" title="gi-gi.us">east central community college</a gi-gi.us>
<a href="http://single.gadda.us/single_player_cd.html" title="gadda.us">single player cd</a gadda.us>
<a href="http://meter.gruvi.us/the_ts_2000_s_swr_meter.html" title="gruvi.us">the ts 2000's swr meter</a gruvi.us>
<a href="http://massageschools.gevos.us/thai_massage_schools.html" title="gevos.us">thai massage schools</a gevos.us>
<a href="http://diego.gulet.us/house_downtown_san_diego.html" title="gulet.us">house downtown san diego</a gulet.us>
<a href="http://stock.immod.us/comcast_stock_symbol.html" title="immod.us">comcast stock symbol</a immod.us>
<a href="http://college.gevos.us/boston_college_unc.html" title="gevos.us">boston college unc</a gevos.us>
<a href="http://loan.immod.us/bank_of_america_loan_center.html" title="immod.us">bank of america loan center</a immod.us>
<a href="http://computer.gazar.us/microelectronics_and_computer_technology.html" title="gazar.us">microelectronics and computer technology</a gazar.us>
<a href="http://stock.gifta.us/stock_average.html" title="gifta.us">stock average</a gifta.us>
<a href="http://college.eudom.us/albertson_college_idaho.html" title="eudom.us">albertson college idaho</a eudom.us>
<a href="http://websitebest.fiuva.us/best_porn_website.html" title="fiuva.us">best porn website</a fiuva.us>
<a href="http://lipitor.gruvi.us/lipitor_substitute.html" title="gruvi.us">lipitor substitute</a gruvi.us>

#

Not bad, but...

Posted by: Administrator on March 11, 2007 11:28 PM
There are better solutions like a mathematical question like many phpbb boards use it.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya