Comparing images with PHP + GD

Some time ago I, for some reason, stumbled upon a website that made the visitor choose the image that was duplicated as some sort of human recognition. I’ve always liked those types of captchas that are not your usual type the text in the image, but this particular one seemed breakable for me. I though, “Hey, I can do this!” and in fact I did.

I wrote a function that took two image url’s as it’s attributes and returned if they matched or not. In fact it returns a factor I named “Dissimilarity Index” since the higher the number, the less similar those images are.
For those who are still into basics of PHP this may be a little bit too much, but it’s worth reading if you’re interested.

First I’ll show you the function, the I will dissect it as usual, line by line, avoiding the most basic stuff that hopefully you already know.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
function dissimilarityIndexCalculator($str_img_url,$str_match_url){
//Try to make images from the urls, on fail return false.
$img_source = @ImageCreateFromString(file_get_contents($str_img_url));
$img_match  = @ImageCreateFromString(file_get_contents($str_match_url));
if(!$img_source || !$img_match) return false;

//Get image sizes.
list($int_img_source_width, $int_img_source_height)     = getimagesize($str_img_url);
list($int_img_match_width, $int_img_match_height)   = getimagesize($str_match_url);

//Resample to 16px each
$img_16source = imagecreatetruecolor(16,16);
$img_16match  = imagecreatetruecolor(16,16);

imagecopyresampled( $img_16source,
$img_source,
0, 0, 0, 0, 16, 16,
$int_img_source_width,
$int_img_source_width
);
imagecopyresampled( $img_16match,
$img_match,
0, 0, 0, 0, 16, 16,
$int_img_match_width,
$int_img_match_width
);

$difference = 0;
for($x=0;$x<16;$x++){
for($y=0;$y<16;$y++){

//Get the color of the resulting image
$arr_img_source_color[$x][$y] =
imagecolorsforindex($img_16source,imagecolorat($img_16source,$x,$y));
$arr_img_match_color[$x][$y]  =
imagecolorsforindex($img_16match,imagecolorat($img_16match,$x,$y));

//Calculate the index
$difference  += abs($arr_img_source_color['red']   - $arr_img_match_color['red'])       +
abs($arr_img_source_color['green'] - $arr_img_match_color['green']) +
abs($arr_img_source_color['blue']  - $arr_img_match_color['blue']);
}
}

$difference = $difference/256;

//Return an array with the information
$arr_return = array( "dissimilarityIndex" => $difference,
"sourceImage" => array( "url"         => $str_img_source,
"width"       => $int_img_source_width,
"height"      => $int_img_source_height,
"colors"    => $arr_img_source_color
),
"matchImage"  => array(    "url"      => $str_img_match,
"width"       => $int_img_match_width,
"height"      => $int_img_match_height,
"colors"    => $arr_img_match_color
)
);
return $arr_return;
}

The function takes two attributes, the first one is the url of the first image from the comparison, named $str_img_url and the second one, which is in fact the url of the second image, called $str_match_url. An example call to this function would be dissimilarityIndexCalculator("imageOne.jpg","imageTwo.jpg"); and will return the following information inside an array:

1
2
3
4
5
6
7
8
9
10
11
12
array( "dissimilarityIndex" => "An index reflecting the combined color differences",
"sourceImage" => array( "url"         => "First Image Url",
"width"       => "First Image Width",
"height"      => "First Image Height",
"colors" => "RGB values for each pixel in the 16x16 matrix"
),
"matchImage" => array( "url"      => "Second Image Url",
"width"       => "Second Image Width",
"height"      => "Second Image Height",
"colors" => "RGB values for each pixel in the 16x16 matrix"
)
);

There’s a lot going on in this function but in a nutshell what I’ve done is assured the images existed, if they do, Copy and resize the image with re-sampling to 16 pixels. These pixels with any luck contains the average colour of that image and by calculating the sum of the subtraction of the three colour indexes of each pair of pixels from the first image and the ones from the second image we get a number from 0 to 765 (255+255+255) that will represent the dissimilarity.

Lets get started dissecting the code:

1
2
3
$img_source = @ImageCreateFromString(file_get_contents($str_img_url));
$img_match  = @ImageCreateFromString(file_get_contents($str_match_url));
if(!$img_source || !$img_match) return false;

On these lines we are creating two image objects from the urls declared as the function attributes. the file_get_contents function when targeted to an image, will return a string that contains the data of the image. I chose this approach instead of building the image directly from the url since it appears that this is less resource hungry for the server. Then using ImageCreateFromString targeting the newly created string, it will translate the string back into an image object. The @ once again will hide any error messages from the function.

The third line is a conditional that in the even that any of the images failed to create will return false, ceasing the execution of the function. If there are no images, we can’t compare them can we?.

1
2
   list($int_img_source_width, $int_img_source_height)  = getimagesize($str_img_url);
list($int_img_match_width, $int_img_match_height)   = getimagesize($str_match_url);

It’s self explanatory, the function getimagesize returns an array of information from which the first two elements contain the height and width of the image. By using list we can declare more than one variable at one time.

1
2
$img_16source = imagecreatetruecolor(16,16);
$img_16match  = imagecreatetruecolor(16,16);

We create the image objects (empty for now) that will contain the resized copy of the original ones.

1
2
3
4
5
6
7
8
9
10
11
12
imagecopyresampled( $img_16source,
$img_source,
0, 0, 0, 0, 16, 16,
$int_img_source_width,
$int_img_source_width
);
imagecopyresampled( $img_16match,
$img_match,
0, 0, 0, 0, 16, 16,
$int_img_match_width,
$int_img_match_width
);

This is where the magic happens, what we are saying is, “Resize the image from the original width/heigh to 16px, and store that copy on $img_16source“. So we now have a 16 pixel image representing the average colour for each of the images we want to compare. All we need to do is a simple calculation and we will have the index, but before that we need to get the colour from those pixel. We do so using the imagecolorsforindex function inside two nested loops.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
    $difference = 0;
for($x=0;$x<16;$x++){
for($y=0;$y<16;$y++){

//Get the color of the resulting image
$arr_img_source_color[$x][$y] =
imagecolorsforindex($img_16source,imagecolorat($img_16source,$x,$y));
$arr_img_match_color[$x][$y]  =
imagecolorsforindex($img_6match,imagecolorat($img_16match,$x,$y));

//Calculate the index
$difference  += abs($arr_img_source_color['red']   - $arr_img_match_color['red'])   +
abs($arr_img_source_color['green'] - $arr_img_match_color['green']) +
abs($arr_img_source_color['blue']  - $arr_img_match_color['blue']);
}
}

$difference = $difference/256;

Last, we calculate the dissimilarity index by subtracting the red, green, and blue integer values of the source image with the one from the match image. For those who don’t know what RGB is, it stands for Red Green and Blue. Those values will be 0 for the lack of that particular colour and 255 for the maximum value of that colour. So by adding the absolute value of those subtraction, we have the index!!

Lets put it to the test by comparing a few images, the first image will be compared against the two other:

When comparing the first image against the first match, we got a result of [dissimilarityIndex] => 38.640625.
When comparing the first image against the second match, we got a result of [dissimilarityIndex] => 57.22265625.

Note that for comparing images with really similar colours, it may give inaccurate results. This was thought more like a way of finding a match between different images. Anyway it performs pretty well under most circumstances and will surely do the work. If you read this post right when it was written (not like anyone reads my blog) you might have seen an old version of this function that used a resize to 1px instead of 16px. The result for this improved version is a lot more accurate and even with slight modifications to a source image it finds the closest match.

About Juan

Hola! Me llamo Juan y soy un programador de 23 años de Madrid. Este es mi blog personal dedicado a la programación en general aunque con cierta predilección hacia PHP y Java.