regexps in PHP, again

People keep on insisting, that preg_match is better for non-unicode lookups than mb_ereg. So, here are actual benchmarks to make it clear.

Here are results:

preg_match:      19.8039090633
mb_ereg:         15.9386620522
mb_ereg_search:  1.24934506416

Here is the source:

<?php
$regexp = '[\w]+@[\w]+\.com';
$pcre_regexp = '/'.$regexp.'/';

$regexp2 = '[\s]+@[\s]+\.com';
$pcre_regexp2 = '/'.$regexp2.'/';

$text = 'blabla bla blbaaasdajkln dsfkl klewnjklfnjkne qwe123@gg.net adkljaskdlnkljnasdljk qwe@test.comasdjlajnsdklnasdklnjl';

$t1 = microtime(true);
for ($i = 0; $i < 100000; $i++) {
    $res1 = preg_match($pcre_regexp, $text);
    $res2 = preg_match($pcre_regexp2, $text);
}
$t2 = microtime(true);

$t3 = microtime(true);
for ($i = 0; $i < 100000; $i++) {
    $res3 = mb_ereg($regexp, $text);
    $res4 = mb_ereg($regexp2, $text);
}
$t4 = microtime(true);

$t5 = microtime(true);
mb_ereg_search_init($text);
for ($i = 0; $i < 100000; $i++) {
    $res5 = mb_ereg_search($regexp);
    $res6 = mb_ereg_search($regexp2);
}
$t6 = microtime(true);

echo 'preg_match:      '.($t2 - $t1)."\n";
echo 'mb_ereg:         '.($t4 - $t3)."\n";
echo 'mb_ereg_search:  '.($t6 - $t5)."\n";

Feel free to check it out yourself.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • DZone
  • FriendFeed
  • Reddit
  • Tumblr
  • Twitter
  • unfortunately, I did see CMS's (not too spherical, by the way) which used regexps a lot. and:

    a) they applied those to larger texts
    b) they had a lot regexps to apply

    p.s. probably I should make more real-life comparison, though
  • hex
    1. I strongly doubt that any serious application (i.e., not a spherical CMS in vacuum) uses RE that often; a hundred of calls I can imagine, but not thousands of hundreds. This takes the comparison to a different degree: 0.019 vs 0.0012, which is not as significant as...

    2. Does it not depend on an extension which is not necessarily loaded? I've seen hosters without mbstring.
blog comments powered by Disqus