Introduction to Regular Expressions in Perl

What Are Regular Expressions And What Are They Good For?

In an nutshell, regular expressions identify patterns in text.

Manipulating strings can be one of the more complex topics to deal with. Regular expressions make it easy and efficient to perform complex searches and/or replacements of text in a string, given pretty much any criteria for the search. The following example demonstrates how easy it is to count the number of tags in an HTML file using regular expressions in Perl, as compared to equivalent c code using a character array.

Perl:

$tags++ while /<.+?>/g;

Equivalent in c (using character arrays) :

while (string[i] != '\0'){
   while (string[i] != '<' && string[i] != '\0') i++;
   if (string[i] == '\0') break;
    
   while (string[i] != '>' && string[i] != '\0') i++;
   if (string[i] == '\0') break;

   tags = tags + 1;
}

Granted, the c code fairly easy to understand, and wouldn't be bad to write, but string manipulation can get incredibly more complex than counting the number of tags in an HTML file. Say you want to search a Web page for all sentences that contain a word that ends with "ing" and doesn't contain the letter "m", with the added stipulation that the word must be preceeded by either the name "Romeo" or "Juliet" in the sentence. I don't even want to attempt this in c, but the following Perl regular expression will do the trick. (Note: it assumes that sentences end with a period, but could be easily modified to handle ?, ;, :, etc.)

/\.?\s*([^\.]*(Romeo|Juliet)[^\.]*\b[^mM]*ing\b[^\.]*.)/;

"Aaack! That is way too complicated!"

Don't worry! I don't expect you to understand that. I was simply demonstrating that regular expressions can handle complex searches without a lot of code. And once you become accustomed to using regular expressions it shouldn't be too difficult to understand what that code is doing.

Getting Started

To begin with, we must have some text to search. We'll place this in a string:

my $string = "This is the text Phil wishes to search."

Now, we need to tell Perl what string we're looking for. We "bind" our search terms to the string with the =~ operator. The search terms appear to the right of this as follows:

$string =~ m/Phil/;

The 'm' stands for match. The two forward slashes are delimiters that set off the search string. In place of the two slashes //, we could use [], {}, ##, <>, !!, ??, or any other non-word character.

The above code returns true (1) if a match is found, and false (0) if it is not. Thus, the following will print "The condition is false":

if ($string =~ m/Bill/){
   print "The condition is true";
}else{
   print "The condition is false";
}

The contents of the search string can also be a string variable. Thus the following will return true:

my $string = "This is the text Phil wants to do the searching on.";
my $strToFind = "Phil";
$string =~ m/$strToFind/;

Metacharacters

The above technique is good if we know exactly what we're trying to find. But this won't work if we want to find a string that is variable. We may want to find a 5 digit number, the first word of a particular sentence, words ending in "ing", etc. We can do this with metacharacters. Metacharacters represent individual characters or combinations of characters.

An example of a metacharacter is a the wildcard character, which is the period (.). This represents any single character (including whitespace). So the following will return true:

my $word = "fall";
$word =~ m/fa.l/;

In this case, the regular expression would have returned true if $word had been "falling", "failed", "unfailing", etc. However, it would not have returned true if $word was "fal", "fill", "faail", etc.
An (incomplete!) list of some other common modifiers follows:

[abc] - Any one of a, b, or c. Example:

#Returns true if $word is "ton", "cone", "prison", etc, but not if it is "non", "won", etc.
$word =~ m/[sct]on/;

[a-d], [4-9] - Any character or number in the range. Example:

#Returns true if $string contains the strings "513", "514", or "515"
$string =~ m/51[3-5]/;

? - The preceeding character or group may or may not be present. Note: Groups are enclosed in parentheses (). Example:

#Returns true if $name is "Phil Lanier" or "Philip Lanier"
$name =~ m/Phil(ip)? Lanier/;

* - The preceeding character or group is present 0 or more times.
+ - The preceeding character or group is present 1 or more times. Example:

#Returns true if $word is "bal", "ball", "balll", etc, but not "ba"
$word =~ m/bal+/;

\s,\S - A whitespace character (including \n, \t, \r); a non-whitespace character. Example:

#Returns true if $string is "two words", but not if it is "two  words", "twowords", etc
$string =~ m/two\swords/;

#Returns true if $string is "one-word", "oneXword", etc but not if it is "one word"
$string =~ m/one\Sword/;

\d,\D - A digit; a non-digit.
\w,\W - A word character [a-z], [A-Z], or [0-9]

For a complete list of metacharacters, see a Perl regular expression reference, such as one found at .

Reserved Characters
You should note that the following special characters are reserved, and you must use the escape character (\) if you want to use them literally in a regular expression:
. * ? + [ ] ( ) { } ^ $ | \

Modifiers
The search strings are case sensitive. Therefore $string =~ /phil/ will return false. We can turn the case sensitivity off, however, using a modifier. Modifiers are single characters that go to the right of the last delimiter. The modifier to make the search case-insensitive is /i. So if $string = "DeVhoOD", the following will return true:

$string =~ /devhood/i;

The the example I gave you at the beginning of this tutorial (for counting the number of HTML tags in a string) uses the modifier /g. This is for a global match, which means that every time the regular expression is evaluated, the "cursor" does not start back at the beginning of the expression. If it were left out of that example, we would get an infinite loop. The /g modifier is also useful for substitution, but I will cover this in a later tutorial. Additionally, the +? is a special sequence of metachacters that I will cover in the next tutorial. Consider it a teaser.

There are several other modifiers as well. As with the metacharacters, I would suggest you look at a Perl reference for a complete list.

Conclusion
That's a basic introduction to regular expressions in Perl. It should be enough to get you started writing a few regular expressions on your own and reading some of the simpler expressions that others have written (now you finally know what that crazy =~ thing is!). The next tutorial will cover some of the other main topics you need to become familiar comfortable using regular expressions for all of your text manipulations needs.

Return to Browsing Tutorials

Email this Tutorial to a Friend


Rate this Content:
low quality	1	2	3	4	5	high quality

Reader's Comments	Post a Comment

Not bad Phil. I remember the days when you didn't know any perl. Good to know you've turned to the dark side ;).
-- parker thompson, April 30, 2002

Excellent tutorial. Nicely written. 5 Stars: *****
-- Michael S, April 30, 2002

I agree, good tutorial Phil. I give it a 5.
-- Brian Anderson, May 01, 2002

I must say, this is a really good tutorial. If I had read this tutorial, my perl projects wouldnt be bad and I dont have to learn all of this by myself. I must said, perl is a complicated language, but its really good at searching. I remember after I did my project, I hardly know what I was doing, since it is all those /'s ['s...etc.
-- Yee Wa Lau, May 03, 2002

This is a good tutorial for guys like me who have never even dealt with PERL before. Now I can at least get the feel for what you would use the language in.
-- Anthony Johnson, May 13, 2002

The Perl regular expression reference that I linked to above has moved. You can now find it . The author of the reference was kind enough to email me to let me know the link had changed. Unfortunately, I can't edit the tutorial like I can other posts.
-- Philip Lanier, June 14, 2002

This is a good simple and easy to understand introduction to regular expresions in Perl. Another great reference for perl is . It's maintained by O'Reilly, the publisher of a lot of great books on Perl (Programming Perl, Mastering Regular Expresions, Perl In A Nutshell...). There are other languages, like JavaScript(see 's JavaScript guide for more details), that suport regular expressions as well.
-- Brent Bishop, August 08, 2002

Thank you, I've been looking for a decent tutorial about regular expressions, and this sure helped.
-- Matthew Hildebrand, August 24, 2002

Kansas City

Wichita

Overland Park

Manhattan

Lawrence

Lenexa

Salina

Topeka

Seneca

Bonner Springs

Emporia

Beloit

Garden City

Derby

Liberal

Hays

Olathe

Stanley