Tutorial
Regular exPssions are often used to make sure that a string
matches a certain pattern. For example, that a string looks like a zip
code, or phone number, or e-mail address.
The simplest regular exPssion is just a substring. For
example, the regular exPssion ther matches the string
hello there, because the string contains the regular exPssion.
If you're familiar with JavaScript regular exPssions, then
you'll you already know most of this. .NET regular exPssions are just
a superset of JavaScript regular exPssions.
Start and End of Line
You can easily match strings that start or end with certain
characters. The ^ character matched the start of the
string. For example:
^hello
Matches hello there, hello sam, hellotopical
To match the end
of the string, use the $ character. For example:
ere$
Matches Where, and There
The ^
and $ characters are know as "Atomic Zero Width Assertions",
in case you were wondering.
Character Classes
Character classes
allow you to specify sets of characters or ranges. For example:
[aeiou]
Matches Hey, and Hi, but not Zzz
In other words,
the string must contain at least one of the characters in the character class.
You can also exclude characters. For example:
[^aeiou]
Matches Zzz, but not Hey or Hi.
When the ^
character is the first character in the character class, it means "anything
but the following characters".
Putting this together,
we could create a pattern that matches strings that start with a vowel:
^[aeiou]
Or, strings that
don't start with a vowel:
^[^aeiou]
With character
classes, you can also specify ranges. For example:
[0-9]
Matches 0, 5, 8, or any number between 0 and 9.
[0-9][0-9]
Matches any two digit number (04, 13, 87, etc.), but there's
a better way to do this.
[a-zA-Z0-9_]
Matches characters typically found in words. A short hand syntax
for this is \w
Some other build
in classes are \W for anything other than a word character
([^a-zA-Z0-9_]). \s for any whitespace character.
\S for any non-whitespace character. \d
for any decimal ([0-9]) and \D for any non-decimal
([^0-9])
Quantifiers
Sometime you want
to specify a certain number of characters that match a certain pattern.
For example, a zip code is 5 digits. This is written as:
^[0-9]{5}$
Which says match the beginning of the string, followed by five digits, followed
by the end of the string. This matches 97211, 01293, 88460.
^[0-9]{5}-[0-9]{4}$
Matches 9 digit zip codes, such as 97211-0165.
This could also be written as ^\d{5}-\d{4}$. where \d
matches any decimal (the same as [0-9]).
You can also specify
minimum and maximum style="FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">occurrences.
For example:
^\d{1,3}$
matches 1, 15, 987. In other words, any number that
is 1 to 3 digits in length.
You can specify
open ended ranges:
^\d{1,}$
matches 1 or more digits. This can also be written ^\d+$
^\d{0,}$
matches 0 or more digits. This can also be written ^\d*$
^[+-]{0,1}\d{0,}$
matches 123, +123, and -123.
This could also be written as ^[+-]?\d* where ?
means 0 or 1 occurrences.
Options
The regular exPssion
syntax contains a number of options that you can toggle. For example,
you can enable case insensitive matching with (?i:):
(?i:[aeiou])
matches hello, and HELLO, but not
Zzzz
Examples:
US Phone Number:
^\(?\d{3}\)?\s|-\d{3}-\d{4}$
matches (555) 555-5555, or
Improved US Phone
Number (by Jesse Sweetland).
^1?\s*-?\s*(\d{3}|\(\s*\d{3}\s*\))\s*-?\s*\d{3}\s*-?\s*\d{4}$
This recognizes 1-, 1 (123) 456 7980, 1 123 456
7890, (123) 456-7890, , and so on, and makes sure that if one
paren is Psent both must be Psent.
International Phone
Number
^\d(\d|-){7,20}
matches 1-12-3123-4141.
E-Mail Address
(by Lucadean)
^([a-zA-Z0-9_\-])([a-zA-Z0-9_\-\.]*)@(\[((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}|((([a-zA-Z0-9\-]+)\.)+))([a-zA-Z]{2,}|(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\])$
5 Digit Zipcode
^\d{5}$
matches 12879, 97211
9 Digit Zipcode
^\d{5}-\d{4}$
matches 97211-1234
5 or 9 Digit Zipcode
from Bradley M. Handy
^\d{5}(-?\d{4})?$
This exPssion will match 12345, 123451234, or 12345-1234.
Date
(as in MM-DD-YYYY or MM/DD/YYYY, by Chow). Accepts 1 or 2 digits for month
and day.
^\d{1,2}/|-\d{1,2}/|-\d{4}$
More sophisticated
date, that accepts dates from 1/1/0001 - 12/31/9999 (mm/dd/yyyy), and validates
leap years (2/29/2000 is valid, but 2/29/2001 is not) - By Mike Akins based
off work by Michael Ash
^(?:(?:(?:0?[13578]|1[02])(\/|-)31)|(?:(?:0?[1,3-9]|1[0-2])(\/|-)(?:29|30)))(\/|-)(?:[1-9]\d\d\d|\d[1-9]\d\d|\d\d[1-9]\d|\d\d\d[1-9])$|^(?:(?:0?[1-9]|1[0-2])(\/|-)(?:0?[1-9]|1\d|2[0-8]))(\/|-)(?:[1-9]\d\d\d|\d[1-9]\d\d|\d\d[1-9]\d|\d\d\d[1-9])$|^(0?2(\/|-)29)(\/|-)(?:(?:0[48]00|[13579][26]00|[2468][048]00)|(?:\d\d)?(?:0[48]|[2468][048]|[13579][26]))$
Here's a version of the above
date exPssions that matches UK dates (dd/mm/yyyy) - by Adam Carless
^(?:(?:0?[1-9]|1\d|2[0-8])(\/|-)(?:0?[1-9]|1[0-2]))(\/|-)(?:[1-9]\d\d\d|\d[1-9]\d\d|\d\d[1-9]\d|\d\d\d[1-9])$|^(?:(?:31(\/|-)(?:0?[13578]|1[02]))|(?:(?:29|30)(\/|-)(?:0?[1,3-9]|1[0-2])))(\/|-)(?:[1-9]\d\d\d|\d[1-9]\d\d|\d\d[1-9]\d|\d\d\d[1-9])$|^(29(\/|-)0?2)(\/|-)(?:(?:0[48]00|[13579][26]00|[2468][048]00)|(?:\d\d)?(?:0[48]|[2468][048]|[13579][26]))$
IP Address (by
Lucadean)
^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$
matches 255.255.255.255, and 0.0.0.0, but
doesn't match 256.1.1.1 or 999.1.1.1.
Make sure a string
doesn't contain certain characters (by Chris Venus):
^[^ab]*$
matches hello, eye, fred (any string that doesn't have "a"
or "b" in it), but doesn't match bye.
UK Postal Codes
by John Dyke.
Their format
is an outer part: 1 or 2 letter(s) + 1 or 2 digits + a letter (sometime mainly
London) an inner part 1 digit and two letters.
The code is normally
written in capital letters with a space between the outer and inner parts;
it is understandable if the space is omitted
This regular
exPssion validates upper or lower case with or without the space:
^[A-Za-z]{1,2}[\d]{1,2}([A-Za-z])?\s?[\d][A-Za-z]{2}$"
CF1 2AA matches
as does cf564fg (= CF56 4FG) but a1234d, A12 77Y would not.
Extract all the
HTML tags from a web page:
In conjunction
with a little .NET code that extracts all the matches, this can be used to
extract every HTML tag from a page.
<;[^>;]*>;
Or, if you just
want image tags, for example:
<;img[^>;]*>;
To get the values
of a CSV (updated by Arnold Bailey), you can use this exPssion:
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))
In conjunction
with this code:
Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
string s = "\"a\",b,\"c, d, e\",,f";
string[] sAry = r.Split(s);
for(int i=0;i <; sAry.Length;i++)
{
Console.WriteLine(sAry[i]);
}
Percentage (by
Andres Garcia)
^(0*100{1,1}\.?((?<;=\.)0*)?%?$)|(^0*\d{0,2}\.?((?<;=\.)\d*)?%?)$
- Matches 0, 0.0, 99.9, 100.0, but excludes -1, 100.1, etc.