Home  |  FAQ  |  About  |  Contact  |  View Source   
 
SEARCH:
 
BROWSE:
    My Hood
Edit My Info
View Events
Read Tutorials
Training Modules
View Presentations
Download Tools
Scan News
Get Jobs
Message Forums
School Forums
Member Directory
   
CONTRIBUTE:
    Sign me up!
Post an Event
Submit Tutorials
Upload Tools
Link News
Post Jobs
   
   
Home >  Tutorials >  C# >  Using Regular Expressions
Add to MyHood
   Using Regular Expressions   [ printer friendly ]
Stats
  Rating: 4.07 out of 5 by 15 users
  Submitted: 10/04/01
Edmund Chou ()

 
Using Regular Expressions

Regular expressions are a powerful tool to manipulate strings often used to extract or modify data from complicated strings. In a sense, regular expressions are a seperate language allowing you to specify what type of data you're attempting to match. Instead of focusing on the details of regular expression syntax, we'll discuss how to use regular expressions using C# and ASP.NET pages. A useful reference and brief overview of regular expressions can be found at the web page.

To begin using regular expressions, you'll need a reference to the System.Text.RegularExpressions namespace. In the Solution Explorer of your Visual Studio.NET, right click on the References folder and select "Add Reference". A window will appear listing several components you can reference. Select the System.Text.RegularExpressions component and click "OK".

Matching

One simple goal of regular expressions is to find relevant data. The .NET Framework supports regular expression operations with the Regex class. Since Regex contains static methods, regular expressions can be used without instantiating a Regex object. To test if a data string matches against a regular expression, we simply call the Regex.IsMatch method as follows:

string strData   = "Sally sells sea shells by the seashore";
string strSearch = "s\\w{6,}";
if (Regex.IsMatch(strData, strSearch)) { ... } 


The code above is attempting to match the regular expression s\w{6,} (the letter 's' followed by at least 6 word characters) to the string "Sally sells sea shells by the seashore". Notice that construction of strSearch required the use of double backslashes (\\) to denote a single backslash. This is because a single backslash is an escape sequence in the C# string.

Now we can determine if a match was made, but what if we wanted to know what the match was? We can continue the code by calling the Regex.Match method, which returns an object of type Match. A Match object is an enumeration of all the matches for a regular expression.

string strData    = "Sally sells sea shells by the seashore";
string strSearch  = "s\\w{6,}";
string strMatches = "";
if (Regex.IsMatch(strData, strSearch)) {

    Match match = Regex.Match(strData, strSearch);

    while (match != Match.Empty) {
        strMatches = strMatches + match.ToString() + "<br>";
        match = match.NextMatch();
    }
}


Replacing

Another large part of regular expressions is manipulating your string data by specifying search and replace strings. One common usage is the replacement of the characters '<' and '>' with &lt; and &gt; respectively to display HTML tags. This can be accomplished the the Regex.Replace, a static method that returns the result of the search and replace.

string strData = "<b>Sally sells sea shells by the seashore</b>";

strData = Regex.Replace(strData, "<", "&lt;");
strData = Regex.Replace(strData, ">", "&gt;");



Finally you'll probably want some more complicated examples of replacing strings with Regex.Replace. For example, we can rewrite the code above to make one Regex.Replace call as follows:

string strData    = "<b>Sally sells sea shells by the seashore</b>";

string strSearch  = "<(?<tagname>[/\\w]*?)>";
string strReplace = "&lt;${tagname}&gt;";

strData = Regex.Replace(strData, strSearch, strReplace);


This last example demonstrates several useful features of regular expressions. First, notice that the match between the angle brackets was given the name "tagname" by declaring ?<tagname> before the match condition. This allowed the replacement string to use the matched text in the replacement by specifying ${tagname}. The search string also used other regular expression syntax such as character set ([/\\w] denotes a match with / or \w) and non-greedy matching (? denotes that the * operator should match the smallest possible number of characters).

Return to Browsing Tutorials

Email this Tutorial to a Friend

Rate this Content:  
low quality  1 2 3 4 5  high quality

Reader's Comments Post a Comment
 
I have found that when writing regular expressions it is easiest to use C#'s '@' "operator." What it does is it causes the string to be interpreted literally... so you don't have to double up the '\' characters in the regular expression.

So you would get something that looks like this:
string regexp = @"s\w{6,}";

For really long regular expressions this really cleans them up.
-- John Gallardo, October 05, 2001
 
I actually agree with John's comments. I wrote this tutorial awhile ago, so I forgot to update it with literal strings. Thanks for the comment!
-- Edmund Chou, October 10, 2001
 
Somebody asked me the following question:

Let say i have a string like this s="[url=http://www.testingonly.com]C# Tutorial[/url]. How can i replace this sting to <a href=">?

So to benefit the readers who would like another example, I thought I'd post the response here.

You can construct your search and replace strings as follows:

String strMatch = @"\[url=(?<url>(.*?))](?<link>(.*?))\[/url\]";
String strReplace = "<a href=\"${url}\">${link}</a>";

So strMatch first looks for the string "[url=" followed by an number of arbitrary characters before our next ]. The . matches any non-whitespace character, the * means to repeat many times, and the ? make the match non-greedy. By default the * operator matches as much as possible, but \[url=.*?] will match as much as possible before the first ]. If I didn't use the ? operator, then our match string would actually go all the way until the last ].

We place a ?<url> before (.*?) so that we can identify the matched string as the target url. So at this point, we've matched [url=http://www.testingonly.com] where we know the url is http://www.testingonly.com.

Now the next part of the match expression is (?<link>(.*?)) which identifies the name of the link, once again by matching next few characters. We want this match to be non-greedy, up until the ending where we match [/url].

For our replace string, we construct the HTML tag as we'd expect, which is simple enough since we know the url and the name of the link.

Hope you found this useful!
-- Edmund Chou, October 26, 2001
 
It looks like the page I had originally linked has been moved. I meant to link the following reference:
-- Edmund Chou, October 28, 2001
 
ugh, regular expressions are hard if you are just starting out. This is a good point to help clear things up, Thanks
-- Ben Ratzlaff, January 25, 2002
 
I had a tough time with regular expressions in the beginning and this is a good tutorial if you're just getting started.

Is there any more advanced regular expression tutorials/topics?
-- Victor Vuong, January 26, 2002
 
Thank you - I found this information very helpful!!!
-- Brad Rider, February 04, 2002
 
An interesting tutorial.
-- Brian Simoneau, March 01, 2002
 
Thanks! This is very helpful with my Theory Computation class!
-- Kuniaki Tran, March 03, 2002
 
Thanks this clears up some thing on expressions for me
-- Brian Gall, March 05, 2002
 
Wow, I didn't know that these types of powerful regular expressions were possible in C#. This is great!
-- Jared Betteridge, March 27, 2002
 
It's always nice to know about regular expressions no matter what language you're using. Good topic.
-- Laurent Vauthrin, April 16, 2002
 
In the third comment, Edmund's explanation of the [url=... to
-- Philip Lanier, May 31, 2002
 
**Oops! Sorry, I forgot to use &lt; in the text of my comment instead of the <

In the third comment, Edmund's explanation of the [url=... to <a href=... example states that "The . matches any non-whitespace character". I am sure this is just a slip on his part, but it actually matches any character except \n. If the expression is modified by the Singleline option, it will match any character (including \n).
-- Philip Lanier, May 31, 2002
 
Copyright © 2001 DevHood® All Rights Reserved