Regular expressions in \.NET

The String class in .NET provides plenty of methods for dealing with text. But when they fall short we can turn to the Regex class.

bool valid = Regex.IsMatch(identifier, @"^[a-z][a-z0-9]*$");

string episode = Regex.Match(filename, @"S\d{2}E\d{2}").Groups[0].Value;

Match match = Regex.Match(url, @"^http://(?<domain>[^:]+):(?<port>[0-9]+)");
if (match.Success)
{
    string domain = match.Groups["domain"].Value;
    string port = match.Groups["port"].Value;
}

MatchCollection matches = Regex.Matches(users, @" (\w+),?");
foreach (Match match in matches)
{
    string user = match.Groups[1].Value;
}

Match match = Regex.Match(users, @"Users: ((\w+)(, )?)*");
foreach (Capture capture in match.Groups[2].Captures)
{
    string user = capture.Value;
}

The syntax for regular expressions in .NET is fairly standard. Character classes uses Unicode unless you pass the ECMAScript flag. Remember to escape backslashes or use verbatim string literals. The last example uses captures, which is a way to get history when multiple matches are made.

Regex regex = new Regex(@"^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} (.+)$",
    RegexOptions.ECMAScript | RegexOptions.Compiled);

foreach (string line in log)
{
    Match match = regex.Match(line);
    if (!match.Success) { continue; }
    string message = match.Groups[1].Value;
}

The Regex methods can be called statically, like in the earlier examples, or the class can be instantiated as in the latter example. Since a regular expression have to be processed before it can be used it makes sense to reuse it. However, when used statically the Regex class uses an internal cache, so the static usage isn’t as bad as it may seem.

Regex objects can also be processed further by passing the Compiled flag. The regular expression is then turned into an assembly. This makes string matching faster but comes at a significant startup and memory cost. You can even go all out and turn a regular expression into a permanent assembly.

Regular expressions in Visual Studio

image

As useful as regular expressions are in code it can be even more useful as a tool when writing code. But for reasons unknown the syntax used in Visual Studio is very different from that in .NET. For example, tags are enclosed with { } instead of ( ), character classes are prefixed with : instead of \ and repetitions are expressed as ^n instead of {n}.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s