Understanding The JavaScript RegExp Object

Find out all about the JavaScript RegExp object and its methods.

The Real World

Most Real Programmers treat JavaScript like the poor cousin from the country - useful in certain situations, but not very important. Real Programmers aren't interested in a language whose primary application seems to be swapping one image with another, or drawing mouse trails across a Web page. Real Programmers have better things to do.

Well, all you Real Programmers out there better hang on to your hats - JavaScript may only be confined to the client side of a Web transaction, but in that territory, it's the undisputed king. No other language is so easy to learn, or allows you to do quite so many things with minimum fuss. And one of the things it lets you do - and quite well too - is use regular expressions in your code.

Over the course of this article, I'm going to give you a gentle introduction to the concept of regular expressions in the context of JavaScript. I'll be showing you how to use JavaScript's String object for basic string matching and replacing capabilities, as well as more complex string manipulation. And I'll introduce you to the RegExp object, which provides a handy way to create more efficient code for common client-side input validation. So come on in - regardless of whether you're a Real Programmer or just trying to be one, you're sure to find something useful inside.

Enter The Matrix

Regular expressions, also known as "regex" by the geek community, are a powerful tool used in pattern-matching and substitution. They are commonly associated with almost all *NIX-based tools, including editors like vi, scripting languages like Perl and PHP, and shell programs like awk and sed.

A regular expression lets you build patterns using a set of special characters; these patterns can then be compared with text in a file, data entered into an application, or input from a form filled up by users on a Web site. Depending on whether or not there's a match, appropriate action can be taken, and appropriate program code executed.

For example, one of the most common applications of regular expressions is to check whether or not a user's email address, as entered into an online form, is in the correct format; if it is, the form is processed, whereas if it's not, a warning message pops up asking the user to correct the error. Regular expressions thus play an important role in the decision-making routines of Web applications - although, as you'll see, they can also be used to great effect in complex find-and-replace operations.

A regular expression usually looks something like this:

/matrix/

All this does is match the pattern "matrix" in the text it's applied to. Like many other things in life, it's simpler to get your mind around the pattern than the concept - but then, that's neither here nor there...

How about something a little more complex? Try this:

/mat+/

This would match the words "matting" and "mattress", but not "matrix". Why? Because the "+" character is used to match one or more occurrence of the preceding character - in the example above, the characters "ma" followed by one or more occurrence of the letter "t".

Similar to the "+" meta-character (that's the official term), we have "*" and "?" - these are used to match zero or more occurrences of the preceding character, and zero or one occurrence of the preceding character, respectively. So,

/eg*/

would match "easy", "egocentric" and "egg"

while

/Wil?/

would match "Winnie", "Wimpy" "Wilson" and "William", though not "Wendy" or "Wolf".

In case all this seems a little too imprecise, you can also specify a range for the number of matches. For example, the regular expression

/jim{2,6}/

would match "jimmy" and "jimmmmmy!", but not "jim". The numbers in the curly braces represent the lower and upper values of the range to match; you can leave out the upper limit for an open-ended range match.

Two To Tango

Now that you know what a regular expression is, let's look at using it in a script. JavaScript's String object exposes a number of methods that support regular expressions. The first of these is the search() method, used to search a string for a match to the supplied regular expression. Take a look at the next example, which illustrates:

<script language="JavaScript">

// define string to be searched
var str = "The Matrix";

// define search pattern
var pattern = /trinity/;

// search and return result
if(str.search(pattern) == -1) {
    alert("Sorry, Trinity is not in The Matrix.");
} else {
    alert("Trinity located in The Matrix at character " + str.search(pattern));
}

</script>

When you run this script, you should see the following:

Sorry, Trinity is not in The Matrix.

The search() method returns the position of the substring matching the regular expression, or -1 if no match exists. In the example above, it is clear that the pattern "trinity" does not exist in the string "The Matrix"; hence, the error message.

Now, look what happens when I update the regular expression so that it results in a positive match:

<script language="JavaScript">

// define string to be searched
var str = " The Matrix";

// define search pattern
var pattern = /tri/;

// search and return result
if(str.search(pattern) == -1) {
    alert("Sorry, Trinity is not in The Matrix.");
} else {
    alert("Trinity located in The Matrix at character " + str.search(pattern));
}

</script>

This time round, the JavaScript interpreter will return a match (and the location where it found the match). Here's the output:

Trinity located in The Matrix at character 7

Game, Set, Match

The String object also comes with a match() method, which can be considered a close cousin of the search() method above. What's the difference? Well, you've already seen that the search() method returns the position where a match was found. The match() method does things a little differently - it applies a regex pattern to a string and returns the values matched in an array.

Confused? Take a look at the next example

<script language="JavaScript">

// define string
var str = "Mississippi";

// define search pattern
var pattern = /is./;

// check for matches
// place result in array
var result = str.match(pattern);

// display matches
for(i = 0; i < result.length; i++) {
    alert("Match #" + (i+1) + ": " + result[i]);
}

</script>

View this example in a browser, and you'll get an alert message displaying the first matching result, as shown below:

Match #1: iss

In the example above, I have defined a regular expression "is.". This will match the string "is", followed by any other character (the "." operator at the end of the pattern matches anything and everything in a string). If you look at the string to be searched, you'll see that there are two occurrences of this pattern. However, the code above only returns 1. Why?

The answer is simple - I've "forgotten" to add the "g" (for "global") modifier to the pattern. As a result, searching stops after the first match. Consider the next example, which revises the previous code listing to add this operator:

<script language="JavaScript">

// define string
var str = "Mississippi";

// define search pattern
// add global modifier
var pattern = /is./g;

// check for matches
// place result in array
var result = str.match(pattern);

// display matches
for(i = 0; i < result.length; i++) {
    alert("Match #" + (i+1) + ": " + result[i]);
}

</script>

And now, when you try out this example, you should see two alert boxes, indicating that two matches to the specified pattern were found in the string. The additional "g" modifier ensures that all occurrences of a pattern in a string are matched, and stored in the return array. I'll show you a few other useful modifiers as we proceed through this tutorial.

Search and Destroy

The previous set of examples highlighted the search capabilities of the String object. But that's not all - you can also perform a search-and-replace operation with the replace() method, which accepts both a regular expression and the value to replace it with. Here's how:

<script language="JavaScript">

// set string
var str = "Welcome to the Matrix, Mr. Anderson";

// uncomment to check initial value
// alert(str);

// replace a string with another string
// The One turns into Smith
str = str.replace(/Anderson/,"Smith");

// display new string
alert(str)

</script>

If you load this example in a browser, you will see that the string "Anderson" has been replaced with the string "Smith". The following output illustrates:

Welcome to the Matrix, Mr. Smith

Remember how I used the "g" modifier to search for multiple instances of a pattern within a string? Take it one step further - you can even use it to replace multiple instances of a pattern within the string:

<script language="JavaScript">

// set string
var str = "yo ho ho and a bottle of gum";

// returns "yoo hoo hoo and a bottle of gum"
alert(str.replace(/o\s/g, "oo "));

</script>

Here, the \s metacharacter matches the space after "yo" and "ho" and replaces with "oo".

You can also use case-insensitive pattern matching - simply add the "i" modifier (for "insensitive") at the end of the pattern. The next example shows you how:

<script language="JavaScript">

// set string
var str = "he He hE HE";

// returns ho ho ho ho
alert(str.replace(/he/gi, "ho"));

</script>

In Splits

The String object also comes with a split() method, which can be used to decompose a single string into separate units on the basis of a particular separator value; these units are then placed into an array for further processing. Consider the following example, which demonstrates:

<script language="Javascript">

// set string
var friends = "Joey, Rachel, Monica, Chandler, Ross, Phoebe";

// split into array using commas
var arr = friends.split(", ");

// iterate through array and print each value
for (x=0; x<arr.length; x++)
{
    alert("Hiya, " + arr[x]);
}

</script>

Up until JavaScript 1.1, you could only use string values as separators. JavaScript 1.2 changed all that - now, you can even split a string on the basis of a regular expression.

To understand this better, consider the following string, which illustrates a common problem - unequal whitespace between separated values:

Neo| Trinity   |Morpheus    |  Smith|  Tank

Here, the | character is used to separate the various names. However, the space between the various | is unequal - which means that before you can use the individual elements of the string, you will need to trim the additional space around them. Splitting by using a regular expression as the separator is an elegant solution to the problem - as you can see from the updated listing below:

<script language="JavaScript">
// define string
var str = "Neo| Trinity   |Morpheus    |  Smith|  Tank";

// define pattern
var pattern = /\s*\|\s*/;

// split the string using the regular expression as the separator
result = str.split(pattern);

// iterate over result array
for(i = 0; i < result.length; i++) {
    alert("Character #" + (i+1) + ": " + result[i]);
}

</script>

The output of the call to split() above will be an array containing the names, without any leading or trailing spaces.

Objects In The Rear-View Mirror

So far, all the examples in this article have piggybacked on the String object to demonstrate the power of the regex implementation in JavaScript. But JavaScript also comes with a core JavaScript object, the RegExp object, whose sole raison d'etre is to match patterns in strings and variables.

This RegExp object comes with three useful methods - take a look:

test() - test a string for a match to a pattern

exec() - returns an array of the matches found in the string, and also permits advanced regex manipulation

compile() - alter the regular expression associated with a RegExp object

Let's look at a simple example:

<script language="JavaScript">

// define str
var str = "The Matrix";

// define RegExp object
var character = new RegExp("tri");

// search for pattern in string
if(character.test(str)) {
alert("User located in The Matrix.");
} else {
alert("Sorry, user is not in The Matrix.");
}

</script>

This is similar to one of the very first examples in this tutorial. However, as you can see, I've adopted a completely different approach here.

The primary difference here lies in my creation of a RegExp object for my regular expression search. This is accomplished with the "new" keyword, followed by a call to the object constructor. By definition, this constructor takes two parameters: the pattern to be searched for, and modifiers if any (I've conveniently skipped these in the example above).

Once the RegExp object has been created, the next step is to use it. Here, I've used the test() method to look for a match to the pattern. By default, this method accepts a string variable as a parameter and compares it against the pattern passed to the RegExp object constructor. If it finds a match, it returns true; if it does not, it returns false. Obviously, this is a more logical implementation than the search() feature of the String object.

One Mississippi, Two Mississippi...

The next method I'll show you is the exec() method. The behavior of this method is similar to that of the String object's match() method. Take a look:

<script language="JavaScript">

// define string
var place = "Mississippi";

// define pattern
var obj = /is./;

// search for match
// place result in aray
result = obj.exec(place);

// display result
if(result != null) {
    alert("Found " + result[0] + " at " + result.index);
}
</script>

The exec() method returns a match to the supplied regular expression, if one exists, as an array; you can access the first element of the array to retrieve the matching substring, and the location of that substring with the index() method.

The main difference between the match() and exec() methods lies in the parameters passed - the former requires a pattern as argument, while the latter requires the string variable to be tested.

However, that's not all. The exec() method has the ability to continue searching within the string for the same pattern without requiring you to use the "g" modifier. Let me tweak the above example to demonstrate this feature:

<script language="JavaScript">

// define string
var place = "Mississippi";

// define pattern
var obj = /is./;

// search for all matches in string
// display result
while((result = obj.exec(place)) != null) {
    alert("Found " + result[0] + " at " + result.index);
}

</script>

So what do we have here? For starters, I have used a "while" loop to call the exec() method repeatedly, until it reaches the end of the string (at which point the object will return null and the loop will terminate). This is possible because every time you call exec(), the RegExp object continue to search from where it left off in the previous iteration.

At least that's the theory - the code above doesn't work as advertised in either Internet Explorer or Netscape Navigator, so you should be careful when using it. Consider the above a purely theoretical example, then...at least until the browser makers fix the bug.

Another interesting point to note in the example above is my definition of the RegExp object. Unlike the previous example, you will notice that I have not used the constructor or the "new" keyword to create the object; instead, I've simply assigned the pattern to a variable. Think of this as a shortcut technique for creating a new RegExp object.

Changing Things Around

You may have noticed from the previous examples that when using a RegExp object, you have to specify the regular expression at the time of constructing the object. So you might be wondering to yourself, what happens if I need to change the pattern at a later time?

Well, the guys at JavaScript HQ have you covered. The compile() method allows a user to update the regular expression used by the RegExp object in its searches. Take a look:

<script language="JavaScript">

// define string
var str = "The Matrix";

// define pattern
var pattern = "trinity"

// define object
var character = new RegExp(pattern);

// look for match
if(character.test(str)) {
alert("Looking for " + pattern + "...User located in The Matrix");
} else {
alert("Looking for " + pattern + "...Sorry, user is not in The Matrix");
}

// change the pattern associated with the RegExp object
var pattern = "tri";
character.compile(pattern);

// look for match and display result
if(character.test(str)) {
alert("Looking for " + pattern + "...User located in The Matrix");
} else {
alert("Looking for " + pattern + "...Sorry, user is not in The Matrix");
}

</script>

Notice the use of the compile() method to dynamically update the pattern associated with the RegExp object.

Working with Forms

Now that you know how it all works, let's look at a practical example of how you can put this knowledge to good use. Consider the following example, which displays an HTML form asking the user for credit card and email information to complete a purchase.

<html>
<head>
<script language="Javascript">
<!-- start hiding -->

// requires regex to be passed as a parameter
function checkField(theForm, theField, theFieldDisplay, objRegex) {

    objField = eval("document." + theForm + "." + theField);

    if(!objRegex.test(objField.value))  {
        alert ("Please enter a valid " + theFieldDisplay + "");
        objField.select();
        objField.focus();
        return (false);
    }

    return (true);
}

// regex for the various form fields

// credit card number
// must contain 20 digits, nothing more, nothing less
objPatCCNum = /^[0-9]{20}$/;

// credit card date of expiry
// must be month between 01 - 12 and year between 2003 to 2010
objPatCCDOE = /^([0][1-9]|[1][1-2])\/20(0[3-9]|10)$/;

// credit card PIN code
// must be numeric
objPatCCPin = /^[0-9]+$/;

// email address
// must be in user@host format
objPatCCEmail = /^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/;

// check the various form fields
function checkForm(theForm)
{

    if(checkField(theForm, "cc_num", "Credit card number", objPatCCNum) && checkField(theForm, "cc_doe", "Date of expiry", objPatCCDOE) && checkField(theForm, "cc_pin", "PIN code", objPatCCPin) && checkField(theForm, "cc_email", "Email address", objPatCCEmail)) {
        return true;
    } else {
        return false;
    }
}
// stop hiding -->
</script>
</head>

<body>

<h2>Credit Card Information</h2>
<form name="frmCCValidation" onSubmit="return checkForm('frmCCValidation');">

Credit card number <br>
<input name="cc_num" type="text">

<p>

Credit card type <br>
<select name="cc_type">
<option value="Visa">Visa</option>
<option value="Mastercard">Mastercard</option>
<option value="AmericanExpress">American Express</option>
</select>

<p>

Credit card expiry date (mm/yyyy) <br>
<input name="cc_doe" type="text">

<p>

PIN code of card billing address <br>
<input name="cc_pin" type="text">

<p>

Email address <br>
<input name="cc_email" type="text">

<p>

<input type="submit" value="Send">

</form>

</body>
</html>

You'll notice, in the example above, that I've used numerous regular expressions to verify that the data being entered into the form by the user is of the correct format. This type of client-side input validation is extremely important on the Web, to ensure that the data you receive is accurate, and in the correct format.

Over And Out

After reading this tutorial, I'm pretty sure you're going to look at JavaScript in a different light. The language you just saw wasn't the one most commonly associated with image swaps and browser detection. Rather, it was a powerful tool to help you execute pattern-matching tasks in the client quickly and efficiently.

I started off with a simple introduction to regular expressions and quickly moved to the search() and replace() methods of the JavaScript String object. These functions take a regular expression as parameter and allow you to carry out smart "search-and-replace" operations on string values. This was followed by an introduction to the hero behind the scenes: the core JavaScript Regexp object. This object comes with a host of methods and properties that allow ordinary programmers to leverage off the power of regular expressions in JavaScript.

To close this article, I developed a simple example that demonstrates the use of complex regular expressions to validate form input - a routine task in all Web-based applications. If you do this often, it makes sense for you to build a good library of regular expressions for common validations (if you already have one, send me some mail and tell me all about it).

Here are some additional URLs to help you understand the concept of regular expressions further:

Stringing Things Along, at http://www.melonfire.com/community/columns/trog/article.php?id=173

Pattern Matching and Regular Expressions, at http://www.webreference.com/js/column5/

Regular Expressions for client-side JavaScript, a free online quick reference by VisiBone at http://www.visibone.com/regular-expressions/

That's all for this article. See you soon!

Note: Examples are illustrative only, and are not meant for a production environment. Melonfire provides no warranties or support for the source code described in this article. YMMV!

This article was first published on18 Dec 2003.