Regular expressions: Greedy regex versus lazy regex

Here’s a typical regex scenario: You’ve got a string in which you need to find/capture the HTML tags from. Let’s say our string is:

[code]
This is a <em>first</em> test
[/code]

Typically, you’d write a regular expression to capture the tag by writing this:

[code]
var re = new RegExp("<(.+)>", "");
[/code]

Unexpectedly, however, the result you’re going to get matched back is:

[code]
"em>first</em"
[/code]

The reason for this is explained on the Regex Tutorial website:

The first token in the regex is . You should see the problem by now. The dot matches the >, and the engine continues repeating the dot. The dot will match all remaining characters in the string. The dot fails when the engine has reached the void after the end of the string. Only at this point does the regex engine continue with the next token: >.

What we need to do instead is force the dot character to be lazy by adding a question mark after the plus sign (or a star, or numbers in curly braces):

[code]
var re = new RegExp("<(.+?)>", "");
[/code]

This time, we’ll get back:

[code]
"em"
[/code]

Reference: Regex Tutorial –┬áRepetition with Star and Plus

Leave a Reply

Your email address will not be published. Required fields are marked *