TOC
Regular Expressions:

Using capturing groups

So far, we have used regular expressions to either check for the occurrence of a search pattern, or to extract an entire string based on this search pattern. However, sometimes you need to only extract a part of the search pattern, and sometimes you need to extract more than one part. For these purposes, we have something really clever called capturing groups.

The technique works by defining capturing groups within the regular expression. A group can be defined by putting a part of the pattern inside a set of parentheses, for instance like this:

[^@]+@(.+)

This will capture the domain part (everything after the @ character) in an e-mail address into a separate group, and the JavaScript interpreter will make sure to make this group available when we do the matching. Let me try to show you an example of it in use:

let testString = "john.doe@gmail.com";
let regex = new RegExp("[^@]+@(.+)");
let result = testString.match(regex);
alert("Mail: " + result[0]);
alert("Domain: " + result[1]);
Mail: john.doe@gmail.com
Domain: gmail.com

So, a bit of explanation is needed here. First of all, the regular expression I use is an EXTREMELY basic e-mail search pattern - it should in no way be used for real code, as it is way too simple. It's only quality is the fact that it's short and will help illustrate this example as well as the next.

Notice that I use the match() function on the String object. As we talked about in the previous article, it will return an array of the matches that could be found. Now pay special attention to the last part of the regular expression: (.+)

The dot operator will match anything, and I have put a set of parentheses around it, to define a capturing group. Because of this, I can access this specific part of the match in the result. Now, the first place in the array (0) is always the entire match, while the following items are the value(s) found for the capturing group(s). So, in this way, I can actually extract the domain part of an e-mail address.

With that in mind, we can of course capture both parts of the e-mail address - simply turn the first part into a capturing group as well, by surrounding it with a set of parentheses:

let testString = "john.doe@gmail.com";
let regex = new RegExp("([^@]+)@(.+)");
let result = testString.match(regex);
alert("Mail: " + result[0]);
alert("User: " + result[1]);
alert("Domain: " + result[2]);
Mail: john.doe@gmail.com
User: john.doe
Domain: gmail.com

Notice how easily we can now get the specific parts we want with the capturing groups, while still also having access to the entire, matched string.

Named groups

If we only have a couple of capturing groups, it's not a big problem to remember which index they can be found in within the resulting array, but sometimes you can have many capturing groups inside a much more complex regular expression. When that is the case, it would be a lot easier to have a label/name for each of the groups. This will also make it a lot easier to understand what the regular expression does when you come back to it later on.

Fortunately for us, regular expressions support named capturing groups, and they can be accessed from JavaScript as well. To define a name for a capturing group, simply add it to the capturing group using the special syntax, like this:

(?<domain>.+)

As you can see, you can create a named capturing group by following the starting parenthesis with a question mark and then the name inside a set of angle brackets. So let's rewrite our example from above to use this technique:

let testString = "john.doe@gmail.com";
let regex = new RegExp("(?<user>[^@]+)@(?<domain>.+)");
let result = testString.match(regex);
alert("Mail: " + result[0]);
alert("User: " + result.groups.user);
alert("Domain: " + result.groups.domain);
Mail: john.doe@gmail.com
User: john.doe
Domain: gmail.com

Notice how we can now access the value of each capturing group, using the special groups property of the result, by the name we have given them. It's much easier, especially when your regular expression becomes more complex.

Summary

Capturing groups are a very powerful part of regular expressions, allowing you to extract multiple parts of a string based on your pattern. By adding names to your capturing groups, you can make your regex more readable and easier to understand the next time you visit your code.

Capturing groups can also be used when doing search/replace operations, which we'll show in the next article.


This article has been fully translated into the following languages: Is your preferred language not on the list? Click here to help us translate this article into your language!