TOC

This article is currently in the process of being translated into Portuguese (~8% done).

Regular Expressions:

Search/replace with Regular expressions

Nos últimos artigos, discutimos todas as excelentes possibilidades para fazer pesquisa/combinar operações com expressões comuns. Isto pode ser extremamente útil por si só, mas ainda não discutimos uma das mais poderosas coisas que se pode fazer com estas expressões regulares: Pesquisar/substituir operações. Então, em vez de procurar um string dentro de outro string, podemos substituir o string correspondido por outro.

replace() and replaceAll()

To do replace operations on a string, we will use either replace() or replaceAll(), which are methods found on the built-in String object. As we talked about previously, these methods can take a simple string as arguments, to do basic search/replace operations, or they can take a regular expression, to do a regex based search/replace operation.

As mentioned previously, the replace() method is case-sensitive when using simple strings as arguments, so to get us started using the regex version of this method, let me show you how we can make it case-insensitive with a regular expression:

let s = "Hello, wOrLd - what a crazy WoRlD indeed!";
let regex = new RegExp("world", "i");

alert(s.replace(regex, "universe"));
// Result: Hello, universe - what a crazy WoRlD indeed!

Notaste como utilizámos um exemplo tonto para testar a palavra "world", e no entanto conseguimos substituir esta com "universo", ignorando o significado ao utilizar uma expressão comum com bandeira i (ignore case).

Also notice that only the first occurrence is replaced. That's the default for the replace() method - if you want to replace all occurrences, you need to specify the g global flag. You can then call the replace() or the replaceAll() method - it won't really make a difference, because both methods require the global flag specified to replace more than one occurrence. Here's the modified example:

let s = "Hello, wOrLd - what a crazy WoRlD indeed!";
let regex = new RegExp("world", "ig");

alert(s.replaceAll(regex, "universe"));
// Result: Hello, universe - what a crazy universe indeed!

Using capture groups

In a previous article, we saw the power of using capture groups in regular expressions when doing string matching. They can be used when doing search/replace operations as well, which can be extremely useful in a lot of situations.

As an example, let's create a small piece of code which will add emphasis to numbers in a string. So, whenever a number is encountered, we replace it with a string containing the number surrounded with HTML tags. Here's how easy we can accomplish that:

let s = "42 cats, 17 dogs and 11 rabbits";
let regex = new RegExp("([0-9]+)", "ig");

alert(s.replaceAll(regex, "<b>$1</b>"));
// Result: 
// <b>42</b> cats, <b>17</b> dogs and <b>11</b> rabbits

You will notice that I use a special notation in the replacement string. The $1 simply specifies that I want to use the value of capture group number 1 here. Since we have only specified one capture group in the regex, this will be the number matched in the string.

Multiple capture groups

Another very common example used to demonstrate search/replace with capture groups is the reversal of the first and last name, like "John Doe". Most people in the western world will write their first name first and then their last name, but sometimes its more practical to show the last name first, e.g. when listing authors of books etc. The format for this is typically "Doe, John", and we can write a bit of code utilizing a simple regular expression to accomplish this:

let authors = 
`William Shakespeare
Charles Dickens
Agatha Christie`;

let regex = /^(\w+) (\w+)$/img;
alert(authors.replace(regex, "$2, $1"));
/* Result:
Shakespeare, William
Dickens, Charles
Christie, Agatha
*/ 

Allow me to quickly go through this example. First, we have a string containing several author names. We then define a regular expression which will match two words per line and put them in separate capture groups (as denoted by the surrounding parentheses). When we do the replace operation, we simply specify that we want the value of capture group 2 first (the last name) followed by a comma and then the first name (capture group 1).

The result is a list of names where first and last name has been reversed, as promised.

You may notice that I have added an extra flag to the regex: The m (multiline) flag. It allows us to use the line anchor operators of regex to match the start (using the ^ operator) and the end (using the $ operator) to match the start and end of each line. Again, this is not a regex tutorial, so for more information about the regex part of this, please consult another tutorial - I just wanted to mention it for now.

Named capture groups

We can of course use named capture groups in search/replace operations as well. Let's try rewriting the previous example to illustrate that:

let authors = 
`William Shakespeare
Charles Dickens
Agatha Christie`;

let regex = /^(?<firstName>\w+) (?<lastName>\w+)$/img;
alert(authors.replace(regex, "$<lastName>, $<firstName>"));
/* Result:
Shakespeare, William
Dickens, Charles
Christie, Agatha
*/ 

As you can see, instead of using numbered capture groups ($1, $2 and so on) we can refer to the capture groups using the names we have given them in the regular expression, surrounded by a set of angle brackets. This will often make it a lot easier to read and understand the code later on.

Replacer functions

Using regular expressions for search/replace operations can be very powerful, especially when you use capture groups, as you can see from the examples above. However, for the ultimate flexibility, JavaScript offers an even more powerful feature: The ability to use a function to generate the replacement value.

So, instead of specifying a static string as the replacement, you specify a function, which will be called for each match. With that in mind, let's try changing the example above to use this technique, to solve a small problem I just introduced: All the names have been entered by someone a bit confused about the Shift-key of the keyboard, resulting in some pretty strange casing-issues in the list of names:

let authors = 
`wiLLiam shakeSPEARE
charleS DIckenS
agATha christiE`;

To fix this problem, I have written a function called FormalName() which will first fix the casing issue, and then return the name parts in the reverse order, just like we did in the examples from earlier in this article:

function FormalName(match, firstName, lastName)
{
	firstName = firstName.charAt(0).toUpperCase() + firstName.slice(1).toLowerCase();
	lastName = lastName.charAt(0).toUpperCase() + lastName.slice(1).toLowerCase();
	return lastName + ", " + firstName;
}

let authors = 
`wiLLiam shakeSPEARE
charleS DIckenS
agATha christiE`;

let regex = /^(?<firstName>\w+) (?<lastName>\w+)$/img;
alert(authors.replace(regex, FormalName));
/* Result:
Shakespeare, William
Dickens, Charles
Christie, Agatha
*/ 

First off, we have the FormalName() function. You will see that it currently accepts three parameters: The match, which is simply the matched string, and then two parameters called firstName and lastName. JavaScript will automatically fill these parameters when calling the function, and we can even use the actual names of the capture groups!

Inside the FormalName() function, we simply apply proper casing to the name parts: First character is forced into uppercase, while the remaining characters are forced into lowercase. Then we return both parts in reverse order, separated by a comma.

When doing the actual replacement, in the last line of the example, we simply specify the name of the function that should be called (FormalName) - JavaScript will take care of the rest for us. The replacer function can receive even more parameters, if you need them. For the full list, please see the specification.

Summary

Doing search/replace operations with the help of regular expressions is a very powerful tool which you will likely need at some point. In this article, we have seen several examples of how they work, and how you can use capture groups and replacer functions to make this awesome tool even more flexible.


This article has been fully translated into the following languages: Is your preferred language not on the list? Click here to help us translate this article into your language!