Introduction to Regex - Regular Expression

Regex in programming and Web development

by Tibor Kopca Date: 23-10-2020 website web webdev code


Today we are going to write about Regular Expressions known as regex or shortened  regexp, a very useful concept of using search patterns.
Surely you were in a situation when you need to replace some word for another or character for some else. This and more you can handle with Regular Expressions. Here you can read about everything you need to know how it works, how you can use Regex to help improve your search in the programming environment and web development. Let’s get started.

So what is REGEX?

Regular Expression is a sequence of characters with the ability to search through text, validate it against defined conditions or rules.
That sequence forms patterns which are used to match character combinations in string of text.
So the purpose of it is to do a simple or more complex match of text characters. It allows you to search for specific characters, words, interpunction etc. There are many uses for the search result. You can use Regex in order to do data validation, web scraping or if you want to do advanced find and replace operation, like for example if you want to change certain characters or get only email addresses from the document and much much more.

Regular expressions are used in search engines, and many programming languages have regex capabilities or implementation of its functionality(regex engine) either built-in or through libraries.
We will focus on presenting regular expression in JavaScript, in which it’s an object(class).

How to use Regex?

As we mentioned, Regular expression can be a single character or more complicated combination of characters in pattern.
Let's look at those characters that define the search.
Each character in a pattern is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. Together, metacharacters and literal characters can be used to identify text, pattern matches may vary from being very precise to being very general, controlled by the metacharacters.
For example in the regex 'm.' , m is a literal character that matches all ‘m’, and ‘.’ is a metacharacter that matches every character. Therefore, if we have text containing text “m0, me, mX”, our regex m. matches all three sets.
Dot . is a very general pattern, [a-z] (match all lower case letters from 'a' to 'z') is less general and a is a precise pattern (matches just 'a').

Regex_img

Regex Syntax and Metacharacters

Regular Expression is a string of text, composed of delimiters, pattern and optional modifiers.
/pattern/modifiers;
Such as this example:  /ma-no.org/i is a regular expression, ma-no.org is a pattern, i is a modifier (modifies the search to be case-insensitive).


The delimiter can be any character that is not a letter, number, backslash or space, that’s why most common is forward slash “/”, but when you have to search /, sometimes you can use other delimiters like # or ~.


Pattern is what is being searched for and the modifier sets where the search is happening or makes it case sensitive or insensitive.
We can construct complex expressions, combine them similar to arithmetics.


The idea is to make a small pattern of characters stand for a large number of possible strings, rather than compiling a large list of all the literal possibilities.
With this done, let's go to the next chapter.

Set of flags

What are those Regex flags? Those are modifiers behind ending delimiter. We can change how the expression is interpreted with them.

  g     Global, perform a global match(continue after the first match through all given string).

  i     Makes the whole pattern case insensitive. For example, /AbC/i would match aBc, ABC, abc, etc.

  m     Multiline, beginning and end (^and $) will match only for end of line, instead of the whole string.

  u     unicode, with this it is possible to extend unicode escapes.

  y     sticky, the pattern will only match from its lastIndex position and ignores the global(g) flag.

  s     dotall, period or dot(.) will match any character, including newline.

Regex for simple matching

Before we continue, not every regex will function in every programming language, you need to check it for yourself.
Now here are metacharacters and their definition with examples for JavaScript.

Period or Dot .

Wildcard, anything, except new line.
For example /a.b/ matches “a3b” but also “acb”, etc
Within [ ] the period or dot is literal.

Escape character \

Is used when you want to match special characters like ‘+;  or ‘\’ or period.
Example
/.\./     finds anything that has a period behind the first searched character as the first period is a wildcard.
/\./      searches for a normal period, instead of wildcard period.
/\(?a/   will find character a with optional special character “(“ which before “a”.

Character classes matches any one of a set of characters.

/w   matches any word
/W   matches anything that IS NOT a word
/s   matches any whitespace characters, such as space and tab
/S   matches anything that IS NOT a space
/d   matches any digits (numbers)
/D   matches any character that IS NOT a digit
/b   matches any word boundary (this include spaces, dashes, commas, semicolons)

Pipe character(vertical bar) |  

Is used like OR in programming, matches any one character separated by it.
For example /m|mouse/ finds text string that are either letter “m” OR the letters “mouse”

Exclamation mark !

Negates.


Caret symbol ^

Beginning of the line or text string in which were searching


Dollar symbol $

End of a statement (text string in which were searching)

Quantifiers

These symbols act as repeaters and the preceding characters are to be used for more than just one time.


Question mark ?

Optional character before question mark, like ‘-?’ dash would be optional
For example /ab?c/ will match ‘ab’ but also ‘abc’

Asterix *

Zero or more occurrences of the preceding character .
Examples :  
/a.*b/ matches any string that contains ‘a’ and then the ‘b’ later, as there might be zero or more occurrences of period - as wildcard character.
/ab*c/ matches ‘ac’, ‘abc’, ‘abbbc’, etc.
/[xyz]*/ matches ‘’, ‘x’, ‘y’, ‘z’, ‘zx’, ‘zyx’, ‘xyzzy’, and so on.
/(ab)*/ matches ‘’, ‘ab’, ‘abab’, ‘ababab’, and so on.


Plus symbol +

Indicates one or more occurences of the preceding character
Example /[a]+/ will match both ‘a’ in word ‘Palma’.


Curly braces { }

Delimits a minimum and  maximum number for characters in search/pattern, affects character before {} in search patterns, like /o{2,3}/ finds two oo in “school”, or /(c|r|a){2,3}/ finds “rat”, “cat”.
{min,} → preceding character may occur min or more times, example /{3}a/ which matches “aaa”.
{min,max} → preceding character may occur at least min times, but not more than max times.

Character grouping


Brackets [ ]

Inside we put characters we want to match in a search. By using lower or uppercase characters we can specify a range of matches, the forms can be mixed like [abcX-Z] .
/[cmt]at/g will find “cat, mat” but not “that”.
[a-z]   range of characters, in this case lowercase.
[^ ]   matches a single character that is not contained within the brackets for example /[^a-z]/ matches any single character that is not a lowercase letter from a to z.


Pattern group ()

The string inside parentheses can be recalled later. A marked subexpression is also called a block or capturing group.
Example :
/(p|P)/   searches for lowercase p OR uppercase P.
(?<name>) - naming the group for later use.

Look aheads and look behinds


With these patterns we can find characters before or behind something, just dont freak out because of its naming.
/(?<=)./   Positive lookbehind. Any character that is preceding the characters we define behind it. Example /(?<=[Tt]he)./g will find any character behind the characters The or the. /(?<=[h])a/g will find any character behind the word h, for example a in word That.
/(?<!)./   Negative lookbehind. Anything that doesn't have characters we define behind it.

/.(?=)/   Positive lookahead. Any character that is before the characters defined. /.(?=[tT])/g will find all the characters before the character T or t.
/.(?!)/   Negative lookahead. Find any characters that are not before characters defined.  /.(?![ ])/g will find all the characters that are not characters before space.

Lets go deeper into Regex


Now we can use what we learned to check for the phone number in a text.
Let us show you some advanced examples.

^[\t]+|[\t]+$   Matches excess whitespace at the beginning or end of a line
[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)? Matches any numeral
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$ Matches email address
^((\+\d{1,3}(-| )?\(?d\)?(-| )?\d{1,5})|(\(?\d{2,6}\)?))(-| )?(\d{3,4})(-| )?(\d{4})(( x| ext)\d{1,5}){0,1}$   Matches 9 digit phone number in EU, with country prefix in format 0034123456789.

^((\+1)[ -])?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})$ Matches a US phone number with international prefix +1.

Regex and JavaScript

There are two string methods.
search() uses expression for a match and returns the position of the match.(note case insensitive modifier).

var string = "Ma-no.org!";
var result = string.search(/NO/i); //result = 3


replace() returns a modified string where the pattern is replaced.(note positive lookahead)

var string2 = "Visit Palma!";
var result2 = string2.replace(/(?<=[ ])[a-zA-Z]{1,5}/g, "Pizzeria");  
//result2 = "Visit Pizzeria!"

There are two ways to construct Regex as follows:

let myRegex = /ab+c/;
let myRegexObject = new RegExp('ab+c');

In JavaScript, regular expressions are objects and are often used with the two predefined object methods.
test() searches a string for a pattern, and returns true or false

<p id="link">Ma-no.org</p>
<script>
inputOutput = document.getElementById("demo").innerHTML;
document.getElementById("demo").innerHTML = /e/.test(inputOutput);  //true
</script>

exec() method searches a string for a specified pattern, and returns the found text as an object, if not found, returns null.

var obj = /M/.exec(inputOutput);
document.getElementById("demo").innerHTML = "Found " + obj[0] + " in position " + obj.index + " in the text: " + obj.input;
//Found M in position 0 in the text: Ma-no.org

match() Returns an array containing all of the matches, including capturing groups, or null if no match is found.

const paragraph = 'Ma-No.org';
const regex = /[A-Z]/g;
document.getElementById("demo").innerHTML = paragraph.match(regex); //M, N

matchAll() Returns an iterator containing all of the matches, including capturing groups.
split() Uses a regular expression or a fixed string to break a string into an array of substrings.

There are also quite useful properties like this one.

var pattern = /Ma-no.org/g;
document.getElementById("demo").innerHTML = pattern.source; //Ma-no.org

Regex_ma-no.org

Conclusion


Regexes are useful in a wide variety of text processing tasks, and more generally string processing.
Finally there are web apps or tool website playgrounds, great for practicing and testing.


We can only recommend these sites:
https://regex101.com/
https://regexr.com/
Javascript Regex cheatsheet
And the last one we recommend for checking browser compatibility.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions.

Image by CyberHades.

 
by Tibor Kopca Date: 23-10-2020 website web webdev code hits : 1008  
 
 
 
 

Related Posts

The easiest way to align items using flexbox

With the release of flexbox in CSS, it has become an essential tool when placing elements next to each other, since, by default, the children of a display: flexare stacked…

Dark Mode on website using CSS and JavaScript

In today’s article we are going to learn how to build pretty much standard these days on the web pages and that is the alternative color mode and switching between…

JavaScript: Spread and Rest operators

In today’s article we are going to talk about one of the features of the ES6 version(ECMAScript 2015) of JavaScript which is Spread operator as well as Rest operator. These features…

How To Add Filter Effects to Images with CSS

To achieve interesting effects on your images, learn about the 'filter' and 'Backdrop-Filter' properties of CSS. CSS filters are a very attractive feature of CSS that allows you to apply certain…

Why You Should Hire Node.js Developer for Your Backend Development

When developers are building a new website, they mainly focus on both frontend and backend development. The frontend code helps create the interfaces through which the app interacts with the…

HTTP Cookies: how they work and how to use them

Today we are going to write about the way to store data in a browser, why websites use cookies and how they work in detail. Continue reading to find out how…

How to write our own Privacy Policy

In this article we will talk about Privacy Policy statements, how you can write one and implement it on your page. Why did it pop up? These days when we browse on…

How to securely access the Dark Web in 15 steps. Second part

Let's continue with the 2nd part of our article in which we try to give you some advice on how to safely and securely explore the dark web. Let's restart from…

How to securely access the Dark Web in 15 steps. First part

The dark web can be a pretty dangerous place if you don't take the right precautions. You can stay relatively safe with a good antivirus and a decent VPN. However,…

How to Browse the Internet Anonymously: 6 tips

Most of the actions you take online are not as private as you might imagine. Nowadays, countless people and groups try to follow our online behaviour as closely as possible. Our…

5 Remote Careers You Can Start Online in 2020

In 2020, life has moved indoors. School, shopping, entertainment, and work have all moved online to keep up with the fight against COVID-19. And with it came an enormous demand…

The concept of Model-View-Controller (MVC) explained

In software engineering, we use design patterns as reusable solutions to a commonly occurring problem, a pattern is like a template for how to solve a problem. Model-View-Controller (MVC)  is a…

We use our own and third-party cookies to improve our services, compile statistical information and analyze your browsing habits. This allows us to personalize the content we offer and to show you advertisements related to your preferences. By clicking "Accept all" you agree to the storage of cookies on your device to improve website navigation, analyse traffic and assist our marketing activities. You can also select "System Cookies Only" to accept only the cookies required for the website to function, or you can select the cookies you wish to activate by clicking on "settings".

Accept All Only sistem cookies Configuration