Introduction to REGEX - Regular Expression

Regex in programming and Web development

by Date: 23-10-2020 website web webdev code

Today we are going to write about Regular Expressions known as regex or shortened  regexp, a very useful concept of using search patterns.
Surely you were in a situation when you need to replace some word for another or character for some else. This and more you can handle with Regular Expressions. Here you can read about everything you need to know how it works, how you can use Regex to help improve your search in the programming environment and web development. Let’s get started.

So what is REGEX?

Regular Expression is a sequence of characters with the ability to search through text, validate it against defined conditions or rules.
That sequence forms patterns which are used to match character combinations in string of text.
So the purpose of it is to do a simple or more complex match of text characters. It allows you to search for specific characters, words, interpunction etc. There are many uses for the search result. You can use Regex in order to do data validation, web scraping or if you want to do advanced find and replace operation, like for example if you want to change certain characters or get only email addresses from the document and much much more.

Regular expressions are used in search engines, and many programming languages have regex capabilities or implementation of its functionality(regex engine) either built-in or through libraries.
We will focus on presenting regular expression in JavaScript, in which it’s an object(class).

How to use Regex?

As we mentioned, Regular expression can be a single character or more complicated combination of characters in pattern.
Let's look at those characters that define the search.
Each character in a pattern is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. Together, metacharacters and literal characters can be used to identify text, pattern matches may vary from being very precise to being very general, controlled by the metacharacters.
For example in the regex 'm.' , m is a literal character that matches all ‘m’, and ‘.’ is a metacharacter that matches every character. Therefore, if we have text containing text “m0, me, mX”, our regex m. matches all three sets.
Dot . is a very general pattern, [a-z] (match all lower case letters from 'a' to 'z') is less general and a is a precise pattern (matches just 'a').

Regex_img

Regex Syntax and Metacharacters

Regular Expression is a string of text, composed of delimiters, pattern and optional modifiers.
/pattern/modifiers;
Such as this example:  /ma-no.org/i is a regular expression, ma-no.org is a pattern, i is a modifier (modifies the search to be case-insensitive).


The delimiter can be any character that is not a letter, number, backslash or space, that’s why most common is forward slash “/”, but when you have to search /, sometimes you can use other delimiters like # or ~.


Pattern is what is being searched for and the modifier sets where the search is happening or makes it case sensitive or insensitive.
We can construct complex expressions, combine them similar to arithmetics.


The idea is to make a small pattern of characters stand for a large number of possible strings, rather than compiling a large list of all the literal possibilities.
With this done, let's go to the next chapter.

Set of flags

What are those Regex flags? Those are modifiers behind ending delimiter. We can change how the expression is interpreted with them.

  g     Global, perform a global match(continue after the first match through all given string).

  i     Makes the whole pattern case insensitive. For example, /AbC/i would match aBc, ABC, abc, etc.

  m     Multiline, beginning and end (^and $) will match only for end of line, instead of the whole string.

  u     unicode, with this it is possible to extend unicode escapes.

  y     sticky, the pattern will only match from its lastIndex position and ignores the global(g) flag.

  s     dotall, period or dot(.) will match any character, including newline.

Regex for simple matching

Before we continue, not every regex will function in every programming language, you need to check it for yourself.
Now here are metacharacters and their definition with examples for JavaScript.

Period or Dot .

Wildcard, anything, except new line.
For example /a.b/ matches “a3b” but also “acb”, etc
Within [ ] the period or dot is literal.

Escape character \

Is used when you want to match special characters like ‘+;  or ‘\’ or period.
Example
/.\./     finds anything that has a period behind the first searched character as the first period is a wildcard.
/\./      searches for a normal period, instead of wildcard period.
/\(?a/   will find character a with optional special character “(“ which before “a”.

Character classes matches any one of a set of characters.

/w   matches any word
/W   matches anything that IS NOT a word
/s   matches any whitespace characters, such as space and tab
/S   matches anything that IS NOT a space
/d   matches any digits (numbers)
/D   matches any character that IS NOT a digit
/b   matches any word boundary (this include spaces, dashes, commas, semicolons)

Pipe character(vertical bar) |  

Is used like OR in programming, matches any one character separated by it.
For example /m|mouse/ finds text string that are either letter “m” OR the letters “mouse”

Exclamation mark !

Negates.


Caret symbol ^

Beginning of the line or text string in which were searching


Dollar symbol $

End of a statement (text string in which were searching)

Quantifiers

These symbols act as repeaters and the preceding characters are to be used for more than just one time.


Question mark ?

Optional character before question mark, like ‘-?’ dash would be optional
For example /ab?c/ will match ‘ab’ but also ‘abc’

Asterix *

Zero or more occurrences of the preceding character .
Examples :  
/a.*b/ matches any string that contains ‘a’ and then the ‘b’ later, as there might be zero or more occurrences of period - as wildcard character.
/ab*c/ matches ‘ac’, ‘abc’, ‘abbbc’, etc.
/[xyz]*/ matches ‘’, ‘x’, ‘y’, ‘z’, ‘zx’, ‘zyx’, ‘xyzzy’, and so on.
/(ab)*/ matches ‘’, ‘ab’, ‘abab’, ‘ababab’, and so on.


Plus symbol +

Indicates one or more occurences of the preceding character
Example /[a]+/ will match both ‘a’ in word ‘Palma’.


Curly braces { }

Delimits a minimum and  maximum number for characters in search/pattern, affects character before {} in search patterns, like /o{2,3}/ finds two oo in “school”, or /(c|r|a){2,3}/ finds “rat”, “cat”.
{min,} → preceding character may occur min or more times, example /{3}a/ which matches “aaa”.
{min,max} → preceding character may occur at least min times, but not more than max times.

Character grouping


Brackets [ ]

Inside we put characters we want to match in a search. By using lower or uppercase characters we can specify a range of matches, the forms can be mixed like [abcX-Z] .
/[cmt]at/g will find “cat, mat” but not “that”.
[a-z]   range of characters, in this case lowercase.
[^ ]   matches a single character that is not contained within the brackets for example /[^a-z]/ matches any single character that is not a lowercase letter from a to z.


Pattern group ()

The string inside parentheses can be recalled later. A marked subexpression is also called a block or capturing group.
Example :
/(p|P)/   searches for lowercase p OR uppercase P.
(?<name>) - naming the group for later use.

Look aheads and look behinds


With these patterns we can find characters before or behind something, just dont freak out because of its naming.
/(?<=)./   Positive lookbehind. Any character that is preceding the characters we define behind it. Example /(?<=[Tt]he)./g will find any character behind the characters The or the. /(?<=[h])a/g will find any character behind the word h, for example a in word That.
/(?<!)./   Negative lookbehind. Anything that doesn't have characters we define behind it.

/.(?=)/   Positive lookahead. Any character that is before the characters defined. /.(?=[tT])/g will find all the characters before the character T or t.
/.(?!)/   Negative lookahead. Find any characters that are not before characters defined.  /.(?![ ])/g will find all the characters that are not characters before space.

Lets go deeper into Regex


Now we can use what we learned to check for the phone number in a text.
Let us show you some advanced examples.

^[\t]+|[\t]+$   Matches excess whitespace at the beginning or end of a line
[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)? Matches any numeral
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$ Matches email address
^((\+\d{1,3}(-| )?\(?d\)?(-| )?\d{1,5})|(\(?\d{2,6}\)?))(-| )?(\d{3,4})(-| )?(\d{4})(( x| ext)\d{1,5}){0,1}$   Matches 9 digit phone number in EU, with country prefix in format 0034123456789.

^((\+1)[ -])?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})$ Matches a US phone number with international prefix +1.

Regex and JavaScript

There are two string methods.
search() uses expression for a match and returns the position of the match.(note case insensitive modifier).

var string = "Ma-no.org!";
var result = string.search(/NO/i); //result = 3


replace() returns a modified string where the pattern is replaced.(note positive lookahead)

var string2 = "Visit Palma!";
var result2 = string2.replace(/(?<=[ ])[a-zA-Z]{1,5}/g, "Pizzeria");  
//result2 = "Visit Pizzeria!"

There are two ways to construct Regex as follows:

let myRegex = /ab+c/;
let myRegexObject = new RegExp('ab+c');

In JavaScript, regular expressions are objects and are often used with the two predefined object methods.
test() searches a string for a pattern, and returns true or false

<p id="link">Ma-no.org</p>
<script>
inputOutput = document.getElementById("demo").innerHTML;
document.getElementById("demo").innerHTML = /e/.test(inputOutput);  //true
</script>

exec() method searches a string for a specified pattern, and returns the found text as an object, if not found, returns null.

var obj = /M/.exec(inputOutput);
document.getElementById("demo").innerHTML = "Found " + obj[0] + " in position " + obj.index + " in the text: " + obj.input;
//Found M in position 0 in the text: Ma-no.org

match() Returns an array containing all of the matches, including capturing groups, or null if no match is found.

const paragraph = 'Ma-No.org';
const regex = /[A-Z]/g;
document.getElementById("demo").innerHTML = paragraph.match(regex); //M, N

matchAll() Returns an iterator containing all of the matches, including capturing groups.
split() Uses a regular expression or a fixed string to break a string into an array of substrings.

There are also quite useful properties like this one.

var pattern = /Ma-no.org/g;
document.getElementById("demo").innerHTML = pattern.source; //Ma-no.org

Regex_ma-no.org

Conclusion


Regexes are useful in a wide variety of text processing tasks, and more generally string processing.
Finally there are web apps or tool website playgrounds, great for practicing and testing.


We can only recommend these sites:
https://regex101.com/
https://regexr.com/
Javascript Regex cheatsheet
And the last one we recommend for checking browser compatibility.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions.

Image by CyberHades.

 
by Date: 23-10-2020 website web webdev code hits : 4475  
 
 
 
 

Related Posts

How to make your own custom cursor for your website

When I started browsing different and original websites to learn from them, one of the first things that caught my attention was that some of them had their own cursors,…

8 benefits of having a website for your business

At this moment, the Internet is a phenomenon that is sweeping the world. It has been able to interconnect millions of users all over the planet. People have made the…

Open source web design tools alternatives

There are many prototyping tools, user interface design tools or vector graphics applications. But most of them are paid or closed source. So here I will show you several open…

How to create a .onion domain for your website

The dark web, a hidden network accessed through the Tor browser, offers enhanced privacy and anonymity for websites. To establish a presence on the dark web, you can create a…

How to access webcam and grab an image using HTML5 and Javascript

We often use webcams to broadcast video in real time via our computer. This broadcast can be viewed, saved and even shared via the Internet. As a rule, we need…

How to Send Email from an HTML Contact Form

In today’s article we will write about how to make a working form that upon hitting that submit button will be functional and send the email (to you as a…

How to multiply matrices in JavaScript

It may seem strange to want to know how to multiply matrices in JavaScript. But we will see some examples where it is useful to know how to perform this…

JavaScript Formatting Date Tips

Something that seems simple as the treatment of dates can become complex if we don't take into account how to treat them when presenting them to the user. That is…

Top tools for UX design and research

This article is a compilation of the "ux tools" I have tested in recent years. I've separated the tools by categories, although I recommend you to take a look at all…

How to make a multilingual website without redirect

Today, we're going to talk about how to implement a simple language selector on the basic static website, without the need of any backend or database calls or redirection to…

Starting with Bootstrap-Vue step by step

Today we will show you how to use BootstrapVue, describe the installation process and show basic functionality. The project’s based on the world's most popular CSS framework - Bootstrap, for building…

Bootstrap 5 beta2. What offers?

Since the release of the Bootstrap 4 is three years, in this article we will present what is new in the world’s most popular framework for building responsive, mobile-first sites.…

Clicky