Basic Regex...

Krishnakanth G
4 min readDec 8, 2022

--

Hi, hope everyone is fine, it's been a long time. Today I have an interesting topic i.e. REGEX.

https://d2h1bfu6zrdxog.cloudfront.net/wp-content/uploads/2022/04/coderpad-regex-the-complete-guide.jpg
https://d2h1bfu6zrdxog.cloudfront.net/wp-content/uploads/2022/04/coderpad-regex-the-complete-guide.jpg

A string of characters that creates a search pattern is known as a regex or regular expression. Regular expressions are a programming language used to parse and manipulate text. They are frequently used to perform complex search-and-replace operations and to validate text data.

Regular expressions are now included in most programming languages, as well as many scripting languages, applications, and command-line tools.

In this article, I will explain regex using python. No need to worry about language because the syntax of the regex is the same across languages (libraries and functions will vary). First, let’s learn the basics of regular expressions.

Basic regex characters:

Special Regex Characters: These characters have special meanings in regex: ., +, *, ?, ^, $, (, ), [, ], {, }, |, \.

Character: All characters, except those with special meaning in regex, match themselves.

Escape Sequences (\): To match a character having a special meaning in regex, you need to use an escape sequence prefix with a backslash (\). E.g., regex \+ matches “+”.

Strings: Strings can be matched by combining a sequence of characters. E.g., the regex Krishna matches “Krishna”.

OR (|): To use multiple regexes in the same regex we can use OR. E.g., the regex one|1 accepts strings “one” or “1”.

Bracket List ([]):

  • […] -> Accept any of the characters enclosed by the square bracket, for example, [abcd] matches “a”, “b”, “c”, or “d”.
  • [.-.] -> Accept any of the characters in the range enclosed by a square bracket. E.g., [A-Za-z] matches any uppercase or lowercase letters.
  • [^…] -> Not one of the characters from the square bracket. E.g., [0-9] matches any non-digit.

Repetition Operators: helps in finding repetitions

  • +: one or more. E.g., [0–9]+ matches one or more digits.
  • *: zero or more. E.g., [0–9]* matches zero or more digits.
  • ?: zero or one. E.g., [18]? matches an optional “1”, “8”, or an empty string.
  • {m,n}: m to n (both inclusive)
  • {m}: exactly m times
  • {m,}: m or more (m+)

Metacharacters: matches a character

  • . (dot): any character except for the new line.
  • \d: any digit character.
  • \D: any non-digit character.
  • \w: anyone word character
  • \W: anyone non-word character.
  • \s: one space character.
  • \S: any non-space character.

Position Anchors: Match positions of the text such as start-of-line, end-of-line, start-of-word, and end-of-word.

  • ^: start-of-line
  • $: end-of-line.
  • \b: boundary of the word, i.e., start-of-word or end-of-word.
  • \B: Inverse of \b, i.e., non-start-of-word or non-end-of-word.
  • \<, \>: start-of-word and end-of-word respectively, like \b.

Parenthesized Back-References:

  • Use parentheses ( ) to create a backreference.
  • Use $1, $2, … (Java, Perl, JavaScript) or \1, \2, … (Python) to retrieve the back references in sequential order.

Examples:

Extract the age from the text

Text: I am 23 years old
Regex: \d+ (The regex says any digit with repetition)
Python code: re.findall(‘\d+’,‘i am 23 years old’)
Python output: [‘23’]

Extract the text between quotes

Text: “Well, the door is already open,” said the boy
Regex: ‘‘.* ’’(The regex says anything between the quotes)
Python code: re.findall(‘‘‘.*,’’’ ‘‘‘Well, the door is already open,” said the boy’)
Python output: [‘“Well, the door is already open,”’]

Till now we saw simple regex but we can also write a complex regex to find the complex patterns from a text.

Extract mail id from text

Text: ‘my mail is something@noth-ing.edu.in
Regex: ([a-zA-Z0–9+._-]+@[a-zA-Z0–9._-]+\.[a-zA-Z0–9_-]+)
(The regex says it can start with anything from [a-z, A-Z,0-9,+,-,.,_] and all these characters can repeat then ‘‘@’’ will appear again the previous pattern appears followed by ‘‘. ’’ then it ends with [a-z, A-Z,0–9,+,-,_]

Python code: re.findall(‘([a-zA-Z0–9+._-]+@[a-zA-Z0–9._-]+\.[a-zA-Z0–9_-]+)’,‘my mail is something@noth-ing.edu.in’)
Python output: [‘something@noth-ing.edu.in’]

Conclusion:

Therefore, Regular expressions are a powerful and versatile tool for text manipulation and pattern matching in Python. They can be used to search, extract, and replace text based on complex patterns, and provide a concise and efficient way to work with text data.

Happy learning …. !!!!!

References:

--

--