Format

'{0}, {1}, {2}'.format('a', 'b', 'c')
# 'a, b, c'

Regular Expressions

The aim of this chapter of our Python tutorial is to present a detailed led and descriptive introduction into regular expressions. This introduction will explain the theoretical aspects of regular expressions and will show you how to use them in Python scripts.

Regular Expressions are used in programming languages to filter texts or textstrings. It's possible to check, if a text or a string matches a regular expression.

There is an aspect of regular expressions which shouldn't go unmentioned: The syntax of regular expressions is the same for all programming and script languages, e.g. Python, Perl, Java, SED, AWK and even X#.

Functions

match function

This function attempts to match RE pattern to string with optional flags.

re.match(pattern, string, flags=0)

Example

import re

line = "Cats are smarter than dogs"

matched_object = re.match(r'(.*) are (.*?) .*', line, re.M | re.I)

if matched_object:
    print "matched_object.group()  : ", matched_object.group()
    print "matched_object.group(1) : ", matched_object.group(1)
    print "matched_object.group(2) : ", matched_object.group(2)
else:
    print "No match!!"

When the code is executed, it produces following results

matched_object.group()  :  Cats are smarter than dogs
matched_object.group(1) :  Cats
matched_object.group(2) :  smarter

search function

This function searches for first occurrence of RE pattern within stirng with optional flags

re.search(pattern, string, flags=0)

Example

#!/usr/bin/python
import re

line = "Cats are smarter than dogs"

search_object = re.search(r'dogs', line, re.M | re.I)
if search_object:
    print "search --> search_object.group() : ", search_object.group()
else:
    print "Nothing found!!"

When the code is executed, it produces following results

search --> search_object.group() :  dogs

sub function

This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method returns modified string.

re.sub(pattern, repl, string, max=0)

Example

#!/usr/bin/python
import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print "Phone Num : ", num

When the code is executed, it produces following results

Phone Num :  2004-959-559
Phone Num :  2004959559

Tokens Cheatsheet

Character Classes
. any character except newline /go.gle/ google goggle gogle
\w \d \s word, digit, whitespace /\w/ AaYyz09 ?! /\d/ 012345 aZ? /\s/ 0123456789 abcd?/
\W \D \S not word, digit, whitespace /\W/ abcded   1234 ?> /\D/ abc 12345 ?<.   /\S/ abc   123?  <.
[abc] any of a, b or c /analy[sz]e/ analyse analyze analyxe
[^abc] not a, b or c /analy[^sz]e/ analyse analyze analyxe
[a-g] character between a & g /[2-4]/ demo1 demo2 demo3 demo4 demo5
Quantifiers & Alternation
a* a+ a? 0 or more, 1 or more, 0 or 1 /go*gle/ gogle gogle google gooooogle hgle /go+gle/ ggle gogle google gooooogle hgle /go?gle/ ggle gogle google gooooogle hgle
a{5}, a{2,} exactly five, two or more /go{5}gle/ gogle gogle google gooooogle hgle /go{2,}gle/ gogle gogle google gooooogle hgle
a{1,3} between one & three /go{1,3}gle/ gogle gogle google gooogle gooooogle hgle
a+? a{2,}? match as few as possible /a+?/ a aa aaaaaa /a{2,}?/ a aa aaaaaa
ab|cd match ab or cd /demo|example/ demo example example1
Anchors
^abc$ start / end of the string /^abc$/ abc /^abc/ abc abc /abc$/ abc abc
\b \B word, not-word boundary /\bis\b/ This island is beautiful. /\Bcat\B/ cat certificate
Escaped characters
\. \* \\ escaped special characters /\./ username@exampe.com 300.000 USD /\*/ abc@&%$*123 /\\/ abc@&%$\\123
\t \n \r tab, linefeed, carriage return /\t/ abc def /ab\n/ ab
/\r/ abc@&%$\\123
\u00A9 unicode escaped © /\u00A9/ Copyright©2017 - All rights reserved
Groups and Lockaround
(abc) capture group /(demo|example)[0-9]/ demo1example4demo
\1 backreference to group #1 /(abc|def)=\1/ abc=abc def=defabc=def
(?:abc) non-capturing group /(?:abc){3}/ abcabcabc abcabc
(?=abc) positive lookahead /t(?=s)/ tttssstttss
(?!abc) negative lookahead /t(?!s)/ tttssstttss
(?<=abc) positive lookbehind /(?<=foo)bar/ foobar fuubar
(?<!abc) negative lookbehind /(?<!foo)bar/ foobar fuubar

Related Readings