[Tut] Python Regex Sub - Printable Version

[Tut] Python Regex Sub - Printable Version

+- Sick Gaming (https://sickgaming.net)
+-- Forum: Programming (https://sickgaming.net/forum-76.html)
+--- Forum: Python (https://sickgaming.net/forum-83.html)
+--- Thread: [Tut] Python Regex Sub (/thread-93346.html)

[Tut] Python Regex Sub - xSicKxBot - 01-26-2020

Python Regex Sub

<div>Do you want to replace all occurrences of a pattern in a string? You’re in the right place! This article is all about the re.sub(pattern, string) method of Python’s <a rel="noreferrer noopener" target="_blank" href="https://docs.python.org/3/library/re.html">re library</a>. 
Let’s answer the following question:
<h2>How Does re.sub() Work in Python?</h2>
The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl.
Here’s a minimal example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'C++ is the best language. C++ rocks!'
>>> re.sub('C\+\+', 'Python', text) 'Python is the best language. Python rocks!'
>>> </pre>
The text contains two occurrences of the string ‘C++’. You use the re.sub() method to search all of those occurrences. Your goal is to replace all those with the new string ‘Python’ (Python is the best language after all).
Note that you must escape the ‘+’ symbol in ‘C++’ as otherwise it would mean the at-least-one regex. 
You can also see that the sub() method replaces all matched patterns in the string—not only the first one.
But there’s more! Let’s have a look at the formal definition of the sub() method.
Specification
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">re.sub(pattern, repl, string, count=0, flags=0)</pre>
The method has four arguments—two of which are optional.
<ul>
<li>pattern: the regular expression pattern to search for strings you want to replace. </li>
<li>repl: the replacement string or function. If it’s a function, it needs to take one argument (the <a href="https://blog.finxter.com/python-regex-match/">match object</a>) which is passed for each occurrence of the pattern. The return value of the replacement function is a string that replaces the matching substring. </li>
<li>string: the text you want to replace.</li>
<li>count (optional argument): the maximum number of replacements you want to perform. Per default, you use count=0 which reads as replace all occurrences of the pattern. </li>
<li>flags (optional argument): a more advanced modifier that allows you to customize the behavior of the method. Per default, you don’t use any flags. Want to know <a href="https://blog.finxter.com/python-regex-flags/">how to use those flags? Check out this detailed article</a> on the Finxter blog.</li>
</ul>
The initial three arguments are required. The remaining two arguments are optional. 
You’ll learn about those arguments in more detail later. 
Return Value:
A new string where count occurrences of the first substrings that match the pattern are replaced with the string value defined in the repl argument.
<h2>Regex Sub Minimal Example</h2>
Let’s study some more examples—from simple to more complex.
The easiest use is with only three arguments: the pattern ‘sing’, the replacement string ‘program’, and the string you want to modify (<code>text</code> in our example). 
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'Learn to sing because singing is fun.'
>>> re.sub('sing', 'program', text) 'Learn to program because programing is fun.'</pre>
Just ignore the grammar mistake for now. You get the point: we don’t sing, we program.
But what if you want to actually fix this grammar mistake? After all, it’s programming, not programing. In this case, we need to substitute ‘sing’ with ‘program’ in some cases and ‘sing’ with ‘programm’ in other cases. 
You see where this leads us: the sub argument must be a function! So let’s try this:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re def sub(matched): if matched.group(0)=='singing': return 'programming' else: return 'program' text = 'Learn to sing because singing is fun.'
print(re.sub('sing(ing)?', sub, text))
# Learn to program because programming is fun.</pre>
In this example, you first define a substitution function sub. The function takes the matched object as an input and returns a string. If it matches the longer form ‘singing’, it returns ‘programming’. Else it matches the shorter form ‘sing’, so it returns the shorter replacement string ‘program’ instead. 
<h2>How to Use the count Argument of the Regex Sub Method?</h2>
What if you don’t want to substitute all occurrences of a pattern but only a limited number of them? Just use the count argument! Here’s an example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> s = 'xxxxxxhelloxxxxxworld!xxxx'
>>> re.sub('x+', '', s, count=2) 'helloworld!xxxx'
>>> re.sub('x+', '', s, count=3) 'helloworld!'</pre>
In the first substitution operation, you replace only two occurrences of the pattern ‘x+’. In the second, you replace all three.
You can also use positional arguments to save some characters:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('x+', '', s, 3) 'helloworld!'</pre>
But as many coders don’t know about the count argument, you probably should use the keyword argument for readability.
<h2>How to Use the Optional Flag Argument?</h2>
As you’ve seen in the specification, the re.sub() method comes with an optional fourth flag argument:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">re.sub(pattern, repl, string, count=0, flags=0)</pre>
What’s the purpose of the <a href="https://blog.finxter.com/python-regex-flags/">flags argument</a>?
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex). 
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td>Syntax</td>
<td>Meaning</td>
</tr>
<tr>
<td> re.ASCII</td>
<td>If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests. </td>
</tr>
<tr>
<td> re.A </td>
<td>Same as re.ASCII </td>
</tr>
<tr>
<td> re.DEBUG </td>
<td>If you use this flag, Python will print some useful information to the shell that helps you debugging your regex. </td>
</tr>
<tr>
<td> re.IGNORECASE </td>
<td>If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z]. </td>
</tr>
<tr>
<td> re.I </td>
<td>Same as re.IGNORECASE </td>
</tr>
<tr>
<td> re.LOCALE </td>
<td>Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable. </td>
</tr>
<tr>
<td> re.L </td>
<td>Same as re.LOCALE </td>
</tr>
<tr>
<td> re.MULTILINE </td>
<td>This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string. </td>
</tr>
<tr>
<td> re.M </td>
<td>Same as re.MULTILINE </td>
</tr>
<tr>
<td> re.DOTALL </td>
<td>Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character. </td>
</tr>
<tr>
<td> re.S </td>
<td>Same as re.DOTALL </td>
</tr>
<tr>
<td> re.VERBOSE </td>
<td>To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex. </td>
</tr>
<tr>
<td> re.X </td>
<td>Same as re.VERBOSE </td>
</tr>
</tbody>
</table>
</figure>
Here’s how you’d use it in a minimal example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> s = 'xxxiiixxXxxxiiixXXX'
>>> re.sub('x+', '', s) 'iiiXiiiXXX'
>>> re.sub('x+', '', s, flags=re.I) 'iiiiii'</pre>
In the second substitution operation, you ignore the capitalization by using the flag re.I which is short for re.IGNORECASE. That’s why it substitutes even the uppercase ‘X’ characters that now match the regex ‘x+’, too.
<h2>What’s the Difference Between Regex Sub and String Replace? </h2>
In a way, the re.sub() method is the more powerful variant of the <a href="https://blog.finxter.com/python-string-replace/">string.replace() method which is described in detail on this Finxter blog article</a>. 
Why? Because you can replace all occurrences of a regex pattern rather than only all occurrences of a string in another string.
So with re.sub() you can do everything you can do with string.replace() but some things more!
Here’s an example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> 'Python is python is PYTHON'.replace('python', 'fun') 'Python is fun is PYTHON'
>>> re.sub('(Python)|(python)|(PYTHON)', 'fun', 'Python is python is PYTHON') 'fun is fun is fun'</pre>
The string.replace() method only replaces the lowercase word ‘python’ while the re.sub() method replaces all occurrences of uppercase or lowercase variants.
Note, you can accomplish the same thing even easier with the flags argument.
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('python', 'fun', 'Python is python is PYTHON', flags=re.I) 'fun is fun is fun'</pre>
<h2>How to Remove Regex Pattern in Python?</h2>
Nothing simpler than that. Just use the empty string as a replacement string:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('p', '', 'Python is python is PYTHON', flags=re.I) 'ython is ython is YTHON'</pre>
You replace all occurrences of the pattern <code>'p'</code> with the empty string <code>''</code>. In other words, you remove all occurrences of <code>'p'</code>. As you use the <code>flags=re.I</code> argument, you ignore capitalization.
<h2>Related Re Methods</h2>
There are five important regular expression methods which you should master:
<ul>
<li>The re.findall(pattern, string) method returns a list of string matches. Read more in <a href="https://blog.finxter.com/python-re-findall/">our blog tutorial</a>.</li>
<li>The re.search(pattern, string) method returns a match object of the first match. Read more in <a href="https://blog.finxter.com/python-regex-search/">our blog tutorial</a>.</li>
<li>The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in <a href="https://blog.finxter.com/python-regex-match/">our blog tutorial</a>.</li>
<li>The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in <a href="https://blog.finxter.com/python-regex-fullmatch/">our blog tutorial</a>.</li>
<li>The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in <a href="https://blog.finxter.com/python-regex-compile/">our blog tutorial</a>.</li>
<li>The re.split(pattern, string) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in <a href="https://blog.finxter.com/python-regex-split/">our blog tutorial</a>.</li>
</ul>
These five methods are 80% of what you need to know to get started with Python’s regular expression functionality.
<h2>Where to Go From Here?</h2>
You’ve learned the re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl.
Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:
<a href="https://blog.finxter.com/subscribe/">Download 8 Free Python Cheat Sheets now!</a>
</div>

https://www.sickgaming.net/blog/2020/01/25/python-regex-sub/