Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($)

#1
Python Regex – How to Match the Start of Line (^) and End of Line ($)

<div><p>This article is all about the <strong>start of line ^ and end of line $ regular expressions in Python’s <a rel="noreferrer noopener" target="_blank" href="https://docs.python.org/3/library/re.html">re library</a>. </strong>These two regexes are fundamental to all regular expressions—even outside the Python world. So invest 5 minutes now and master them once and for all!</p>
<h2>Python Re Start-of-String (^) Regex</h2>
<p>You can use the caret operator ^ to match the beginning of the string. For example, this is useful if you want to ensure that a pattern appears at the beginning of a string. Here’s an example:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.findall('^PYTHON', 'PYTHON is fun.')
['PYTHON']</pre>
<p>The findall(pattern, string) method finds all occurrences of the pattern in the string. The caret at the beginning of the pattern ‘^PYTHON’ ensures that you match the word Python only at the beginning of the string. In the previous example, this doesn’t make any difference. But in the next example, it does:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall('^PYTHON', 'PYTHON! PYTHON is fun')
['PYTHON']</pre>
<p>Although there are two occurrences of the substring ‘PYTHON’, there’s only one matching substring—at the beginning of the string.</p>
<p>But what if you want to match not only at the beginning of the string but at the beginning of each line in a multi-line string? In other words:</p>
<h3>Python Re Start-of-Line (^) Regex</h3>
<p>The caret operator, per default, only applies to the start of a string. So if you’ve got a multi-line string—for example, when reading a text file—it will still only match once: at the beginning of the string.</p>
<p>However, you may want to match at the beginning of each line. For example, you may want to find all lines that start with ‘Python’ in a given string.</p>
<p>You can specify that the caret operator matches the beginning of each line via the re.MULTILINE flag. Here’s an example showing both usages—without and with setting the re.MULTILINE flag:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = '''
Python is great.
Python is the fastest growing
major programming language in
the world.
Pythonistas thrive.'''
>>> re.findall('^Python', text)
[]
>>> re.findall('^Python', text, re.MULTILINE)
['Python', 'Python', 'Python']
>>> </pre>
<p>The first output is the empty list because the string ‘Python’ does not appear at the beginning of the string. </p>
<p>The second output is the list of three matching substrings because the string ‘Python’ appears three times at the beginning of a line.</p>
<h3>Python re.sub()</h3>
<p><strong>The re.sub(pattern, repl, string, count=0, flags=0)</strong> method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in <a href="https://blog.finxter.com/python-regex-sub/">the Finxter blog tutorial</a>.</p>
<p>You can use the caret operator to substitute wherever some pattern appears at the beginning of the string:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.sub('^Python', 'Code', 'Python is \nPython') 'Code is \nPython'</pre>
<p>Only the beginning of the string matches the regex pattern so you’ve got only one substitution.</p>
<p>Again, you can use the re.MULTILINE flag to match the beginning of each line with the caret operator:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('^Python', 'Code', 'Python is \nPython', flags=re.MULTILINE) 'Code is \nCode'</pre>
<p>Now, you replace both appearances of the string ‘Python’.</p>
<h3>Python re.match(), re.search(), re.findall(), and re.fullmatch()</h3>
<p>Let’s quickly recap the most important regex methods in Python:</p>
<ul>
<li>The <strong>re.findall(pattern, string, flags=0)</strong> method returns a list of string matches. Read more in <a href="https://blog.finxter.com/python-re-findall/">our blog tutorial</a>.</li>
<li>The <strong>re.search(pattern, string<strong>, flags=0</strong>)</strong> method returns a match object of the first match. Read more in <a href="https://blog.finxter.com/python-regex-search/">our blog tutorial</a>.</li>
<li>The <strong>re.match(pattern, string<strong>, flags=0</strong>)</strong> method returns a match object if the regex matches at the beginning of the string. Read more in <a href="https://blog.finxter.com/python-regex-match/">our blog tutorial</a>.</li>
<li>The <strong>re.fullmatch(pattern, string<strong>, flags=0</strong>)</strong> method returns a match object if the regex matches the whole string. Read more in <a href="https://blog.finxter.com/python-regex-fullmatch/">our blog tutorial</a>.</li>
</ul>
<p>You can see that all four methods search for a pattern in a given string. You can use the caret operator ^ within each pattern to match the beginning of the string. Here’s one example per method:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'Python is Python'
>>> re.findall('^Python', text)
['Python']
>>> re.search('^Python', text)
&lt;re.Match object; span=(0, 6), match='Python'>
>>> re.match('^Python', text)
&lt;re.Match object; span=(0, 6), match='Python'>
>>> re.fullmatch('^Python', text)
>>> </pre>
<p>So you can use the caret operator to match at the beginning of the string. However, you should note that it doesn’t make a lot of sense to use it for the match() and fullmatch() methods as they, by definition, start by trying to match the first character of the string.</p>
<p>You can also use the re.MULTILINE flag to match the beginning of each line (rather than only the beginning of the string):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> text = '''Python is
Python'''
>>> re.findall('^Python', text, flags=re.MULTILINE)
['Python', 'Python']
>>> re.search('^Python', text, flags=re.MULTILINE)
&lt;re.Match object; span=(0, 6), match='Python'>
>>> re.match('^Python', text, flags=re.MULTILINE)
&lt;re.Match object; span=(0, 6), match='Python'>
>>> re.fullmatch('^Python', text, flags=re.MULTILINE)
>>> </pre>
<p>Again, it’s questionable whether this makes sense for the re.match() and re.fullmatch() methods as they only look for a match at the beginning of the string.</p>
<h2>Python Re End of String ($) Regex</h2>
<p>Similarly, you can use the dollar-sign operator $ to match the end of the string. Here’s an example:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.findall('fun$', 'PYTHON is fun')
['fun']</pre>
<p>The findall() method finds all occurrences of the pattern in the string—although the trailing dollar-sign $ ensures that the regex matches only at the end of the string.</p>
<p>This can significantly alter the meaning of your regex as you can see in the next example:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall('fun$', 'fun fun fun')
['fun']</pre>
<p>Although, there are three occurrences of the substring ‘fun’, there’s only one matching substring—at the end of the string.</p>
<p>But what if you want to match not only at the end of the string but at the end of each line in a multi-line string?</p>
<h3>Python Re End of Line ($)</h3>
<p>The dollar-sign operator, per default, only applies to the end of a string. So if you’ve got a multi-line string—for example, when reading a text file—it will still only match once: at the end of the string.</p>
<p>However, you may want to match at the end of each line. For example, you may want to find all lines that end with ‘.py’.</p>
<p>To achieve this, you can specify that the dollar-sign operator matches the end of each line via the re.MULTILINE flag. Here’s an example showing both usages—without and with setting the re.MULTILINE flag:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = '''
Coding is fun
Python is fun
Games are fun
Agreed?'''
>>> re.findall('fun$', text)
[]
>>> re.findall('fun$', text, flags=re.MULTILINE)
['fun', 'fun', 'fun']
>>> </pre>
<p>The first output is the empty list because the string ‘fun’ does not appear at the end of the string. </p>
<p>The second output is the list of three matching substrings because the string ‘fun’ appears three times at the end of a line.</p>
<h3>Python re.sub()</h3>
<p><strong>The re.sub(pattern, repl, string, count=0, flags=0)</strong> method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in <a href="https://blog.finxter.com/python-regex-sub/">the Finxter blog tutorial</a>.</p>
<p>You can use the dollar-sign operator to substitute wherever some pattern appears at the end of the string:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.sub('Python$', 'Code', 'Is Python\nPython') 'Is Python\nCode'</pre>
<p>Only the end of the string matches the regex pattern so there’s only one substitution.</p>
<p>Again, you can use the re.MULTILINE flag to match the end of each line with the dollar-sign operator:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('Python$', 'Code', 'Is Python\nPython', flags=re.MULTILINE) 'Is Code\nCode'</pre>
<p>Now, you replace both appearances of the string ‘Python’.</p>
<h3>Python re.match(), re.search(), re.findall(), and re.fullmatch()</h3>
<p>All four methods—re.findall(), re.search(), re.match(), and re.fullmatch()—search for a pattern in a given string. You can use the dollar-sign operator $ within each pattern to match the end of the string. Here’s one example per method:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'Python is Python'
>>> re.findall('Python$', text)
['Python']
>>> re.search('Python$', text)
&lt;re.Match object; span=(10, 16), match='Python'>
>>> re.match('Python$', text)
>>> re.fullmatch('Python$', text)
>>></pre>
<p>So you can use the dollar-sign operator to match at the end of the string. However, you should note that it doesn’t make a lot of sense to use it for the fullmatch() methods as it, by definition, already requires that the last character of the string is part of the matching substring.</p>
<p>You can also use the re.MULTILINE flag to match the end of each line (rather than only the end of the whole string):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>> text = '''Is Python
Python'''
>>> re.findall('Python$', text, flags=re.MULTILINE)
['Python', 'Python']
>>> re.search('Python$', text, flags=re.MULTILINE)
&lt;re.Match object; span=(3, 9), match='Python'>
>>> re.match('Python$', text, flags=re.MULTILINE)
>>> re.fullmatch('Python$', text, flags=re.MULTILINE)
>>></pre>
<p>As the pattern doesn’t match the string prefix, both re.match() and re.fullmatch() return empty results.</p>
<h2>How to Match the Caret (^) or Dollar ($) Symbols in Your Regex?</h2>
<p>You know that the caret and dollar symbols have a special meaning in Python’s regular expression module: they match the beginning or end of each string/line. But what if you search for the caret (^) or dollar ($) symbols themselves? How can you match them in a string?</p>
<p>The answer is simple: escape the caret or dollar symbols in your regular expression using the backslash. In particular, use ‘\^’ instead of ‘^’ and ‘\$’ instead of ‘$’. Here’s an example:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'The product ^^^ costs $3 today.'
>>> re.findall('\^', text)
['^', '^', '^']
>>> re.findall('\$', text)
['$']</pre>
<p>By escaping the special symbols ^ and $, you tell the regex engine to ignore their special meaning.</p>
<h2>Where to Go From Here?</h2>
<p>You’ve learned everything you need to know about the caret operator ^ and the dollar-sign operator $ in this regex tutorial. </p>
<p><strong>Summary</strong>: <em>The caret operator ^ matches at the beginning of a string. The dollar-sign operator $ matches at the end of a string. If you want to match at the beginning or end of each line in a multi-line string, you can set the re.MULTILINE flag in all the relevant re methods.</em></p>
<p><strong>Want to earn money while you learn Python?</strong> Average Python programmers earn more than $50 per hour. You can become average, can’t you?</p>
<p>Join the free webinar that shows you how to become a thriving coding business owner online!</p>
<p><a href="https://blog.finxter.com/webinar-freelancer/">[Webinar] Are You a Six-Figure Freelance Developer?</a></p>
<p>Join us. It’s fun! <img src="https://s.w.org/images/core/emoji/12.0.0-1/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
</div>


https://www.sickgaming.net/blog/2020/02/...d-of-line/
Reply



Forum Jump:


Users browsing this thread:
2 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016