Login

Python Regex – How to Count the Number of Matches?

<div><p>I just wrote a <a rel="noreferrer noopener" aria-label="regular expression in Python (opens in a new tab)" href="https://blog.finxter.com/python-regex/" target="_blank">regular expression in Python</a> that matches multiple times in the text and wondered: <strong>how to count the number of matches?</strong></p>
<p>Consider the example where you match an arbitrary number of word characters <code>'[a-z]+'</code> in a given sentence <code>'python is the best programming language in the world'</code>. </p>
<p>You can watch my explainer video as you read over the tutorial:</p>
<figure class="wp-block-embed-youtube wp-block-embed is-type-rich is-provider-embed-handler wp-embed-aspect-16-9 wp-has-aspect-ratio">
<div class="wp-block-embed__wrapper">
<div class="ast-oembed-container"><iframe title="Python Regex - How to Count the Number of Matches?" width="1100" height="619" src="https://www.youtube.com/embed/QD4REDMqEmI?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div>
</p></div>
</figure>
<p>How many matches are there in the string? To count the number of matches, you can use multiple methods:</p>
<h2>1. Python re.findall() </h2>
<p>Use the <a href="https://blog.finxter.com/python-re-findall/" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">re.findall(pattern, string) method</a> that returns a list of matching substrings. Then count the length of the returned list. Here’s an example:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> pattern = '[a-z]+'
>>> text = 'python is the best programming language in the world'
>>> len(re.findall(pattern, text))
9</pre>
<p>Why is the result 9? Because there are nine matching substrings in the returned list of the <code>re.findall()</code> method:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall(pattern, text)
['python', 'is', 'the', 'best', 'programming', 'language', 'in', 'the', 'world']</pre>
<p>This method works great if there are non-overlapping matches.</p>
<h2>2. Python re.finditer()</h2>
<p>You can also count the number of times a given <code>pattern</code> matches in a <code>text</code> by using the <code>re.finditer(pattern, text)</code> method:</p>
<p><strong>Specification</strong>: <code>re.finditer(<em>pattern</em>, <em>text</em>, <em>flags=0</em>)</code></p>
<p><strong>Definition</strong>: returns an iterator that goes over all non-overlapping matches of the <code>pattern</code> in the <code>text</code>. </p>
<p>The <code>flags</code> argument allows you to customize some advanced properties of the regex engine such as whether capitalization of characters should be ignored. You can learn more about the <a rel="noreferrer noopener" aria-label="flags argu (opens in a new tab)" href="https://blog.finxter.com/python-regex-flags/" target="_blank">flags argument in my detailed blog tutorial</a>. </p>
<p><strong>Example</strong>: You can use the iterator to count the number of matches. In contrast to the <code>re.findall()</code> method described above, this has the advantage that you can analyze the match objects themselves that carry much more information than just the matching substring.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re
pattern = '[a-z]+'
text = 'python is the best programming language in the world'
for match in re.finditer(pattern, text): print(match) '''
<re.Match object; span=(0, 6), match='python'>
<re.Match object; span=(7, 9), match='is'>
<re.Match object; span=(10, 13), match='the'>
<re.Match object; span=(14, 18), match='best'>
<re.Match object; span=(19, 30), match='programming'>
<re.Match object; span=(31, 39), match='language'>
<re.Match object; span=(40, 42), match='in'>
<re.Match object; span=(43, 46), match='the'>
<re.Match object; span=(47, 52), match='world'> '''</pre>
<p>If you want to count the number of matches, you can use a simple <code>count </code>variable:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re
pattern = '[a-z]+'
text = 'python is the best programming language in the world' count = 0
for match in re.finditer(pattern, text): count += 1 print(count)
# 9</pre>
<p>Or a more Pythonic solution:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re
pattern = '[a-z]+'
text = 'python is the best programming language in the world' print(len([i for i in re.finditer(pattern, text)]))
# 9</pre>
<p>This method works great if there are non-overlapping matches.</p>
<h2>3. Overlapping Matches</h2>
<p>The above two methods work great if there are no overlapping matches. If there are overlapping matches, the regex engine will just ignore them because it “consumes” the whole matching substrings and starts matching the next pattern only after the <code>stop</code> index of the previous match. </p>
<p>So if you need to find the number of overlapping matches, you need to use a different approach. </p>
<p>The idea is to keep track of the start position in the previous match and increment it by one after each match:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re
pattern = '99'
text = '999 ways of writing 99 - 99999' left = 0
count = 0
while True: match = re.search(pattern, text[left:]) if not match: break count += 1 left += match.start() + 1
print(count)
# 7 </pre>
<p>By keeping track of the start index of the previous match in the left variable, we can control where to look for the next match in the string. Note that we use Python’s <a rel="noreferrer noopener" aria-label="slicing (opens in a new tab)" href="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank">slicing operation </a><code>text[left:]</code> to ignore all left characters that are already considered in previous matches. In each loop iteration, we match another pattern in the text. This works even if those matches overlap. </p>
<h2>Where to Go From Here</h2>
<p>You’ve learned three ways of finding the number of matches of a given pattern in a string. </p>
<p>If you struggle with regular expressions, check out our free 20,000 word <a rel="noreferrer noopener" aria-label="regex tutorial (opens in a new tab)" href="https://blog.finxter.com/python-regex/" target="_blank">regex tutorial</a> on the Finxter blog! It’ll give you <strong>regex superpowers</strong>!</p>
</div>

https://www.sickgaming.net/blog/2020/02/...f-matches/

xSicKxBot