Every computer scientist knows the asterisk quantifier of regular expressions. But many non-techies know it, too. Each time you search for a text file *.txt on your computer, you use the asterisk operator.
This article is all about the asterisk * quantifier in Python’s re library. Study it carefully and master this important piece of knowledge once and for all!
Alternatively, you can also watch the video where I lead you through the whole article:
What’s the Python Re * Quantifier?
When applied to regular expression A, Python’s A* quantifier matches zero or more occurrences of A. The * quantifier is called asterisk operator and it always applies only to the preceding regular expression. For example, the regular expression ‘yes*’ matches strings ‘ye’, ‘yes’, and ‘yesssssss’. But it does not match the empty string because the asterisk quantifier * does not apply to the whole regex ‘yes’ but only to the preceding regex ‘s’.
Let’s study two basic examples to help you gain a deeper understanding. Do you get all of them?
>>> import re
>>> text = 'finxter for fast and fun python learning'
>>> re.findall('f.* ', text)
['finxter for fast and fun python ']
>>> re.findall('f.*? ', text)
['finxter ', 'for ', 'fast ', 'fun ']
>>> re.findall('f[a-z]*', text)
['finxter', 'for', 'fast', 'fun']
>>>
Don’t worry if you had problems understanding those examples. You’ll learn about them next. Here’s the first example:
Greedy Asterisk Example
>>> re.findall('f.* ', text)
['finxter for fast and fun python ']
You use the re.findall() method. In case you don’t know it, here’s the definition from the Finxter blog article:
The re.findall(pattern, string) method finds all occurrences of the pattern in the string and returns a list of all matching substrings.
The first argument is the regular expression pattern ‘f.* ‘. The second argument is the string to be searched for the pattern. In plain English, you want to find all patterns in the string that start with the character ‘f’, followed by an arbitrary number of optional characters, followed by an empty space.
The findall() method returns only one matching substring: ‘finxter for fast and fun python ‘. The asterisk quantifier * is greedy. This means that it tries to match as many occurrences of the preceding regex as possible. So in our case, it wants to match as many arbitrary characters as possible so that the pattern is still matched. Therefore, the regex engine “consumes” the whole sentence.
Non-Greedy Asterisk Example
But what if you want to find all words starting with an ‘f’? In other words: how to match the text with a non-greedy asterisk operator?
In this example, you’re looking at a similar pattern with only one difference: you use the non-greedy asterisk operator *?. You want to find all occurrences of character ‘f’ followed by an arbitrary number of characters (but as few as possible), followed by an empty space.
Therefore, the regex engine finds four matches: the strings ‘finxter ‘, ‘for ‘, ‘fast ‘, and ‘fun ‘.
This regex achieves almost the same thing: finding all words starting with f. But you use the asterisk quantifier in combination with a character class that defines specifically which characters are valid matches.
Within the character class, you can define character ranges. For example, the character range [a-z] matches one lowercase character in the alphabet while the character range [A-Z] matches one uppercase character in the alphabet.
But note that the empty space is not part of the character class, so it won’t be matched if it appears in the text. Thus, the result is the same list of words that start with character f: ‘finxter ‘, ‘for ‘, ‘fast ‘, and ‘fun ‘.
What If You Want to Match the Asterisk Character Itself?
You know that the asterisk quantifier matches an arbitrary number of the preceding regular expression. But what if you search for the asterisk (or star) character itself? How can you search for it in a string?
The answer is simple: escape the asterisk character in your regular expression using the backslash. In particular, use ‘\*’ instead of ‘*’. Here’s an example:
You find all occurrences of the star symbol in the text by using the regex ‘\*’. Consequently, if you use the regex ‘\**’, you search for an arbitrary number of occurrences of the asterisk symbol (including zero occurrences). And if you would like to search for all maximal number of occurrences of subsequent asterisk symbols in a text, you’d use the regex ‘\*+’.
[Collection] What Are The Different Python Re Quantifiers?
The asterisk quantifier—Python re *—is only one of many regex operators. If you want to use (and understand) regular expressions in practice, you’ll need to know all of them by heart!
So let’s dive into the other operators:
A regular expression is a decades-old concept in computer science. Invented in the 1950s by famous mathematician Stephen Cole Kleene, the decades of evolution brought a huge variety of operations. Collecting all operations and writing up a comprehensive list would result in a very thick and unreadable book by itself.
Fortunately, you don’t have to learn all regular expressions before you can start using them in your practical code projects. Next, you’ll get a quick and dirty overview of the most important regex operations and how to use them in Python. In follow-up chapters, you’ll then study them in detail — with many practical applications and code puzzles.
Here are the most important regex quantifiers:
Quantifier
Description
Example
.
The wild-card (‘dot’) matches any character in a string except the newline character ‘n’.
Regex ‘…’ matches all words with three characters such as ‘abc’, ‘cat’, and ‘dog’.
*
The zero-or-more asterisk matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex.
Regex ‘cat*’ matches the strings ‘ca’, ‘cat’, ‘catt’, ‘cattt’, and ‘catttttttt’.
?
The zero-or-one matches (as the name suggests) either zero or one occurrences of the immediately preceding regex.
Regex ‘cat?’ matches both strings ‘ca’ and ‘cat’ — but not ‘catt’, ‘cattt’, and ‘catttttttt’.
+
The at-least-one matches one or more occurrences of the immediately preceding regex.
Regex ‘cat+’ does not match the string ‘ca’ but matches all strings with at least one trailing character ‘t’ such as ‘cat’, ‘catt’, and ‘cattt’.
^
The start-of-string matches the beginning of a string.
Regex ‘^p’ matches the strings ‘python’ and ‘programming’ but not ‘lisp’ and ‘spying’ where the character ‘p’ does not occur at the start of the string.
$
The end-of-string matches the end of a string.
Regex ‘py$’ would match the strings ‘main.py’ and ‘pypy’ but not the strings ‘python’ and ‘pypi’.
A|B
The OR matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions.
Regex ‘(hello)|(hi)’ matches strings ‘hello world’ and ‘hi python’. It wouldn’t make sense to try to match both of them at the same time.
AB
The AND matches first the regex A and second the regex B, in this sequence.
We’ve already seen it trivially in the regex ‘ca’ that matches first regex ‘c’ and second regex ‘a’.
Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the ‘^’ operator is usually denoted as the ‘caret’ operator. Those names are not descriptive so I came up with more kindergarten-like words such as the “start-of-string” operator.
We’ve already seen many examples but let’s dive into even more!
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) '''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!'] ''' print(re.findall('is.*and', text)) '''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and'] ''' print(re.findall('her:?', text)) '''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) '''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:'] ''' print(re.findall('^Ha.*', text)) '''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. Can you figure out why Python doesn't find any?
[] ''' print(re.findall('n$', text)) '''
Finds all occurrences where the new-line character 'n'
occurs at the end of the string.
['n'] ''' print(re.findall('(Life|Death)', text)) '''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death'] '''
In these examples, you’ve already seen the special symbol ‘\n’ which denotes the new-line character in Python (and most other languages). There are many special characters, specifically designed for regular expressions. Next, we’ll discover the most important special symbols.
What’s the Difference Between Python Re * and ? Quantifiers?
You can read the Python Re A? quantifier as zero-or-one regex: the preceding regex A is matched either zero times or exactly once. But it’s not matched more often.
Analogously, you can read the Python Re A* operator as the zero-or-more regex (I know it sounds a bit clunky): the preceding regex A is matched an arbitrary number of times.
The regex ‘ab?’ matches the character ‘a’ in the string, followed by character ‘b’ if it exists (which it does in the code).
The regex ‘ab*’ matches the character ‘a’ in the string, followed by as many characters ‘b’ as possible.
What’s the Difference Between Python Re * and + Quantifiers?
You can read the Python Re A* quantifier as zero-or-more regex: the preceding regex A is matched an arbitrary number of times.
Analogously, you can read the Python Re A+ operator as the at-least-once regex: the preceding regex A is matched an arbitrary number of times too—but at least once.
The regex ‘ab*’ matches the character ‘a’ in the string, followed by an arbitary number of occurrences of character ‘b’. The substring ‘a’ perfectly matches this formulation. Therefore, you find that the regex matches eight times in the string.
The regex ‘ab+’ matches the character ‘a’, followed by as many characters ‘b’ as possible—but at least one. However, the character ‘b’ does not exist so there’s no match.
What are Python Re *?, +?, ?? Quantifiers?
You’ve learned about the three quantifiers:
The quantifier A* matches an arbitrary number of patterns A.
The quantifier A+ matches at least one pattern A.
The quantifier A? matches zero-or-one pattern A.
Those three are all greedy: they match as many occurrences of the pattern as possible. Here’s an example that shows their greediness:
The code shows that all three quantifiers *, +, and ? match as many ‘a’ characters as possible.
So, the logical question is: how to match as few as possible? We call this non-greedy matching. You can append the question mark after the respective quantifiers to tell the regex engine that you intend to match as few patterns as possible: *?, +?, and ??.
Here’s the same example but with the non-greedy quantifiers:
In this case, the code shows that all three quantifiers *?, +?, and ?? match as few ‘a’ characters as possible.
Related Re Methods
There are five important regular expression methods which you should master:
The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.
The re.split(pattern, string) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in our blog tutorial.
The re.sub(The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in our blog tutorial.
These seven methods are 80% of what you need to know to get started with Python’s regular expression functionality.
Where to Go From Here?
You’ve learned everything you need to know about the asterisk quantifier * in this regex tutorial.
Summary: When applied to regular expression A, Python’s A* quantifier matches zero or more occurrences of A. The * quantifier is called asterisk operator and it always applies only to the preceding regular expression. For example, the regular expression ‘yes*’ matches strings ‘ye’, ‘yes’, and ‘yesssssss’. But it does not match the empty string because the asterisk quantifier * does not apply to the whole regex ‘yes’ but only to the preceding regex ‘s’.
Want to earn money while you learn Python? Average Python programmers earn more than $50 per hour. You can certainly become average, can’t you?
Join the free webinar that shows you how to become a thriving coding business owner online!
What keeps you going day after day? No matter what, you already know that your motivation is the most important building block of your success. In the following, I’d like to give you some fact-based motivation why creating your coding business online can easily be the most rewarding decision in your life.
Yet, motivation is not everything. If you want to make your business work, you must show some persistency. You need to keep working on it for many months, even years.
There’s no quick and easy way to create a successful and lasting business. It takes time, discipline, and focused effort.
The truth is that creating a successful business is a straightforward endeavor if you have the right mindset, habits, and motivation. Using the words of legendary speaker Jim Rohn: “it’s easy to do, but it’s also easy not to do.”
This tutorial intends to give you all the motivation you need to sustain a long time (say, one or two years) working daily on your new online coding business.
You can also watch the video while reading the blog article where I’ll lead you through all the content and more:
In particular, you’ll find an answer to these questions:
Why should you even consider working from home on your online coding business?
What are the advantages?
What are the disadvantages?
What can you expect to happen after you decided not to follow the herd by working for a big corporation or the government?
And, last but not least, what can you expect to earn as a freelance developer?
Let’s take a high-level perspective analyzing some major trends in society.
The Workforce Disruption of the 21st Century
Massive change is the only constant in today’s world. One aspect of those changes is the nature of employment in a globalized economy. It becomes more and more evident that freelancing is the most suitable way of organizing, managing, and delivering talents to small businesses and creators in the 21st century.
Say, you’re a small business owner, and you need to get some editing done for an ebook project. Would you hire a new employee for this project? Or would you just visit an online freelancing platform and hire the best editor you can get for a fair price?
You may find the answer obvious, but I don’t think that most people have already realized the second-order consequences: online freelancing is not a niche idea but has the power to transform and, ultimately, dominate the organization of the world’s talent. It’s accessible to billions of creators and business owners. And it’ll become even more efficient in the future.
When I discuss the evolution of the traditional “job market” to a project-driven “freelancer market”, I often end up debating the ethical implication of this. Yes, it means that there will be less job security in the future. It also means that there will be a massive global competition for skill. The ones who deliver excellent work will get paid much better than their lazy, low-quality competition. You may not like this trend. But this doesn’t mean that it is not happening right now. This tutorial is not about whether we should or should not enter this area. It’s about how you can benefit from this global trend. But to take a stand on this, I find it a highly positive development towards a more efficient workforce where you can simply focus on the work you like, and you’re good at and outsource everything else.
To me, freelancing already is an integral ingredient of my existence. Here’s how freelancing impacts every aspect of my professional life today:
By working as a freelancer myself, I funded and grew my passion online business Finxter.com.
I hire freelancers for Finxter. The more Finxter grows, the more I rely on freelancers to create more value for my users.
I host the most comprehensive Python freelancer course in the world. This is my way of centralizing and sharing (but also learning from) the expertise of professionals across the globe.
My online business would have never been possible in its current form (and scale) without leveraging the efficiency gains of freelancing.
This is great because before freelancing became popular, large corporations practically owned the monopoly for exploiting the benefits of globalized labor.
Today, every small business owner can access the global pool of talents. This way, new arbitrage opportunities open up for every small business owner who seizes them.
Both business owners and freelancers benefit from this trend (as well as the people who, like me, work on both sides).
So how can you benefit from the global freelancing trend? You can benefit by becoming an arbitrage trader: buy and sell freelancing services at the same time! You purchase the services you’re not good at. You sell the services you’re good at. This way, you’re continually increasing your hourly rate. Can you see why? A bit of napkin math will highlight the fundamental arithmetic of outsourcing.
Why Outsourcing is Genius [Alice Example]
Say, you’re a fast coder: you write ten lines of code per minute. But you suck at customer service: you write 0.1 emails per minute. But you need to do both in your current position. To write 100 lines of code and answer ten emails, you need 10 + 100 = 110 minutes. Most of the time, you’ll be answering emails.
Let’s assume further that Alice has the exact opposite preferences: she writes only one line of code per minute (10x slower than you) but answers one email per minute (10x faster than you). To write 100 lines of code and answer ten emails, she’d need 100 + 10 = 110 minutes, too. Most of the time, she’ll be writing code.
Both of you spend most of your time doing work you suck at.
But what if you decide to hire each other? You hire Alice to answer your emails, and Alice hires you to do her coding. Now, you have to write 200 lines of code instead of 100 lines of code which takes you only 20 minutes. Alice now answers 20 emails instead of 10, which takes her 20 minutes. In total, you too finish your work in 20+20=40 minutes instead of 110+110=220 minutes! Together, you saved 220 – 40 = 180 minutes – 3 hours per day!
It’s a stupid idea to do everything by yourself! You’ll leave vast amounts of money on the table if you’re guilty of this.
The freelancer disruption will make the world much more efficient. So let’s get some clarity: is freelancing for you?
Python Freelancer: To Be Or Not To Be?
Becoming a freelancer is an exciting way of growing your business skills, participating in the new economy, learning new technologies, practicing your communication expertise, learning how to sell and market your skills, and earning more and more money on the side. Technology and globalization have opened up this opportunity. And now it’s up to you to seize it.
But what can you expect from this new path of becoming a freelance developer (e.g., focusing on the Python programming language)?
First and foremost, freelancing is a path of personal growth, learning new skills, and earning money in the process. But in today’s digital economy, becoming a Python freelancer is – above everything else – a lifestyle choice. It can give you fulfillment, flexibility, and endless growth opportunities. Additionally, it offers you a unique way of connecting with other people, learning about their exciting projects, and finding friends and acquaintances on the road.
While this sounds nice – becoming a Python freelancer can also be a struggle with the potential to make your life miserable and stressful if you’re approaching it with the wrong strategies and tactics. But no worries, this book is all about teaching you these.
So is being a Python freelancer for you? Let’s discuss the pros and cons of becoming a Python freelancer. The list is based not only on my personal experience as a Python freelancer — working for diverse projects in science, data analytics, and even law enforcement — but I have also assembled the experiences of some of the top experts in the field.
The Good Things
There are many advantages to being a Python freelancer. Here are the most important of them:
Flexibility: You are flexible in time and space. I am living in a large German city (Stuttgart) where rent prices are growing rapidly, year after year. However, since I am working full-time in the Python industry, being self-employed, and 100% digital, I have the freedom to move to the countryside. Outside large cities, housing is exceptionally cheap, and living expenses are genuinely affordable. I am earning good money matched only by a few employees in my home town — while I don’t have to compete for housing to live close to my employers. That’s a huge advantage which can make your life wonderfully peaceful and efficient. Taken to an extreme, you can move to countries with minimal living expenses: earn Dollars and pay Rupees. As a Python freelancer, you are 100% flexible, and this flexibility opens up new possibilities for your life and work.
Independence: Do you hate working for your boss? Being a Python freelancer injects a dose of true independence into your life. While you are not free from influences (after all, you are still working for clients), you can theoretically get rid of any single client while not losing your profession. Firing your bad clients is even a smart thing to do because they demand more of your time, drain your energy, pay you badly (if at all), and don’t value your work in general. In contrast, good clients will treat you with respect, pay well and on time, come back, refer you to other clients, and make working with them a pleasant and productive experience. As an employee, you don’t have this freedom of firing your boss until you find a good one. This is a unique advantage of being a Python freelancer compared to being a Python employee.
Tax advantages: As a freelancer, you start your own business. Please note that I’m not an accountant — and tax laws are different in different countries. But in Germany and many other developed nations, your small Python freelancing business usually comes with a lot of tax advantages. You can deduct a lot of things from the taxes you pay like your Notebook, your car, your living expenses, working environment, eating outside with clients or partners, your smartphone, and so on. At the end of the year, many freelancers enjoy tax benefits worth tens of thousands of dollars.
Business expertise: This advantage is maybe the most important one. As a Python freelancer, you gain a tremendous amount of experience in the business world. You learn to offer and sell your skills in the marketplace, you learn how to acquire clients and keep them happy, you learn how to solve problems, and you learn how to keep your books clean, invest, and manage your money. Being a Python freelancer gives you a lot of valuable business experiences. And even if you plan to start a more scalable business system, being a Python freelancer is a great first step towards your goal.
Paid learning: While you have to pay to learn at University, being a Python freelancer flips this situation upside down. You are getting paid for your education. As a bonus, the things you are learning are as practical as they can be. Instead of coding toy projects in University, you are coding (more or less) exciting projects with an impact on the real world.
Save time in commute: Being in commute is one of the major time killers in modern life. Every morning, people are rushing to their jobs, offices, factories, schools, or universities. Every evening, people are rushing back home. On the way, they leave 1-2 hours of their valuable time on the streets, every single day, 200 days a year. During a ten year period, you’ll waste 2000-4000 hours — enough to become a master in a new topic of your choice, or writing more than ten full books and sell them on the marketplace. Commute time to work is one of the greatest inefficiencies in our society. And you, as a Python freelancer, can eliminate it. This will make your life easier, and you have an unfair advantage compared to any other employee. You can spend the time on learning, recreation, or building more side businesses. You don’t even need a car (I don’t have one), which will save you hundreds of thousands of dollars throughout your lifetime (the average German employee spends 300,000 € for cars).
Family time: During the last 12 months of being self-employed with Python, I watched my 1-year old son walking his first steps and speaking his first words. Many fathers who work at big companies as employees may have missed their sons and daughters growing up. In my environment, most fathers do not have time to spend with their kids during their working days. But I have, and I’m very grateful for this.
Are you already convinced that becoming a Python freelancer is the way to go for you? You are not alone. To help you with your quest, I have created the only Python freelancer course on the web, which pushes you to Python freelancer level in a few months — starting as a beginner coder. The course is designed to pay for itself because it will instantly increase your hourly rate on diverse freelancing platforms such as Upwork or Freelancer.com.
The Bad Things
But it’s not all fun and easy being a Python freelancer. There are a few severe disadvantages which you have to consider before starting your own freelancing business. Let’s dive right into them!
No stability: It’s hard to reach a stable income as a Python freelancer. If you feel only safe if you know exactly how much income you bring home every month, you’ll be terrified as a Python freelancer. Especially if you live from paycheck to paycheck and don’t have yet developed the valuable habit of saving money every month. In this case, being a Python freelancer can be very dangerous because it will ultimately push you out of business within a few bad months. You need to buffer the lack of stability with means of a rigorous savings plan. There is no way around that.
Bad clients: Yes, they exist. If you commit to becoming a Python freelancer, you will get those bad clients for sure. They expect a lot, are never satisfied, give you a bad rating, and don’t even pay you. You might as well already accept this fact and write 10% of your income off as insurance for freeing yourself from any of those bad clients. I’m not kidding — set apart a fraction of your income so that you can always fire the bad clients immediately. You save yourself a lot of time, energy, and ultimately money (time is money in the freelancing business).
Procrastination: Are you a procrastinator? It may be difficult for you to start a freelancing business because this requires that you stay disciplined. No boss kicks your ass if you don’t perform. All initiative is on you. Of course, if you have established a thriving freelancing business, new clients will line up to do business with you. In this case, it may be easier to overcome procrastination. But especially in the early days where you have to make a name for yourself, you must show the discipline which this job profile requires. Make a clear plan for how you acquire clients. For example, if you are a Python freelancer at Upwork, make it a habit to apply for ten projects every day. Yes, you’ve heard this right. Commit first, figure out later. You can always hire your freelancers to help you with this if you have more projects than you can handle. Or even withdraw your services. But doing this will ensure that you never run out of clients, which will practically guarantee your success as a freelancer in the long run.
Legacy code: Kenneth, an experienced Python freelancer, describes this disadvantage as follows: “Python has been around for 25+ years, so, needless to say, there are some projects that have a lot of really old code that might not be up to modern standards. Legacy code presents its own fun challenge. You can’t usually refactor it, at least not easily, because other, equally old, code depends on it. That means you get to remember that this one class with a lowercase name and camel-case methods acts in its own special way. This is another place where you thank your lucky stars if there are docs and tests. Or write to them as quickly as possible if there’s not!” [1]
Competition: Python is a very well documented language. Although the code projects in Python are snowballing, so is the international competition. Many coders are attracted to Python because of its excellent documentation and suitability for machine learning and data science. Thus, the significant advantage of writing Python code that is fun, can sometimes also be the biggest curse. Competition can be fierce. However, this is usually only a problem if you are just starting and have not yet made a name for yourself. If you are doing good work, and focus on one sought-after area (e.g., machine learning nowadays), you have good chances to have plenty of clients competing for your valued time!
Solitude: If you are working as an employee at a company, you always have company, quite literally. You will meet your buddies at the coffee corner, you’ll attend seminars and conferences, you’ll present your work to your group, and you’ll generally get a lot of external input regarding upcoming trends and technology. As a freelancer, you cannot count on these advantages. You have to structure your day well, read books, attend conferences, and meet new people. Otherwise, you will quickly fall out of shape with both your coding and communication skills because you regularly work on your own. The ambitious way out is to continually grow your freelancing business by hiring more and more employees.
What’s unique in Python freelancing compared to general IT or coding freelancing?
Python is a unique language in many ways. The code is clean; there are strict rules (PEP standards), and “writing Pythonic code” is a globally accepted norm of expressing yourself in code. This has the big advantage that usually, you will work on clean and standardized code projects which are easily understandable. This is in stark contrast to languages such as C, where it’s hard to find common ground from time to time.
The Python ecosystem is also incredibly active and vivid — you’ll find tons of resources about every single aspect. As mentioned previously, the documentation is excellent. Many languages such as COBOL (wtf, I know), Go, Haskell and C# are documented poorly in comparison to Python. This helps you a lot when trying to figure out the nasty bugs in your code (or your clients’).
The barrier of entry is also low, which is partly a result of the great documentation, and partly a result of the easy to understand language design. Python is clean and concise — no doubt about that.
Finally, if you plan to start your career in the area of machine learning or data science, Python is the 800-pound gorilla in the room. The library support is stunning — more and more people migrating from Matlab or R to Python because of its generality and the rise of new machine learning frameworks such as TensorFlow.
Knowing about those, let’s dive into the more worldly benefits of becoming a freelance developer.
What’s the Hourly Rate of a Python Freelancer?
Today, many Python freelance developers earn six figures.
How much can you expect to earn as a Python freelancer?
The short answer is: the average Python developer makes between $51 and $61 per hour (worldwide).
This data is based on various sources:
Codementor argues that the average freelancer earns between $61 and $80 in 2019: source
This Subreddit gives a few insights about what some random freelancers earn per hour (it’s usually more than $30 per hour): source
Ziprecruiter finds that the average Python freelancer earns $52 per hour in the US—the equivalent of $8,980 per month or $107,000 per year: source
Payscale is more pessimistic and estimates the average hourly rate around $29 per hour: source
As a Python developer, you can expect to earn between $10 and $80 per hour, with an average salary of $51 (source). I know the variation of the earning potential is high, but so is the quality of the Python freelancers in the wild. Take the average salary as a starting point and add +/- 50% to account for your expertise.
If you work on the side, let’s make it 8 hours each Saturday, you will earn $400 extra per week – or $1600 per month (before taxes). Your hourly rate will be a bit lower because you have to invest time finding freelancing clients – up to 20% of your total time. (source)
1.1 Million USD — How Much You Are Worth as an Average Python Coder?
What’s your market value as a Python developer?
I base this calculation on a standard way of evaluating businesses. In a way, you’re a one-person business when selling your coding skills to the marketplace (whether you’re an employee or a freelancer). When estimating the value of a company, analysts often use multiples of its yearly earnings. Let’s take this approach to come up with a rough estimate of how much your Python skills are worth.
Say, we are taking a low multiple of 10x of your (potential) yearly earning of a Python freelancer.
As an AVERAGE Python freelancer, you’ll earn about $60 per hour.
So the market value of being an average Python coder is:
Yearly Earnings: $60 / hour x 40 hours/week x 46 weeks/year = $110,000 / year
Market Value: Yearly Earnings x 10 = $1.1 Million
As it turns out, Python is a Million-Dollar Skill (even for an average coder)!
And the value of a top 5% coder can easily be 10x or 100x of the average coder:
“A great lathe operator commands several times the wage of an average lathe operator, but a great writer of software code is worth 10,000 times the price of an average software writer.”
Bill Gates
So if you want to thrive with your own coding business, you need to think strategically.
Being cheap costs you hundreds of thousands of dollars. You simply cannot invest too much time, energy, and even money in the right learning material.
Here’s another quote from a billionaire:
“Ultimately, there’s one investment that supersedes all others: Invest in yourself. Nobody can take away what you’ve got in yourself, and everybody has potential they haven’t used yet.”
Warren Buffet
Do you want to know how to go from beginner to average Python freelancer — and even move beyond average?
Then join my Python freelancer program. It’s the world’s most in-depth Python freelancer program — distilling thousands of hours of real-market experience of professional Python freelancers in various industries.
I guarantee that you will earn your first dollars on a freelancer platform within weeks — otherwise, you’ll get your money back.
But one warning: the Python freelancer program is only for those who commit now to invest 1-2 hours every day into their new coding business online. It’s not for the weak players who would rather watch 3.5 hours of Netflix in the evening.
If you fully commit, joining this new venture will be one of the most profitable investments in your life.
Code From Home! How to Be Happier & Earn More Money
What is the number one reason why you should consider working from home?
The number one reason is commute time. It’s healthy and makes you happier to skip commute time altogether.
Commute time is a huge productivity killer and drains your energy. Even if you use the time productively by listening to audiobooks or reading — it’s still a waste of your time.
When I became self-employed, my work productivity skyrocketed. At the same time, work became easier and less stressful. When I analyzed my days to find out about the reason for this, it struck me: No commute time.
Suddenly, I had a lot more time and more energy to create more content. Skipping commute time simply gave me more resources.
Working from home means that you don’t have these enormous drains of energy every day — even more so if you’re involved in a lot of office politics costs.
Many scientific research studies show that having a long commute time makes you less happy. It’s one of the top ten influential factors for your happiness — even more important than making a lot of money with your job.
Working from home is one of the best advantages of being a Python freelancer.
You save 1-2h per day commute time. Invest this commute time into your dream project every day, and you’ll be wildly successful in a few years.
You could write 2-3 books per year, finish ten small web projects per year, or learn and master an entirely new skill such as business or marketing.
What Does it Take to Be a Freelancer?
Surprisingly, many people fear to take the first steps towards freelance development. They are hesitant because they believe that they don’t have enough knowledge, skill, or expertise.
But this is far from the truth. If anything else, it’s a limiting belief that harms their ability to make progress towards their dream life.
The only thing it takes for sure to become a freelancer is to be human (and this may not even be a requirement in the decades to come). Everything else you already have in more — or less — rudimentary form:
Communication skills. You need to ask and respond to questions, figure out what your clients want, be responsive, positive, enthusiastic, and helpful.
Technical skills. There’s always an underlying set of technical skills for which clients hire you. They may want you to develop their next website, write their copy and ads, create valuable content, or solve any other problem. Before being able to deliver the solution, you first need to have the technical skills required to develop this solution.
The ability and ambition to learn. You won’t know everything you need to know to solve the client’s problems. So you need to learn. There’s no way around. If you are willing to learn, you can solve any problem — it’s just a matter of time. And each time you learn more in your area of expertise, the next freelancer gig will become a little bit easier.
Time. All of us have the same number of hours every day. You already have enough time to become a freelancer. You just need to focus your effort—and maybe even skip the Netflix episode this evening.
You see, there’s nothing special about what you need to have to become a freelancer. You already have everything you need to get started. Now, it’s just a matter of your persistence.
Are You Good Enough to Start Earning Money?
André, one of my early students at my “Coffee Break Python” email series, asked me the following question:
“How much do I have to learn to become a Python freelancer?”
My answer is straightforward: start right away — no matter your current skill level.
But I know that for many new Python coders, it’s tough to start right away. Why? Because they don’t have the confidence, yet, to start taking on projects.
And the reason is that they never have quite finished a Python project — and, of course, they are full of doubts and low self-esteem. They fear not being able to finish through with the freelancer project and earn the criticism of their clients.
If you have to overcome this fear first, then I would recommend that you start doing some archived freelancer projects. I always recommend a great resource where you can find these archived freelancer projects (at Freelancer.com). On this resource, you’ll find not only a few but all the freelancer projects in different areas — such as Python, data science, and machine learning — that have ever been published at the Freelancer.com platform. There are thousands of such projects.
Unfortunately, many projects published there are crappy, and it’ll take a lot of time finding suitable projects. To relieve you from this burden, I have compiled a list of 10 suitable Python projects (and published a blog article about that), which you can start doing today to improve your skill level and gain some confidence. Real freelancers have earned real money solving these projects — so they are as practical as they can be.
I recommend that you invest 70% of your learning time finishing these projects. First, you select the project. Second, you finish this project. No matter your current skill level. Even if you are a complete beginner, then it will just take you weeks to finish the project, which earned the freelancer 20 dollars. So what? Then you have worked weeks to make $20 (which you would have invested for learning anyways), and you have improved your skill level a lot. But now you know you can solve the freelancer project.
The next projects will be much easier then. This time, it’ll take you not weeks but a week to finish a similar project. And the next project will take you only three days. And this is how your hourly rate increases exponentially in the beginning until you reach some convergence, and your hourly rate flattens out. At this point, you must specialize even further. Select the skills that interest you and focus on those skills first. Always play your strengths.
Start early
If you want to know how much you can earn and get the overall picture of the state of Python freelancing in 2019, then check out my free webinar: How to earn $3000/M as a Python freelancer. It’ll take you only 30-40 minutes, and I’ll explain to you in detail the state of the art in freelancing, future outlooks and hot skills, and how much you can earn compared to employees and other professions.
Can I Start Freelancing as an Intermediate-Level Python Programmer?
For sure! You should have started much earlier. Have a look at the income distribution of Python freelancers:
It resembles a Gaussian distribution around the average value of $51 per hour. So if you are an average Python freelancer, you can earn $51 per hour in the US!
I have gained a lot of experience at the freelancing platform Upwork.com. Many beginner-level Python coders earn great money finishing smaller code projects. If you are an intermediate-level Python coder and interested in freelancing, you should start earning money ASAP.
The significant benefit is not only that you are getting paid to learn and improving your Python skills even further. It’s also about learning the right skill sets that will make you successful online: communication, marketing, and also coding (the essential practical stuff).
Only practice can push you to the next level. And working as a Python freelancer online will give you a lot of practice for sure!
Are You too Old to Become a Python Freelancer?
The short answer is no. You are not too old.
The older you are, the better your communication skills tend to be. Having excellent communication skills is the main factor for your success in the Python freelancing space.
Just to make this point crystal clear: there are plenty of successful freelancers with limited technical skills that earn even more than highly-skilled employees. They are successful because they are responsive, positive, upbeat, and are committed making the lives of their clients easier. That’s what matters most as a freelancer.
As you see there’s no age barrier here—just double down on your advantages rather than focus too much on your disadvantages.
Are You too Young to Become a Python Freelancer?
The short answer is no. You are not too young.
Was Warren Buffet too young when buying his first stocks at the age of 11? Was Magnus Carlsen, the world’s best chess player, too young when he started playing chess at age 5? Was Mark Zuckerberg too young when he started Facebook?
If anything else, a young age is an advantage, and you should use this advantage by relentlessly pursuing maximal value for your clients. If you do just that, you have a good chance to build yourself a thriving business within a few years.
If you are young, you learn quickly. By focusing your learning on highly practical tasks such as solving problems for clients by using Python code, you create a well-rounded personality and skillset.
Just to make this point crystal clear: there are plenty of successful freelancers earning more than employees who have very limited technical skills. They are successful because they are responsive, positive, upbeat, and are committed making the lives of their clients easier. That’s what matters most as a freelancer.
As you see, there’s no age barrier here—just double down on your advantages rather than focus too much on your disadvantages.
Where to Go From Here
If you want to become a Python freelance developer (and create your coding business online), check out my free webinar “How to Build Your High-Income Skill Python”. Just click the link, register, and watch the webinar immediately. It’s a replay so you won’t have to wait even a minute to watch it. The webinar is an in-depth PowerPoint presentation that will give you a detailed overview of the Python freelancing space.
You’re about to learn one of the most frequently used regex operators: the dot regex . in Python’s re library.
What’s the Dot Regex in Python’s Re Library?
The dot regex . matches all characters except the newline character. For example, the regular expression ‘…’ matches strings ‘hey’ and ‘tom’. But it does not match the string ‘yo\ntom’ which contains the newline character ‘\n’.
Let’s study some basic examples to help you gain a deeper understanding.
>>> import re
>>> >>> text = '''But then I saw no harm, and then I heard
Each syllable that breath made up between them.'''
>>> re.findall('B..', text)
['But']
>>> re.findall('heard.Each', text)
[]
>>> re.findall('heard\nEach', text)
['heard\nEach']
>>>
You first import Python’s re library for regular expression handling. Then, you create a multi-line text using the triple string quotes.
Let’s dive into the first example:
>>> re.findall('B..', text)
['But']
You use the re.findall() method. Here’s the definition from the Finxter blog article:
The re.findall(pattern, string) method finds all occurrences of the pattern in the string and returns a list of all matching substrings.
The first argument is the regular expression pattern ‘B..’. The second argument is the string to be searched for the pattern. You want to find all patterns starting with the ‘B’ character, followed by two arbitrary characters except the newline character.
The findall() method finds only one such occurrence: the string ‘But’.
The second example shows that the dot operator does not match the newline character:
>>> re.findall('heard.Each', text)
[]
In this example, you’re looking at the simple pattern ‘heard.Each’. You want to find all occurrences of string ‘heard’ followed by an arbitrary non-whitespace character, followed by the string ‘Each’.
But such a pattern does not exist! Many coders intuitively read the dot regex as an arbitrary character. You must be aware that the correct definition of the dot regex is an arbitrary character except the newline. This is a source of many bugs in regular expressions.
The third example shows you how to explicitly match the newline character ‘\n’ instead:
Naturally, the following relevant question arises:
How to Match an Arbitrary Character (Including Newline)?
The dot regex . matches a single arbitrary character—except the newline character. But what if you do want to match the newline character, too? There are two main ways to accomplish this.
You create a multi-line string. Then you try to find the regex pattern ‘o.p’ in the string. But there’s no match because the dot operator does not match the newline character per default. However, if you define the flag re.DOTALL, the newline character will also be a valid match.
An alternative is to use the slightly more complicated regex pattern [.\n]. The square brackets enclose a character class—a set of characters that are all a valid match. Think of a character class as an OR operation: exactly one character must match.
What If You Actually Want to Match a Dot?
If you use the character ‘.’ in a regular expression, Python assumes that it’s the dot operator you’re talking about. But what if you actually want to match a dot—for example to match the period at the end of a sentence?
Nothing simpler than that: escape the dot regex by using the backslash: ‘\.’. The backslash nullifies the meaning of the special symbol ‘.’ in the regex. The regex engine now knows that you’re actually looking for the dot character, not an arbitrary character except newline.
Here’s an example:
>>> import re
>>> text = 'Python. Is. Great. Period.'
>>> re.findall('\.', text)
['.', '.', '.', '.']
The findall() method returns all four periods in the sentence as matching substrings for the regex ‘\.’.
In this example, you’ll learn how you can combine it with other regular expressions:
>>> re.findall('\.\s', text)
['. ', '. ', '. ']
Now, you’re looking for a period character followed by an arbitrary whitespace. There are only three such matching substrings in the text.
In the next example, you learn how to combine this with a character class:
>>> re.findall('[st]\.', text)
['s.', 't.']
You want to find either character ‘s’ or character ‘t’ followed by the period character ‘.’. Two substrings match this regex.
Note that skipping the backslash is required. If you forget this, it can lead to strange behavior:
>>> re.findall('[st].', text)
['th', 's.', 't.']
As an arbitrary character is allowed after the character class, the substring ‘th’ also matches the regex.
[Collection] What Are The Different Python Re Quantifiers?
If you want to use (and understand) regular expressions in practice, you’ll need to know the most important quantifiers that can be applied to any regex (including the dot regex)!
So let’s dive into the other regexes:
Quantifier
Description
Example
.
The wild-card (‘dot’) matches any character in a string except the newline character ‘n’.
Regex ‘…’ matches all words with three characters such as ‘abc’, ‘cat’, and ‘dog’.
*
The zero-or-more asterisk matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex.
Regex ‘cat*’ matches the strings ‘ca’, ‘cat’, ‘catt’, ‘cattt’, and ‘catttttttt’.
?
The zero-or-one matches (as the name suggests) either zero or one occurrences of the immediately preceding regex.
Regex ‘cat?’ matches both strings ‘ca’ and ‘cat’ — but not ‘catt’, ‘cattt’, and ‘catttttttt’.
+
The at-least-one matches one or more occurrences of the immediately preceding regex.
Regex ‘cat+’ does not match the string ‘ca’ but matches all strings with at least one trailing character ‘t’ such as ‘cat’, ‘catt’, and ‘cattt’.
^
The start-of-string matches the beginning of a string.
Regex ‘^p’ matches the strings ‘python’ and ‘programming’ but not ‘lisp’ and ‘spying’ where the character ‘p’ does not occur at the start of the string.
$
The end-of-string matches the end of a string.
Regex ‘py$’ would match the strings ‘main.py’ and ‘pypy’ but not the strings ‘python’ and ‘pypi’.
A|B
The OR matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions.
Regex ‘(hello)|(hi)’ matches strings ‘hello world’ and ‘hi python’. It wouldn’t make sense to try to match both of them at the same time.
AB
The AND matches first the regex A and second the regex B, in this sequence.
We’ve already seen it trivially in the regex ‘ca’ that matches first regex ‘c’ and second regex ‘a’.
Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the ‘^’ operator is usually denoted as the ‘caret’ operator. Those names are not descriptive so I came up with more kindergarten-like words such as the “start-of-string” operator.
We’ve already seen many examples but let’s dive into even more!
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) '''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!'] ''' print(re.findall('is.*and', text)) '''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and'] ''' print(re.findall('her:?', text)) '''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) '''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:'] ''' print(re.findall('^Ha.*', text)) '''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. Can you figure out why Python doesn't find any?
[] ''' print(re.findall('n$', text)) '''
Finds all occurrences where the new-line character 'n'
occurs at the end of the string.
['n'] ''' print(re.findall('(Life|Death)', text)) '''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death'] '''
In these examples, you’ve already seen the special symbol ‘\n’ which denotes the new-line character in Python (and most other languages). There are many special characters, specifically designed for regular expressions.
Related Re Methods
There are five important regular expression methods which you should master:
The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.
The re.split(pattern, string) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in our blog tutorial.
The re.sub(The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in our blog tutorial.
These seven methods are 80% of what you need to know to get started with Python’s regular expression functionality.
Where to Go From Here?
You’ve learned everything you need to know about the dot regex . in this regex tutorial.
Summary: The dot regex . matches all characters except the newline character. For example, the regular expression ‘…’ matches strings ‘hey’ and ‘tom’. But it does not match the string ‘yo\ntom’ which contains the newline character ‘\n’.
Want to earn money while you learn Python? Average Python programmers earn more than $50 per hour. You can certainly become average, can’t you?
Join the free webinar that shows you how to become a thriving coding business owner online!
Congratulations, you’re about to learn one of the most frequently used regex operators: the question mark quantifier A?.
In particular, this article is all about the ? quantifier in Python’s re library.
What’s the Python Re ? Quantifier
When applied to regular expression A, Python’s A? quantifier matches either zero or one occurrences of A. The ? quantifier always applies only to the preceding regular expression. For example, the regular expression ‘hey?’ matches both strings ‘he’ and ‘hey’. But it does not match the empty string because the ? quantifier does not apply to the whole regex ‘hey’ but only to the preceding regex ‘y’.
Let’s study two basic examples to help you gain a deeper understanding. Do you get all of them?
The first argument is the regular expression pattern ‘aa[cde]?’. The second argument is the string to be searched for the pattern. In plain English, you want to find all patterns that start with two ‘a’ characters, followed by one optional character—which can be either ‘c’, ‘d’, or ‘e’.
The findall() method returns three matching substrings:
First, string ‘aac’ matches the pattern. After Python consumes the matched substring, the remaining substring is ‘de aa aadcde’.
Second, string ‘aa’ matches the pattern. Python consumes it which leads to the remaining substring ‘ aadcde’.
Third, string ‘aad’ matches the pattern in the remaining substring. What remains is ‘cde’ which doesn’t contain a matching substring anymore.
In this example, you’re looking at the simple pattern ‘aa?’. You want to find all occurrences of character ‘a’ followed by an optional second ‘a’. But be aware that the optional second ‘a’ is not needed for the pattern to match.
Therefore, the regex engine finds three matches: the characters ‘a’.
This regex pattern looks complicated: ‘[cd]?[cde]?’. But is it really?
Let’s break it down step-by-step:
The first part of the regex [cd]? defines a character class [cd] which reads as “match either c or d”. The question mark quantifier indicates that you want to match either one or zero occurrences of this pattern.
The second part of the regex [cde]? defines a character class [cde] which reads as “match either c, d, or e”. Again, the question mark indicates the zero-or-one matching requirement.
As both parts are optional, the empty string matches the regex pattern. However, the Python regex engine attempts as much as possible.
Thus, the regex engine performs the following steps:
The first match in the string ‘ccc dd ee’ is ‘cc’. The regex engine consumes the matched substring, so the string ‘c dd ee’ remains.
The second match in the remaining string is the character ‘c’. The empty space ‘ ‘ does not match the regex so the second part of the regex [cde] does not match. Because of the question mark quantifier, this is okay for the regex engine. The remaining string is ‘ dd ee’.
The third match is the empty string ”. Of course, Python does not attempt to match the same position twice. Thus, it moves on to process the remaining string ‘dd ee’.
The fourth match is the string ‘dd’. The remaining string is ‘ ee’.
The fifth match is the string ”. The remaining string is ‘ee’.
The sixth match is the string ‘e’. The remaining string is ‘e’.
The seventh match is the string ‘e’. The remaining string is ”.
The eighth match is the string ”. Nothing remains.
This was the most complicated of our examples. Congratulations if you understood it completely!
[Collection] What Are The Different Python Re Quantifiers?
The question mark quantifier—Python re ?—is only one of many regex operators. If you want to use (and understand) regular expressions in practice, you’ll need to know all of them by heart!
So let’s dive into the other operators:
A regular expression is a decades-old concept in computer science. Invented in the 1950s by famous mathematician Stephen Cole Kleene, the decades of evolution brought a huge variety of operations. Collecting all operations and writing up a comprehensive list would result in a very thick and unreadable book by itself.
Fortunately, you don’t have to learn all regular expressions before you can start using them in your practical code projects. Next, you’ll get a quick and dirty overview of the most important regex operations and how to use them in Python. In follow-up chapters, you’ll then study them in detail — with many practical applications and code puzzles.
Here are the most important regex quantifiers:
Quantifier
Description
Example
.
The wild-card (‘dot’) matches any character in a string except the newline character ‘\n’.
Regex ‘…’ matches all words with three characters such as ‘abc’, ‘cat’, and ‘dog’.
*
The zero-or-more asterisk matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex.
Regex ‘cat*’ matches the strings ‘ca’, ‘cat’, ‘catt’, ‘cattt’, and ‘catttttttt’.
?
The zero-or-one matches (as the name suggests) either zero or one occurrences of the immediately preceding regex.
Regex ‘cat?’ matches both strings ‘ca’ and ‘cat’ — but not ‘catt’, ‘cattt’, and ‘catttttttt’.
+
The at-least-one matches one or more occurrences of the immediately preceding regex.
Regex ‘cat+’ does not match the string ‘ca’ but matches all strings with at least one trailing character ‘t’ such as ‘cat’, ‘catt’, and ‘cattt’.
^
The start-of-string matches the beginning of a string.
Regex ‘^p’ matches the strings ‘python’ and ‘programming’ but not ‘lisp’ and ‘spying’ where the character ‘p’ does not occur at the start of the string.
$
The end-of-string matches the end of a string.
Regex ‘py$’ would match the strings ‘main.py’ and ‘pypy’ but not the strings ‘python’ and ‘pypi’.
A|B
The OR matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions.
Regex ‘(hello)|(hi)’ matches strings ‘hello world’ and ‘hi python’. It wouldn’t make sense to try to match both of them at the same time.
AB
The AND matches first the regex A and second the regex B, in this sequence.
We’ve already seen it trivially in the regex ‘ca’ that matches first regex ‘c’ and second regex ‘a’.
Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the ‘^’ operator is usually denoted as the ‘caret’ operator. Those names are not descriptive so I came up with more kindergarten-like words such as the “start-of-string” operator.
We’ve already seen many examples but let’s dive into even more!
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) '''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!'] ''' print(re.findall('is.*and', text)) '''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and'] ''' print(re.findall('her:?', text)) '''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) '''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:'] ''' print(re.findall('^Ha.*', text)) '''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. Can you figure out why Python doesn't find any?
[] ''' print(re.findall('\n$', text)) '''
Finds all occurrences where the new-line character '\n'
occurs at the end of the string.
['\n'] ''' print(re.findall('(Life|Death)', text)) '''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death'] '''
In these examples, you’ve already seen the special symbol ‘\n’ which denotes the new-line character in Python (and most other languages). There are many special characters, specifically designed for regular expressions. Next, we’ll discover the most important special symbols.
What’s the Difference Between Python Re ? and * Quantifiers?
You can read the Python Re A? quantifier as zero-or-one regex: the preceding regex A is matched either zero times or exactly once. But it’s not matched more often.
Analogously, you can read the Python Re A* operator as the zero-or-multiple-times regex (I know it sounds a bit clunky): the preceding regex A is matched an arbitrary number of times.
The regex ‘ab?’ matches the character ‘a’ in the string, followed by character ‘b’ if it exists (which it does in the code).
The regex ‘ab*’ matches the character ‘a’ in the string, followed by as many characters ‘b’ as possible.
What’s the Difference Between Python Re ? and + Quantifiers?
You can read the Python Re A? quantifier as zero-or-one regex: the preceding regex A is matched either zero times or exactly once. But it’s not matched more often.
Analogously, you can read the Python Re A+ operator as the at-least-once regex: the preceding regex A is matched an arbitrary number of times but at least once.
The regex ‘ab?’ matches the character ‘a’ in the string, followed by character ‘b’ if it exists—but it doesn’t in the code.
The regex ‘ab+’ matches the character ‘a’ in the string, followed by as many characters ‘b’ as possible—but at least one. However, the character ‘b’ does not exist so there’s no match.
What are Python Re *?, +?, ?? Quantifiers?
You’ve learned about the three quantifiers:
The quantifier A* matches an arbitrary number of patterns A.
The quantifier A+ matches at least one pattern A.
The quantifier A? matches zero-or-one pattern A.
Those three are all greedy: they match as many occurrences of the pattern as possible. Here’s an example that shows their greediness:
The code shows that all three quantifiers *, +, and ? match as many ‘a’ characters as possible.
So, the logical question is: how to match as few as possible? We call this non-greedy matching. You can append the question mark after the respective quantifiers to tell the regex engine that you intend to match as few patterns as possible: *?, +?, and ??.
Here’s the same example but with the non-greedy quantifiers:
In this case, the code shows that all three quantifiers *?, +?, and ?? match as few ‘a’ characters as possible.
Related Re Methods
There are five important regular expression methods which you should master:
The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.
The re.split(pattern, string) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in our blog tutorial.
The re.sub(The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in our blog tutorial.
These seven methods are 80% of what you need to know to get started with Python’s regular expression functionality.
Where to Go From Here?
You’ve learned everything you need to know about the question mark quantifier ? in this regex tutorial.
Summary: When applied to regular expression A, Python’s A? quantifier matches either zero or one occurrences of A. The ? quantifier always applies only to the preceding regular expression. For example, the regular expression ‘hey?’ matches both strings ‘he’ and ‘hey’. But it does not match the empty string because the ? quantifier does not apply to the whole regex ‘hey’ but only to the preceding regex ‘y’.
Want to earn money while you learn Python? Average Python programmers earn more than $50 per hour. You can certainly become average, can’t you?
Join the free webinar that shows you how to become a thriving coding business owner online!
Do you want to replace all occurrences of a pattern in a string? You’re in the right place! This article is all about the re.sub(pattern, string) method of Python’s re library.
Let’s answer the following question:
How Does re.sub() Work in Python?
The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl.
Here’s a minimal example:
>>> import re
>>> text = 'C++ is the best language. C++ rocks!'
>>> re.sub('C\+\+', 'Python', text) 'Python is the best language. Python rocks!'
>>>
The text contains two occurrences of the string ‘C++’. You use the re.sub() method to search all of those occurrences. Your goal is to replace all those with the new string ‘Python’ (Python is the best language after all).
Note that you must escape the ‘+’ symbol in ‘C++’ as otherwise it would mean the at-least-oneregex.
You can also see that the sub() method replaces all matched patterns in the string—not only the first one.
But there’s more! Let’s have a look at the formal definition of the sub() method.
Specification
re.sub(pattern, repl, string, count=0, flags=0)
The method has four arguments—two of which are optional.
pattern: the regular expression pattern to search for strings you want to replace.
repl: the replacement string or function. If it’s a function, it needs to take one argument (the match object) which is passed for each occurrence of the pattern. The return value of the replacement function is a string that replaces the matching substring.
string: the text you want to replace.
count (optional argument): the maximum number of replacements you want to perform. Per default, you use count=0 which reads as replace all occurrences of the pattern.
flags (optional argument): a more advanced modifier that allows you to customize the behavior of the method. Per default, you don’t use any flags. Want to know how to use those flags? Check out this detailed article on the Finxter blog.
The initial three arguments are required. The remaining two arguments are optional.
You’ll learn about those arguments in more detail later.
Return Value:
A new string where count occurrences of the first substrings that match the pattern are replaced with the string value defined in the repl argument.
Regex Sub Minimal Example
Let’s study some more examples—from simple to more complex.
The easiest use is with only three arguments: the pattern ‘sing’, the replacement string ‘program’, and the string you want to modify (text in our example).
>>> import re
>>> text = 'Learn to sing because singing is fun.'
>>> re.sub('sing', 'program', text) 'Learn to program because programing is fun.'
Just ignore the grammar mistake for now. You get the point: we don’t sing, we program.
But what if you want to actually fix this grammar mistake? After all, it’s programming, not programing. In this case, we need to substitute ‘sing’ with ‘program’ in some cases and ‘sing’ with ‘programm’ in other cases.
You see where this leads us: the sub argument must be a function! So let’s try this:
import re def sub(matched): if matched.group(0)=='singing': return 'programming' else: return 'program' text = 'Learn to sing because singing is fun.'
print(re.sub('sing(ing)?', sub, text))
# Learn to program because programming is fun.
In this example, you first define a substitution function sub. The function takes the matched object as an input and returns a string. If it matches the longer form ‘singing’, it returns ‘programming’. Else it matches the shorter form ‘sing’, so it returns the shorter replacement string ‘program’ instead.
How to Use the count Argument of the Regex Sub Method?
What if you don’t want to substitute all occurrences of a pattern but only a limited number of them? Just use the count argument! Here’s an example:
>>> import re
>>> s = 'xxxxxxhelloxxxxxworld!xxxx'
>>> re.sub('x+', '', s, count=2) 'helloworld!xxxx'
>>> re.sub('x+', '', s, count=3) 'helloworld!'
In the first substitution operation, you replace only two occurrences of the pattern ‘x+’. In the second, you replace all three.
You can also use positional arguments to save some characters:
>>> re.sub('x+', '', s, 3) 'helloworld!'
But as many coders don’t know about the count argument, you probably should use the keyword argument for readability.
How to Use the Optional Flag Argument?
As you’ve seen in the specification, the re.sub() method comes with an optional fourth flag argument:
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).
Syntax
Meaning
re.ASCII
If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A
Same as re.ASCII
re.DEBUG
If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE
If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I
Same as re.IGNORECASE
re.LOCALE
Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L
Same as re.LOCALE
re.MULTILINE
This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M
Same as re.MULTILINE
re.DOTALL
Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S
Same as re.DOTALL
re.VERBOSE
To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X
Same as re.VERBOSE
Here’s how you’d use it in a minimal example:
>>> import re
>>> s = 'xxxiiixxXxxxiiixXXX'
>>> re.sub('x+', '', s) 'iiiXiiiXXX'
>>> re.sub('x+', '', s, flags=re.I) 'iiiiii'
In the second substitution operation, you ignore the capitalization by using the flag re.I which is short for re.IGNORECASE. That’s why it substitutes even the uppercase ‘X’ characters that now match the regex ‘x+’, too.
What’s the Difference Between Regex Sub and String Replace?
Why? Because you can replace all occurrences of a regex pattern rather than only all occurrences of a string in another string.
So with re.sub() you can do everything you can do with string.replace() but some things more!
Here’s an example:
>>> 'Python is python is PYTHON'.replace('python', 'fun') 'Python is fun is PYTHON'
>>> re.sub('(Python)|(python)|(PYTHON)', 'fun', 'Python is python is PYTHON') 'fun is fun is fun'
The string.replace() method only replaces the lowercase word ‘python’ while the re.sub() method replaces all occurrences of uppercase or lowercase variants.
Note, you can accomplish the same thing even easier with the flags argument.
>>> re.sub('python', 'fun', 'Python is python is PYTHON', flags=re.I) 'fun is fun is fun'
How to Remove Regex Pattern in Python?
Nothing simpler than that. Just use the empty string as a replacement string:
>>> re.sub('p', '', 'Python is python is PYTHON', flags=re.I) 'ython is ython is YTHON'
You replace all occurrences of the pattern 'p' with the empty string ''. In other words, you remove all occurrences of 'p'. As you use the flags=re.I argument, you ignore capitalization.
Related Re Methods
There are five important regular expression methods which you should master:
The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.
The re.split(pattern, string) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in our blog tutorial.
These five methods are 80% of what you need to know to get started with Python’s regular expression functionality.
Where to Go From Here?
You’ve learned the re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl.
Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:
Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!
This article is all about the re.split(pattern, string) method of Python’s re library.
Let’s answer the following question:
How Does re.split() Work in Python?
The re.split(pattern, string, maxsplit=0, flags=0) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those.
The string contains four words that are separated by whitespace characters (in particular: the empty space ‘ ‘ and the tabular character ‘\t’). You use the regular expression ‘\s+’ to match all occurrences of a positive number of subsequent whitespaces. The matched substrings serve as delimiters. The result is the string divided along those delimiters.
But that’s not all! Let’s have a look at the formal definition of the split method.
Specification
re.split(pattern, string, maxsplit=0, flags=0)
The method has four arguments—two of which are optional.
pattern: the regular expression pattern you want to use as a delimiter.
string: the text you want to break up into a list of strings.
maxsplit (optional argument): the maximum number of split operations (= the size of the returned list). Per default, the maxsplit argument is 0, which means that it’s ignored.
flags (optional argument): a more advanced modifier that allows you to customize the behavior of the function. Per default the regex module does not consider any flags. Want to know how to use those flags? Check out this detailed article on the Finxter blog.
The first and second arguments are required. The third and fourth arguments are optional.
You’ll learn about those arguments in more detail later.
Return Value:
The regex split method returns a list of substrings obtained by using the regex as a delimiter.
Regex Split Minimal Example
Let’s study some more examples—from simple to more complex.
The easiest use is with only two arguments: the delimiter regex and the string to be split.
You use an arbitrary number of ‘f’ or ‘g’ characters as regular expression delimiters. How do you accomplish this? By combining the character class regex [A] and the one-or-more regex A+ into the following regex: [fg]+. The strings in between are added to the return list.
How to Use the maxsplit Argument?
What if you don’t want to split the whole string but only a limited number of times. Here’s an example:
We use the simple delimiter regex ‘-‘ to divide the string into substrings. In the first method call, we set maxsplit=5 to obtain six list elements. In the second method call, we set maxsplit=3 to obtain three list elements. Can you see the pattern?
You can also use positional arguments to save some characters:
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).
Syntax
Meaning
re.ASCII
If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A
Same as re.ASCII
re.DEBUG
If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE
If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I
Same as re.IGNORECASE
re.LOCALE
Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L
Same as re.LOCALE
re.MULTILINE
This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M
Same as re.MULTILINE
re.DOTALL
Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S
Same as re.DOTALL
re.VERBOSE
To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
Although your regex is lowercase, we ignore the capitalization by using the flag re.I which is short for re.IGNORECASE. If we wouldn’t do it, the result would be quite different:
As the character class [xy] only contains lowerspace characters ‘x’ and ‘y’, their uppercase variants appear in the returned list rather than being used as delimiters.
What’s the Difference Between re.split() and string.split() Methods in Python?
The method re.split() is much more powerful. The re.split(pattern, string) method can split a string along all occurrences of a matched pattern. The pattern can be arbitrarily complicated. This is in contrast to the string.split(delimiter) method which also splits a string into substrings along the delimiter. However, the delimiter must be a normal string.
An example where the more powerful re.split() method is superior is in splitting a text along any whitespace characters:
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely Frost Upon the sweetest flower of all the field. ''' print(re.split('\s+', text)) '''
['', 'Ha!', 'let', 'me', 'see', 'her:', 'out,', 'alas!', "he's", 'cold:', 'Her', 'blood', 'is', 'settled,', 'and', 'her', 'joints', 'are', 'stiff;', 'Life', 'and', 'these', 'lips', 'have', 'long', 'been', 'separated:', 'Death', 'lies', 'on', 'her', 'like', 'an', 'untimely', 'Frost', 'Upon', 'the', 'sweetest', 'flower', 'of', 'all', 'the', 'field.', ''] '''
The re.split() method divides the string along any positive number of whitespace characters. You couldn’t achieve such a result with string.split(delimiter) because the delimiter must be a constant-sized string.
Related Re Methods
There are five important regular expression methods which you should master:
The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.
These five methods are 80% of what you need to know to get started with Python’s regular expression functionality.
Where to Go From Here?
You’ve learned about the re.split(pattern, string) method that divides the string along the matched pattern occurrences and returns a list of substrings.
Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:
Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!
This article is all about the re.compile(pattern) method of Python’s re library. Before we dive into re.compile(), let’s get an overview of the four related methods you must understand:
The findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
The search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
The match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
The fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
Equipped with this quick overview of the most critical regex methods, let’s answer the following question:
How Does re.compile() Work in Python?
The re.compile(pattern) method returns a regular expression object (see next section)
You then use the object to call important regex methods such as search(string), match(string), fullmatch(string), and findall(string).
In short: You compile the pattern first. You search the pattern in a string second.
This two-step approach is more efficient than calling, say, search(pattern, string) at once. That is, IF you call the search() method multiple times on the same pattern. Why? Because you can reuse the compiled pattern multiple times.
Here’s an example:
import re # These two lines ...
regex = re.compile('Py...n')
match = regex.search('Python is great') # ... are equivalent to ...
match = re.search('Py...n', 'Python is great')
In both instances, the match variable contains the following match object:
<re.Match object; span=(0, 6), match='Python'>
But in the first case, we can find the pattern not only in the string ‘Python is great’ but also in other strings—without any redundant work of compiling the pattern again and again.
Specification:
re.compile(pattern, flags=0)
The method has up to two arguments.
pattern: the regular expression pattern that you want to match.
We’ll explore those arguments in more detail later.
Return Value:
The re.compile(patterns, flags) method returns a regular expression object. You may ask (and rightly so):
What’s a Regular Expression Object?
Python internally creates a regular expression object (from the Pattern class) to prepare the pattern matching process. You can call the following methods on the regex object:
Method
Description
Pattern.search(string[, pos[, endpos]])
Searches the regex anywhere in the string and returns a match object or None. You can define start and end positions of the search.
Pattern.match(string[, pos[, endpos]])
Searches the regex at the beginning of the string and returns a match object or None. You can define start and end positions of the search.
Pattern.fullmatch(string[, pos[, endpos]])
Matches the regex with the whole string and returns a match object or None. You can define start and end positions of the search.
Pattern.split(string, maxsplit=0)
Divides the string into a list of substrings. The regex is the delimiter. You can define a maximum number of splits.
Pattern.findall(string[, pos[, endpos]])
Searches the regex anywhere in the string and returns a list of matching substrings. You can define start and end positions of the search.
Pattern.finditer(string[, pos[, endpos]])
Returns an iterator that goes over all matches of the regex in the string (returns one match object after another). You can define the start and end positions of the search.
Pattern.sub(repl, string, count=0)
Returns a new string by replacing the first count occurrences of the regex in the string (from left to right) with the replacement string repl.
Pattern.subn(repl, string, count=0)
Returns a new string by replacing the first count occurrences of the regex in the string (from left to right) with the replacement string repl. However, it returns a tuple with the replaced string as the first and the number of successful replacements as the second tuple value.
If you’re familiar with the most basic regex methods, you’ll realize that all of them appear in this table. But there’s one distinction: you don’t have to define the pattern as an argument. For example, the regex method re.search(pattern, string) will internally compile a regex object p and then call p.search(string).
def search(pattern, string, flags=0): """Scan through string looking for a match to the pattern, returning a Match object, or None if no match was found.""" return _compile(pattern, flags).search(string)
The re.search(pattern, string) method is a mere wrapper for compiling the pattern first and calling the p.search(string) function on the compiled regex object p.
Is It Worth Using Python’s re.compile()?
No, in the vast majority of cases, it’s not worth the extra line.
Consider the following example:
import re # These two lines ...
regex = re.compile('Py...n')
match = regex.search('Python is great') # ... are equivalent to ...
match = re.search('Py...n', 'Python is great')
Don’t get me wrong. Compiling a pattern once and using it many times throughout your code (e.g., in a loop) comes with a big performance benefit. In some anecdotal cases, compiling the pattern first lead to 10x to 50x speedup compared to compiling it again and again.
But the reason it is not worth the extra line is that Python’s re library ships with an internal cache. At the time of this writing, the cache has a limit of up to 512 compiled regex objects. So for the first 512 times, you can be sure when calling re.search(pattern, string) that the cache contains the compiled pattern already.
# --------------------------------------------------------------------
# internals _cache = {} # ordered! _MAXCACHE = 512
def _compile(pattern, flags): # internal: compile pattern if isinstance(flags, RegexFlag): flags = flags.value try: return _cache[type(pattern), pattern, flags] except KeyError: pass if isinstance(pattern, Pattern): if flags: raise ValueError( "cannot process flags argument with a compiled pattern") return pattern if not sre_compile.isstring(pattern): raise TypeError("first argument must be string or compiled pattern") p = sre_compile.compile(pattern, flags) if not (flags & DEBUG): if len(_cache) >= _MAXCACHE: # Drop the oldest item try: del _cache[next(iter(_cache))] except (StopIteration, RuntimeError, KeyError): pass _cache[type(pattern), pattern, flags] = p return p
Can you find the spots where the cache is initialized and used?
While in most cases, you don’t need to compile a pattern, in some cases, you should. These follow directly from the previous implementation:
You’ve got more than MAXCACHE patterns in your code.
You’ve got more than MAXCACHE different patterns between two same pattern instances. Only in this case, you will see “cache misses” where the cache has already flushed the seemingly stale pattern instances to make room for newer ones.
You reuse the pattern multiple times. Because if you don’t, it won’t make sense to use sparse memory to save them in your memory.
(Even then, it may only be useful if the patterns are relatively complicated. Otherwise, you won’t see a lot of performance benefits in practice.)
To summarize, compiling the pattern first and storing the compiled pattern in a variable for later use is often nothing but “premature optimization”—one of the deadly sins of beginner and intermediate programmers.
What Does re.compile() Really Do?
It doesn’t seem like a lot, does it? My intuition was that the real work is in finding the pattern in the text—which happens after compilation. And, of course, matching the pattern is the hard part. But a sensible compilation helps a lot in preparing the pattern to be matched efficiently by the regex engine—work that would otherwise have be done by the regex engine.
Regex’s compile() method does a lot of things such as:
Combine two subsequent characters in the regex if they together indicate a special symbol such as certain Greek symbols.
Prepare the regex to ignore uppercase and lowercase.
Check for certain (smaller) patterns in the regex.
Analyze matching groups in the regex enclosed in parentheses.
Here’s the implemenation of the compile() method—it looks more complicated than expected, no?
def _compile(code, pattern, flags): # internal: compile a (sub)pattern emit = code.append _len = len LITERAL_CODES = _LITERAL_CODES REPEATING_CODES = _REPEATING_CODES SUCCESS_CODES = _SUCCESS_CODES ASSERT_CODES = _ASSERT_CODES iscased = None tolower = None fixes = None if flags & SRE_FLAG_IGNORECASE and not flags & SRE_FLAG_LOCALE: if flags & SRE_FLAG_UNICODE: iscased = _sre.unicode_iscased tolower = _sre.unicode_tolower fixes = _ignorecase_fixes else: iscased = _sre.ascii_iscased tolower = _sre.ascii_tolower for op, av in pattern: if op in LITERAL_CODES: if not flags & SRE_FLAG_IGNORECASE: emit(op) emit(av) elif flags & SRE_FLAG_LOCALE: emit(OP_LOCALE_IGNORE[op]) emit(av) elif not iscased(av): emit(op) emit(av) else: lo = tolower(av) if not fixes: # ascii emit(OP_IGNORE[op]) emit(lo) elif lo not in fixes: emit(OP_UNICODE_IGNORE[op]) emit(lo) else: emit(IN_UNI_IGNORE) skip = _len(code); emit(0) if op is NOT_LITERAL: emit(NEGATE) for k in (lo,) + fixes[lo]: emit(LITERAL) emit(k) emit(FAILURE) code[skip] = _len(code) - skip elif op is IN: charset, hascased = _optimize_charset(av, iscased, tolower, fixes) if flags & SRE_FLAG_IGNORECASE and flags & SRE_FLAG_LOCALE: emit(IN_LOC_IGNORE) elif not hascased: emit(IN) elif not fixes: # ascii emit(IN_IGNORE) else: emit(IN_UNI_IGNORE) skip = _len(code); emit(0) _compile_charset(charset, flags, code) code[skip] = _len(code) - skip elif op is ANY: if flags & SRE_FLAG_DOTALL: emit(ANY_ALL) else: emit(ANY) elif op in REPEATING_CODES: if flags & SRE_FLAG_TEMPLATE: raise error("internal: unsupported template operator %r" % (op,)) if _simple(av[2]): if op is MAX_REPEAT: emit(REPEAT_ONE) else: emit(MIN_REPEAT_ONE) skip = _len(code); emit(0) emit(av[0]) emit(av[1]) _compile(code, av[2], flags) emit(SUCCESS) code[skip] = _len(code) - skip else: emit(REPEAT) skip = _len(code); emit(0) emit(av[0]) emit(av[1]) _compile(code, av[2], flags) code[skip] = _len(code) - skip if op is MAX_REPEAT: emit(MAX_UNTIL) else: emit(MIN_UNTIL) elif op is SUBPATTERN: group, add_flags, del_flags, p = av if group: emit(MARK) emit((group-1)*2) # _compile_info(code, p, _combine_flags(flags, add_flags, del_flags)) _compile(code, p, _combine_flags(flags, add_flags, del_flags)) if group: emit(MARK) emit((group-1)*2+1) elif op in SUCCESS_CODES: emit(op) elif op in ASSERT_CODES: emit(op) skip = _len(code); emit(0) if av[0] >= 0: emit(0) # look ahead else: lo, hi = av[1].getwidth() if lo != hi: raise error("look-behind requires fixed-width pattern") emit(lo) # look behind _compile(code, av[1], flags) emit(SUCCESS) code[skip] = _len(code) - skip elif op is CALL: emit(op) skip = _len(code); emit(0) _compile(code, av, flags) emit(SUCCESS) code[skip] = _len(code) - skip elif op is AT: emit(op) if flags & SRE_FLAG_MULTILINE: av = AT_MULTILINE.get(av, av) if flags & SRE_FLAG_LOCALE: av = AT_LOCALE.get(av, av) elif flags & SRE_FLAG_UNICODE: av = AT_UNICODE.get(av, av) emit(av) elif op is BRANCH: emit(op) tail = [] tailappend = tail.append for av in av[1]: skip = _len(code); emit(0) # _compile_info(code, av, flags) _compile(code, av, flags) emit(JUMP) tailappend(_len(code)); emit(0) code[skip] = _len(code) - skip emit(FAILURE) # end of branch for tail in tail: code[tail] = _len(code) - tail elif op is CATEGORY: emit(op) if flags & SRE_FLAG_LOCALE: av = CH_LOCALE[av] elif flags & SRE_FLAG_UNICODE: av = CH_UNICODE[av] emit(av) elif op is GROUPREF: if not flags & SRE_FLAG_IGNORECASE: emit(op) elif flags & SRE_FLAG_LOCALE: emit(GROUPREF_LOC_IGNORE) elif not fixes: # ascii emit(GROUPREF_IGNORE) else: emit(GROUPREF_UNI_IGNORE) emit(av-1) elif op is GROUPREF_EXISTS: emit(op) emit(av[0]-1) skipyes = _len(code); emit(0) _compile(code, av[1], flags) if av[2]: emit(JUMP) skipno = _len(code); emit(0) code[skipyes] = _len(code) - skipyes + 1 _compile(code, av[2], flags) code[skipno] = _len(code) - skipno else: code[skipyes] = _len(code) - skipyes + 1 else: raise error("internal: unsupported operand type %r" % (op,))
Don’t worry, you don’t need to understand the code. Just note that all this work would have to be done by the regex engine at “matching runtime” if you wouldn’t compile the pattern first. If we can do it only once, it’s certainly a low-hanging fruit for performance optimizations—especially for long regular expression patterns.
How to Use the Optional Flag Argument?
As you’ve seen in the specification, the compile() method comes with an optional third ‘flag’ argument:
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).
Syntax
Meaning
re.ASCII
If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A
Same as re.ASCII
re.DEBUG
If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE
If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I
Same as re.IGNORECASE
re.LOCALE
Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L
Same as re.LOCALE
re.MULTILINE
This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M
Same as re.MULTILINE
re.DOTALL
Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S
Same as re.DOTALL
re.VERBOSE
To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X
Same as re.VERBOSE
Here’s how you’d use it in a practical example:
import re text = 'Python is great (python really is)' regex = re.compile('Py...n', flags=re.IGNORECASE) matches = regex.findall(text)
print(matches)
# ['Python', 'python']
Although your regex ‘Python’ is uppercase, we ignore the capitalization by using the flag re.IGNORECASE.
Where to Go From Here?
You’ve learned about the re.compile(pattern) method that prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code.
Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:
Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!
This article is all about the re.fullmatch(pattern, string) method of Python’s re library. There are three similar methods to help you use regular expressions:
The findall(pattern, string) method returns a list of string matches. Check out our blog tutorial.
The search(pattern, string) method returns a match object of the first match. Check out our blog tutorial.
The match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Check out our blog tutorial.
So how does the re.fullmatch() method work? Let’s study the specification.
How Does re.fullmatch() Work in Python?
The re.fullmatch(pattern, string) method returns a match object if the pattern matches the whole string.
Specification:
re.fullmatch(pattern, string, flags=0)
The re.fullmatch() method has up to three arguments.
pattern: the regular expression pattern that you want to match.
string: the string which you want to search for the pattern.
The re.fullmatch() method returns a match object. You may ask (and rightly so):
What’s a Match Object?
If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where?
The match object is a simple wrapper for this information. Some regex methods of the re package in Python—such as fullmatch()—automatically create a match object upon the first pattern match.
At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods m.start() and m.end() on the match object m:
In the first line, you create a match object m by using the re.fullmatch() method. The pattern ‘h…o’ matches in the string ‘hello’ at start position 0 and end position 5. But note that as the fullmatch() method always attempts to match the whole string, the m.start() method will always return zero.
Now, you know the purpose of the match object in Python. Let’s check out a few examples of re.fullmatch()!
A Guided Example for re.fullmatch()
First, you import the re module and create the text string to be searched for the regex patterns:
>>> import re
>>> text = '''
Call me Ishmael. Some years ago--never mind how long precisely
--having little or no money in my purse, and nothing particular
to interest me on shore, I thought I would sail about a little
and see the watery part of the world. '''
Let’s say you want to match the full text with this regular expression:
>>> re.fullmatch('Call(.|\n)*', text)
>>>
The first argument is the pattern to be found: 'Call(.|\n)*'. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. The third argument flags of the fullmatch() method is optional and we skip it in the code.
There’s no output! This means that the re.fullmatch() method did not return a match object. Why? Because at the beginning of the string, there’s no match for the ‘Call’ part of the regex. The regex starts with an empty line!
So how can we fix this? Simple, by matching a new line character ‘\n’ at the beginning of the string.
>>> re.fullmatch('\nCall(.|\n)*', text)
<re.Match object; span=(0, 229), match='\nCall me Ishmael. Some years ago--never mind how>
The regex (.|\n)* matches an arbitrary number of characters (new line characters or not) after the prefix ‘\nCall’. This matches the whole text so the result is a match object. Note that there are 229 matching positions so the string included in resulting match object is only the prefix of the whole matching string. This fact is often overlooked by beginner coders.
What’s the Difference Between re.fullmatch() and re.match()?
The methods re.fullmatch() and re.match(pattern, string) both return a match object. Both attempt to match at the beginning of the string. The only difference is that re.fullmatch() also attempts to match the end of the string as well: it wants to match the whole string!
You can see this difference in the following code:
>>> text = 'More with less'
>>> re.match('More', text)
<re.Match object; span=(0, 4), match='More'>
>>> re.fullmatch('More', text)
>>>
The re.match(‘More’, text) method matches the string ‘More’ at the beginning of the string ‘More with less’. But the re.fullmatch(‘More’, text) method does not match the whole text. Therefore, it returns the None object—nothing is printed to your shell!
What’s the Difference Between re.fullmatch() and re.findall()?
There are two differences between the re.fullmatch(pattern, string) and re.findall(pattern, string) methods:
re.fullmatch(pattern, string) returns a match object while re.findall(pattern, string) returns a list of matching strings.
re.fullmatch(pattern, string) can only match the whole string, while re.findall(pattern, string) can return multiple matches in the string.
Both can be seen in the following example:
>>> text = 'the 42th truth is 42'
>>> re.fullmatch('.*?42', text)
<re.Match object; span=(0, 20), match='the 42th truth is 42'>
>>> re.findall('.*?42', text)
['the 42', 'th truth is 42']
Note that the regex .*? matches an arbitrary number of characters but it attempts to consume as few characters as possible. This is called “non-greedy” match (the *? operator). The fullmatch() method only returns a match object that matches the whole string. The findall() method returns a list of all occurrences. As the match is non-greedy, it finds two such matches.
What’s the Difference Between re.fullmatch() and re.search()?
The methods re.fullmatch() and re.search(pattern, string) both return a match object. However, re.fullmatch() attempts to match the whole string while re.search() matches anywhere in the string.
You can see this difference in the following code:
>>> text = 'Finxter is fun!'
>>> re.search('Finxter', text)
<re.Match object; span=(0, 7), match='Finxter'>
>>> re.fullmatch('Finxter', text)
>>>
The re.search() method retrieves the match of the ‘Finxter’ substring as a match object. But the re.fullmatch() method has no return value because the substring ‘Finxter’ does not match the whole string ‘Finxter is fun!’.
How to Use the Optional Flag Argument?
As you’ve seen in the specification, the fullmatch() method comes with an optional third ‘flag’ argument:
re.fullmatch(pattern, string, flags=0)
What’s the purpose of the flags argument?
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).
Syntax
Meaning
re.ASCII
If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A
Same as re.ASCII
re.DEBUG
If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE
If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I
Same as re.IGNORECASE
re.LOCALE
Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L
Same as re.LOCALE
re.MULTILINE
This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M
Same as re.MULTILINE
re.DOTALL
Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S
Same as re.DOTALL
re.VERBOSE
To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X
Same as re.VERBOSE
Here’s how you’d use it in a practical example:
>>> text = 'Python is great!'
>>> re.search('PYTHON', text, flags=re.IGNORECASE)
<re.Match object; span=(0, 6), match='Python'>
Although your regex ‘PYTHON’ is all-caps, we ignore the capitalization by using the flag re.IGNORECASE.
Where to Go From Here?
This article has introduced the re.fullmatch(pattern, string) method that attempts to match the whole string—and returns a match object if it succeeds or None if it doesn’t.
Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:
Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!
This article is all about the match() method of Python’s re library. There are two similar methods to help you use regular expressions:
The easy-to-use but less powerful findall() method returns a list of string matches. Check out our blog tutorial.
The search() method returns a match object of the first match. Check out our blog tutorial.
So how does the re.match() method work? Let’s study the specification.
How Does re.match() Work in Python?
The re.match(pattern, string) method matches the pattern at the beginning of the string and returns a match object.
Specification:
re.match(pattern, string, flags=0)
The re.match() method has up to three arguments.
pattern: the regular expression pattern that you want to match.
string: the string which you want to search for the pattern.
The re.match() method returns a match object. You may ask (and rightly so):
What’s a Match Object?
If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where?
The match object is a simple wrapper for this information. Some regex methods of the re package in Python—such as match()—automatically create a match object upon the first pattern match.
At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods m.start() and m.end() on the match object m:
In the first line, you create a match object m by using the re.match() method. The pattern ‘h…o’ matches in the string ‘hello world’ at start position 0. You use the start and end position to access the substring that matches the pattern (using the popular Python technique of slicing). But note that as the match() method always attempts to match only at the beginning of the string, the m.start() method will always return zero.
Now, you know the purpose of the match() object in Python. Let’s check out a few examples of re.match()!
A Guided Example for re.match()
First, you import the re module and create the text string to be searched for the regex patterns:
>>> import re
>>> text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. '''
Let’s say you want to search the text for the string ‘her’:
>>> re.match('lips', text)
>>>
The first argument is the pattern to be found: the string ‘lips’. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. The third argument flags of the match() method is optional.
There’s no output! This means that the re.match() method did not return a match object. Why? Because at the beginning of the string, there’s no match for the regex pattern ‘lips’.
So how can we fix this? Simple, by matching all the characters that preced the string ‘lips’ in the text:
>>> re.match('(.|\n)*lips', text)
<re.Match object; span=(0, 122), match="\n Ha! let me see her: out, alas! he's cold:\n>
The regex (.|\n)*lips matches all prefixes (an arbitrary number of characters including new lines) followed by the string ‘lips’. This results in a new match object that matches a huge substring from position 0 to position 122. Note that the match object doesn’t print the whole substring to the shell. If you access the matched substring, you’ll get the following result:
>>> m = re.match('(.|\n)*lips', text)
>>> text[m.start():m.end()] "\n Ha! let me see her: out, alas! he's cold:\n Her blood is settled, and her joints are stiff;\n Life and these lips"
Interestingly, you can also achieve the same thing by specifying the third flag argument as follows:
>>> m = re.match('.*lips', text, flags=re.DOTALL)
>>> text[m.start():m.end()] "\n Ha! let me see her: out, alas! he's cold:\n Her blood is settled, and her joints are stiff;\n Life and these lips"
The re.DOTALL flag ensures that the dot operator . matches all characters including the new line character.
What’s the Difference Between re.match() and re.findall()?
There are two differences between the re.match(pattern, string) and re.findall(pattern, string) methods:
re.match(pattern, string) returns a match object while re.findall(pattern, string) returns a list of matching strings.
re.match(pattern, string) returns only the first match in the string—and only at the beginning—while re.findall(pattern, string) returns all matches in the string.
Both can be seen in the following example:
>>> text = 'Python is superior to Python'
>>> re.match('Py...n', text)
<re.Match object; span=(0, 6), match='Python'>
>>> re.findall('Py...n', text)
['Python', 'Python']
The string ‘Python is superior to Python’ contains two occurrences of ‘Python’. The match() method only returns a match object of the first occurrence. The findall() method returns a list of all occurrences.
What’s the Difference Between re.match() and re.search()?
The methods re.search(pattern, string) and re.match(pattern, string) both return a match object of the first match. However, re.match() attempts to match at the beginning of the string while re.search() matches anywhere in the string.
You can see this difference in the following code:
>>> text = 'Slim Shady is my name'
>>> re.search('Shady', text)
<re.Match object; span=(5, 10), match='Shady'>
>>> re.match('Shady', text)
>>>
The re.search() method retrieves the match of the ‘Shady’ substring as a match object. But if you use the re.match() method, there is no match and no return value because the substring ‘Shady’ does not occur at the beginning of the string ‘Slim Shady is my name’.
How to Use the Optional Flag Argument?
As you’ve seen in the specification, the match() method comes with an optional third ‘flag’ argument:
re.match(pattern, string, flags=0)
What’s the purpose of the flags argument?
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).
Syntax
Meaning
re.ASCII
If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A
Same as re.ASCII
re.DEBUG
If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE
If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I
Same as re.IGNORECASE
re.LOCALE
Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L
Same as re.LOCALE
re.MULTILINE
This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M
Same as re.MULTILINE
re.DOTALL
Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S
Same as re.DOTALL
re.VERBOSE
To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X
Same as re.VERBOSE
Here’s how you’d use it in a practical example:
>>> text = 'Python is great!'
>>> re.search('PYTHON', text, flags=re.IGNORECASE)
<re.Match object; span=(0, 6), match='Python'>
Although your regex ‘PYTHON’ is all-caps, we ignore the capitalization by using the flag re.IGNORECASE.
Where to Go From Here?
This article has introduced the re.match(pattern, string) method that attempts to match the first occurrence of the regex pattern at the beginning of a given string—and returns a match object.
Python soars in popularity. There are two types of people: those who understand coding and those who don’t. The latter will have larger and larger difficulties participating in the era of massive adoption and penetration of digital content. Do you want to increase your Python skills daily without investing a lot of time?
When I first learned about regular expressions, I didn’t appreciate their power. But there’s a reason regular expressions have survived seven decades of technological disruption: coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!
This article is all about the search() method of Python’s re library. To learn about the easy-to-use but less powerful findall() method that returns a list of string matches, check out our article about the similar findall() method.
So how does the re.search() method work? Let’s study the specification.
How Does re.search() Work in Python?
The re.search(pattern, string) method matches the first occurrence of the pattern in the string and returns a match object.
Specification:
re.search(pattern, string, flags=0)
The re.search() method has up to three arguments.
pattern: the regular expression pattern that you want to match.
string: the string which you want to search for the pattern.
The re.search() method returns a match object. You may ask (and rightly so):
What’s a Match Object?
If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where?
The match object is a simple wrapper for this information. Some regex methods of the re package in Python—such as search()—automatically create a match object upon the first pattern match.
At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods m.start() and m.end() on the match object m:
In the first line, you create a match object m by using the re.search() method. The pattern ‘h…o’ matches in the string ‘hello world’ at start position 0. You use the start and end position to access the substring that matches the pattern (using the popular Python technique of slicing).
Now, you know the purpose of the match() object in Python. Let’s check out a few examples of re.search()!
A Guided Example for re.search()
First, you import the re module and create the text string to be searched for the regex patterns:
>>> import re
>>> text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. '''
Let’s say you want to search the text for the string ‘her’:
The first argument is the pattern to be found. In our case, it’s the string ‘her’. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. You don’t need to define the optional third argument flags of the search() method because you’re fine with the default behavior in this case.
Look at the output: it’s a match object! The match object gives the span of the match—that is the start and stop indices of the match. We can also directly access those boundaries by using the start() and stop() methods of the match object:
The problem is that the search() method only retrieves the first occurrence of the pattern in the string. If you want to find all matches in the string, you may want to use the findall() method of the re library.
What’s the Difference Between re.search() and re.findall()?
There are two differences between the re.search(pattern, string) and re.findall(pattern, string) methods:
re.search(pattern, string) returns a match object while re.findall(pattern, string) returns a list of matching strings.
re.search(pattern, string) returns only the first match in the string while re.findall(pattern, string) returns all matches in the string.
Both can be seen in the following example:
>>> text = 'Python is superior to Python'
>>> re.search('Py...n', text)
<re.Match object; span=(0, 6), match='Python'>
>>> re.findall('Py...n', text)
['Python', 'Python']
The string ‘Python is superior to Python’ contains two occurrences of ‘Python’. The search() method only returns a match object of the first occurrence. The findall() method returns a list of all occurrences.
What’s the Difference Between re.search() and re.match()?
The methods re.search(pattern, string) and re.match(pattern, string) both return a match object of the first match. However, re.match() attempts to match at the beginning of the string while re.search() matches anywhere in the string.
You can see this difference in the following code:
>>> text = 'Slim Shady is my name'
>>> re.search('Shady', text)
<re.Match object; span=(5, 10), match='Shady'>
>>> re.match('Shady', text)
>>>
The re.search() method retrieves the match of the ‘Shady’ substring as a match object. But if you use the re.match() method, there is no match and no return value because the substring ‘Shady’ does not occur at the beginning of the string ‘Slim Shady is my name’.
How to Use the Optional Flag Argument?
As you’ve seen in the specification, the search() method comes with an optional third ‘flag’ argument:
re.search(pattern, string, flags=0)
What’s the purpose of the flags argument?
Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).
Syntax
Meaning
re.ASCII
If you don’t use this flag, the special Python regex symbols \w, \W, \b, \B, \d, \D, \s and \S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A
Same as re.ASCII
re.DEBUG
If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE
If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I
Same as re.IGNORECASE
re.LOCALE
Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L
Same as re.LOCALE
re.MULTILINE
This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M
Same as re.MULTILINE
re.DOTALL
Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘\n’. Switch on this flag to really match all characters including the newline character.
re.S
Same as re.DOTALL
re.VERBOSE
To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X
Same as re.VERBOSE
Here’s how you’d use it in a practical example:
>>> text = 'Python is great!'
>>> re.search('PYTHON', text, flags=re.IGNORECASE)
<re.Match object; span=(0, 6), match='Python'>
Although your regex ‘PYTHON’ is all-caps, we ignore the capitalization by using the flag re.IGNORECASE.
Where to Go From Here?
This article has introduced the re.search(pattern, string) method that attempts to match the first occurrence of the regex pattern in a given string—and returns a match object.
Python soars in popularity. There are two types of people: those who understand coding and those who don’t. The latter will have larger and larger difficulties participating in the era of massive adoption and penetration of digital content. Do you want to increase your Python skills daily without investing a lot of time?