Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Python Character Set [Regex Tutorial]

#1
Python Character Set [Regex Tutorial]

This tutorial makes you a master of character sets in Python. (I know, I know, it feels awesome to see your deepest desires finally come true.)



As I wrote this article, I saw a lot of different terms describing this same powerful concept such as “character class“, “character range“, or “character group“. However, the most precise term is “character set” as introduced in the official Python regex docs. So in this tutorial, I’ll use this term throughout.

Python Regex – Character Set


So, what is a character set in regular expressions?

The character set is (surprise) a set of characters: if you use a character set in a regular expression pattern, you tell the regex engine to choose one arbitrary character from the set. As you may know, a set is an unordered collection of unique elements. So each character in a character set is unique and the order doesn’t really matter (with a few minor exceptions).

Here’s an example of a character set as used in a regular expression:

>>> import re
>>> re.findall('[abcde]', 'hello world!')
['e', 'd']

You use the re.findall(pattern, string) method to match the pattern '[abcde]' in the string 'hello world!'. You can think of all characters a, b, c, d, and e as being in an OR relation: either of them would be a valid match.

The regex engine goes from the left to the right, scanning over the string ‘hello world!’ and simultaneously trying to match the (character set) pattern. Two characters from the text ‘hello world!’ are in the character set—they are valid matches and returned by the re.findall() method.

You can simplify many character sets by using the range symbol ‘-‘ that has a special meaning within square brackets: [a-z] reads “match any character from a to z”, while [0-9] reads “match any character from 0 to 9”.

Here’s the previous example, simplified:

>>> re.findall('[a-e]', 'hello world!')
['e', 'd']

You can even combine multiple character ranges in a single character set:

>>> re.findall('[a-eA-E0-4]', 'hello WORLD 42!')
['e', 'D', '4', '2']

Here, you match three ranges: lowercase characters from a to e, uppercase characters from A to E, and numbers from 0 to 4. Note that the ranges are inclusive so both start and stop symbols are included in the range.

Python Regex Negative Character Set


But what if you want to match all characters—except some? You can achieve this with a negative character set!

The negative character set works just like a character set, but with one difference: it matches all characters that are not in the character set.

Here’s an example where you match all sequences of characters that do not contain characters a, b, c, d, or e:

>>> import re
>>> re.findall('[^a-e]+', 'hello world')
['h', 'llo worl']

We use the “at-least-once quantifier +” in the example that matches at least one occurrence of the preceding regex (if you’re unsure about how it works, check out my detailed Finxter tutorial about the plus operator).

There are only two such sequences: the one-character sequence ‘h’ and the eight-character sequence ‘llo worl’. You can see that even the empty space matches the negative character set.

Summary: the negative character set matches all characters that are not enclosed in the brackets.

How to Fix “re.error: unterminated character set at position”?


Now that you know character classes, you can probably fix this error easily: it occurs if you use the opening (or closing) bracket ‘[‘ in your pattern. Maybe you want to match the character ‘[‘ in your string?

But Python assumes that you’ve just opened a character class—and you forgot to close it.

Here’s an example:

>>> re.findall('[', 'hello [world]')
Traceback (most recent call last): File "<pyshell#5>", line 1, in <module> re.findall('[', 'hello [world]') File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\re.py", line 223, in findall return _compile(pattern, flags).findall(string) File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\re.py", line 286, in _compile p = sre_compile.compile(pattern, flags) File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 930, in parse p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 426, in _parse_sub not nested and not items)) File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 532, in _parse source.tell() - here)
re.error: unterminated character set at position 0

The error happens because you used the bracket character ‘[‘ as if it was a normal symbol.

So, how to fix it? Just escape the special bracket character ‘\[‘ with the single backslash:

>>> re.findall('\[', 'hello [world]')
['[']

This removes the “special” meaning of the bracket symbol.

Related Re Methods


There are seven important regular expression methods which you must master:

  • The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
  • The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
  • The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
  • The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
  • The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.
  • The re.split(pattern, string) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in our blog tutorial.
  • The re.sub(The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in our blog tutorial.

These seven methods are 80% of what you need to know to get started with Python’s regular expression functionality. If you want to learn more, check out the most comprehensive Python regex tutorial in the world!

Where to Go From Here?


You’ve learned everything you need to know about the Python Regex Character Set Operator.

Summary:

If you use a character set [XYZ] in a regular expression pattern, you tell the regex engine to choose one arbitrary character from the set: X, Y, or Z.


Want to earn money while you learn Python? Average Python programmers earn more than $50 per hour. You can certainly become average, can’t you?

Join the free webinar that shows you how to become a thriving coding business owner online!

[Webinar] Become a Six-Figure Freelance Developer with Python

Join us. It’s fun! ?



https://www.sickgaming.net/blog/2020/02/...-tutorial/
Reply



Possibly Related Threads…
Thread Author Replies Views Last Post
  [Tut] How to Sort a List Alphabetically in Python? xSicKxBot 0 1 7 hours ago
Last Post: xSicKxBot
  [Tut] Python: How to Count Elements in a List Matching a Condition? xSicKxBot 0 4 Yesterday, 11:23 PM
Last Post: xSicKxBot
  [Tut] Python One-Liners – The Ultimate Collection xSicKxBot 0 19 03-28-2020, 05:37 PM
Last Post: xSicKxBot
  [Tut] Python List index() xSicKxBot 0 17 03-27-2020, 02:40 PM
Last Post: xSicKxBot
  [Tut] How to Get the Key with Maximum Value in a Python Dictionary? xSicKxBot 0 18 03-26-2020, 10:42 AM
Last Post: xSicKxBot
  [Tut] Python List count() xSicKxBot 0 14 03-24-2020, 08:14 PM
Last Post: xSicKxBot
  [Tut] Python Regex Quantifiers – Question Mark (?) vs Plus (+) vs Asterisk (*) xSicKxBot 0 22 03-23-2020, 05:41 PM
Last Post: xSicKxBot
  [Tut] Python List reverse() xSicKxBot 0 14 03-22-2020, 01:39 PM
Last Post: xSicKxBot
  [Tut] Python List copy() xSicKxBot 0 17 03-21-2020, 12:42 PM
Last Post: xSicKxBot
  [Tut] Python List pop() xSicKxBot 0 15 03-20-2020, 11:08 AM
Last Post: xSicKxBot

Forum Jump:

[-]
Upcoming Events

[-]
Discord

[-]
Latest Threads
News - Atlus Survey Asks Fans If They Wo...
Last Post: xSicKxBot
Today 07:41 PM
» Replies: 0
» Views: 0
News - Resident Evil Resistance Beta Fin...
Last Post: xSicKxBot
Today 07:41 PM
» Replies: 0
» Views: 0
[Tut] How to Sort a List Alphabetically ...
Last Post: xSicKxBot
Today 12:17 PM
» Replies: 0
» Views: 1
(Indie Deal) Rogue Activity Bundle | Bor...
Last Post: xSicKxBot
Today 12:17 PM
» Replies: 0
» Views: 1
Pixel Vision 8 Free During COVID-19 Cris...
Last Post: xSicKxBot
Today 12:17 PM
» Replies: 0
» Views: 1
Mobile - Letter from the Editor: End of ...
Last Post: xSicKxBot
Today 12:17 PM
» Replies: 0
» Views: 1
News - Super Mario 35th Anniversary Rumo...
Last Post: xSicKxBot
Today 12:16 PM
» Replies: 0
» Views: 1
News - Video: The Best Switch Games To P...
Last Post: xSicKxBot
Today 12:16 PM
» Replies: 0
» Views: 1
News - Pokemon Go Will Let You Do Raids ...
Last Post: xSicKxBot
Today 12:15 PM
» Replies: 0
» Views: 3
Microsoft - Introducing the new Microsof...
Last Post: xSicKxBot
Today 03:43 AM
» Replies: 0
» Views: 6

[-]
Twitter

[-]
Sponsored
Get the Deal of the Week at RefurBees.com

Copyright © SickGaming.net 2012-2019