Posted on Leave a comment

5 Expert-Approved Ways to Remove Unicode Characters from a Python Dict

5/5 – (1 vote)

The best way to remove Unicode characters from a Python dictionary is a recursive function that iterates over each key and value, checking their type.

✅ If a value is a dictionary, the function calls itself.
✅ If a value is a string, it’s encoded to ASCII, ignoring non-ASCII characters, and then decoded back to a string, effectively removing any Unicode characters.

This ensures a thorough cleansing of the entire dictionary.

Here’s a minimal example for copy&paste

def remove_unicode(obj): if isinstance(obj, dict): return {remove_unicode(key): remove_unicode(value) for key, value in obj.items()} elif isinstance(obj, str): return obj.encode('ascii', 'ignore').decode('ascii') return obj # Example usage
my_dict = {'key': 'valüe', 'këy2': {'kêy3': 'vàlue3'}}
cleaned_dict = remove_unicode(my_dict)
print(cleaned_dict)

In this example, remove_unicode is a recursive function that traverses the dictionary. If it encounters a dictionary, it recursively cleans each key-value pair. If it encounters a string, it encodes the string to ASCII, ignoring non-ASCII characters, and then decodes it back to a string. The example usage shows a nested dictionary with Unicode characters, which are removed in the cleaned_dict.


Understanding Unicode and Dictionaries in Python

You may come across dictionaries containing Unicode values. These Unicode values can be a hurdle when using the data in specific formats or applications, such as JSON editors. To overcome these challenges, you can use various methods to remove the Unicode characters from your dictionaries.

One popular method to remove Unicode characters from a dictionary is by using the encode() method to convert the keys and values within the dictionary into a different encoding, such as UTF-8. This can help you eliminate the 'u' prefix, which signifies a character is a Unicode character. Similarly, you can use external libraries, like Unidecode, that provide functions to transliterate Unicode strings into the closest possible ASCII representation (source).

💡 Recap: Python dictionaries are a flexible data structure that allows you to store key-value pairs. They enable you to organize and access your data more efficiently. A dictionary can hold a variety of data types, including Unicode strings. Unicode is a widely-used character encoding standard that includes a huge range of characters from different scripts and languages.

When working with dictionaries in Python, you might encounter Unicode strings as keys or values. For example, a dictionary might have keys or values in various languages or contain special characters like emojis (🙈🙉🙊). This diversity is because Python supports Unicode characters to allow for broader text representation and internationalization.

To create a dictionary containing Unicode strings, you simply define key-value pairs with the appropriate Unicode characters. In some cases, you might also have nested dictionaries, where a dictionary’s value is another dictionary. Nested dictionaries can also contain Unicode strings as keys or values.

Consider the following example:

my_dictionary = { "name": "François", "languages": { "primary": "Français", "secondary": "English" }, "hobbies": ["music", "فنون-القتال"]
}

In this example, the dictionary represents a person’s information, including their name, languages, and hobbies. Notice that both the name and primary language contain Unicode characters, and one of the items in the hobbies list is also represented using Unicode characters.

When working with dictionary data that contains Unicode characters, you might need to remove or replace these characters for various purposes, such as preprocessing text for machine learning applications or ensuring compatibility with ASCII-only systems. Several methods can help you achieve this, such as using Python’s built-in encode() and decode() methods or leveraging third-party libraries like Unidecode.

Now that you have a better understanding of Unicode and dictionaries in Python, you can confidently work with dictionary data containing Unicode characters and apply appropriate techniques to remove or replace them when necessary.

Challenges with Unicode in Dictionaries

Your data may contain special characters from different languages. These characters can lead to display, sorting, and searching problems, especially when your goal is to process the data in a way that is language-agnostic.

One of the main challenges with Unicode characters in dictionaries is that they can cause compatibility issues when interacting with certain libraries, APIs, or external tools. For instance, JSON editors may struggle to handle Unicode properly, potentially resulting in malformed data. Additionally, some libraries may not be specifically designed to handle Unicode, and even certain text editors may not display these characters correctly.

💡 Note: Another issue arises when attempting to remove Unicode characters from a dictionary. You may initially assume that using functions like .encode() or .decode() would be sufficient, but these functions can sometimes leave the 'u' prefix, which denotes a unicode string, in place. This can lead to confusion and unexpected results when working with the data.

To address these challenges, various methods can be employed to remove Unicode characters from dictionaries:

  1. Method 1: You could try converting your dictionary to a JSON object, and then back to a dictionary with the help of the json library. This process can effectively remove the Unicode characters, making your data more compatible and easier to work with.
  2. Method 2: Alternatively, you can use a library like unidecode to convert Unicode to ASCII characters, which can be helpful in cases where you need to interact with systems or APIs that only accept ASCII text.
  3. Method 3: Another option is to use list or dict comprehensions to iterate over your data and apply the .encode() and .decode() methods, effectively stripping the unicode characters from your dictionary.

Below are minimal code snippets for each of the three approaches:

Method 1: Using JSON Library

import json my_dict = {'key': 'valüe'}
# Convert dictionary to JSON object and back to dictionary
cleaned_dict = json.loads(json.dumps(my_dict, ensure_ascii=True))
print(cleaned_dict)

In this example, the dictionary is converted to a JSON object and back to a dictionary, ensuring ASCII encoding, which removes Unicode characters.

Method 2: Using Unidecode Library

from unidecode import unidecode my_dict = {'key': 'valüe'}
# Use unidecode to convert Unicode to ASCII
cleaned_dict = {k: unidecode(v) for k, v in my_dict.items()}
print(cleaned_dict)

Here, the unidecode library is used to convert each Unicode string value to ASCII, iterating over the dictionary with a dict comprehension.

Method 3: Using List or Dict Comprehensions

my_dict = {'key': 'valüe'}
# Use .encode() and .decode() to remove Unicode characters
cleaned_dict = {k.encode('ascii', 'ignore').decode(): v.encode('ascii', 'ignore').decode() for k, v in my_dict.items()}
print(cleaned_dict)

In this example, a dict comprehension is used to iterate over the dictionary. The .encode() and .decode() methods are applied to each key and value to strip Unicode characters.

💡 Recommended: Python Dictionary Comprehension: A Powerful One-Liner Tutorial

Fundamentals of Removing Unicode

When working with dictionaries in Python, you may sometimes encounter Unicode characters that need to be removed. In this section, you’ll learn the fundamentals of removing Unicode characters from dictionaries using various techniques.

Firstly, it’s important to understand that Unicode characters can be present in both keys and values of a dictionary. A common scenario that may require you to remove Unicode characters is when you need to convert your dictionary into a JSON object.

One of the simplest ways to remove Unicode characters is by using the str.encode() and str.decode() methods. You can loop through the dictionary, and for each key-value pair, apply these methods to remove any unwanted Unicode characters:

new_dict = {}
for key, value in old_dict.items(): new_key = key.encode('ascii', 'ignore').decode('ascii') if isinstance(value, str): new_value = value.encode('ascii', 'ignore').decode('ascii') else: new_value = value new_dict[new_key] = new_value

Another useful method, particularly for removing Unicode characters from strings, is the isalnum() function. You can use this in combination with a loop to clean your keys and values:

def clean_unicode(string): return "".join(c for c in string if c.isalnum() or c.isspace()) new_dict = {}
for key, value in old_dict.items(): new_key = clean_unicode(key) if isinstance(value, str): new_value = clean_unicode(value) else: new_value = value new_dict[new_key] = new_value

As you can see, removing Unicode characters from a dictionary in Python can be achieved using these techniques.

Using Id and Ast for Unicode Removal

Utilizing the id and ast libraries in Python can be a powerful way to remove Unicode characters from a dictionary. The ast library, in particular, offers an s-expression parser which makes processing text data more straightforward. In this section, you will follow a step-by-step guide to using these powerful tools effectively.

First, you need to import the necessary libraries. In your Python script, add the following lines to import json and ast:

import json
import ast

The next step is to define your dictionary containing Unicode strings. Let’s use the following example dictionary:

my_dict = {u'Apple': [u'A', u'B'], u'orange': [u'C', u'D']}

Now, you can utilize the json.dumps() function and ast.literal_eval() for the Unicode removal process. The json.dumps() function converts the dictionary into a JSON-formatted string. This function removes the Unicode 'u' from the keys and values in your dictionary. After that, you can employ the ast.literal_eval() s-expression parser to convert the JSON-formatted string back to a Python dictionary.

Here’s how to perform these steps:

json_string = json.dumps(my_dict)
cleaned_dict = ast.literal_eval(json_string)

After executing these lines, you will obtain a new dictionary called cleaned_dict without the Unicode characters. Simply put, it should look like this:

{'Apple': ['A', 'B'], 'orange': ['C', 'D']}

By using the id and ast libraries, you can efficiently remove Unicode characters from dictionaries in Python. Following this simple yet effective method, you can ensure the cleanliness of your data, making it easier to work with and process.

Replacing Unicode Characters with Empty String

When working with dictionaries in Python, you might come across cases where you need to remove Unicode characters. One efficient way to do this is by replacing Unicode characters with empty strings.

To achieve this, you can make use of the encode() and decode() string methods available in Python. First, you need to loop through your dictionary and access the strings. Here’s how you can do it:

for key, value in your_dict.items(): cleaned_key = key.encode("ascii", "ignore").decode() cleaned_value = value.encode("ascii", "ignore").decode() your_dict[cleaned_key] = cleaned_value

In this code snippet, the encode() function encodes the string into ‘ASCII’ format and specifies the error-handling mode as ‘ignore’, which helps remove Unicode characters. The decode() function is then used to convert the encoded string back to its original form, without the Unicode characters.

💡 Note: This method assumes your dictionary contains only string keys and values. If your dictionary has nested values, such as lists or other dictionaries, you’ll need to adjust the code to handle those cases as well.

If you want to perform this operation on a single string instead, you can do this:

cleaned_string = original_string.encode("ascii", "ignore").decode()

Applying Encode and Decode Methods

When you need to remove Unicode characters from a dictionary, applying the encode() and decode() methods is a straightforward and effective approach. In Python, these built-in methods help you encode a string into a different character representation and decode byte strings back to Unicode strings.

To remove Unicode characters from a dictionary, you can iterate through its keys and values, applying the encode() and decode() methods. First, encode the Unicode string to ASCII, specifying the 'ignore' error handling mode. This mode omits any Unicode characters that do not have an ASCII representation. After encoding the string, decode it back to a regular string.

Here’s an example:

input_dict = {"𝕴𝖗𝖔𝖓𝖒𝖆𝖓": "𝖙𝖍𝖊 𝖍𝖊𝖗𝖔", "location": "𝕬𝖛𝖊𝖓𝖌𝖊𝖗𝖘 𝕿𝖔𝖜𝖊𝖗"}
output_dict = {} for key, value in input_dict.items(): encoded_key = key.encode("ascii", "ignore") decoded_key = encoded_key.decode() encoded_value = value.encode("ascii", "ignore") decoded_value = encoded_value.decode() output_dict[decoded_key] = decoded_value

In this example, the output_dict will be a new dictionary with the same keys and values as input_dict, but with Unicode characters removed:

{"Ironman": "the hero", "location": "Avengers Tower"}

Keep in mind that the encode() and decode() methods may not always produce an accurate representation of the original Unicode characters, especially when dealing with complex scripts or diacritic marks.

If you need to handle a wide range of Unicode characters and preserve their meaning in the output string, consider using libraries like Unidecode. This library can transliterate any Unicode string into the closest possible representation in ASCII text, providing better results in some cases.

Utilizing JSON Dumps and Literal Eval

When dealing with dictionaries containing Unicode characters, you might want an efficient and user-friendly way to remove or bypass the characters. Two useful techniques for this purpose are using json.dumps from the json module and ast.literal_eval from the ast module.

To begin, import both the json and ast modules in your Python script:

import json
import ast

The json.dumps method is quite handy for converting dictionaries with Unicode values into strings. This method takes a dictionary and returns a JSON formatted string. For instance, if you have a dictionary containing Unicode characters, you can use json.dumps to obtain a string version of the dictionary:

original_dict = {"key": "value with unicode: \u201Cexample\u201D"}
json_string = json.dumps(original_dict, ensure_ascii=False)

The ensure_ascii=False parameter in json.dumps ensures that Unicode characters are encoded in the UTF-8 format, making the JSON string more human-readable.

Next, you can use ast.literal_eval to evaluate the JSON string and convert it back to a dictionary. This technique allows you to get rid of any unnecessary Unicode characters by restricting the data structure to basic literals:

cleaned_dict = ast.literal_eval(json_string)

Keep in mind that ast.literal_eval is more secure than the traditional eval() function, as it only evaluates literals and doesn’t execute any arbitrary code.

By using both json.dumps and ast.literal_eval in tandem, you can effectively manage Unicode characters in dictionaries. These methods not only help to remove Unicode characters but also assist in maintaining a human-readable format for further processing and editing.

Managing Unicode in Nested Dictionaries

Dealing with Unicode characters in nested dictionaries can sometimes be challenging. However, you can efficiently manage this by following a few simple steps.

First and foremost, you need to identify any Unicode content within your nested dictionary. If you’re working with large dictionaries, consider looping through each key-value pair and checking for the presence of Unicode.

One approach to remove Unicode characters from nested dictionaries is to use the Unidecode library. This library transliterates any Unicode string into the closest possible ASCII representation. To use Unidecode, you’ll need to install it first:

pip install Unidecode

Now, you can begin working with the Unidecode library. Import the library and create a function to process each value in the dictionary. Here’s a sample function that handles nested dictionaries:

from unidecode import unidecode def remove_unicode_from_dict(dictionary): new_dict = {} for key, value in dictionary.items(): if isinstance(value, dict): new_value = remove_unicode_from_dict(value) elif isinstance(value, list): new_value = [remove_unicode_from_dict(item) if isinstance(item, dict) else item for item in value] elif isinstance(value, str): new_value = unidecode(value) else: new_value = value new_dict[key] = new_value return new_dict

This function recursively iterates through the dictionary, removing Unicode characters from string values and maintaining the original structure. Use this function on your nested dictionary:

cleaned_dict = remove_unicode_from_dict(your_nested_dictionary)

Handling Special Cases with Regular Expressions

When working with dictionaries in Python, you may come across special characters or Unicode characters that need to be removed or replaced. Using the re module in Python, you can leverage the power of regular expressions to effectively handle such cases.

Let’s say you have a dictionary with keys and values containing various Unicode characters. One efficient way to remove them is by combining the re.sub() function and ord() function. First, import the required re module:

import re

To remove special characters, you can use the re.sub() function, which takes a pattern, replacement, and a string as arguments, and returns a new string with the specified pattern replaced:

string_with_special_chars = "𝓣𝓱𝓲𝓼 𝓲𝓼 𝓪 𝓽𝓮𝓼𝓽 𝓼𝓽𝓻𝓲𝓷𝓰."
clean_string = re.sub(r"[^\x00-\x7F]+", "", string_with_special_chars)

ord() is a useful built-in function that returns the Unicode code point of a given character. You can create a custom function utilizing ord() to check if a character is alphanumeric:

def is_alphanumeric(char): code_point = ord(char) return (code_point >= 48 and code_point <= 57) or (code_point >= 65 and code_point <= 90) or (code_point >= 97 and code_point <= 122)

Now you can use this custom function along with the re.sub() function to clean up your dictionary:

def clean_dict_item(item): return "".join([char for char in item if is_alphanumeric(char) or char.isspace()]) original_dict = {"𝓽𝓮𝓼𝓽1": "𝓗𝓮𝓵𝓵𝓸 𝓦𝓸𝓻𝓵𝓭!", "𝓽𝓮𝓼𝓽2": "𝓘 𝓵𝓸𝓿𝓮 𝓟𝔂𝓽𝓱𝓸𝓷!"}
cleaned_dict = {clean_dict_item(key): clean_dict_item(value) for key, value in original_dict.items()} print(cleaned_dict)
# {'1': ' ', '2': ' '}

Frequently Asked Questions

How can I eliminate non-ASCII characters from a Python dictionary?

To eliminate non-ASCII characters from a Python dictionary, you can use a dictionary comprehension with the str.encode() method and the ascii codec. This will replace non-ASCII characters with their escape codes. Here’s an example:

original_dict = {"key": "value with non-ASCII character: ę"}
cleaned_dict = {k: v.encode("ascii", "ignore").decode() for k, v in original_dict.items()}

What is the best way to remove hex characters from a string in Python?

One efficient way to remove hex characters from a string in Python is using the re (regex) module. You can create a pattern to match hex characters and replace them with nothing. Here’s a short example code:

import re
text = "Hello \x00World!"
clean_text = re.sub(r"\\x\d{2}", "", text)

How to replace Unicode characters with ASCII in a Python dict?

To replace Unicode characters with their corresponding ASCII characters in a Python dictionary, you can use the unidecode library. Install it using pip install unidecode, and then use it like this:

from unidecode import unidecode
original_dict = {"key": "value with non-ASCII character: ę"}
ascii_dict = {k: unidecode(v) for k, v in original_dict.items()}

How can I filter out non-ascii characters in a dictionary?

To filter out non-ASCII characters in a Python dictionary, you can use a dictionary comprehension along with a string comprehension to create new strings containing only ASCII characters.

original_dict = {"key": "value with non-ASCII character: ę"}
filtered_dict = {k: "".join(char for char in v if ord(char) < 128) for k, v in original_dict.items()}

What method should I use to remove ‘u’ from a list in Python?

If you want to remove the ‘u’ Unicode prefix from a list of strings, you can simply convert each element to a regular string using a list comprehension:

unicode_list = [u"example1", u"example2"]
string_list = [str(element) for element in unicode_list]

How do I handle and remove special characters from a dictionary?

Handling and removing special characters from a dictionary can be accomplished using the re module to replace unwanted characters with an empty string or a suitable replacement. Here’s an example:

import re
original_dict = {"key": "value with special character: #!"}
cleaned_dict = {k: re.sub(r"[^A-Za-z0-9\s]+", "", v) for k, v in original_dict.items()}

This will remove any character that is not an alphanumeric character or whitespace from the dictionary values.


If you learned something new today, feel free to join my free email academy. We have cheat sheets too! ✅

The post 5 Expert-Approved Ways to Remove Unicode Characters from a Python Dict appeared first on Be on the Right Side of Change.

Posted on Leave a comment

GPT-4 with Vision (GPT-4V) Is Out! 32 Fun Examples with Screenshots

5/5 – (1 vote)

💡 TLDR: GPT-4 with vision (GPT-4V) is now out for many ChatGPT Plus users in the US and some other regions! You can instruct GPT-4 to analyze image inputs. GPT-4V incorporates additional modalities such as image inputs into large language models (LLMs). Multimodal LLMs will expand the reach of AI from mainly language-based applications to a broad range of brand-new application categories that go beyond language user interfaces (UIs).

👆 GPT-4V could explain why a picture was funny by talking about different parts of the image and their connections. The meme in the picture has words on it, which GPT-4V read to help make its answer. However, it made an error. It wrongly said the fried chicken in the image was called “NVIDIA BURGER” instead of “GPU”.

Still impressive! 🤯 OpenAI’s GPT-4 with Vision (GPT-4V) represents a significant advancement in artificial intelligence, enabling the analysis of image inputs alongside text.

Let’s dive into some additional examples I and others encountered:

More Examples

Prompting GPT-4V with "How much money do I have?" and a photo of some foreign coins:

GPT4V was even able to identify that these are Polish Zloty Coins, a task with which 99% of humans would struggle:

It can also identify locations from photos and give you information about plants you make photos of. In this way, it’s similar to Google Lens but much better and more interactive with a higher level of image understanding.

It can do optical character recognition (OCR) almost flawlessly:

Now here’s why many teachers and professors will lose their sleep over GPT-4V: it can even solve math problems from photos (source):

GPT-4V can do object detection, a crucial field in AI and ML: one model to rule them all!

GPT-4V can even help you play poker ♠♥

A Twitter/X user gave it a screenshot of a day planner and asked it to code a digital UI of it. The Python code worked!

Speaking of coding, here’s a fun example by another creative developer, Matt Shumer:

"The first GPT-4V-powered frontend engineer agent. Just upload a picture of a design, and the agent autonomously codes it up, looks at a render for mistakes, improves the code accordingly, repeat. Utterly insane." (source)

I’ve even seen GPT-4V analyzing financial data like Bitcoin indicators:

source

I could go on forever. Here are 20 more ideas of how to use GPT-4V that I found extremely interesting, fun, and even visionary:

  1. Visual Assistance for the Blind: GPT-4V can describe the surroundings or read out text from images to assist visually impaired individuals.
  2. Educational Tutor: It can analyze diagrams and provide detailed explanations, helping students understand complex concepts.
  3. Medical Imaging: Assist doctors by providing preliminary observations from medical images (though not for making diagnoses).
  4. Recipe Suggestions: Users can show ingredients they have, and GPT-4V can suggest possible recipes.
  5. Fashion Advice: Offer fashion tips by analyzing pictures of outfits.
  6. Plant or Animal Identification: Identify and provide information about plants or animals in photos.
  7. Travel Assistance: Analyze photos of landmarks to provide historical and cultural information.
  8. Language Translation: Read and translate text in images from one language to another.
  9. Home Decor Planning: Provide suggestions for home decor based on pictures of users’ living spaces.
  10. Art Creation: Offer guidance and suggestions for creating art by analyzing images of ongoing artwork.
  11. Fitness Coaching: Analyze workout or yoga postures and offer corrections or enhancements.
  12. Event Planning: Assist in planning events by visualizing and organizing space, decorations, and layouts.
  13. Shopping Assistance: Help users in making purchasing decisions by analyzing product images and providing information.
  14. Gardening Advice: Provide gardening tips based on pictures of plants and their surroundings.
  15. DIY Project Guidance: Offer step-by-step guidance for DIY projects by analyzing images of the project at various stages.
  16. Safety Training: Analyze images of workplace environments to offer safety recommendations.
  17. Historical Analysis: Provide historical context and information for images of historical events or figures.
  18. Real Estate Assistance: Analyze images of properties to provide insights and information for buyers or sellers.
  19. Wildlife Research: Assist researchers by analyzing images of wildlife and their habitats.
  20. Meme Creation: Help users create memes by suggesting text or edits based on the image provided.

These are truly mind-boggling times. Most of those ideas are million-dollar startup ideas. Some ideas (like the real estate assistance app #18) could become billion-dollar businesses that are mostly built on GPT-4V’s functionality and are easy to implement for coders like you and me.

If you’re interested, feel free to read my other article on the Finxter blog:

📈 Recommended: Startup.ai – Eight Steps to Start an AI Subscription Biz

What About SaFeTY?

GPT-4V is a multimodal large language model that incorporates image inputs, expanding the impact of language-only systems by solving new tasks and providing novel experiences for users. It builds upon the work done for GPT-4, employing a similar training process and reinforcement learning from human feedback (RLHF) to produce outputs preferred by human trainers.

Why RLHF? Mainly to avoid jailbreaking 😢😅 like so:

You can see that the “refusal rate” went up significantly:

From an everyday user perspective that doesn’t try to harm people, the "Sorry I cannot do X" reply will remain one of the more annoying parts of LLM tech, unfortunately.

However, the race is on! People have still reported jailbroken queries like this: 😂

I hope you had fun reading this compilation of GPT-4V ideas. Thanks for reading! ♥ If you’re not already subscribed, feel free to join our popular Finxter Academy with dozens of state-of-the-art LLM prompt engineering courses for next-level exponential coders. It’s an all-you-can-learn inexpensive way to remain on the right side of change.

For example, this is one of our recent courses:

Prompt Engineering with Llama 2

💡 The Llama 2 Prompt Engineering course helps you stay on the right side of change. Our course is meticulously designed to provide you with hands-on experience through genuine projects.

You’ll delve into practical applications such as book PDF querying, payroll auditing, and hotel review analytics. These aren’t just theoretical exercises; they’re real-world challenges that businesses face daily.

By studying these projects, you’ll gain a deeper comprehension of how to harness the power of Llama 2 using 🐍 Python, 🔗🦜 Langchain, 🌲 Pinecone, and a whole stack of highly ⚒🛠 practical tools of exponential coders in a post-ChatGPT world.

The post GPT-4 with Vision (GPT-4V) Is Out! 32 Fun Examples with Screenshots appeared first on Be on the Right Side of Change.

Posted on Leave a comment

4 Best Ways to Remove Unicode Characters from JSON

4/5 – (1 vote)

To remove all Unicode characters from a JSON string in Python, load the JSON data into a dictionary using json.loads(). Traverse the dictionary and use the re.sub() method from the re module to substitute any Unicode character (matched by the regular expression pattern r'[^\x00-\x7F]+') with an empty string. Convert the updated dictionary back to a JSON string with json.dumps().

import json
import re # Original JSON string with emojis and other Unicode characters
json_str = '{"text": "I love 🍕 and 🍦 on a ☀ day! \u200b \u1234"}' # Load JSON data
data = json.loads(json_str) # Remove all Unicode characters from the value
data['text'] = re.sub(r'[^\x00-\x7F]+', '', data['text']) # Convert back to JSON string
new_json_str = json.dumps(data) print(new_json_str)
# {"text": "I love and on a day! "}

The text "I love 🍕 and 🍦 on a ☀ day! \u200b \u1234" contains various Unicode characters including emojis and other non-ASCII characters. The code will output {"text": "I love and on a day! "}, removing all the Unicode characters and leaving only the ASCII characters.

This is only one method, keep reading to learn about alternative ones and detailed explanations! 👇


Occasionally, you may encounter unwanted Unicode characters in your JSON files, leading to problems with parsing and displaying the data. Removing these characters ensures clean, well-formatted JSON data that can be easily processed and analyzed.

In this article, we will explore some of the best practices to achieve this, providing you with the tools and techniques needed to clean up your JSON data efficiently.

Understanding Unicode Characters

Unicode is a character encoding standard that includes characters from most of the world’s writing systems. It allows for consistent representation and handling of text across different languages and platforms. In this section, you’ll learn about Unicode characters and how they relate to JSON.

💡 JSON is natively designed to support Unicode, which means it can store and transmit information in various languages without any issues. When you store a string in JSON, it can include any valid Unicode character, making it easy to work with multilingual data. However, certain Unicode characters might cause problems in specific scenarios, such as when using older software or transmitting data over a limited bandwidth connection.

In JSON, certain characters must be escaped, like quotation marks, reverse solidus, and control characters (U+0000 through U+001F). These characters must be represented using escape sequences in order for the JSON to be properly parsed.

🔗 You can find more information about escaping characters in JSON through this Stack Overflow discussion.

There might be times where you need to remove or replace Unicode characters from your JSON data. One way to achieve this is by using encoding and decoding techniques. For example, you can encode a string to ASCII while ignoring non-ASCII characters, and then decode it back to UTF-8.

🔗 This method can be found in this Stack Overflow example.

The Basics of JSON

💡 JSON (JavaScript Object Notation) is a lightweight, text-based data interchange format that is easy to read and write. It has become one of the most popular data formats for exchanging information on the web. When dealing with JSON data, you may encounter situations where you need to remove or modify Unicode characters.

JSON is built on two basic structures: objects and arrays.

  • An object is an unordered collection of key-value pairs, while
  • an array represents an ordered list of values.

A JSON file typically consists of a single object or array, containing different types of data such as strings, numbers, and other objects.

When working with JSON data, it is important to ensure that the text is properly formatted. This includes using appropriate escape characters for special characters, such as double quotes and backslashes, as well as handling any Unicode characters in the text. Keep in mind that JSON is a human-readable format, so a well-formatted JSON file should be easy to understand.

Since JSON data is text-based, you can easily manipulate it using standard text-processing techniques. For example, to remove unwanted Unicode characters from a JSON file, you can use a combination of encoding and decoding methods, like this:

json_data = json_data.encode("ascii", "ignore").decode("utf-8")

This process will remove all non-ASCII characters from the JSON data and return a new, cleaned-up version of the text.

How Unicode Characters Interact within JSON

In JSON, most Unicode characters can be freely placed within the string values. However, there are certain characters that must be escaped (i.e., replaced by a special sequence of characters) to be part of your JSON string. These characters include the quotation mark (U+0022), the reverse solidus (U+005C), and control characters ranging from U+0000 to U+001F.

When you encounter escaped Unicode characters in your JSON, they typically appear in a format like \uXXXX, where XXXX represents a 4-digit hexadecimal code. For example, the acute é character can be represented as \u00E9. JSON parsers can understand this format and interpret it as the intended Unicode character.

Sometimes, you might need or want to remove these Unicode characters from your JSON data. This can be done in various ways, depending on the programming language you are using. In Python, for instance, you could leverage the encode and decode functions to remove unwanted Unicode characters:

cleaned_string = original_string.encode("ascii", "ignore").decode("utf-8")

In this code snippet, the encode function tries to convert the original string to ASCII, replacing Unicode characters with basic ASCII equivalents. The ignore parameter specifies that any non-ASCII characters should be left out. Finally, the decode function transforms the bytes back into a string.

Method 1: Encoding and Decoding JSONs

JSON supports Unicode character sets, including UTF-8, UTF-16, and UTF-32. UTF-8 is the most commonly used encoding for JSON texts and it is well-supported across different programming languages and platforms.

If you come across unwanted Unicode characters in your JSON data while parsing, you can use the built-in encoding and decoding functions provided by most languages. For example, in Python, the json.dumps() and json.loads() functions allow you to encode and decode JSON data respectively. To remove unwanted Unicode characters, you can use the encode() and decode() functions available in string objects:

json_data = '{"quote_text": "This is an example of a JSON file with unicode characters like \\u201c and \\u201d."}'
decoded_data = json.loads(json_data)
cleaned_text = decoded_data['quote_text'].encode("ascii", "ignore").decode('utf-8')

In this example, the encode() function is used with the "ascii" argument, which ignores unicode characters outside the ASCII range. The decode() function then converts the encoded bytes object back to a string.

When dealing with JSON APIs and web services, be aware that different programming languages and libraries may have specific methods for encoding and decoding JSON data. Always consult the documentation for the language or library you are working with to ensure proper handling of Unicode characters.

Method 2: Python Regex to Remove Unicode from JSON

A second approach is to use a regex pattern before loading the JSON data. By applying a regex pattern, you can remove specific Unicode characters. For example, in Python, you can implement this with the re module as follows:

import json
import re def remove_unicode(input_string): return re.sub(r'\\u([0-9a-fA-F]{4})', '', input_string) json_string = '{"text": "Welcome to the world of \\u2022 and \\u2019"}'
json_string = remove_unicode(json_string)
parsed_data = json.loads(json_string)

This code uses the remove_unicode function to strip away any Unicode entities before loading the JSON string. Once you have a clean JSON data, you can continue with further processing.

Method 3: Replace Non-ASCII Characters

Another approach to removing Unicode characters is to replace non-ASCII characters after decoding the JSON data. This method is useful when dealing with specific character sets. Here’s an example using Python:

import json def remove_non_ascii(input_string): return ''.join(char for char in input_string if ord(char) < 128) json_string = '{"text": "Welcome to the world of \\u2022 and \\u2019"}'
parsed_data = json.loads(json_string)
cleaned_data = {} for key, value in parsed_data.items(): cleaned_data[key] = remove_non_ascii(value) print(cleaned_data)
# {'text': 'Welcome to the world of and '}

In this example, the remove_non_ascii function iterates over each character in the input string and retains only the ASCII characters. By applying this to each value in the JSON data, you can efficiently remove any unwanted Unicode characters.

When working with languages like JavaScript, you can utilize external libraries to remove Unicode characters from JSON data. For instance, in a Node.js environment, you can use the lodash library for cleaning Unicode characters:

const _ = require('lodash');
const json = {"text": "Welcome to the world of • and ’"}; const removeUnicode = (obj) => { return _.mapValues(obj, (value) => _.replace(value, /[\u2022\u2019]/g, ''));
}; const cleanedJson = removeUnicode(json);

In this example, the removeUnicode function leverages Lodash’s mapValues and replace functions to remove specific Unicode characters from the JSON object.

Handling Specific Unicode Characters in JSON

Dealing with Control Characters

Control characters are special non-printing characters in Unicode, such as carriage returns, linefeeds, and tabs. JSON requires that these characters be escaped in strings. When dealing with JSON data that contains control characters, it’s essential to escape them properly to avoid potential errors when parsing the data.

For instance, you can use the json.dumps() function in Python to output a JSON string with control characters escaped:

import json data = { "text": "This is a string with a newline character\nin it."
} json_string = json.dumps(data)
print(json_string)

This would output the following JSON string with the newline character escaped:

{"text": "This is a string with a newline character\\nin it."}

When you parse this JSON string, the control character will be correctly interpreted, and you’ll be able to access the data as expected.

Addressing Non-ASCII Characters

JSON strings can also contain non-ASCII Unicode characters, such as those from other languages. These characters may sometimes cause problems when processing JSON data in applications that don’t handle Unicode well.

One option is to escape non-ASCII characters when encoding the JSON data. You can do this by setting the ensure_ascii parameter of the json.dumps() function to True:

import json data = { "text": "こんにちは、世界!" # Japanese for "Hello, World!"
} json_string = json.dumps(data, ensure_ascii=True)
print(json_string)

This will output the JSON string with the non-ASCII characters escaped:

{"text": "\u3053\u3093\u306b\u3061\u306f\u3001\u4e16\u754c\u0021"}

However, if you’d rather preserve the original non-ASCII characters in the JSON output, you can set ensure_ascii to False:

json_string = json.dumps(data, ensure_ascii=False)
print(json_string)

In this case, the output would be:

{"text": "こんにちは、世界!"}

Keep in mind that when working with non-ASCII characters in JSON, it’s essential to use tools and libraries that support Unicode. This ensures that the data is correctly processed and displayed in your application.

Examples: Implementing the Unicode Removal

Before starting with the examples, make sure you have your JSON object ready for manipulation. In this section, you’ll explore different methods to remove unwanted Unicode characters from JSON objects, focusing on JavaScript implementation.

First, let’s look at a simple example using JavaScript’s replace() function and a regular expression. The following code showcases how to remove Unicode characters from a JSON string:

const jsonString = '{"message": "𝕴 𝖆𝖒 𝕴𝖗𝖔𝖓𝖒𝖆𝖓! I have some unicode characters."}';
const withoutUnicode = jsonString.replace(/[\u{0080}-\u{FFFF}]/gu, "");
console.log(withoutUnicode);

In the code above, the regular expression \u{0080}-\u{FFFF} covers most of the Unicode characters you might want to remove. By using the replace() function, you can replace those characters with an empty string ("").

Next, for more complex scenarios involving nested JSON objects, consider using a recursive function to traverse and clean up Unicode characters from the JSON data:

function cleanUnicode(jsonData) { if (Array.isArray(jsonData)) { return jsonData.map(item => cleanUnicode(item)); } else if (typeof jsonData === "object" &#x26;&#x26; jsonData !== null) { const cleanedObject = {}; for (const key in jsonData) { cleanedObject[key] = cleanUnicode(jsonData[key]); } return cleanedObject; } else if (typeof jsonData === "string") { return jsonData.replace(/[\u{0080}-\u{FFFF}]/gu, ""); } else { return jsonData; }
} const jsonObject = { message: "𝕴 𝖆𝖒 𝕴𝖗𝖔𝖓𝖒𝖆𝖓! I have some unicode characters.", nested: { text: "𝕾𝖔𝖒𝖊 𝖚𝖓𝖎𝖈𝖔𝖉𝖊 𝖈𝖍𝖆𝖗𝖆𝖈𝖙𝖊𝖗𝖘 𝖍𝖊𝖗𝖊 𝖙𝖔𝖔!" }
}; const cleanedJson = cleanUnicode(jsonObject);
console.log(cleanedJson);

This cleanUnicode function processes arrays, objects, and strings, making it ideal for nested JSON data.

In conclusion, use the simple replace() method for single JSON strings, and consider a recursive approach for nested JSON data. Utilize these examples to confidently, cleanly, and effectively remove Unicode characters from your JSON data in JavaScript.

Common Errors and How to Resolve Them

When working with JSON data involving Unicode characters, you might encounter a few common errors that can easily be resolved. In this section, we will discuss these errors and provide solutions to overcome them.

One commonly observed issue is the presence of invalid Unicode characters in the JSON data. This can lead to decoding errors while parsing. To overcome this, you can employ a Python library called unidecode to remove accents and normalize the Unicode string into the closest possible representation in ASCII text. For example, using the unidecode library, you can transform a word like “François” into “Francois”:

from unidecode import unidecode
unidecode('François') # Output: 'Francois'

Another common error arises due to the presence of special characters in JSON data, which leads to parsing issues. Proper escaping of special characters is essential for building valid JSON strings. You can use the json.dumps() function in Python to automatically escape special characters in JSON strings. For instance:

import json
raw_data = {"text": "A string with special characters: \\, \", \'"}
json_string = json.dumps(raw_data)

Remember, it’s crucial to produce only 100% compliant JSON, as mentioned in RFC 4627. Ensuring that you follow these guidelines will help you avoid most of the common errors while handling Unicode characters in JSON.

Lastly, if you encounter non-compliant Unicode characters in text files, you can use a text editor like Notepad to remove them. For instance, you can save the file in Unicode format instead of the default ANSI format, which will help preserve the integrity of the Unicode characters.

By addressing these common errors, you’ll be able to effectively handle and process JSON data containing Unicode characters.

Conclusion

In summary, removing Unicode characters from JSON can be achieved using various methods. One approach is to encode the JSON string to ASCII and then decode it back to UTF-8. This method allows you to eliminate all Unicode characters in one go. For example, you can use the .encode("ascii", "ignore").decode('utf-8') technique to accomplish this, as explained on Stack Overflow.

Another option is applying regular expressions to target specific unwanted Unicode characters, as discussed in this Stack Overflow post. Employing regular expressions enables you to fine-tune your removal of specific Unicode characters from JSON strings.

Frequently Asked Questions

How to eliminate UTF-8 characters in Python?

To eliminate UTF-8 characters in Python, you can use the encode() and decode() methods. First, encode the string using ascii encoding with the ignore option, and then decode it back to utf-8. For example:

text = "Hello 你好"
sanitized_text = text.encode("ascii", "ignore").decode("utf-8")

What are the methods to remove non-ASCII characters in Python?

There are several methods to remove non-ASCII characters in Python:

  1. Using the encode() and decode() methods as mentioned above.
  2. Using a regular expression to filter out non-ASCII characters: re.sub(r'[^\x00-\x7F]+', '', text)
  3. Using a list comprehension to create a new string with only ASCII characters: ''.join(c for c in text if ord(c) < 128)

How can Pandas be used to remove Unicode characters?

To remove Unicode characters in a Pandas dataframe, you can use the applymap() function combined with the encode() and decode() methods:

import pandas as pd def sanitize(text): return text.encode("ascii", "ignore").decode("utf-8") df = pd.DataFrame({"text": ["Hello 你好", "Pandas rocks!"]})
df["sanitized_text"] = df["text"].apply(sanitize)

What is the process to replace Unicode in JSON?

To replace Unicode characters in a JSON object, you can first convert the JSON object to a string using the json.dumps() method. Then, replace the Unicode characters using one of the methods mentioned earlier. Finally, parse the sanitized string back to a JSON object using the json.loads() method:

import json
import re json_data = {"text": "Hello 你好"}
json_str = json.dumps(json_data)
sanitized_str = re.sub(r'[^\x00-\x7F]+', '', json_str)
sanitized_json = json.loads(sanitized_str)

How to convert Unicode to JSON format in Python?

If you have a Python object containing Unicode strings and want to convert it to JSON format, use the json.dumps() method:

import json data = {"text": "Hello 你好"}
json_data = json.dumps(data, ensure_ascii=False)

This will preserve the Unicode characters in the JSON output.

How can special characters be removed from a JSON file?

To remove special characters from a JSON file, first read the file and parse its content to a Python object using the json.loads() method. Then, iterate through the object and sanitize the strings, removing special characters using one of the mentioned methods. Finally, write the sanitized object back to a JSON file using the json.dump() method:

import json
import re with open("input.json", "r") as f: json_data = json.load(f) # sanitize your JSON object here with open("output.json", "w") as f: json.dump(sanitized_json_data, f)

The post 4 Best Ways to Remove Unicode Characters from JSON appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Prompt Engineering with Llama 2 (Full Course)

5/5 – (1 vote)

💡 This Llama 2 Prompt Engineering course helps you stay on the right side of change. Our course is meticulously designed to provide you with hands-on experience through genuine projects.

YouTube Video

🔗 Prompt Engineering with Llama 2: Four Practical Projects using Python, Langchain, and Pinecone

You’ll delve into practical applications such as book PDF querying, payroll auditing, and hotel review analytics.

These aren’t just theoretical exercises; they’re real-world challenges that businesses face daily.

By studying these projects, you’ll gain a deeper comprehension of how to harness the power of Llama 2 using 🐍 Python, 🔗🦜 Langchain, 🌲 Pinecone, and a whole stack of highly ⚒🛠 practical tools of exponential coders in a post-ChatGPT world.

Specifically, you’ll learn these topics (ToC):

This knowledge can be your foundation in creating solutions that have tangible value for real people. Equip yourself with the expertise to keep pace with technological change and be a proactive force in shaping it.

The post Prompt Engineering with Llama 2 (Full Course) appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Python BS4 – How to Scrape Absolute URL Instead of Relative Path

5/5 – (1 vote)

Summary: Use urllib.parse.urljoin() to scrape the base URL and the relative path and join them to extract the complete/absolute URL. You can also concatenate the base URL and the absolute path to derive the absolute path; but make sure to take care of erroneous situations like extra forward-slash in this case.

Quick Answer

When web scraping with BeautifulSoup in Python, you may encounter relative URLs (e.g., /page2.html) instead of absolute URLs (e.g., http://example.com/page2.html). To convert relative URLs to absolute URLs, you can use the urljoin() function from the urllib.parse module.

Below is an example of how to extract absolute URLs from the a tags on a webpage using BeautifulSoup and urljoin:

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin # URL of the webpage you want to scrape
url = 'http://example.com' # Send an HTTP request to the URL
response = requests.get(url)
response.raise_for_status() # Raise an error for bad responses # Parse the webpage content
soup = BeautifulSoup(response.text, 'html.parser') # Find all the 'a' tags on the webpage
for a_tag in soup.find_all('a'): # Get the href attribute from the 'a' tag href = a_tag.get('href') # Use urljoin to convert the relative URL to an absolute URL absolute_url = urljoin(url, href) # Print the absolute URL print(absolute_url)

In this example:

  • url is the URL of the webpage you want to scrape.
  • response is the HTTP response obtained by sending an HTTP GET request to the URL.
  • soup is a BeautifulSoup object that contains the parsed HTML content of the webpage.
  • soup.find_all('a') finds all the a tags on the webpage.
  • a_tag.get('href') gets the href attribute from an a tag, which is the relative URL.
  • urljoin(url, href) converts the relative URL to an absolute URL by joining it with the base URL.
  • absolute_url is the absolute URL, which is printed to the console.

Now that you have a quick overview let’s dive into the specific problem more deeply and discuss various methods to solve this easily and effectively. 👇

Problem Formulation

Problem: How do you extract all the absolute URLs from an HTML page?

Example: Consider the following webpage which has numerous links:

Now, when you try to scrape the links as highlighted above, you find that only the relative links/paths are extracted instead of the entire absolute path. Let us have a look at the code given below, which demonstrates what happens when you try to extract the 'href' elements normally.

from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): print(url['href'])

Output:

/
/about
/blog
/finxter
/

The above output is not what you desired. You wanted to extract the absolute paths as shown below:

https://sayonshubham.github.io/
https://sayonshubham.github.io/about
https://sayonshubham.github.io/blog
https://sayonshubham.github.io/finxter
https://sayonshubham.github.io/

Without further delay, let us go ahead and try to extract the absolute paths instead of the relative paths.

Method 1: Using urllib.parse.urljoin()

The easiest solution to our problem is to use the urllib.parse.urljoin() method.

According to the Python documentation: urllib.parse.urljoin() is used to construct a full/absolute URL by combining the “base URL” with another URL. The advantage of using the urljoin() is that it properly resolves the relative path, whether BASE_URL is the domain of the URL, or the absolute URL of the webpage.

from urllib.parse import urljoin URL_1 = 'http://www.example.com'
URL_2 = 'http://www.example.com/something/index.html' print(urljoin(URL_1, '/demo'))
print(urljoin(URL_2, '/demo'))

Output:

http://www.example.com/demo
http://www.example.com/demo

Now that we have an idea about urljoin, let us have a look at the following code which successfully resolves our problem and helps us to extract the complete/absolute paths from the HTML page.

Solution:

from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): print(urljoin(web_url, url.get('href')))

Output:

https://sayonshubham.github.io/
https://sayonshubham.github.io/about
https://sayonshubham.github.io/blog
https://sayonshubham.github.io/finxter
https://sayonshubham.github.io/

Method 2: Concatenate The Base URL And Relative URL Manually

Another workaround to our problem is to concatenate the base part of the URL and the relative URLs manually, just like two ordinary strings. The problem, in this case, is that manually adding the strings might lead to “one-off” errors — try to spot the extra front slash characters / below:

URL_1 = 'http://www.example.com/'
print(URL_1+'/demo') # Output --> http://www.example.com//demo

Therefore to ensure proper concatenation, you have to modify your code accordingly such that any extra character that might lead to errors is removed. Let us have a look at the following code that helps us to concatenate the base and the relative paths without the presence of any extra forward slash.

Solution:

from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): # extract the href string x = url['href'] # remove the extra forward-slash if present if x[0] == '/': print(web_url + x[1:]) else: print(web_url+x)

Output:

https://sayonshubham.github.io/
https://sayonshubham.github.io/about
https://sayonshubham.github.io/blog
https://sayonshubham.github.io/finxter
https://sayonshubham.github.io/

⚠ Caution: This is not the recommended way of extracting the absolute path from a given HTML page. In situations when you have an automated script that needs to resolve a URL but at the time of writing the script you don’t know what website your script is visiting, in that case, this method won’t serve your purpose, and your go-to method would be to use urlljoin. Nevertheless, this method deserves to be mentioned because, in our case, it successfully serves the purpose and helps us to extract the absolute URLs.

Conclusion

In this article, we learned how to extract the absolute links from a given HTML page using BeautifulSoup. If you want to master the concepts of Pythons BeautifulSoup library and dive deep into the concepts along with examples and video lessons, please have a look at the following link and follow the articles one by one wherein you will find every aspect of BeautifulSoup explained in great details.

YouTube Video

🔗 Recommended: Web Scraping With BeautifulSoup In Python

With that, we come to the end of this tutorial! Please stay tuned and subscribe for more interesting content in the future.

The post Python BS4 – How to Scrape Absolute URL Instead of Relative Path appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Python Int to String with Trailing Zeros

5/5 – (1 vote)

To add trailing zeros to a string up to a certain length in Python, convert the number to a string and use the ljust(width, '0') method. Call this method on the string, specifying the total desired width and the padding character '0'. This will append zeros to the right of the string until the specified width is achieved.

Challenge: Given an integer number. How to convert it to a string by adding trailing zeros so that the string has a fixed number of positions.

Example: For integer 42, you want to fill it up with trailing zeros to the following string with 5 characters: '42000'.

In all methods, we assume that the integer has less than 5 characters.

Method 1: string.ljust()

In Python, you can use the str.ljust() method to pad zeros (or any other character) to the right of a string. The ljust() method returns the string left-justified in a field of a given width, padded with a specified character (default is space).

Below is an example of how to use ljust() to add trailing zeros to a number:

# Integer value to be converted
i = 42 # Convert the integer to a string
s = str(i) # Use ljust to add trailing zeros, specifying the total width and the padding character ('0')
s_padded = s.ljust(5, '0') print(s_padded)
# Output: '42000'

In this example:

  • str(i) converts the integer i to a string.
  • s.ljust(5, '0') pads the string s with zeros to the right to make the total width 5 characters.

This is the most Pythonic way to accomplish this challenge.

Method 2: Format String

The second method uses the format string feature in Python 3+ called f-strings or replacement fields.

💡 Info: In Python, f-strings allow for the embedding of expressions within strings by prefixing a string with the letter "f" or "F" and enclosing expressions within curly braces {}. The expressions within the curly braces in the f-string are evaluated, and their values are inserted into the resulting string. This allows for a concise and readable way to include variable values or complex expressions within string literals.

The following f-string converts an integer i to a string while adding trailing zeros to a given integer:

# Integer value to be converted
i = 42 # Convert the integer to a string and then use format to add trailing zeros
s1 = f'{str(i):<5}'
s1 = s1.replace(" ", "0") # replace spaces with zeros print(s1)
# 42000

The code f'{str(i):<5}' first converts the integer i to a string. The :<5 format specifier aligns the string to the left and pads with spaces to make the total width 5. Then we replace the padded spaces with zeros using the string.replace() function.

Method 3: List Comprehension

Many Python coders don’t quite get the f-strings and the ljust() method shown in Methods 1 and 2. If you don’t have time to learn them, you can also use a more standard way based on string concatenation and list comprehension.

# Method 3: List Comprehension
s3 = str(42)
n = len(s3)
s3 = s3 + '0' * (5-len(s3))
print(s3)
# 42000

You first convert the integer to a basic string. Then, you concatenate the integer’s string representation to the string of 0s, filled up to n=5 characters. The asterisk operator creates a string of 5-len(s3) zeros here.

Programmer Humor

“Real programmers set the universal constants at the start such that the universe evolves to contain the disk with the data they want.”xkcd

🔗 Recommended: Python Int to String with Leading Zeros

The post Python Int to String with Trailing Zeros appeared first on Be on the Right Side of Change.

Posted on Leave a comment

How to Install xlrd in Python?

5/5 – (2 votes)

The Python xlrd library reads data and formatting information from Excel files in the historical .xls format. Note that it won’t read anything other than .xls files.

pip install xlrd

The Python xlrd library is among the top 100 Python libraries, with more than 17,375,582 downloads. This article will show you everything you need to install this in your Python environment.

🔗 Library Link
🔗 Other Excel Python Libraries

Alternatively, you may use any of the following commands to install xlrd, depending on your concrete environment. One is likely to work!

💡 If you have only one version of Python installed:
pip install xlrd 💡 If you have Python 3 (and, possibly, other versions) installed:
pip3 install xlrd 💡 If you don't have PIP or it doesn't work
python -m pip install xlrd
python3 -m pip install xlrd 💡 If you have Linux and you need to fix permissions (any one):
sudo pip3 install xlrd
pip3 install xlrd --user 💡 If you have Linux with apt
sudo apt install xlrd 💡 If you have Windows and you have set up the py alias
py -m pip install xlrd 💡 If you have Anaconda
conda install -c anaconda xlrd 💡 If you have Jupyter Notebook
!pip install xlrd
!pip3 install xlrd

Let’s dive into the installation guides for the different operating systems and environments!

How to Install xlrd on Windows?

  1. Type "cmd" in the search bar and hit Enter to open the command line.
  2. Type “pip install xlrd” (without quotes) in the command line and hit Enter again. This installs xlrd for your default Python installation.
  3. The previous command may not work if you have both Python versions 2 and 3 on your computer. In this case, try "pip3 install xlrd" or “python -m pip install xlrd“.
  4. Wait for the installation to terminate successfully. It is now installed on your Windows machine.

Here’s how to open the command line on a (German) Windows machine:

Open CMD in Windows

First, try the following command to install xlrd on your system:

pip install xlrd

Second, if this leads to an error message, try this command to install xlrd on your system:

pip3 install xlrd

Third, if both do not work, use the following long-form command:

python -m pip install xlrd

The difference between pip and pip3 is that pip3 is an updated version of pip for Python version 3. Depending on what’s first in the PATH variable, pip will refer to your Python 2 or Python 3 installation—and you cannot know which without checking the environment variables. To resolve this uncertainty, you can use pip3, which will always refer to your default Python 3 installation.

How to Install xlrd on Linux?

You can install xlrd on Linux in four steps:

  1. Open your Linux terminal or shell
  2. Type “pip install xlrd” (without quotes), hit Enter.
  3. If it doesn’t work, try "pip3 install xlrd" or “python -m pip install xlrd“.
  4. Wait for the installation to terminate successfully.

The package is now installed on your Linux operating system.

How to Install xlrd on macOS?

Similarly, you can install xlrd on macOS in four steps:

  1. Open your macOS terminal.
  2. Type “pip install xlrd” without quotes and hit Enter.
  3. If it doesn’t work, try "pip3 install xlrd" or “python -m pip install xlrd“.
  4. Wait for the installation to terminate successfully.

The package is now installed on your macOS.

How to Install xlrd in PyCharm?

Given a PyCharm project. How to install the xlrd library in your project within a virtual environment or globally? Here’s a solution that always works:

  • Open File > Settings > Project from the PyCharm menu.
  • Select your current project.
  • Click the Python Interpreter tab within your project tab.
  • Click the small + symbol to add a new library to the project.
  • Now type in the library to be installed, in your example "xlrd" without quotes, and click Install Package.
  • Wait for the installation to terminate and close all pop-ups.

Here’s the general package installation process as a short animated video—it works analogously for xlrd if you type in “xlrd” in the search field instead:

Make sure to select only “xlrd” because there may be other packages that are not required but also contain the same term (false positives):

How to Install xlrd in a Jupyter Notebook?

To install any package in a Jupyter notebook, you can prefix the !pip install my_package statement with the exclamation mark "!". This works for the xlrd library too:

!pip install my_package

This automatically installs the xlrd library when the cell is first executed.

How to Resolve ModuleNotFoundError: No module named ‘xlrd’?

Say you try to import the xlrd package into your Python script without installing it first:

import xlrd
# ... ModuleNotFoundError: No module named 'xlrd'

Because you haven’t installed the package, Python raises a ModuleNotFoundError: No module named 'xlrd'.

To fix the error, install the xlrd library using “pip install xlrd” or “pip3 install xlrd” in your operating system’s shell or terminal first.

See above for the different ways to install xlrd in your environment. Also check out my detailed article:

🔗 Recommended: [Fixed] ModuleNotFoundError: No module named ‘xlrd’

Improve Your Python Skills

If you want to keep improving your Python skills and learn about new and exciting technologies such as Blockchain development, machine learning, and data science, check out the Finxter free email academy with cheat sheets, regular tutorials, and programming puzzles.

Join us, it’s fun! 🙂

✅ Recommended: Python Excel – Basic Worksheet Operations

The post How to Install xlrd in Python? appeared first on Be on the Right Side of Change.

Posted on Leave a comment

5 Best Open-Source LLMs in 2023 (Two-Minute Guide)

5/5 – (1 vote)

Open-source research on large language models (LLMs) is crucial for democratizing this powerful technology.

Although open-source LLMs are now widely used and studied, they faced initial challenges and criticism. Early attempts at creating open-source LLMs like OPT and BLOOM had poor performance compared to closed-source models.

This led researchers to realize the need for higher-quality base models pre-trained on larger datasets with trillions (!) of tokens!

  • OPT: 180 billion tokens
  • BLOOM: 341 billion tokens
  • LLaMa: 1.4 trillion tokens
  • MPT: 1 trillion tokens
  • Falcon: 1.5 trillion tokens
  • LLaMA 2: 2 trillion tokens

However, pre-training these models is expensive and requires organizations with sufficient funding to make them freely available to the community.

This article focuses on high-performing open-source base models significantly improving the field. A great graphic of the historic context of open-source LLMs is presented on the Langchain page:

How can we determine the best of those? Easy, with Chatbot leaderboards like this on Hugginface:

At the time of writing, the best non-commercial LLM is Vicuna-33B. Of course, closed-source GPT-4 by OpenAI and Claude by Anthropic models are the best.

By the way, feel free to check out my article on Claude-2 proven to be one of the most powerful free but closed-source LLMs:

🔗 Recommended: Claude 2 LLM Reads Ten Papers in One Prompt with Massive 200k Token Context

The introduction of LLaMA 1 and 2 was a significant step in improving the quality of open-source LLMs. LLaMA is a suite of different LLMs with sizes ranging from 7 billion to 65 billion parameters. These models strike a balance between performance and inference efficiency.

LLaMA models are pre-trained on a corpus containing over 1.4 trillion tokens of text, making it one of the largest open-source datasets available. The release of LLaMA models sparked an explosion of open-source research and development in the LLM community.

Here’s a couple of open-source LLMs that were kicked off after the release of Llama: Alpaca, Vicuna, Koala, GPT4All:

LLaMA-2, the latest release, sets a new state-of-the-art among open-source LLMs. These models are pre-trained on 2 trillion tokens of publicly available data and utilize a novel approach called Grouped Query Attention (GQA) to improve inference efficiency.

MPT, another commercially-usable open-source LLM suite, was released by MosaicML. MPT-7B and MPT-30B models gained popularity due to their performance and ability to be used in commercial applications. While these models perform slightly worse than proprietary models like GPT-based variants, they outperform other open-source models.

Falcon, an open-source alternative to proprietary models, was the first to match the quality of closed-source LLMs. Falcon-7B and Falcon-40B models are commercially licensed and perform exceptionally well. They are pre-trained on a custom-curated corpus called RefinedWeb, which contains over 5 trillion tokens of text.

You can currently try the Falcon-180B Demo here.

📈 TLDR: Open-source LLMs include OPT, BLOOM, LLaMa, MPT, and Falcon, each pre-trained on extensive tokens. LLaMa-2 and Falcon stand out for their innovative approaches and extensive training data.

👉 For the best open-source LLM, consider using Vicuna-33B for its superior performance among non-commercial options.

Also, make sure to check out my other article on the Finxter blog: 👇

🔗 Recommended: Six Best Private & Secure LLMs in 2023

The post 5 Best Open-Source LLMs in 2023 (Two-Minute Guide) appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Zero to Ph.D. – How to Accomplish Any Task By Going Small First

5/5 – (1 vote)

A couple of years ago, I watched a TED talk that changed my life. 

I had just finished my computer science master’s degree and was starting out as a fresh Ph.D. student in the department of distributed systems…

… and I was overwhelmed.

There are many computer science students reading the Finxter blog so I hope to find a few encouraging words in this article.

Not only was I overwhelmed, but I seriously doubted my ability to finish the doctoral research program successfully.

I was so impressed by my colleagues, who were much smarter, wittier, and better coders.

So what were (some of) the things that were bothering me?

  • Reading and understanding code.
  • Reading and understanding research papers.
  • Designing algorithms.
  • Maths.
  • Presenting stuff.
  • English.
  • Writing scientifically.
  • “Selling” my approaches to my supervisors.

The list goes on and on — and I really felt like an imposter not worthy to contribute to the scientific community.

~~~

Then I watched the TED talk from a former investment banker who claimed to possess the formula to achieve anything.

YouTube Video

The formula: break the big task into a series of small tasks. Then just keep doing the small tasks (and don’t stop).

I know it sounds lame, but it really resonated with me. So I approached my problem from first principles: What must I do to finish my dissertation within four years?

  • I need to publish at least four research papers.
  • I need to submit at least ten times to top conferences — maybe even more often.
  • I need to create a 10,000-word research paper every three months or so.
  • I need to write (or edit) 300 words every day.

So my output was clear: if I just do this one thing (it’s really easy to write 300 words) — I will have enough written content for my dissertation.

Quality comes as a byproduct of massive quantity. 😉

But to produce output, any system needs input. To brew tasty coffee, put in the right ingredients: high-quality beans and pure water. To produce better outputs, just feed the system with better inputs.

  • Question: What’s the input that helps me produce excellent 300-word written output?
  • Answer: Read papers from top conferences.

So the formula boils down to:

  • INPUT: read (at least skim over) one paper a day from a top conference in my research area.
  • OUTPUT: generate 300 words for the current paper project.

That’s it. After I developed this formula, the remaining three and a half years were simple: follow this straightforward recipe to the best of my abilities, even with serious distractions, doubts, highs, and lows.

The day before I published this article originally (in 2019), I delivered my defense. Based on my sample size of one, the system works! 😉

So what is your BIG TASK that is overwhelming you? How can you break into a series of small outputs that guarantees your success? What is the input that helps you generate this kind of output?

📈 Recommended: How to Overcome the Imposter Syndrome as a Doctoral Computer Science Researcher (and Thrive)

The post Zero to Ph.D. – How to Accomplish Any Task By Going Small First appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Python Async IO – The Ultimate Guide in a Single Post

5/5 – (1 vote)

As a Python developer, you might have come across the concept of asynchronous programming. Asynchronous programming, or async I/O, is a concurrent programming design that has received dedicated support in Python, evolving rapidly from Python 3.4 through 3.7 and beyond. With async I/O, you can manage multiple tasks concurrently without the complexities of parallel programming, making it a perfect fit for I/O bound and high-level structured network code.

In the Python world, the asyncio library is your go-to tool for implementing asynchronous I/O. This library provides various high-level APIs to run Python coroutines concurrently, giving you full control over their execution. It also enables you to perform network I/O, Inter-process Communication (IPC), control subprocesses, and synchronize concurrent code using tasks and queues.

Understanding Asyncio

In the world of Python programming, asyncio plays a crucial role in designing efficient and concurrent code without using threads. It is a library that helps you manage tasks, event loops, and coroutines. To fully benefit from asyncio, you must understand some key components.

First, let’s start with coroutines. They are special functions that can pause their execution at specified points without completely terminating it. In Python, you declare a coroutine using the async def syntax.

For instance:

async def my_coroutine(): # Your code here

Next, the event loop is a core feature of asyncio and is responsible for executing tasks concurrently and managing I/O operations. An event loop runs tasks one after the other and can pause a task when it is waiting for external input, such as reading data from a file or from the network. It also listens for other tasks that are ready to run, switches to them, and resumes the initial task when it receives the input.

Tasks are the coroutines wrapped in an object, managed by the event loop. They are used to run multiple concurrent coroutines simultaneously. You can create a task using the asyncio.create_task() function, like this:

async def my_coroutine(): # Your code here task = asyncio.create_task(my_coroutine())

Finally, the sleep function in asyncio is used to simulate I/O bound tasks or a delay in the code execution. It works differently than the standard time.sleep() function as it is non-blocking and allows other coroutines to run while one is paused. You can use await asyncio.sleep(delay) to add a brief pause in your coroutine execution.

Putting it all together, you can use asyncio to efficiently manage multiple coroutines concurrently:

import asyncio async def task_one(): print('Starting task one') await asyncio.sleep(3) print('Finished task one') async def task_two(): print('Starting task two') await asyncio.sleep(1) print('Finished task two') async def main(): task1 = asyncio.create_task(task_one()) task2 = asyncio.create_task(task_two()) await task1 await task2 # Run the event loop
asyncio.run(main())

In this example, the event loop will start running both tasks concurrently, allowing task two to complete while task one is paused during the sleep period. This allows you to handle multiple tasks in a single-threaded environment.

You can see it play out in this Gif:

Async/Await Syntax

In Python, the async/await syntax is a powerful tool to create and manage asynchronous tasks without getting lost in callback hell or making your code overly complex.

The async/await keywords are at the core of asynchronous code in Python. You can use the async def keyword to define an asynchronous function. Inside this function, you can use the await keyword to pause the execution of the function until some asynchronous operation is finished.

For example:

import asyncio async def main(): print("Start") await asyncio.sleep(2) print("End")

yield and yield from are related to asynchronous code in the context of generators, which provide a way to iterate through a collection of items without loading all of them into memory at once. In Python 3.3 and earlier, yield from was used to delegate a part of a generator’s operation to another generator. However, in later versions of Python, the focus shifted to async/await for managing asynchronous tasks, and yield from became less commonly used.

For example, before Python 3.4, you might have used a generator with yield and yield from like this:

def generator_a(): for i in range(3): yield i def generator_b(): yield from generator_a() for item in generator_b(): print(item)

With the introduction of async/await, asynchronous tasks can be written more consistently and readably. You can convert the previous example to use async/await as follows:

import asyncio async def async_generator_a(): for i in range(3): yield i await asyncio.sleep(1) async def async_generator_b(): async for item in async_generator_a(): print(item) await async_generator_b()

Working with Tasks and Events

In asynchronous programming with Python, you’ll often work with tasks and events to manage the execution of simultaneous IO-bound operations. To get started with this model, you’ll need to understand the event loop and the concept of tasks.

The event loop is a core component of Python’s asyncio module. It’s responsible for managing and scheduling the execution of tasks. A task, created using asyncio.create_task(), represents a coroutine that runs independently of other tasks in the same event loop.

To create tasks, first, define an asynchronous function using the async def syntax. Then, you can use the await keyword to make non-blocking calls within this function. The await keyword allows the event loop to perform other tasks while waiting for an asynchronous operation to complete.

Here’s an example:

import asyncio async def my_async_function(): print("Task started") await asyncio.sleep(2) print("Task finished") event_loop = asyncio.get_event_loop()
task = event_loop.create_task(my_async_function())
event_loop.run_until_complete(task)

In this example, my_async_function is an asynchronous function, and await asyncio.sleep(2) represents an asynchronous operation. The event_loop.create_task() method wraps the coroutine into a task, allowing it to run concurrently within the event loop.

To execute tasks and manage their output, you can use asyncio.gather(). This function receives a list of tasks and returns their outputs as a list in the same order they were provided. Here’s an example of how you can use asyncio.gather():

import asyncio async def async_task_1(): await asyncio.sleep(1) return "Task 1 completed" async def async_task_2(): await asyncio.sleep(2) return "Task 2 completed" async def main(): tasks = [async_task_1(), async_task_2()] results = await asyncio.gather(*tasks) print(results) asyncio.run(main())

In this example, asyncio.gather() awaits the completion of both tasks and then collects their output in a list, which is printed at the end.

Working with tasks and events in Python’s asynchronous IO model helps improve the efficiency of your code when dealing with multiple IO operations, ensuring smoother and faster execution. Remember to use asyncio.create_task(), await, and asyncio.gather() when handling tasks within your event loop.

Coroutines and Futures

In Python, async IO is powered by coroutines and futures. Coroutines are functions that can be paused and resumed at specific points, allowing other tasks to run concurrently. They are declared with the async keyword and used with await. Asyncio coroutines are the preferred way to write asynchronous code in Python.

On the other hand, futures represent the result of an asynchronous operation that hasn’t completed yet. They are primarily used for interoperability between callback-based code and the async/await syntax. With asyncio, Future objects should be created using loop.create_future().

To execute multiple coroutines concurrently, you can use the gather function. asyncio.gather() is a high-level function that takes one or more awaitable objects (coroutines or futures) and schedules them to run concurrently. Here’s an example:

import asyncio async def foo(): await asyncio.sleep(1) return "Foo" async def bar(): await asyncio.sleep(2) return "Bar" async def main(): results = await asyncio.gather(foo(), bar()) print(results) asyncio.run(main())

In this example, both foo() and bar() coroutines run concurrently, and the gather() function returns a list of their results.

Error handling in asyncio is done through the set_exception() method. If a coroutine raises an exception, you can catch the exception and attach it to the associated future using future.set_exception(). This allows other coroutines waiting for the same future to handle the exception gracefully.

In summary, working with coroutines and futures helps you write efficient, asynchronous code in Python. Use coroutines along with the async/await syntax for defining asynchronous tasks, and futures for interacting with low-level callback-based code. Utilize functions like gather() for running multiple coroutines concurrently, and handle errors effectively with future.set_exception().

Threading and Multiprocessing

In the world of Python, you have multiple options for concurrent execution and managing concurrency. Two popular approaches to achieve this are threading and multiprocessing.

Threading can be useful when you want to improve the performance of your program by efficiently utilizing your CPU’s time. It allows you to execute multiple threads in parallel within a single process. Threads share memory and resources, which makes them lightweight and more suitable for I/O-bound tasks. However, because of the Global Interpreter Lock (GIL) in Python, only one thread can execute at a time, limiting the benefits of threading for CPU-bound tasks. You can explore the threading module for building multithreaded applications.

Multiprocessing overcomes the limitations of threading by using multiple processes working independently. Each process has its own Python interpreter, memory space, and resources, effectively bypassing the GIL. This approach is better for CPU-bound tasks, as it allows you to utilize multiple cores to achieve true parallelism. To work with multiprocessing, you can use Python’s multiprocessing module.

While both threading and multiprocessing help manage concurrency, it is essential to choose the right approach based on your application’s requirements. Threading is more suitable when your tasks are I/O-bound, and multiprocessing is advisable for CPU-bound tasks. When dealing with a mix of I/O-bound and CPU-bound tasks, using a combination of the two might be beneficial.

Async I/O offers another approach for handling concurrency and might be a better fit in some situations. However, understanding threading and multiprocessing remains crucial to make informed decisions and efficiently handle concurrent execution in Python.

Understanding Loops and Signals

In the world of Python async IO, working with loops and signals is an essential skill to grasp. As a developer, you must be familiar with these concepts to harness the power of asynchronous programming.

Event loops are at the core of asynchronous programming in Python. They provide a foundation for scheduling and executing tasks concurrently. The asyncio library helps you create and manage these event loops. You can experiment with event loops using Python’s asyncio REPL, which can be started by running python -m asyncio in your command line.

Signals, on the other hand, are a way for your program to receive notifications about certain events, like a user interrupting the execution of the program. A common use case for handling signals in asynchronous programming involves stopping the event loop gracefully when it receives a termination signal like SIGINT or SIGTERM.

A useful method for running synchronous or blocking functions in an asynchronous context is the loop.run_in_executor() method. This allows you to offload the execution of such functions to a separate thread or process, preventing them from blocking the event loop. For example, if you have a CPU-bound operation that cannot be implemented using asyncio‘s native coroutines, you can utilize loop.run_in_executor() to keep the event loop responsive.

Here’s a simple outline of using loops and signals together in your asynchronous Python code:

  1. Create an event loop using asyncio.get_event_loop().
  2. Register your signal handlers with the event loop, typically by using the loop.add_signal_handler() method.
  3. Schedule your asynchronous tasks and coroutines in the event loop.
  4. Run the event loop using loop.run_forever(), which will keep running until you interrupt it with a signal or a coroutine stops it explicitly.

Managing I/O Operations

When working with I/O-bound tasks in Python, it’s essential to manage I/O operations efficiently. Using asyncio can help you handle these tasks concurrently, resulting in more performant and scalable code.

I/O-bound tasks are operations where the primary bottleneck is fetching data from input/output sources like files, network requests, or databases. To improve the performance of your I/O-bound tasks, you can use asynchronous programming techniques. In Python, this often involves using the asyncio library and writing non-blocking code.

Typically, you’d use blocking code for I/O operations, which means waiting for the completion of an I/O task before continuing with the rest of the code execution. This blocking behavior can lead to inefficient use of resources and poor performance, especially in larger programs with multiple I/O-bound tasks.

Non-blocking code, on the other hand, allows your program to continue executing other tasks while waiting for the I/O operation to complete. This can significantly improve the efficiency and performance of your program. When using Python’s asyncio library, you write non-blocking code with coroutines.

For I/O-bound tasks involving file operations, you can use libraries like aiofiles to perform asynchronous file I/O. Just like with asyncio, aiofiles provides an API to work with files using non-blocking code, improving the performance of your file-based tasks.

When dealing with network I/O, the asyncio library provides APIs to perform tasks such as asynchronous reading and writing operations for sockets and other resources. This enables you to manage multiple network connections concurrently, efficiently utilizing your system resources.

In summary, when managing I/O operations in Python:

  • Identify I/O-bound tasks in your program
  • Utilize the asyncio library to write non-blocking code using coroutines
  • Consider using aiofiles for asynchronous file I/O
  • Utilize asyncio APIs to manage network I/O efficiently

Handling Transports and Timeouts

When working with Python’s Async IO, you might need to handle transports and timeouts effectively. Transports and protocols are low-level event loop APIs for implementing network or IPC protocols such as HTTP. They help improve the performance of your application by using callback-based programming style. You can find more details in the Python 3.11.4 documentation.

Timeouts are often useful when you want to prevent your application from waiting indefinitely for a task to complete. To handle timeouts in asyncio, you can use the asyncio.wait_for function. This allows you to set a maximum time that your function can run. If the function doesn’t complete within the specified time, an asyncio.TimeoutError is raised.

import asyncio async def some_function(): await asyncio.sleep(5) async def main(): try: await asyncio.wait_for(some_function(), timeout=3) except asyncio.TimeoutError: print("Task took too long.") asyncio.run(main())

In this example, some_function takes 5 seconds to complete, but we set a timeout of 3 seconds. As a result, an asyncio.TimeoutError is raised, and the program prints “Task took too long.”

Another concept to be familiar with is the executor, which allows you to run synchronous functions in an asynchronous context. You can use the loop.run_in_executor() method, where loop is an instance of the event loop. This method takes three arguments: the executor, the function you want to run, and any arguments for that function. The executor can be a custom one or None for the default ThreadPoolExecutor.

Here’s an example:

import asyncio
import time def sync_function(seconds): time.sleep(seconds) return "Slept for {} seconds".format(seconds) async def main(): loop = asyncio.get_event_loop() result = await loop.run_in_executor(None, sync_function, 3) print(result) asyncio.run(main())

In this example, we run the synchronous sync_function inside the async main() function using the loop.run_in_executor() method.

Dealing with Logging and Debugging

When working with Python’s asyncio library, properly handling logging and debugging is essential for ensuring efficient and smooth development. As a developer, it’s crucial to stay confident and knowledgeable when dealing with these tasks.

To begin logging in your asynchronous Python code, you need to initialize a logger object. Import the logging module and create an instance of the Logger class, like this:

import logging logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

This configuration sets up a logger object that will capture debug-level log messages. To log a message, simply call the appropriate method like logger.debug, logger.info, or logger.error:

async def my_async_function(): logger.debug("Debug message") logger.info("Info message") logger.error("Error message") await some_async_operation()

Keep in mind that Python’s logging module is not inherently asynchronous. However, there are ways to work around this issue. One approach is to use a ThreadPoolExecutor, which executes logging methods in a separate thread:

import concurrent.futures
import logging executor = concurrent.futures.ThreadPoolExecutor(max_workers=1) def log_info(msg, *args): executor.submit(logging.info, msg, *args) async def my_async_function(): log_info("Info message") await some_async_operation()

For debugging your asynchronous code, it’s possible to enable the debug mode in asyncio by calling the loop.set_debug() method. Additionally, consider setting the log level of the asyncio logger to logging.DEBUG and configuring the warnings module to display ResourceWarning warnings. Check the official Python documentation for more information and best practices.

Understanding Virtual Environments and Resources

When working with Python, you’ll often encounter the need for a virtual environment. A virtual environment is an isolated environment for your Python applications, which allows you to manage resources and dependencies efficiently. It helps ensure that different projects on your computer do not interfere with each other in terms of dependencies and versions, maintaining the availability of the required resources for each project.

To create a virtual environment, you can use built-in Python libraries such as venv or third-party tools like conda. Once created, you’ll activate the virtual environment and install the necessary packages needed for your project. This ensures that the resources are available for your application without causing conflicts with other Python packages or applications on your computer.

🔗 For a more detailed explanation of virtual environments, check out this complete guide to Python virtual environments.

When working with async IO in Python, it’s crucial to manage resources effectively, especially when dealing with asynchronous operations like networking requests or file I/O. By using a virtual environment, you can make sure that your project has the correct version of asyncio and other async libraries, ensuring that your code runs smoothly and efficiently.

In a virtual environment, resources are allocated based, on the packages and libraries you install. This way, only the necessary resources for your project are used, improving performance and consistency across development. The virtual environment lets you keep track of your project’s dependencies, making it easier to maintain and share your project with others, ensuring that they can access the required resources without compatibility issues.

Optimizing Asynchronous Program

When working with Python, you may often encounter situations where an asynchronous program can significantly improve the performance and responsiveness of your application. This is especially true when dealing with I/O-bound tasks or high-level structured network code, where asyncio can be your go-to library for writing concurrent code.

Before diving into optimization techniques, it’s crucial to understand the difference between synchronous and asynchronous programs. In a synchronous program, tasks are executed sequentially, blocking other tasks from running. Conversely, an asynchronous program allows you to perform multiple tasks concurrently without waiting for one to complete before starting another. This cooperative multitasking approach enables your asynchronous program to run much faster and more efficiently.

To make the most of your asynchronous program, consider applying the following techniques:

  1. Use async/await syntax: Employing the async and await keywords when defining asynchronous functions and awaiting their results ensures proper execution and responsiveness.
  2. Implement an event loop: The event loop is the core of an asyncio-based application. It schedules, executes, and manages tasks within the program, so it’s crucial to utilize one effectively.
  3. Leverage libraries: Many asynchronous frameworks, such as web servers and database connection libraries, have been built on top of asyncio. Take advantage of these libraries to simplify and optimize your asynchronous program.
  4. Avoid blocking code: Blocking code can slow down the execution of your asynchronous program. Ensure your program is entirely non-blocking by avoiding time-consuming operations or synchronous APIs.

It’s essential to remember that while asynchronous programming has its advantages, it might not always be the best solution. In situations where your tasks are CPU-bound or require a more straightforward processing flow, a synchronous program might be more suitable.

Exploring Asyncio Libraries and APIs

When working with asynchronous programming in Python, it’s essential to explore the available libraries you can use. One such library is aiohttp. It allows you to make asynchronous HTTP requests efficiently using asyncio. You can find more details about this library from the aiohttp documentation.

To get started with aiohttp, you’ll first need to install the library:

pip install aiohttp

In your Python code, you can now import aiohttp and use it with the asyncio library. For example, if you want to make an asynchronous GET request, you can use the following code:

import aiohttp
import asyncio async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): url = 'https://api.example.com/data' data = await fetch_data(url) print(data) await main()

In the example above, the fetch_data function is defined as an async function using the async def syntax. This indicates that this function can be called with the await statement within other asynchronous functions.

The pathlib library provides classes for working with filesystem paths. While it is not directly related to async IO, it can be useful when working with file paths in your async projects. The pathlib.Path class offers a more Pythonic way to handle file system paths, making it easier to manipulate file and directory paths across different operating systems. You can read more about this library in the official Python documentation on pathlib.

When you create async function calls in your code, remember to use the await keyword when calling them. This ensures that the function is executed asynchronously. By combining the power of aiohttp, asyncio, and other async-compatible libraries, you can efficiently perform multiple tasks concurrently in your Python projects.

Understanding Queues and Terminals

With Python’s asyncio module, you can write concurrent, asynchronous code that works efficiently on I/O-bound tasks and network connections. In this context, queues become helpful tools for coordinating the execution of multiple tasks and managing shared resources.

Queues in asyncio are similar to standard Python queues, but they have special asynchronous properties. With coroutine functions such as get() and put(), you can efficiently retrieve an item from the queue or insert an item, respectively. When the queue is empty, the get() function will wait until an item becomes available. This enables smooth flow control and ensures that your async tasks are executed in the most optimal order.

Terminals, on the other hand, are interfaces for interacting with your system – either through command-line or graphical user interfaces. When working with async tasks in Python, terminals play a crucial role in tracking the progress and execution of your tasks. You can use terminals to initiate and monitor the state of your async tasks by entering commands and viewing the output.

When it comes to incorporating multithreaded or asynchronous programming in a parent-child relationship, queues and terminals can come in handy. Consider a scenario where a parent task is responsible for launching multiple child tasks that operate concurrently. In this case, a queue can facilitate the communication and synchronization between parent and child tasks by efficiently passing data to and fro.

Here are a few tips to keep in mind while working with queues and terminals in asynchronous Python programming:

  • Use asyncio.Queue() to create an instance suitable for async tasks, while still maintaining similar functionality as a standard Python queue.
  • For managing timeouts, remember to use the asyncio.wait_for() function in conjunction with queue operations, since the methods of asyncio queues don’t have a built-in timeout parameter.
  • When working with terminals, be mindful of potential concurrency issues. Make sure you avoid race conditions by properly synchronizing your async tasks’ execution using queues, locks, and other synchronization primitives provided by the asyncio module.

Frequently Asked Questions

How does asyncio compare to threading in Python?

Asyncio is a concurrency model that uses a single thread and an event loop to execute tasks concurrently. While threading allows for concurrent execution of tasks using multiple threads, asyncio provides better performance by managing tasks in a non-blocking manner within a single thread. Thus, asyncio is often preferred when dealing with I/O-bound tasks, as it can handle many tasks without creating additional threads.

What are the main components of the asyncio event loop?

The asyncio event loop is responsible for managing asynchronous tasks in Python. Its main components include:

  1. Scheduling tasks: The event loop receives and schedules coroutine functions for execution.
  2. Managing I/O operations: The event loop monitors I/O operations and receives notifications when the operations are complete.
  3. Executing asynchronous tasks: The event loop executes scheduled tasks in a non-blocking manner, allowing other tasks to run concurrently.

How do I use asyncio with pip?

To use asyncio in your Python projects, no additional installation is needed, as it is included in the Python Standard Library from Python version 3.4 onwards. Simply import asyncio in your Python code and make use of its features.

What is the difference between asyncio.run() and run_until_complete()?

asyncio.run() is a newer and more convenient function for running an asynchronous coroutine until it completes. It creates an event loop, runs the passed coroutine, and closes the event loop when the task is finished. run_until_complete() is an older method that requires an existing event loop object on which to run a coroutine.

Here’s an example of how to use asyncio.run():

import asyncio async def example_coroutine(): await asyncio.sleep(1) print("Coroutine has completed") asyncio.run(example_coroutine())

How can I resolve the ‘asyncio.run() cannot be called from a running event loop’ error?

This error occurs when you try to call asyncio.run() inside an already running event loop. Instead of using asyncio.run() in this case, you should use create_task() or gather() functions to schedule your coroutines to run concurrently within the existing loop.

import asyncio async def example_coroutine(): await asyncio.sleep(1) print("Coroutine has completed") async def main(): task = asyncio.create_task(example_coroutine()) await task asyncio.run(main())

Can you provide an example of using async/await in Python?

Here’s a simple example demonstrating the use of async/await in Python:

import asyncio async def async_function(): print("Function starting") await asyncio.sleep(2) print("Function completed") async def main(): await asyncio.gather(async_function(), async_function()) asyncio.run(main())

This example demonstrates two async functions running concurrently. The main() function uses asyncio.gather() to run both async_function() tasks at the same time, and asyncio.run(main()) starts the event loop to execute them.

⚡ Recommended: Can I Run OpenAI’s API in Parallel? Yes, with Python Async!

The post Python Async IO – The Ultimate Guide in a Single Post appeared first on Be on the Right Side of Change.