Posted on Leave a comment

Disruptive Innovation โ€“ A Friendly Guide for Small Coding Startups

5/5 – (1 vote)

Disruptive innovation, a concept introduced in 1995, has become a wildly popular concept explaining innovation-driven growth.

The Disruptive Innovation Model

Clayton Christensen’s “Disruptive Innovation Model” refers to a theory that explains how smaller companies can successfully challenge established incumbent businesses. Here’s a detailed breakdown:

๐Ÿ“ˆ Disruptive Innovation refers to a new technology, process, or business model that disrupts an existing market. Disruptive innovations often start as simpler, cheaper, and lower-quality solutions compared to existing offerings. They often target an underserved or new market segment. They often create a different value network within the market. However, truly disruptive innovation companies improve over time and eventually displace existing market participants.

In fact, there are two general types of disruptive innovation models:

  • Low-End Disruption: Targets the least profitable customers who are typically overserved by the incumbentโ€™s existing offering.
  • New-Market Disruption: Targets customers with needs previously unserved by existing incumbents. You may have heard of the “blue ocean strategy”.

Low-end disruption is exemplified by Southwest Airlines and BIC Disposable Razors. Southwest Airlines disrupted the aviation industry by focusing on providing basic, reliable, and cost-effective air travel, appealing to price-sensitive customers and those who might opt for alternative transportation. BIC, on the other hand, introduced affordable disposable razors, offering a satisfactory solution for customers unwilling to pay a premium for high-end razors, thereby securing a substantial market share.

In terms of new-market disruption, Tesla Motors and Coursera stand out. Tesla targeted environmentally conscious consumers, offering electric vehicles that didnโ€™t compromise on performance or luxury, creating a new market for high-performance electric vehicles and prompting other manufacturers to expedite their EV programs. After introducing the high-end luxury cars, Tesla subsequently moved down market and even announced in the “Master Plan Part 3” that they plan to release a $25k electric car. Coursera disrupted the traditional educational model by providing online courses from renowned universities to a global audience, creating a new market for online education.

The Blue Ocean Strategy, which is somewhat related to new-market disruption, emphasizes innovating and creating new demand in unexplored market areas, or “Blue Oceans”, instead of competing in saturated markets, or “Red Oceans”. An example of this strategy is the Nintendo Wii, which carved out a new market space by targeting casual gamers with simpler, family-friendly games and innovative controllers, thereby reaching an entirely new demographic of consumers and avoiding direct competition with powerful gaming consoles like Xbox and PlayStation.

The disruptive innovation process often plays out like so:

  • Introduction: The innovation is introduced, often with skepticism from established players.
  • Evolution: The innovation evolves and improves, gradually becoming more appealing to a wider customer base.
  • Disruption: The innovation becomes good enough to meet the needs of most customers, disrupting the status quo.
  • Domination: The innovators often come to dominate the market, replacing the previous incumbents.

Technological advancements typically undergo an S-curve progression, as seen with smartphones, which experienced slow initial adoption, followed by rapid uptake, and eventually, market saturation.

Companies often align innovations with their existing value networks, ensuring new products resonate with their established customer base, like how Appleโ€™s product ecosystem is meticulously designed to ensure customer retention and continuous engagement.

The implications of disruptive innovation are profound, with established companies, such as Kodak, often facing dilemmas and organizational inertia in adopting new technologies due to a deep-rooted focus on existing offerings and customer bases.

To navigate through disruptive waters, incumbents might employ strategies like establishing separate units dedicated to innovation, akin to how Google operates Alphabet to explore varied ventures, adopting agile methodologies for nimble operations, and maintaining a relentless focus on evolving customer needs to stay relevant and competitive in the market.

๐Ÿ“ˆ๐Ÿง‘โ€๐Ÿ’ป Here’s my personal key take-away (not financial advice):

It is tough to create a huge disruptive startup. It is easy to disrupt a tiny niche.

A great strategy that I found extremely profitable is to focus on a tiny niche within your career, keep optimizing daily, and invest your income in star businesses, i.e., disruptive innovation companies in high-growth markets (>10% per year) that are also market leaders.

Only invest in companies or opportunities that are both, in a high-growth market and leader of this market.

Bitcoin, for example, is the leader of a high-growth market (=digital store of value). Tesla, another example, is the leader of a high-growth market (=autonomous electric vehicles).

A Short Primer on the Star Principle — And How It’ll Make You Rich

The Star Principle, articulated by Richard Koch, underscores the potency of investing in or creating a ‘star venture’ to amass wealth and success in business.

A star venture is characterized by two pivotal attributes: (1) it is a leader in a high-growth market and (2) it operates within a niche that is expanding rapidly.

The allure of a star business emanates from its ability to combine niche leadership with high niche growth, enabling it to potentially command price premiums, lower costs, and subsequently, attain higher profits and cash flow.

The principle asserts that positioning is the key to success, provided that the positioning is truly exceptional and the venture is a star business. However, it’s imperative to note that star ventures are not devoid of risks; the primary pitfall being the loss of leadership within its niche, which can drastically diminish its value.

While star ventures are relatively rare, with perhaps one in twenty startups being a star, they are not so scarce that they cannot be discovered or created with thoughtful consideration and patience.

The principle emphasizes that whether you are an employee, an aspiring venture leader, or an investor, aligning yourself with a star venture can pave the way to a prosperous and enriched life.

Here’s a list of 20 example star businesses from the past (some are still stars โญ):

  1. Apple: Dominates various tech niches, offering premium products that command higher prices.
  2. Amazon: A leader in e-commerce and cloud computing, consistently expanding into new markets.
  3. Google (Alphabet): Dominates the search engine market and has successful ventures like YouTube.
  4. Facebook (Meta): Leads in social media through platforms like Facebook, Instagram, and WhatsApp.
  5. Microsoft: A leader in software, cloud services, and hardware, with a vast, growing ecosystem.
  6. Tesla: Revolutionizing the electric vehicle market and autonomous technologies. The bot!
  7. Netflix: A dominant player in the streaming service industry, with a massive global subscriber base.
  8. Alibaba: A leader in e-commerce, cloud computing, and various other sectors in China and globally.
  9. Shopify: A giant in the e-commerce platform space, enabling myriad online stores globally.
  10. Zoom: Became essential for virtual communication, especially during the pandemic, and continues to grow.
  11. Spotify: Leading the music streaming industry with a vast library and substantial subscriber base.
  12. PayPal: A major player in the digital payments space, facilitating global e-commerce.
  13. Adobe: Dominates several software niches, including graphic design and document management.
  14. Salesforce: Leads in customer relationship management (CRM) software and platform technology.
  15. NVIDIA: A dominant force in GPUs, expanding into AI, machine learning, and autonomous vehicles.
  16. Airbnb: Revolutionized the hospitality industry, becoming a go-to platform for home-sharing.
  17. Square: Innovating in the financial and mobile payment sectors, providing solutions for small businesses.
  18. Uber: Despite controversies, it remains a significant player in ride-hailing and has expanded into food delivery.
  19. Tencent: A conglomerate leader in various sectors, including social media, gaming, and fintech, particularly in China.
  20. Samsung: A leader in various tech niches, including smartphones, semiconductors, and consumer electronics.

These businesses have demonstrated leadership in their respective niches and have experienced significant growth, aligning with the Star Principle’s criteria of operating in high-growth markets and being a leader in those markets.

Let’s dive into some practical strategies you can use as a small coding business owner to become more innovative, possibly disruptive in a step-by-step manner:

9-Step Guide to Leverage the Disruptive Innovation Model for a Small Coding Business

Step 1: Identify Underserved Needs

Imagine embarking on a journey to create a startup named “ChatHealer,” an online platform that uses Large Language Models (LLMs) and the OpenAI API to provide instant, empathetic, and anonymous conversational support for individuals experiencing stress or emotional challenges.

Step 2: Define Your Value Proposition

In the initial phase, identifying underserved needs is crucial. A thorough market research might reveal that there’s a gap in providing immediate, non-clinical emotional support to individuals in a highly accessible and non-judgmental platform.

โญ The unique value proposition of ChatHealer would be its ability to offer instant, 24/7 emotional support through intelligent and empathetic conversational agents, ensuring user anonymity and privacy.

Step 3: Develop a Minimum Viable Product (MVP) to Validate and Iterate

The development of a Minimum Viable Product (MVP) would involve creating a basic version of ChatHealer, focusing on core functionalities like user authentication, basic conversational abilities, and ensuring data security. The MVP would be introduced to a select group of users, and their feedback would be paramount in validating and iterating the product, ensuring it aligns with user expectations and experiences.

๐Ÿ’ก Recommended: Minimum Viable Product (MVP) in Software Development โ€” Why Stealth Sucks

Step 4: Utilize LLMs and AI to Scale Labor and Find a Business Model

Leveraging LLMs and AI, ChatHealer could enhance its conversational agents to understand and respond to user inputs more empathetically and contextually, providing a semblance of genuine human interaction.

๐Ÿ“ˆ The business model might adopt a freemium approach, offering basic conversational support for free while providing a premium subscription that includes additional features like personalized emotional support journeys, and perhaps, priority access to human professionals.

Step 5: Focus on Customer Experience and Scale Gradually

Ensuring a seamless and supportive customer experience would be pivotal, as the nature of ChatHealer demands a safe and nurturing environment. As the platform gains traction, gradual scaling would involve introducing ChatHealer to wider demographics and possibly integrating multilingual support to cater to a global audience.

Step 6: Continuous Improvements

Continuous improvement would be embedded in ChatHealerโ€™s operations, ensuring that the platform evolves with technological advancements and user needs. Building partnerships, perhaps with mental health professionals and organizations, could enhance its credibility and provide a pathway for users to access further support if needed.

Step 7: Manage Finances Wisely

Prudent financial management would ensure that funds are judiciously utilized, maintaining a balance between technological development, marketing, and operations. Cultivating a culture of innovation within the team ensures that ChatHealer remains at the forefront of technological and therapeutic advancements, always exploring new ways to provide support to its users.

๐Ÿ“ˆ Recommended: The Math of Becoming a Millionaire in 13 Years

Step 8: Adaptability and Compliance

Adaptability would be key, as ChatHealer would need to be ready to pivot its strategies and offerings in response to user needs, technological advancements, and market trends. Ensuring that all operations, especially data handling and user interactions, adhere to legal and compliance standards would be paramount to maintain user trust and regulatory adherence.

Step 9: Measure and Analyze Throughout the Process

Lastly, employing analytics to measure and analyze user engagement, subscription conversions, and user feedback would be instrumental in shaping ChatHealerโ€™s future strategies and innovations, ensuring that it not only remains a disruptive innovation but also a sustained, valuable service in the emotional support domain.

Case Study: Is Uber a Disruptive Innovation?

In this section, we will explore whether Uber is a disruptive innovation by examining its origins and how its quality compares to the mainstream market expectations.

Disruptive Innovations Start with Low-End or New-Market Footholds

Disruptive innovations typically begin in low-end or new-market footholds, as incumbents often focus on their most profitable and demanding customers. This focus can lead to less attention being paid to less-demanding customers, allowing disruptors to introduce products that cater to these neglected market segments.

However, Uber did not originate with either a low-end or new-market foothold. It did not start by targeting non-consumers or finding a low-end opportunity. Instead, Uber was launched in San Francisco, which already had a well-established taxi market. Its primary customers were individuals who already had the habit of hiring rides. Therefore, Uber did not follow the typical pattern of disruptive innovations that begin with low-end or new-market footholds.

Quality Must Align with Mainstream Expectations in Disruptive Innovations

Disruptive innovations are initially perceived as inferior in comparison to the offerings by established companies. Mainstream customers are hesitant to adopt these new, typically cheaper, alternatives until their quality satisfies their expectations.

In the case of Uber, most elements of its strategy appear to be sustaining innovations. Its service is often regarded as equal or superior to existing taxi services, with convenient booking, cashless payments, and a passenger rating system. Additionally, Uber generally offers competitive pricing and reliable service. In response to Uber, established taxi companies have implemented similar technologies and challenged the legality of some of Uberโ€™s offerings.

Based on these factors, Uber cannot be considered a true disruptive innovation. While it has certainly impacted the taxi market and incited changes among traditional taxi companies, it did not originate from classic low-end or new-market footholds, and its service quality aligns with mainstream expectations rather than being perceived as initially inferior.

Frequently Asked Questions

What makes disruptive innovation different from regular innovations?

Disruptive innovation refers to a process where a smaller company with fewer resources challenges established businesses by entering at the bottom of the market and moving up-market. This is different from traditional or incremental innovations, which usually improve existing products or services for existing customers.

Can you give some examples of disruptive innovation in the healthcare sector?

Some examples of disruptive innovation in healthcare include:

  • Telemedicine: Remote consultations through video calls, making healthcare services more accessible.
  • Wearable health technology: Wearable devices that monitor and track health data, empowering individuals to take control of their health.
  • Electronic health records (EHR): Digitizing patient records for more efficient and secure management of information.

Which companies have successfully implemented disruptive innovation?

Some well-known companies that implemented disruptive innovation strategies include:

  • Netflix (transforming the way we consume video content)
  • Uber (redefining transportation services)
  • Airbnb (disrupting the hospitality industry)
  • Slack (changing team communication and collaboration)

Could you share some low-end disruptive innovation examples?

Low-end disruption refers to innovations targeting customers who are not well-served by the incumbent companies due to high prices or complex products. Examples include:

  • IKEA (providing affordable and stylish furniture)
  • Southwest Airlines (offering low-cost air travel)
  • Xiaomi (manufacturing and selling high-quality smartphones at affordable prices)

What is the process for introducing disruptive innovations?

Launching disruptive innovations typically involves the following steps:

  1. Identify an underserved market segment or new niche.
  2. Develop a cost-effective, simple, and efficient solution targeting this segment.
  3. Iterate and improve the product or service offering as you learn more about customers and the market.
  4. Gradually move up-market, improving the product or service as it gains traction and market share.

Can you provide examples of new market disruptions?

New market disruptions typically create entirely new markets that did not exist before. Examples include:

  • E-commerce platforms like Amazon (creating a massive online marketplace)
  • Social media platforms like Facebook (connecting people worldwide and creating an advertising market)
  • Streaming music services like Spotify (transforming how individuals listen to music and generating revenue through subscriptions and ads)

If you want to keep learning disruptive technologies, why not becoming an expert prompt engineer with our Finxter Academy Courses (all-you-can-learn) such as this one: ๐Ÿ‘‡

The post Disruptive Innovation – A Friendly Guide for Small Coding Startups appeared first on Be on the Right Side of Change.

Posted on Leave a comment

5 Expert-Approved Ways to Remove Unicode Characters from a Python Dict

5/5 – (1 vote)

The best way to remove Unicode characters from a Python dictionary is a recursive function that iterates over each key and value, checking their type.

โœ… If a value is a dictionary, the function calls itself.
โœ… If a value is a string, it’s encoded to ASCII, ignoring non-ASCII characters, and then decoded back to a string, effectively removing any Unicode characters.

This ensures a thorough cleansing of the entire dictionary.

Here’s a minimal example for copy&paste

def remove_unicode(obj): if isinstance(obj, dict): return {remove_unicode(key): remove_unicode(value) for key, value in obj.items()} elif isinstance(obj, str): return obj.encode('ascii', 'ignore').decode('ascii') return obj # Example usage
my_dict = {'key': 'valรผe', 'kรซy2': {'kรชy3': 'vร lue3'}}
cleaned_dict = remove_unicode(my_dict)
print(cleaned_dict)

In this example, remove_unicode is a recursive function that traverses the dictionary. If it encounters a dictionary, it recursively cleans each key-value pair. If it encounters a string, it encodes the string to ASCII, ignoring non-ASCII characters, and then decodes it back to a string. The example usage shows a nested dictionary with Unicode characters, which are removed in the cleaned_dict.


Understanding Unicode and Dictionaries in Python

You may come across dictionaries containing Unicode values. These Unicode values can be a hurdle when using the data in specific formats or applications, such as JSON editors. To overcome these challenges, you can use various methods to remove the Unicode characters from your dictionaries.

One popular method to remove Unicode characters from a dictionary is by using the encode() method to convert the keys and values within the dictionary into a different encoding, such as UTF-8. This can help you eliminate the 'u' prefix, which signifies a character is a Unicode character. Similarly, you can use external libraries, like Unidecode, that provide functions to transliterate Unicode strings into the closest possible ASCII representation (source).

๐Ÿ’ก Recap: Python dictionaries are a flexible data structure that allows you to store key-value pairs. They enable you to organize and access your data more efficiently. A dictionary can hold a variety of data types, including Unicode strings. Unicode is a widely-used character encoding standard that includes a huge range of characters from different scripts and languages.

When working with dictionaries in Python, you might encounter Unicode strings as keys or values. For example, a dictionary might have keys or values in various languages or contain special characters like emojis (๐Ÿ™ˆ๐Ÿ™‰๐Ÿ™Š). This diversity is because Python supports Unicode characters to allow for broader text representation and internationalization.

To create a dictionary containing Unicode strings, you simply define key-value pairs with the appropriate Unicode characters. In some cases, you might also have nested dictionaries, where a dictionary’s value is another dictionary. Nested dictionaries can also contain Unicode strings as keys or values.

Consider the following example:

my_dictionary = { "name": "Franรงois", "languages": { "primary": "Franรงais", "secondary": "English" }, "hobbies": ["music", "ูู†ูˆู†-ุงู„ู‚ุชุงู„"]
}

In this example, the dictionary represents a person’s information, including their name, languages, and hobbies. Notice that both the name and primary language contain Unicode characters, and one of the items in the hobbies list is also represented using Unicode characters.

When working with dictionary data that contains Unicode characters, you might need to remove or replace these characters for various purposes, such as preprocessing text for machine learning applications or ensuring compatibility with ASCII-only systems. Several methods can help you achieve this, such as using Python’s built-in encode() and decode() methods or leveraging third-party libraries like Unidecode.

Now that you have a better understanding of Unicode and dictionaries in Python, you can confidently work with dictionary data containing Unicode characters and apply appropriate techniques to remove or replace them when necessary.

Challenges with Unicode in Dictionaries

Your data may contain special characters from different languages. These characters can lead to display, sorting, and searching problems, especially when your goal is to process the data in a way that is language-agnostic.

One of the main challenges with Unicode characters in dictionaries is that they can cause compatibility issues when interacting with certain libraries, APIs, or external tools. For instance, JSON editors may struggle to handle Unicode properly, potentially resulting in malformed data. Additionally, some libraries may not be specifically designed to handle Unicode, and even certain text editors may not display these characters correctly.

๐Ÿ’ก Note: Another issue arises when attempting to remove Unicode characters from a dictionary. You may initially assume that using functions like .encode() or .decode() would be sufficient, but these functions can sometimes leave the 'u' prefix, which denotes a unicode string, in place. This can lead to confusion and unexpected results when working with the data.

To address these challenges, various methods can be employed to remove Unicode characters from dictionaries:

  1. Method 1: You could try converting your dictionary to a JSON object, and then back to a dictionary with the help of the json library. This process can effectively remove the Unicode characters, making your data more compatible and easier to work with.
  2. Method 2: Alternatively, you can use a library like unidecode to convert Unicode to ASCII characters, which can be helpful in cases where you need to interact with systems or APIs that only accept ASCII text.
  3. Method 3: Another option is to use list or dict comprehensions to iterate over your data and apply the .encode() and .decode() methods, effectively stripping the unicode characters from your dictionary.

Below are minimal code snippets for each of the three approaches:

Method 1: Using JSON Library

import json my_dict = {'key': 'valรผe'}
# Convert dictionary to JSON object and back to dictionary
cleaned_dict = json.loads(json.dumps(my_dict, ensure_ascii=True))
print(cleaned_dict)

In this example, the dictionary is converted to a JSON object and back to a dictionary, ensuring ASCII encoding, which removes Unicode characters.

Method 2: Using Unidecode Library

from unidecode import unidecode my_dict = {'key': 'valรผe'}
# Use unidecode to convert Unicode to ASCII
cleaned_dict = {k: unidecode(v) for k, v in my_dict.items()}
print(cleaned_dict)

Here, the unidecode library is used to convert each Unicode string value to ASCII, iterating over the dictionary with a dict comprehension.

Method 3: Using List or Dict Comprehensions

my_dict = {'key': 'valรผe'}
# Use .encode() and .decode() to remove Unicode characters
cleaned_dict = {k.encode('ascii', 'ignore').decode(): v.encode('ascii', 'ignore').decode() for k, v in my_dict.items()}
print(cleaned_dict)

In this example, a dict comprehension is used to iterate over the dictionary. The .encode() and .decode() methods are applied to each key and value to strip Unicode characters.

๐Ÿ’ก Recommended: Python Dictionary Comprehension: A Powerful One-Liner Tutorial

Fundamentals of Removing Unicode

When working with dictionaries in Python, you may sometimes encounter Unicode characters that need to be removed. In this section, you’ll learn the fundamentals of removing Unicode characters from dictionaries using various techniques.

Firstly, it’s important to understand that Unicode characters can be present in both keys and values of a dictionary. A common scenario that may require you to remove Unicode characters is when you need to convert your dictionary into a JSON object.

One of the simplest ways to remove Unicode characters is by using the str.encode() and str.decode() methods. You can loop through the dictionary, and for each key-value pair, apply these methods to remove any unwanted Unicode characters:

new_dict = {}
for key, value in old_dict.items(): new_key = key.encode('ascii', 'ignore').decode('ascii') if isinstance(value, str): new_value = value.encode('ascii', 'ignore').decode('ascii') else: new_value = value new_dict[new_key] = new_value

Another useful method, particularly for removing Unicode characters from strings, is the isalnum() function. You can use this in combination with a loop to clean your keys and values:

def clean_unicode(string): return "".join(c for c in string if c.isalnum() or c.isspace()) new_dict = {}
for key, value in old_dict.items(): new_key = clean_unicode(key) if isinstance(value, str): new_value = clean_unicode(value) else: new_value = value new_dict[new_key] = new_value

As you can see, removing Unicode characters from a dictionary in Python can be achieved using these techniques.

Using Id and Ast for Unicode Removal

Utilizing the id and ast libraries in Python can be a powerful way to remove Unicode characters from a dictionary. The ast library, in particular, offers an s-expression parser which makes processing text data more straightforward. In this section, you will follow a step-by-step guide to using these powerful tools effectively.

First, you need to import the necessary libraries. In your Python script, add the following lines to import json and ast:

import json
import ast

The next step is to define your dictionary containing Unicode strings. Let’s use the following example dictionary:

my_dict = {u'Apple': [u'A', u'B'], u'orange': [u'C', u'D']}

Now, you can utilize the json.dumps() function and ast.literal_eval() for the Unicode removal process. The json.dumps() function converts the dictionary into a JSON-formatted string. This function removes the Unicode 'u' from the keys and values in your dictionary. After that, you can employ the ast.literal_eval() s-expression parser to convert the JSON-formatted string back to a Python dictionary.

Here’s how to perform these steps:

json_string = json.dumps(my_dict)
cleaned_dict = ast.literal_eval(json_string)

After executing these lines, you will obtain a new dictionary called cleaned_dict without the Unicode characters. Simply put, it should look like this:

{'Apple': ['A', 'B'], 'orange': ['C', 'D']}

By using the id and ast libraries, you can efficiently remove Unicode characters from dictionaries in Python. Following this simple yet effective method, you can ensure the cleanliness of your data, making it easier to work with and process.

Replacing Unicode Characters with Empty String

When working with dictionaries in Python, you might come across cases where you need to remove Unicode characters. One efficient way to do this is by replacing Unicode characters with empty strings.

To achieve this, you can make use of the encode() and decode() string methods available in Python. First, you need to loop through your dictionary and access the strings. Here’s how you can do it:

for key, value in your_dict.items(): cleaned_key = key.encode("ascii", "ignore").decode() cleaned_value = value.encode("ascii", "ignore").decode() your_dict[cleaned_key] = cleaned_value

In this code snippet, the encode() function encodes the string into ‘ASCII’ format and specifies the error-handling mode as ‘ignore’, which helps remove Unicode characters. The decode() function is then used to convert the encoded string back to its original form, without the Unicode characters.

๐Ÿ’ก Note: This method assumes your dictionary contains only string keys and values. If your dictionary has nested values, such as lists or other dictionaries, you’ll need to adjust the code to handle those cases as well.

If you want to perform this operation on a single string instead, you can do this:

cleaned_string = original_string.encode("ascii", "ignore").decode()

Applying Encode and Decode Methods

When you need to remove Unicode characters from a dictionary, applying the encode() and decode() methods is a straightforward and effective approach. In Python, these built-in methods help you encode a string into a different character representation and decode byte strings back to Unicode strings.

To remove Unicode characters from a dictionary, you can iterate through its keys and values, applying the encode() and decode() methods. First, encode the Unicode string to ASCII, specifying the 'ignore' error handling mode. This mode omits any Unicode characters that do not have an ASCII representation. After encoding the string, decode it back to a regular string.

Here’s an example:

input_dict = {"๐•ด๐–—๐–”๐–“๐–’๐–†๐–“": "๐–™๐–๐–Š ๐–๐–Š๐–—๐–”", "location": "๐•ฌ๐–›๐–Š๐–“๐–Œ๐–Š๐–—๐–˜ ๐•ฟ๐–”๐–œ๐–Š๐–—"}
output_dict = {} for key, value in input_dict.items(): encoded_key = key.encode("ascii", "ignore") decoded_key = encoded_key.decode() encoded_value = value.encode("ascii", "ignore") decoded_value = encoded_value.decode() output_dict[decoded_key] = decoded_value

In this example, the output_dict will be a new dictionary with the same keys and values as input_dict, but with Unicode characters removed:

{"Ironman": "the hero", "location": "Avengers Tower"}

Keep in mind that the encode() and decode() methods may not always produce an accurate representation of the original Unicode characters, especially when dealing with complex scripts or diacritic marks.

If you need to handle a wide range of Unicode characters and preserve their meaning in the output string, consider using libraries like Unidecode. This library can transliterate any Unicode string into the closest possible representation in ASCII text, providing better results in some cases.

Utilizing JSON Dumps and Literal Eval

When dealing with dictionaries containing Unicode characters, you might want an efficient and user-friendly way to remove or bypass the characters. Two useful techniques for this purpose are using json.dumps from the json module and ast.literal_eval from the ast module.

To begin, import both the json and ast modules in your Python script:

import json
import ast

The json.dumps method is quite handy for converting dictionaries with Unicode values into strings. This method takes a dictionary and returns a JSON formatted string. For instance, if you have a dictionary containing Unicode characters, you can use json.dumps to obtain a string version of the dictionary:

original_dict = {"key": "value with unicode: \u201Cexample\u201D"}
json_string = json.dumps(original_dict, ensure_ascii=False)

The ensure_ascii=False parameter in json.dumps ensures that Unicode characters are encoded in the UTF-8 format, making the JSON string more human-readable.

Next, you can use ast.literal_eval to evaluate the JSON string and convert it back to a dictionary. This technique allows you to get rid of any unnecessary Unicode characters by restricting the data structure to basic literals:

cleaned_dict = ast.literal_eval(json_string)

Keep in mind that ast.literal_eval is more secure than the traditional eval() function, as it only evaluates literals and doesn’t execute any arbitrary code.

By using both json.dumps and ast.literal_eval in tandem, you can effectively manage Unicode characters in dictionaries. These methods not only help to remove Unicode characters but also assist in maintaining a human-readable format for further processing and editing.

Managing Unicode in Nested Dictionaries

Dealing with Unicode characters in nested dictionaries can sometimes be challenging. However, you can efficiently manage this by following a few simple steps.

First and foremost, you need to identify any Unicode content within your nested dictionary. If you’re working with large dictionaries, consider looping through each key-value pair and checking for the presence of Unicode.

One approach to remove Unicode characters from nested dictionaries is to use the Unidecode library. This library transliterates any Unicode string into the closest possible ASCII representation. To use Unidecode, you’ll need to install it first:

pip install Unidecode

Now, you can begin working with the Unidecode library. Import the library and create a function to process each value in the dictionary. Here’s a sample function that handles nested dictionaries:

from unidecode import unidecode def remove_unicode_from_dict(dictionary): new_dict = {} for key, value in dictionary.items(): if isinstance(value, dict): new_value = remove_unicode_from_dict(value) elif isinstance(value, list): new_value = [remove_unicode_from_dict(item) if isinstance(item, dict) else item for item in value] elif isinstance(value, str): new_value = unidecode(value) else: new_value = value new_dict[key] = new_value return new_dict

This function recursively iterates through the dictionary, removing Unicode characters from string values and maintaining the original structure. Use this function on your nested dictionary:

cleaned_dict = remove_unicode_from_dict(your_nested_dictionary)

Handling Special Cases with Regular Expressions

When working with dictionaries in Python, you may come across special characters or Unicode characters that need to be removed or replaced. Using the re module in Python, you can leverage the power of regular expressions to effectively handle such cases.

Let’s say you have a dictionary with keys and values containing various Unicode characters. One efficient way to remove them is by combining the re.sub() function and ord() function. First, import the required re module:

import re

To remove special characters, you can use the re.sub() function, which takes a pattern, replacement, and a string as arguments, and returns a new string with the specified pattern replaced:

string_with_special_chars = "๐“ฃ๐“ฑ๐“ฒ๐“ผ ๐“ฒ๐“ผ ๐“ช ๐“ฝ๐“ฎ๐“ผ๐“ฝ ๐“ผ๐“ฝ๐“ป๐“ฒ๐“ท๐“ฐ."
clean_string = re.sub(r"[^\x00-\x7F]+", "", string_with_special_chars)

ord() is a useful built-in function that returns the Unicode code point of a given character. You can create a custom function utilizing ord() to check if a character is alphanumeric:

def is_alphanumeric(char): code_point = ord(char) return (code_point >= 48 and code_point <= 57) or (code_point >= 65 and code_point <= 90) or (code_point >= 97 and code_point <= 122)

Now you can use this custom function along with the re.sub() function to clean up your dictionary:

def clean_dict_item(item): return "".join([char for char in item if is_alphanumeric(char) or char.isspace()]) original_dict = {"๐“ฝ๐“ฎ๐“ผ๐“ฝ1": "๐“—๐“ฎ๐“ต๐“ต๐“ธ ๐“ฆ๐“ธ๐“ป๐“ต๐“ญ!", "๐“ฝ๐“ฎ๐“ผ๐“ฝ2": "๐“˜ ๐“ต๐“ธ๐“ฟ๐“ฎ ๐“Ÿ๐”‚๐“ฝ๐“ฑ๐“ธ๐“ท!"}
cleaned_dict = {clean_dict_item(key): clean_dict_item(value) for key, value in original_dict.items()} print(cleaned_dict)
# {'1': ' ', '2': ' '}

Frequently Asked Questions

How can I eliminate non-ASCII characters from a Python dictionary?

To eliminate non-ASCII characters from a Python dictionary, you can use a dictionary comprehension with the str.encode() method and the ascii codec. This will replace non-ASCII characters with their escape codes. Here’s an example:

original_dict = {"key": "value with non-ASCII character: ฤ™"}
cleaned_dict = {k: v.encode("ascii", "ignore").decode() for k, v in original_dict.items()}

What is the best way to remove hex characters from a string in Python?

One efficient way to remove hex characters from a string in Python is using the re (regex) module. You can create a pattern to match hex characters and replace them with nothing. Here’s a short example code:

import re
text = "Hello \x00World!"
clean_text = re.sub(r"\\x\d{2}", "", text)

How to replace Unicode characters with ASCII in a Python dict?

To replace Unicode characters with their corresponding ASCII characters in a Python dictionary, you can use the unidecode library. Install it using pip install unidecode, and then use it like this:

from unidecode import unidecode
original_dict = {"key": "value with non-ASCII character: ฤ™"}
ascii_dict = {k: unidecode(v) for k, v in original_dict.items()}

How can I filter out non-ascii characters in a dictionary?

To filter out non-ASCII characters in a Python dictionary, you can use a dictionary comprehension along with a string comprehension to create new strings containing only ASCII characters.

original_dict = {"key": "value with non-ASCII character: ฤ™"}
filtered_dict = {k: "".join(char for char in v if ord(char) < 128) for k, v in original_dict.items()}

What method should I use to remove ‘u’ from a list in Python?

If you want to remove the ‘u’ Unicode prefix from a list of strings, you can simply convert each element to a regular string using a list comprehension:

unicode_list = [u"example1", u"example2"]
string_list = [str(element) for element in unicode_list]

How do I handle and remove special characters from a dictionary?

Handling and removing special characters from a dictionary can be accomplished using the re module to replace unwanted characters with an empty string or a suitable replacement. Here’s an example:

import re
original_dict = {"key": "value with special character: #!"}
cleaned_dict = {k: re.sub(r"[^A-Za-z0-9\s]+", "", v) for k, v in original_dict.items()}

This will remove any character that is not an alphanumeric character or whitespace from the dictionary values.


If you learned something new today, feel free to join my free email academy. We have cheat sheets too! โœ…

The post 5 Expert-Approved Ways to Remove Unicode Characters from a Python Dict appeared first on Be on the Right Side of Change.

Posted on Leave a comment

GPT-4 with Vision (GPT-4V) Is Out! 32 Fun Examples with Screenshots

5/5 – (1 vote)

๐Ÿ’ก TLDR: GPT-4 with vision (GPT-4V) is now out for many ChatGPT Plus users in the US and some other regions! You can instruct GPT-4 to analyze image inputs. GPT-4V incorporates additional modalities such as image inputs into large language models (LLMs). Multimodal LLMs will expand the reach of AI from mainly language-based applications to a broad range of brand-new application categories that go beyond language user interfaces (UIs).

๐Ÿ‘† GPT-4V could explain why a picture was funny by talking about different parts of the image and their connections. The meme in the picture has words on it, which GPT-4V read to help make its answer. However, it made an error. It wrongly said the fried chicken in the image was called โ€œNVIDIA BURGERโ€ instead of โ€œGPUโ€.

Still impressive! ๐Ÿคฏ OpenAI’s GPT-4 with Vision (GPT-4V) represents a significant advancement in artificial intelligence, enabling the analysis of image inputs alongside text.

Let’s dive into some additional examples I and others encountered:

More Examples

Prompting GPT-4V with "How much money do I have?" and a photo of some foreign coins:

GPT4V was even able to identify that these are Polish Zloty Coins, a task with which 99% of humans would struggle:

It can also identify locations from photos and give you information about plants you make photos of. In this way, it’s similar to Google Lens but much better and more interactive with a higher level of image understanding.

It can do optical character recognition (OCR) almost flawlessly:

Now here’s why many teachers and professors will lose their sleep over GPT-4V: it can even solve math problems from photos (source):

GPT-4V can do object detection, a crucial field in AI and ML: one model to rule them all!

GPT-4V can even help you play poker โ™ โ™ฅ

A Twitter/X user gave it a screenshot of a day planner and asked it to code a digital UI of it. The Python code worked!

Speaking of coding, here’s a fun example by another creative developer, Matt Shumer:

"The first GPT-4V-powered frontend engineer agent. Just upload a picture of a design, and the agent autonomously codes it up, looks at a render for mistakes, improves the code accordingly, repeat. Utterly insane." (source)

I’ve even seen GPT-4V analyzing financial data like Bitcoin indicators:

source

I could go on forever. Here are 20 more ideas of how to use GPT-4V that I found extremely interesting, fun, and even visionary:

  1. Visual Assistance for the Blind: GPT-4V can describe the surroundings or read out text from images to assist visually impaired individuals.
  2. Educational Tutor: It can analyze diagrams and provide detailed explanations, helping students understand complex concepts.
  3. Medical Imaging: Assist doctors by providing preliminary observations from medical images (though not for making diagnoses).
  4. Recipe Suggestions: Users can show ingredients they have, and GPT-4V can suggest possible recipes.
  5. Fashion Advice: Offer fashion tips by analyzing pictures of outfits.
  6. Plant or Animal Identification: Identify and provide information about plants or animals in photos.
  7. Travel Assistance: Analyze photos of landmarks to provide historical and cultural information.
  8. Language Translation: Read and translate text in images from one language to another.
  9. Home Decor Planning: Provide suggestions for home decor based on pictures of users’ living spaces.
  10. Art Creation: Offer guidance and suggestions for creating art by analyzing images of ongoing artwork.
  11. Fitness Coaching: Analyze workout or yoga postures and offer corrections or enhancements.
  12. Event Planning: Assist in planning events by visualizing and organizing space, decorations, and layouts.
  13. Shopping Assistance: Help users in making purchasing decisions by analyzing product images and providing information.
  14. Gardening Advice: Provide gardening tips based on pictures of plants and their surroundings.
  15. DIY Project Guidance: Offer step-by-step guidance for DIY projects by analyzing images of the project at various stages.
  16. Safety Training: Analyze images of workplace environments to offer safety recommendations.
  17. Historical Analysis: Provide historical context and information for images of historical events or figures.
  18. Real Estate Assistance: Analyze images of properties to provide insights and information for buyers or sellers.
  19. Wildlife Research: Assist researchers by analyzing images of wildlife and their habitats.
  20. Meme Creation: Help users create memes by suggesting text or edits based on the image provided.

These are truly mind-boggling times. Most of those ideas are million-dollar startup ideas. Some ideas (like the real estate assistance app #18) could become billion-dollar businesses that are mostly built on GPT-4V’s functionality and are easy to implement for coders like you and me.

If you’re interested, feel free to read my other article on the Finxter blog:

๐Ÿ“ˆ Recommended: Startup.ai โ€“ Eight Steps to Start an AI Subscription Biz

What About SaFeTY?

GPT-4V is a multimodal large language model that incorporates image inputs, expanding the impact of language-only systems by solving new tasks and providing novel experiences for users. It builds upon the work done for GPT-4, employing a similar training process and reinforcement learning from human feedback (RLHF) to produce outputs preferred by human trainers.

Why RLHF? Mainly to avoid jailbreaking ๐Ÿ˜ข๐Ÿ˜… like so:

You can see that the “refusal rate” went up significantly:

From an everyday user perspective that doesn’t try to harm people, the "Sorry I cannot do X" reply will remain one of the more annoying parts of LLM tech, unfortunately.

However, the race is on! People have still reported jailbroken queries like this: ๐Ÿ˜‚

I hope you had fun reading this compilation of GPT-4V ideas. Thanks for reading! โ™ฅ If you’re not already subscribed, feel free to join our popular Finxter Academy with dozens of state-of-the-art LLM prompt engineering courses for next-level exponential coders. It’s an all-you-can-learn inexpensive way to remain on the right side of change.

For example, this is one of our recent courses:

Prompt Engineering with Llama 2

๐Ÿ’ก Theย Llama 2 Prompt Engineering course helps you stay on the right side of change.ย Our course is meticulously designed to provide you with hands-on experience through genuine projects.

You’ll delve into practical applications such as book PDF querying, payroll auditing, and hotel review analytics. These aren’t just theoretical exercises; they’re real-world challenges that businesses face daily.

By studying these projects, you’ll gain a deeper comprehension of how to harness the power of Llama 2 using ๐Ÿ Python, ๐Ÿ”—๐Ÿฆœ Langchain, ๐ŸŒฒ Pinecone, and a whole stack of highly โš’๐Ÿ›  practical tools of exponential coders in a post-ChatGPT world.

The post GPT-4 with Vision (GPT-4V) Is Out! 32 Fun Examples with Screenshots appeared first on Be on the Right Side of Change.

Posted on Leave a comment

4 Best Ways to Remove Unicode Characters from JSON

4/5 – (1 vote)

To remove all Unicode characters from a JSON string in Python, load the JSON data into a dictionary using json.loads(). Traverse the dictionary and use the re.sub() method from the re module to substitute any Unicode character (matched by the regular expression pattern r'[^\x00-\x7F]+') with an empty string. Convert the updated dictionary back to a JSON string with json.dumps().

import json
import re # Original JSON string with emojis and other Unicode characters
json_str = '{"text": "I love ๐Ÿ• and ๐Ÿฆ on a โ˜€ day! \u200b \u1234"}' # Load JSON data
data = json.loads(json_str) # Remove all Unicode characters from the value
data['text'] = re.sub(r'[^\x00-\x7F]+', '', data['text']) # Convert back to JSON string
new_json_str = json.dumps(data) print(new_json_str)
# {"text": "I love and on a day! "}

The text "I love 🍕 and 🍦 on a ☀ day! \u200b \u1234" contains various Unicode characters including emojis and other non-ASCII characters. The code will output {"text": "I love and on a day! "}, removing all the Unicode characters and leaving only the ASCII characters.

This is only one method, keep reading to learn about alternative ones and detailed explanations! ๐Ÿ‘‡


Occasionally, you may encounter unwanted Unicode characters in your JSON files, leading to problems with parsing and displaying the data. Removing these characters ensures clean, well-formatted JSON data that can be easily processed and analyzed.

In this article, we will explore some of the best practices to achieve this, providing you with the tools and techniques needed to clean up your JSON data efficiently.

Understanding Unicode Characters

Unicode is a character encoding standard that includes characters from most of the world’s writing systems. It allows for consistent representation and handling of text across different languages and platforms. In this section, you’ll learn about Unicode characters and how they relate to JSON.

๐Ÿ’ก JSON is natively designed to support Unicode, which means it can store and transmit information in various languages without any issues. When you store a string in JSON, it can include any valid Unicode character, making it easy to work with multilingual data. However, certain Unicode characters might cause problems in specific scenarios, such as when using older software or transmitting data over a limited bandwidth connection.

In JSON, certain characters must be escaped, like quotation marks, reverse solidus, and control characters (U+0000 through U+001F). These characters must be represented using escape sequences in order for the JSON to be properly parsed.

๐Ÿ”— You can find more information about escaping characters in JSON through this Stack Overflow discussion.

There might be times where you need to remove or replace Unicode characters from your JSON data. One way to achieve this is by using encoding and decoding techniques. For example, you can encode a string to ASCII while ignoring non-ASCII characters, and then decode it back to UTF-8.

๐Ÿ”— This method can be found in this Stack Overflow example.

The Basics of JSON

๐Ÿ’ก JSON (JavaScript Object Notation) is a lightweight, text-based data interchange format that is easy to read and write. It has become one of the most popular data formats for exchanging information on the web. When dealing with JSON data, you may encounter situations where you need to remove or modify Unicode characters.

JSON is built on two basic structures: objects and arrays.

  • An object is an unordered collection of key-value pairs, while
  • an array represents an ordered list of values.

A JSON file typically consists of a single object or array, containing different types of data such as strings, numbers, and other objects.

When working with JSON data, it is important to ensure that the text is properly formatted. This includes using appropriate escape characters for special characters, such as double quotes and backslashes, as well as handling any Unicode characters in the text. Keep in mind that JSON is a human-readable format, so a well-formatted JSON file should be easy to understand.

Since JSON data is text-based, you can easily manipulate it using standard text-processing techniques. For example, to remove unwanted Unicode characters from a JSON file, you can use a combination of encoding and decoding methods, like this:

json_data = json_data.encode("ascii", "ignore").decode("utf-8")

This process will remove all non-ASCII characters from the JSON data and return a new, cleaned-up version of the text.

How Unicode Characters Interact within JSON

In JSON, most Unicode characters can be freely placed within the string values. However, there are certain characters that must be escaped (i.e., replaced by a special sequence of characters) to be part of your JSON string. These characters include the quotation mark (U+0022), the reverse solidus (U+005C), and control characters ranging from U+0000 to U+001F.

When you encounter escaped Unicode characters in your JSON, they typically appear in a format like \uXXXX, where XXXX represents a 4-digit hexadecimal code. For example, the acute รฉ character can be represented as \u00E9. JSON parsers can understand this format and interpret it as the intended Unicode character.

Sometimes, you might need or want to remove these Unicode characters from your JSON data. This can be done in various ways, depending on the programming language you are using. In Python, for instance, you could leverage the encode and decode functions to remove unwanted Unicode characters:

cleaned_string = original_string.encode("ascii", "ignore").decode("utf-8")

In this code snippet, the encode function tries to convert the original string to ASCII, replacing Unicode characters with basic ASCII equivalents. The ignore parameter specifies that any non-ASCII characters should be left out. Finally, the decode function transforms the bytes back into a string.

Method 1: Encoding and Decoding JSONs

JSON supports Unicode character sets, including UTF-8, UTF-16, and UTF-32. UTF-8 is the most commonly used encoding for JSON texts and it is well-supported across different programming languages and platforms.

If you come across unwanted Unicode characters in your JSON data while parsing, you can use the built-in encoding and decoding functions provided by most languages. For example, in Python, the json.dumps() and json.loads() functions allow you to encode and decode JSON data respectively. To remove unwanted Unicode characters, you can use the encode() and decode() functions available in string objects:

json_data = '{"quote_text": "This is an example of a JSON file with unicode characters like \\u201c and \\u201d."}'
decoded_data = json.loads(json_data)
cleaned_text = decoded_data['quote_text'].encode("ascii", "ignore").decode('utf-8')

In this example, the encode() function is used with the "ascii" argument, which ignores unicode characters outside the ASCII range. The decode() function then converts the encoded bytes object back to a string.

When dealing with JSON APIs and web services, be aware that different programming languages and libraries may have specific methods for encoding and decoding JSON data. Always consult the documentation for the language or library you are working with to ensure proper handling of Unicode characters.

Method 2: Python Regex to Remove Unicode from JSON

A second approach is to use a regex pattern before loading the JSON data. By applying a regex pattern, you can remove specific Unicode characters. For example, in Python, you can implement this with the re module as follows:

import json
import re def remove_unicode(input_string): return re.sub(r'\\u([0-9a-fA-F]{4})', '', input_string) json_string = '{"text": "Welcome to the world of \\u2022 and \\u2019"}'
json_string = remove_unicode(json_string)
parsed_data = json.loads(json_string)

This code uses the remove_unicode function to strip away any Unicode entities before loading the JSON string. Once you have a clean JSON data, you can continue with further processing.

Method 3: Replace Non-ASCII Characters

Another approach to removing Unicode characters is to replace non-ASCII characters after decoding the JSON data. This method is useful when dealing with specific character sets. Here’s an example using Python:

import json def remove_non_ascii(input_string): return ''.join(char for char in input_string if ord(char) < 128) json_string = '{"text": "Welcome to the world of \\u2022 and \\u2019"}'
parsed_data = json.loads(json_string)
cleaned_data = {} for key, value in parsed_data.items(): cleaned_data[key] = remove_non_ascii(value) print(cleaned_data)
# {'text': 'Welcome to the world of and '}

In this example, the remove_non_ascii function iterates over each character in the input string and retains only the ASCII characters. By applying this to each value in the JSON data, you can efficiently remove any unwanted Unicode characters.

When working with languages like JavaScript, you can utilize external libraries to remove Unicode characters from JSON data. For instance, in a Node.js environment, you can use the lodash library for cleaning Unicode characters:

const _ = require('lodash');
const json = {"text": "Welcome to the world of โ€ข and โ€™"}; const removeUnicode = (obj) => { return _.mapValues(obj, (value) => _.replace(value, /[\u2022\u2019]/g, ''));
}; const cleanedJson = removeUnicode(json);

In this example, the removeUnicode function leverages Lodash’s mapValues and replace functions to remove specific Unicode characters from the JSON object.

Handling Specific Unicode Characters in JSON

Dealing with Control Characters

Control characters are special non-printing characters in Unicode, such as carriage returns, linefeeds, and tabs. JSON requires that these characters be escaped in strings. When dealing with JSON data that contains control characters, it’s essential to escape them properly to avoid potential errors when parsing the data.

For instance, you can use the json.dumps() function in Python to output a JSON string with control characters escaped:

import json data = { "text": "This is a string with a newline character\nin it."
} json_string = json.dumps(data)
print(json_string)

This would output the following JSON string with the newline character escaped:

{"text": "This is a string with a newline character\\nin it."}

When you parse this JSON string, the control character will be correctly interpreted, and you’ll be able to access the data as expected.

Addressing Non-ASCII Characters

JSON strings can also contain non-ASCII Unicode characters, such as those from other languages. These characters may sometimes cause problems when processing JSON data in applications that don’t handle Unicode well.

One option is to escape non-ASCII characters when encoding the JSON data. You can do this by setting the ensure_ascii parameter of the json.dumps() function to True:

import json data = { "text": "ใ“ใ‚“ใซใกใฏใ€ไธ–็•Œ๏ผ" # Japanese for "Hello, World!"
} json_string = json.dumps(data, ensure_ascii=True)
print(json_string)

This will output the JSON string with the non-ASCII characters escaped:

{"text": "\u3053\u3093\u306b\u3061\u306f\u3001\u4e16\u754c\u0021"}

However, if you’d rather preserve the original non-ASCII characters in the JSON output, you can set ensure_ascii to False:

json_string = json.dumps(data, ensure_ascii=False)
print(json_string)

In this case, the output would be:

{"text": "ใ“ใ‚“ใซใกใฏใ€ไธ–็•Œ๏ผ"}

Keep in mind that when working with non-ASCII characters in JSON, it’s essential to use tools and libraries that support Unicode. This ensures that the data is correctly processed and displayed in your application.

Examples: Implementing the Unicode Removal

Before starting with the examples, make sure you have your JSON object ready for manipulation. In this section, you’ll explore different methods to remove unwanted Unicode characters from JSON objects, focusing on JavaScript implementation.

First, let’s look at a simple example using JavaScript’s replace() function and a regular expression. The following code showcases how to remove Unicode characters from a JSON string:

const jsonString = '{"message": "๐•ด ๐–†๐–’ ๐•ด๐–—๐–”๐–“๐–’๐–†๐–“! I have some unicode characters."}';
const withoutUnicode = jsonString.replace(/[\u{0080}-\u{FFFF}]/gu, "");
console.log(withoutUnicode);

In the code above, the regular expression \u{0080}-\u{FFFF} covers most of the Unicode characters you might want to remove. By using the replace() function, you can replace those characters with an empty string ("").

Next, for more complex scenarios involving nested JSON objects, consider using a recursive function to traverse and clean up Unicode characters from the JSON data:

function cleanUnicode(jsonData) { if (Array.isArray(jsonData)) { return jsonData.map(item => cleanUnicode(item)); } else if (typeof jsonData === "object" &#x26;&#x26; jsonData !== null) { const cleanedObject = {}; for (const key in jsonData) { cleanedObject[key] = cleanUnicode(jsonData[key]); } return cleanedObject; } else if (typeof jsonData === "string") { return jsonData.replace(/[\u{0080}-\u{FFFF}]/gu, ""); } else { return jsonData; }
} const jsonObject = { message: "๐•ด ๐–†๐–’ ๐•ด๐–—๐–”๐–“๐–’๐–†๐–“! I have some unicode characters.", nested: { text: "๐•พ๐–”๐–’๐–Š ๐–š๐–“๐–Ž๐–ˆ๐–”๐–‰๐–Š ๐–ˆ๐–๐–†๐–—๐–†๐–ˆ๐–™๐–Š๐–—๐–˜ ๐–๐–Š๐–—๐–Š ๐–™๐–”๐–”!" }
}; const cleanedJson = cleanUnicode(jsonObject);
console.log(cleanedJson);

This cleanUnicode function processes arrays, objects, and strings, making it ideal for nested JSON data.

In conclusion, use the simple replace() method for single JSON strings, and consider a recursive approach for nested JSON data. Utilize these examples to confidently, cleanly, and effectively remove Unicode characters from your JSON data in JavaScript.

Common Errors and How to Resolve Them

When working with JSON data involving Unicode characters, you might encounter a few common errors that can easily be resolved. In this section, we will discuss these errors and provide solutions to overcome them.

One commonly observed issue is the presence of invalid Unicode characters in the JSON data. This can lead to decoding errors while parsing. To overcome this, you can employ a Python library called unidecode to remove accents and normalize the Unicode string into the closest possible representation in ASCII text. For example, using the unidecode library, you can transform a word like “Franรงois” into “Francois”:

from unidecode import unidecode
unidecode('Franรงois') # Output: 'Francois'

Another common error arises due to the presence of special characters in JSON data, which leads to parsing issues. Proper escaping of special characters is essential for building valid JSON strings. You can use the json.dumps() function in Python to automatically escape special characters in JSON strings. For instance:

import json
raw_data = {"text": "A string with special characters: \\, \", \'"}
json_string = json.dumps(raw_data)

Remember, it’s crucial to produce only 100% compliant JSON, as mentioned in RFC 4627. Ensuring that you follow these guidelines will help you avoid most of the common errors while handling Unicode characters in JSON.

Lastly, if you encounter non-compliant Unicode characters in text files, you can use a text editor like Notepad to remove them. For instance, you can save the file in Unicode format instead of the default ANSI format, which will help preserve the integrity of the Unicode characters.

By addressing these common errors, you’ll be able to effectively handle and process JSON data containing Unicode characters.

Conclusion

In summary, removing Unicode characters from JSON can be achieved using various methods. One approach is to encode the JSON string to ASCII and then decode it back to UTF-8. This method allows you to eliminate all Unicode characters in one go. For example, you can use the .encode("ascii", "ignore").decode('utf-8') technique to accomplish this, as explained on Stack Overflow.

Another option is applying regular expressions to target specific unwanted Unicode characters, as discussed in this Stack Overflow post. Employing regular expressions enables you to fine-tune your removal of specific Unicode characters from JSON strings.

Frequently Asked Questions

How to eliminate UTF-8 characters in Python?

To eliminate UTF-8 characters in Python, you can use the encode() and decode() methods. First, encode the string using ascii encoding with the ignore option, and then decode it back to utf-8. For example:

text = "Hello ไฝ ๅฅฝ"
sanitized_text = text.encode("ascii", "ignore").decode("utf-8")

What are the methods to remove non-ASCII characters in Python?

There are several methods to remove non-ASCII characters in Python:

  1. Using the encode() and decode() methods as mentioned above.
  2. Using a regular expression to filter out non-ASCII characters: re.sub(r'[^\x00-\x7F]+', '', text)
  3. Using a list comprehension to create a new string with only ASCII characters: ''.join(c for c in text if ord(c) < 128)

How can Pandas be used to remove Unicode characters?

To remove Unicode characters in a Pandas dataframe, you can use the applymap() function combined with the encode() and decode() methods:

import pandas as pd def sanitize(text): return text.encode("ascii", "ignore").decode("utf-8") df = pd.DataFrame({"text": ["Hello ไฝ ๅฅฝ", "Pandas rocks!"]})
df["sanitized_text"] = df["text"].apply(sanitize)

What is the process to replace Unicode in JSON?

To replace Unicode characters in a JSON object, you can first convert the JSON object to a string using the json.dumps() method. Then, replace the Unicode characters using one of the methods mentioned earlier. Finally, parse the sanitized string back to a JSON object using the json.loads() method:

import json
import re json_data = {"text": "Hello ไฝ ๅฅฝ"}
json_str = json.dumps(json_data)
sanitized_str = re.sub(r'[^\x00-\x7F]+', '', json_str)
sanitized_json = json.loads(sanitized_str)

How to convert Unicode to JSON format in Python?

If you have a Python object containing Unicode strings and want to convert it to JSON format, use the json.dumps() method:

import json data = {"text": "Hello ไฝ ๅฅฝ"}
json_data = json.dumps(data, ensure_ascii=False)

This will preserve the Unicode characters in the JSON output.

How can special characters be removed from a JSON file?

To remove special characters from a JSON file, first read the file and parse its content to a Python object using the json.loads() method. Then, iterate through the object and sanitize the strings, removing special characters using one of the mentioned methods. Finally, write the sanitized object back to a JSON file using the json.dump() method:

import json
import re with open("input.json", "r") as f: json_data = json.load(f) # sanitize your JSON object here with open("output.json", "w") as f: json.dump(sanitized_json_data, f)

The post 4 Best Ways to Remove Unicode Characters from JSON appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Prompt Engineering with Llama 2 (Full Course)

5/5 – (1 vote)

๐Ÿ’ก This Llama 2 Prompt Engineering course helps you stay on the right side of change. Our course is meticulously designed to provide you with hands-on experience through genuine projects.

YouTube Video

๐Ÿ”— Prompt Engineering with Llama 2: Four Practical Projects using Python, Langchain, and Pinecone

You’ll delve into practical applications such as book PDF querying, payroll auditing, and hotel review analytics.

These aren’t just theoretical exercises; they’re real-world challenges that businesses face daily.

By studying these projects, you’ll gain a deeper comprehension of how to harness the power of Llama 2 using ๐Ÿ Python, ๐Ÿ”—๐Ÿฆœ Langchain, ๐ŸŒฒ Pinecone, and a whole stack of highly โš’๐Ÿ›  practical tools of exponential coders in a post-ChatGPT world.

Specifically, you’ll learn these topics (ToC):

This knowledge can be your foundation in creating solutions that have tangible value for real people. Equip yourself with the expertise to keep pace with technological change and be a proactive force in shaping it.

The post Prompt Engineering with Llama 2 (Full Course) appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Python BS4 โ€“ How to Scrape Absolute URL Instead of Relative Path

5/5 – (1 vote)

Summary: Use urllib.parse.urljoin() to scrape the base URL and the relative path and join them to extract the complete/absolute URL. You can also concatenate the base URL and the absolute path to derive the absolute path; but make sure to take care of erroneous situations like extra forward-slash in this case.

Quick Answer

When web scraping with BeautifulSoup in Python, you may encounter relative URLs (e.g., /page2.html) instead of absolute URLs (e.g., http://example.com/page2.html). To convert relative URLs to absolute URLs, you can use the urljoin() function from the urllib.parse module.

Below is an example of how to extract absolute URLs from the a tags on a webpage using BeautifulSoup and urljoin:

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin # URL of the webpage you want to scrape
url = 'http://example.com' # Send an HTTP request to the URL
response = requests.get(url)
response.raise_for_status() # Raise an error for bad responses # Parse the webpage content
soup = BeautifulSoup(response.text, 'html.parser') # Find all the 'a' tags on the webpage
for a_tag in soup.find_all('a'): # Get the href attribute from the 'a' tag href = a_tag.get('href') # Use urljoin to convert the relative URL to an absolute URL absolute_url = urljoin(url, href) # Print the absolute URL print(absolute_url)

In this example:

  • url is the URL of the webpage you want to scrape.
  • response is the HTTP response obtained by sending an HTTP GET request to the URL.
  • soup is a BeautifulSoup object that contains the parsed HTML content of the webpage.
  • soup.find_all('a') finds all the a tags on the webpage.
  • a_tag.get('href') gets the href attribute from an a tag, which is the relative URL.
  • urljoin(url, href) converts the relative URL to an absolute URL by joining it with the base URL.
  • absolute_url is the absolute URL, which is printed to the console.

Now that you have a quick overview let’s dive into the specific problem more deeply and discuss various methods to solve this easily and effectively. ๐Ÿ‘‡

Problem Formulation

Problem: How do you extract all the absolute URLs from an HTML page?

Example: Consider the following webpage which has numerous links:

Now, when you try to scrape the links as highlighted above, you find that only the relative links/paths are extracted instead of the entire absolute path. Let us have a look at the code given below, which demonstrates what happens when you try to extract the 'href' elements normally.

from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): print(url['href'])

Output:

/
/about
/blog
/finxter
/

The above output is not what you desired. You wanted to extract the absolute paths as shown below:

https://sayonshubham.github.io/
https://sayonshubham.github.io/about
https://sayonshubham.github.io/blog
https://sayonshubham.github.io/finxter
https://sayonshubham.github.io/

Without further delay, let us go ahead and try to extract the absolute paths instead of the relative paths.

Method 1: Using urllib.parse.urljoin()

The easiest solution to our problem is to use the urllib.parse.urljoin() method.

According to the Python documentation: urllib.parse.urljoin() is used to construct a full/absolute URL by combining the โ€œbase URLโ€ with another URL. The advantage of using the urljoin() is that it properly resolves the relative path, whether BASE_URL is the domain of the URL, or the absolute URL of the webpage.

from urllib.parse import urljoin URL_1 = 'http://www.example.com'
URL_2 = 'http://www.example.com/something/index.html' print(urljoin(URL_1, '/demo'))
print(urljoin(URL_2, '/demo'))

Output:

http://www.example.com/demo
http://www.example.com/demo

Now that we have an idea about urljoin, let us have a look at the following code which successfully resolves our problem and helps us to extract the complete/absolute paths from the HTML page.

Solution:

from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): print(urljoin(web_url, url.get('href')))

Output:

https://sayonshubham.github.io/
https://sayonshubham.github.io/about
https://sayonshubham.github.io/blog
https://sayonshubham.github.io/finxter
https://sayonshubham.github.io/

Method 2: Concatenate The Base URL And Relative URL Manually

Another workaround to our problem is to concatenate the base part of the URL and the relative URLs manually, just like two ordinary strings. The problem, in this case, is that manually adding the strings might lead to “one-off” errors — try to spot the extra front slash characters / below:

URL_1 = 'http://www.example.com/'
print(URL_1+'/demo') # Output --> http://www.example.com//demo

Therefore to ensure proper concatenation, you have to modify your code accordingly such that any extra character that might lead to errors is removed. Let us have a look at the following code that helps us to concatenate the base and the relative paths without the presence of any extra forward slash.

Solution:

from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): # extract the href string x = url['href'] # remove the extra forward-slash if present if x[0] == '/': print(web_url + x[1:]) else: print(web_url+x)

Output:

https://sayonshubham.github.io/
https://sayonshubham.github.io/about
https://sayonshubham.github.io/blog
https://sayonshubham.github.io/finxter
https://sayonshubham.github.io/

โš  Caution: This is not the recommended way of extracting the absolute path from a given HTML page. In situations when you have an automated script that needs to resolve a URL but at the time of writing the script you don’t know what website your script is visiting, in that case, this method won’t serve your purpose, and your go-to method would be to use urlljoin. Nevertheless, this method deserves to be mentioned because, in our case, it successfully serves the purpose and helps us to extract the absolute URLs.

Conclusion

In this article, we learned how to extract the absolute links from a given HTML page using BeautifulSoup. If you want to master the concepts of Pythons BeautifulSoup library and dive deep into the concepts along with examples and video lessons, please have a look at the following link and follow the articles one by one wherein you will find every aspect of BeautifulSoup explained in great details.

YouTube Video

๐Ÿ”— Recommended: Web Scraping With BeautifulSoup In Python

With that, we come to the end of this tutorial! Please stay tuned and subscribe for more interesting content in the future.

The post Python BS4 – How to Scrape Absolute URL Instead of Relative Path appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Python Int to String with Trailing Zeros

5/5 – (1 vote)

To add trailing zeros to a string up to a certain length in Python, convert the number to a string and use the ljust(width, '0') method. Call this method on the string, specifying the total desired width and the padding character '0'. This will append zeros to the right of the string until the specified width is achieved.

Challenge: Given an integer number. How to convert it to a string by adding trailing zeros so that the string has a fixed number of positions.

Example: For integer 42, you want to fill it up with trailing zeros to the following string with 5 characters: '42000'.

In all methods, we assume that the integer has less than 5 characters.

Method 1: string.ljust()

In Python, you can use the str.ljust() method to pad zeros (or any other character) to the right of a string. The ljust() method returns the string left-justified in a field of a given width, padded with a specified character (default is space).

Below is an example of how to use ljust() to add trailing zeros to a number:

# Integer value to be converted
i = 42 # Convert the integer to a string
s = str(i) # Use ljust to add trailing zeros, specifying the total width and the padding character ('0')
s_padded = s.ljust(5, '0') print(s_padded)
# Output: '42000'

In this example:

  • str(i) converts the integer i to a string.
  • s.ljust(5, '0') pads the string s with zeros to the right to make the total width 5 characters.

This is the most Pythonic way to accomplish this challenge.

Method 2: Format String

The second method uses the format string feature in Python 3+ called f-strings or replacement fields.

๐Ÿ’ก Info: In Python, f-strings allow for the embedding of expressions within strings by prefixing a string with the letter "f" or "F" and enclosing expressions within curly braces {}. The expressions within the curly braces in the f-string are evaluated, and their values are inserted into the resulting string. This allows for a concise and readable way to include variable values or complex expressions within string literals.

The following f-string converts an integer i to a string while adding trailing zeros to a given integer:

# Integer value to be converted
i = 42 # Convert the integer to a string and then use format to add trailing zeros
s1 = f'{str(i):<5}'
s1 = s1.replace(" ", "0") # replace spaces with zeros print(s1)
# 42000

The code f'{str(i):<5}' first converts the integer i to a string. The :<5 format specifier aligns the string to the left and pads with spaces to make the total width 5. Then we replace the padded spaces with zeros using the string.replace() function.

Method 3: List Comprehension

Many Python coders don’t quite get the f-strings and the ljust() method shown in Methods 1 and 2. If you don’t have time to learn them, you can also use a more standard way based on string concatenation and list comprehension.

# Method 3: List Comprehension
s3 = str(42)
n = len(s3)
s3 = s3 + '0' * (5-len(s3))
print(s3)
# 42000

You first convert the integer to a basic string. Then, you concatenate the integer’s string representation to the string of 0s, filled up to n=5 characters. The asterisk operator creates a string of 5-len(s3) zeros here.

Programmer Humor

“Real programmers set the universal constants at the start such that the universe evolves to contain the disk with the data they want.”xkcd

๐Ÿ”— Recommended: Python Int to String with Leading Zeros

The post Python Int to String with Trailing Zeros appeared first on Be on the Right Side of Change.

Posted on Leave a comment

How to Install xlrd in Python?

5/5 – (2 votes)

The Python xlrd library reads data and formatting information from Excel files in the historical .xls format. Note that it won’t read anything other than .xls files.

pip install xlrd

The Python xlrd library is among the top 100 Python libraries, with more than 17,375,582 downloads. This article will show you everything you need to install this in your Python environment.

๐Ÿ”— Library Link
๐Ÿ”— Other Excel Python Libraries

Alternatively, you may use any of the following commands to install xlrd, depending on your concrete environment. One is likely to work!

๐Ÿ’ก If you have only one version of Python installed:
pip install xlrd ๐Ÿ’ก If you have Python 3 (and, possibly, other versions) installed:
pip3 install xlrd ๐Ÿ’ก If you don't have PIP or it doesn't work
python -m pip install xlrd
python3 -m pip install xlrd ๐Ÿ’ก If you have Linux and you need to fix permissions (any one):
sudo pip3 install xlrd
pip3 install xlrd --user ๐Ÿ’ก If you have Linux with apt
sudo apt install xlrd ๐Ÿ’ก If you have Windows and you have set up the py alias
py -m pip install xlrd ๐Ÿ’ก If you have Anaconda
conda install -c anaconda xlrd ๐Ÿ’ก If you have Jupyter Notebook
!pip install xlrd
!pip3 install xlrd

Let’s dive into the installation guides for the different operating systems and environments!

How to Install xlrd on Windows?

  1. Type "cmd" in the search bar and hit Enter to open the command line.
  2. Type “pip install xlrd” (without quotes) in the command line and hit Enter again. This installs xlrd for your default Python installation.
  3. The previous command may not work if you have both Python versions 2 and 3 on your computer. In this case, try "pip3 install xlrd" or “python -m pip install xlrd“.
  4. Wait for the installation to terminate successfully. It is now installed on your Windows machine.

Here’s how to open the command line on a (German) Windows machine:

Open CMD in Windows

First, try the following command to install xlrd on your system:

pip install xlrd

Second, if this leads to an error message, try this command to install xlrd on your system:

pip3 install xlrd

Third, if both do not work, use the following long-form command:

python -m pip install xlrd

The difference between pip and pip3 is that pip3 is an updated version of pip for Python version 3. Depending on what’s first in the PATH variable, pip will refer to your Python 2 or Python 3 installation—and you cannot know which without checking the environment variables. To resolve this uncertainty, you can use pip3, which will always refer to your default Python 3 installation.

How to Install xlrd on Linux?

You can install xlrd on Linux in four steps:

  1. Open your Linux terminal or shell
  2. Type “pip install xlrd” (without quotes), hit Enter.
  3. If it doesn’t work, try "pip3 install xlrd" or “python -m pip install xlrd“.
  4. Wait for the installation to terminate successfully.

The package is now installed on your Linux operating system.

How to Install xlrd on macOS?

Similarly, you can install xlrd on macOS in four steps:

  1. Open your macOS terminal.
  2. Type “pip install xlrd” without quotes and hit Enter.
  3. If it doesn’t work, try "pip3 install xlrd" or “python -m pip install xlrd“.
  4. Wait for the installation to terminate successfully.

The package is now installed on your macOS.

How to Install xlrd in PyCharm?

Given a PyCharm project. How to install the xlrd library in your project within a virtual environment or globally? Hereโ€™s a solution that always works:

  • Open File > Settings > Project from the PyCharm menu.
  • Select your current project.
  • Click the Python Interpreter tab within your project tab.
  • Click the small + symbol to add a new library to the project.
  • Now type in the library to be installed, in your example "xlrd" without quotes, and click Install Package.
  • Wait for the installation to terminate and close all pop-ups.

Hereโ€™s the general package installation process as a short animated videoโ€”it works analogously for xlrd if you type in “xlrd” in the search field instead:

Make sure to select only “xlrd” because there may be other packages that are not required but also contain the same term (false positives):

How to Install xlrd in a Jupyter Notebook?

To install any package in a Jupyter notebook, you can prefix the !pip install my_package statement with the exclamation mark "!". This works for the xlrd library too:

!pip install my_package

This automatically installs the xlrd library when the cell is first executed.

How to Resolve ModuleNotFoundError: No module named ‘xlrd’?

Say you try to import the xlrd package into your Python script without installing it first:

import xlrd
# ... ModuleNotFoundError: No module named 'xlrd'

Because you haven’t installed the package, Python raises a ModuleNotFoundError: No module named 'xlrd'.

To fix the error, install the xlrd library using “pip install xlrd” or “pip3 install xlrd” in your operating system’s shell or terminal first.

See above for the different ways to install xlrd in your environment. Also check out my detailed article:

๐Ÿ”— Recommended: [Fixed] ModuleNotFoundError: No module named โ€˜xlrdโ€™

Improve Your Python Skills

If you want to keep improving your Python skills and learn about new and exciting technologies such as Blockchain development, machine learning, and data science, check out the Finxter free email academy with cheat sheets, regular tutorials, and programming puzzles.

Join us, it’s fun! ๐Ÿ™‚

โœ… Recommended: Python Excel โ€“ Basic Worksheet Operations

The post How to Install xlrd in Python? appeared first on Be on the Right Side of Change.

Posted on Leave a comment

5 Best Open-Source LLMs in 2023 (Two-Minute Guide)

5/5 – (1 vote)

Open-source research on large language models (LLMs) is crucial for democratizing this powerful technology.

Although open-source LLMs are now widely used and studied, they faced initial challenges and criticism. Early attempts at creating open-source LLMs like OPT and BLOOM had poor performance compared to closed-source models.

This led researchers to realize the need for higher-quality base models pre-trained on larger datasets with trillions (!) of tokens!

  • OPT: 180 billion tokens
  • BLOOM: 341 billion tokens
  • LLaMa: 1.4 trillion tokens
  • MPT: 1 trillion tokens
  • Falcon: 1.5 trillion tokens
  • LLaMA 2: 2 trillion tokens

However, pre-training these models is expensive and requires organizations with sufficient funding to make them freely available to the community.

This article focuses on high-performing open-source base models significantly improving the field. A great graphic of the historic context of open-source LLMs is presented on the Langchain page:

How can we determine the best of those? Easy, with Chatbot leaderboards like this on Hugginface:

At the time of writing, the best non-commercial LLM is Vicuna-33B. Of course, closed-source GPT-4 by OpenAI and Claude by Anthropic models are the best.

By the way, feel free to check out my article on Claude-2 proven to be one of the most powerful free but closed-source LLMs:

๐Ÿ”— Recommended: Claude 2 LLM Reads Ten Papers in One Prompt with Massive 200k Token Context

The introduction of LLaMA 1 and 2 was a significant step in improving the quality of open-source LLMs. LLaMA is a suite of different LLMs with sizes ranging from 7 billion to 65 billion parameters. These models strike a balance between performance and inference efficiency.

LLaMA models are pre-trained on a corpus containing over 1.4 trillion tokens of text, making it one of the largest open-source datasets available. The release of LLaMA models sparked an explosion of open-source research and development in the LLM community.

Here’s a couple of open-source LLMs that were kicked off after the release of Llama: Alpaca, Vicuna, Koala, GPT4All:

LLaMA-2, the latest release, sets a new state-of-the-art among open-source LLMs. These models are pre-trained on 2 trillion tokens of publicly available data and utilize a novel approach called Grouped Query Attention (GQA) to improve inference efficiency.

MPT, another commercially-usable open-source LLM suite, was released by MosaicML. MPT-7B and MPT-30B models gained popularity due to their performance and ability to be used in commercial applications. While these models perform slightly worse than proprietary models like GPT-based variants, they outperform other open-source models.

Falcon, an open-source alternative to proprietary models, was the first to match the quality of closed-source LLMs. Falcon-7B and Falcon-40B models are commercially licensed and perform exceptionally well. They are pre-trained on a custom-curated corpus called RefinedWeb, which contains over 5 trillion tokens of text.

You can currently try the Falcon-180B Demo here.

๐Ÿ“ˆ TLDR: Open-source LLMs include OPT, BLOOM, LLaMa, MPT, and Falcon, each pre-trained on extensive tokens. LLaMa-2 and Falcon stand out for their innovative approaches and extensive training data.

๐Ÿ‘‰ For the best open-source LLM, consider using Vicuna-33B for its superior performance among non-commercial options.

Also, make sure to check out my other article on the Finxter blog: ๐Ÿ‘‡

๐Ÿ”— Recommended: Six Best Private & Secure LLMs in 2023

The post 5 Best Open-Source LLMs in 2023 (Two-Minute Guide) appeared first on Be on the Right Side of Change.

Posted on Leave a comment

Zero to Ph.D. โ€“ How to Accomplish Any Task By Going Small First

5/5 – (1 vote)

A couple of years ago, I watched a TED talk that changed my life.ย 

I had just finished my computer science master’s degree and was starting out as a fresh Ph.D. student in the department of distributed systems…

… and I was overwhelmed.

There are many computer science students reading the Finxter blog so I hope to find a few encouraging words in this article.

Not only was I overwhelmed, but I seriously doubted my ability to finish the doctoral research program successfully.

I was so impressed by my colleagues, who were much smarter, wittier, and better coders.

So what were (some of) the things that were bothering me?

  • Reading and understanding code.
  • Reading and understanding research papers.
  • Designing algorithms.
  • Maths.
  • Presenting stuff.
  • English.
  • Writing scientifically.
  • “Selling” my approaches to my supervisors.

The list goes on and on — and I really felt like an imposter not worthy to contribute to the scientific community.

~~~

Then I watched the TED talk from a former investment banker who claimed to possess the formula to achieve anything.

YouTube Video

The formula: break the big task into a series of small tasks. Then just keep doing the small tasks (and don’t stop).

I know it sounds lame, but it really resonated with me. So I approached my problem from first principles: What must I do to finish my dissertation within four years?

  • I need to publish at least four research papers.
  • I need to submit at least ten times to top conferences — maybe even more often.
  • I need to create a 10,000-word research paper every three months or so.
  • I need to write (or edit) 300 words every day.

So my output was clear: if I just do this one thing (it’s really easy to write 300 words) — I will have enough written content for my dissertation.

Quality comes as a byproduct of massive quantity. ๐Ÿ˜‰

But to produce output, any system needs input. To brew tasty coffee, put in the right ingredients: high-quality beans and pure water. To produce better outputs, just feed the system with better inputs.

  • Question: What’s the input that helps me produce excellent 300-word written output?
  • Answer: Read papers from top conferences.

So the formula boils down to:

  • INPUT: read (at least skim over) one paper a day from a top conference in my research area.
  • OUTPUT: generate 300 words for the current paper project.

That’s it. After I developed this formula, the remaining three and a half years were simple: follow this straightforward recipe to the best of my abilities, even with serious distractions, doubts, highs, and lows.

The day before I published this article originally (in 2019), I delivered my defense. Based on my sample size of one, the system works! ๐Ÿ˜‰

So what is your BIG TASK that is overwhelming you? How can you break into a series of small outputs that guarantees your success? What is the input that helps you generate this kind of output?

๐Ÿ“ˆ Recommended: How to Overcome the Imposter Syndrome as a Doctoral Computer Science Researcher (and Thrive)

The post Zero to Ph.D. – How to Accomplish Any Task By Going Small First appeared first on Be on the Right Side of Change.