Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Python Unicode Encode Error

#1
Python Unicode Encode Error

<div><p class="has-luminous-vivid-amber-background-color has-background"><strong>Summary: </strong>The UnicodeEncodeError generally occurs while encoding a Unicode string into a certain coding. Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this <a href="https://blog.finxter.com/how-to-resolve-unboundlocalerror-on-local-variable-when-reassigned-after-the-first-use/" target="_blank" rel="noreferrer noopener">error</a> use the encode(<code>utf-8</code>) and decode(<code>utf-8</code>) functions accordingly in your code.</p>
<p>You might be using handling an application code that needs to deal with multilingual data or web content that has plenty of emojis and special symbols. In such situations, you will possibly come across numerous problems relating to Unicode data. But python has well-defined options to deal with Unicode characters and we shall be discussing them in this article.</p>
<h2>What is Unicode?</h2>
<p>Unicode is a standard that facilitates character encoding using variable bit encoding. I am sure, you must have heard of ASCII if you are into the world of computer programming. ASCII represents 128 characters while Unicode defines 2<sup>21</sup> characters. Thus, Unicode can be regarded as a superset of ASCII. If you are interested in having an in-depth look at Unicode, please follow this <a href="https://docs.python.org/3/howto/unicode.html" target="_blank" rel="noreferrer noopener">link. </a><br />Click on Unicode:- <a href="https://blog.finxter.com/wp-content/uploads/2020/09/finxPythond.png" target="_blank" rel="noreferrer noopener">U+1F40D</a><a href="https://blog.finxter.com/wp-content/uploads/2020/09/finxPythond.png"> </a>to find out what it represents! (Try it!!!<img src="https://s.w.org/images/core/emoji/12.0.0-1/72x72/1f609.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" />)</p>
<h2>What is a UnicodeEncodeError?</h2>
<p>The best way to grasp any concept is to visualize it with an example. So let us have a look at an example of the <code>UnicodeEncodeError</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">u = 'é'
print("Integer value for é: ", ord(u))
print("Converting the encoded value of é to Integer Equivalent: ", chr(233))
print("UNICODE Representation of é: ", u.encode('utf-8'))
print("ASCII Representation of é: ", u.encode('ascii'))</pre>
<p><strong>Output</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Integer value for é: 233
Converting the encoded value of é to Integer Equivalent: é
UNICODE Representation of é: b'\xc3\xa9'
Traceback (most recent call last): File "main.py", line 5, in &lt;module&gt; print("ASCII Representation of é: ",u.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128)</pre>
<p>In the above code, when we tried to encode the character <strong>é </strong>to<strong> </strong>its Unicode value we got an output but while trying to convert it to the ASCII equivalent we encountered an error. The error occurred because ASCII only allows 7-bit encoding and it cannot represent characters outside the range of [0..128]. </p>
<p>You now have an essence of what the <code>UnicodeEncodeError </code>looks like. Before discussing how we can avoid such errors, I feel that there is a dire need to discuss the following concepts:</p>
<h2>Encoding and Decoding</h2>
<p>The process of converting human-readable data into a specified format, for the secured transmission of data is known as encoding. Decoding is the opposite of encoding that is to convert the encoded information to normal text (human-readable form).</p>
<p>In Python,&nbsp;</p>
<ul>
<li>encode() is an inbuilt method used for encoding. Incase no encoding is specified, UTF-8 is used as default. </li>
<li>decode() is an inbuilt method used for decoding. </li>
</ul>
<p><strong>Example:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">u = 'Πύθωνος'
print("UNICODE Representation of é: ", u.encode('utf-8'))</pre>
<p><strong>Output:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">UNICODE Representation of é: b'\xce\xa0\xcf\x8d\xce\xb8\xcf\x89\xce\xbd\xce\xbf\xcf\x82'</pre>
<p>The following diagram should make things a little easier:</p>
<figure class="wp-block-image size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/09/encode-decode.png" alt="" class="wp-image-13352" srcset="https://blog.finxter.com/wp-content/uploads/2020/09/encode-decode.png 665w, https://blog.finxter.com/wp-content/uplo...00x156.png 300w, https://blog.finxter.com/wp-content/uplo...150x78.png 150w" sizes="(max-width: 665px) 100vw, 665px" /></figure>
<h2>Codepoint</h2>
<p>Unicode maps the codepoint to their respective characters. So, what do we mean by a codepoint? </p>
<ul>
<li>Codepoints are numerical values or integers used to represent a character. </li>
<li>The Unicode code point for é is <code>U+00E9</code> which is integer 233. When you encode a character and print it, you will generally get its hexadecimal representation as an output instead of its binary equivalent (as seen in the examples above).</li>
<li>The byte sequence of a code point is different in different encoding schemes. For eg: the byte sequence for é in <code>UTF-8</code> is <code>\xc3\xa9</code> while in <code>UTF-16</code> is \xff\xfe\xe9\x00.</li>
</ul>
<p>Please have a look at the following program to get a better grip on this concept:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">u = 'é'
print("INTEGER value for é: ", ord(u))
print("ENCODED Representation of é in UTF-8: ", u.encode('utf-8'))
print("ENCODED Representation of é in UTF-16: ", u.encode('utf-16'))</pre>
<p><strong>Output</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">INTEGER value for é: 233
ENCODED Representation of é in UTF-8: b'\xc3\xa9'
ENCODED Representation of é in UTF-16: b'\xff\xfe\xe9\x00'</pre>
<p>Now that we have an overview of Unicode and <code>UnicodeEncodeError</code>, let us discuss how we can deal with the error and avoid it in our program.</p>
<p><strong>Problem: </strong>Given a string/text to be written in a text File; how to avoid the UnicodeEncodeError and write given text in the text file.</p>
<p><strong>Example:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">f = open('demo.txt', 'w')
f.write('να έχεις μια όμορφη μέρα')
f.close()</pre>
<p><strong>Output</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Traceback (most recent call last): File "uniError.py", line 2, in &lt;module&gt; f.write('να έχεις μια όμορφη μέρα') File "C:\Users\Shubham-PC\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to &lt;undefined&gt;</pre>
<p><strong>Desired Output</strong></p>
<p><strong><img loading="lazy" width="365" height="205" src="https://lh3.googleusercontent.com/ZutNsD_Rtv-es6KRfsfB5mbzHT5TH0k2-Y6vONV8HUaf8Nl39jecGXSKROunOnVULUZehc2L5BifvkVBiswM64PrMFWvZVxgi6sgwX56PgElibWpZD1TMSJenRhbRd8zspe_qyx7"></strong></p>
<h2>Solution 1: Encode String Before Writing To File And Decode While Reading</h2>
<p>You cannot write Unicode to a file directly. This will raise an <code>UnicodeEncodeError</code>. To avoid this you must encode the Unicode <a href="https://blog.finxter.com/string-formatting-vs-format-vs-formatted-string-literal/" target="_blank" rel="noreferrer noopener">string</a> using the <code>encode()</code> function and then write it to the file as shown in the program below:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">text = u'να έχεις μια όμορφη μέρα'
# write in binary mode to avoid TypeError
f = open('demo.txt', 'wb')
f.write(text.encode('utf8'))
f.close()
f = open('demo.txt', 'rb')
print(f.read().decode('utf8'))</pre>
<p><strong>Output</strong></p>
<p><strong><img loading="lazy" width="365" height="205" src="https://lh5.googleusercontent.com/AFAQe4yYbeWvX5_uGOAKwkT3HEOSAVvmFc37uR6eJ0tewle3Gfq8oPXctJ53LfLpOwVNla6pv78Tk7KILWRCSvERgpzzebJG_yOZ7GoQCjsDtg86nnVvnu91sGJyNQuIbyAmWgL-"></strong></p>
<h2>Solution 2: Open File In utf-8</h2>
<p>If you are using Python 3 or higher, all you need to do is open the file in <code>utf-8</code>, as Unicode string handling is already standardized in Python 3.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">text = 'να έχεις μια όμορφη μέρα'
f = open('demo2.txt', 'w', encoding="utf-8")
f.write(text)
f.close()</pre>
<p><strong>Output</strong></p>
<p><strong><img loading="lazy" width="362" height="171" src="https://lh5.googleusercontent.com/BxvWjGc4iTZs1rJHfB182woKfFlFviQc-BSGGxOH7XVhB847OjystraNXrEKBse2CBDumls55oRZlcL233wZdEXs5ZFDcO7TqZ9byjqCQRZp_21kMZCdmQY0nKQWASWUVyfegh0b"></strong></p>
<h2>Solution 3: Using The Codecs <a href="https://blog.finxter.com/the-complete-python-library-guide/" target="_blank" rel="noreferrer noopener">Module</a></h2>
<p>Another approach to deal with the <code>UnicodeEncodeError</code> is using the <a href="https://docs.python.org/2/library/codecs.html#codecs.open" target="_blank" rel="noreferrer noopener">codecs </a>module. </p>
<p>Let us have a look at the following code to understand how we can use the codecs module:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import codecs f = codecs.open("demo3.txt", "w", encoding='utf-8')
f.write("να έχεις μια όμορφη μέρα")
f.close()</pre>
<p><strong>Output</strong></p>
<p><strong><img loading="lazy" width="364" height="135" src="https://lh6.googleusercontent.com/mcnV8RvtK9zp_uTvQCRF074kCOrI7cSnDsdnV9bgWrn3NbseOtGyqvYjyhCJH1ZSP0qzLwhIdsCNqZ5xGubKEVCfnuRqqyb7dbuhzx06o6ut0GRWRAQf3vpc54KUcke-cZLFyA9N"></strong></p>
<h2>Solution 4: Using Python’s unicodecsv Module</h2>
<p>If you are dealing with Unicode data and using a <a href="https://blog.finxter.com/how-to-read-a-csv-file-into-a-python-list/" target="_blank" rel="noreferrer noopener"><code>csv</code> file</a> for managing your data, then the <code>unicodecsv</code> module can be really helpful. It is an extended <a href="https://blog.finxter.com/how-to-check-your-python-version/" target="_blank" rel="noreferrer noopener">version of Python 2’s</a> <code>csv</code> module and helps the user to handle Unicode data without any hassle. </p>
<p>Since the <code>unicodecsv </code>module is not a part of Python’s standard library, you have to install it before using it. Use the following command to install this module:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">$ pip install unicodecsv</pre>
<p>Let us have a look at the following example to get a better grip on the <code>unicodecsv</code> module:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import unicodecsv as csv with open('example.csv', 'wb') as f: writer = csv.writer(f, encoding='utf-8') writer.writerow(('English', 'Japanese')) writer.writerow((u'Hello', u'こんにちは'))</pre>
<p><strong>Output</strong></p>
<p><strong><img loading="lazy" width="624" height="71" src="https://lh3.googleusercontent.com/SzzRpINyK8xFZBExZc0kR8sPv05onsAmAZVbbOkVOCxgFr4DOzcTSb5RLE4MOLVpCVx_bV068Pm91pBMCQGtwiFcaBLqb6cDRacJyQV03MVm44T288E0Ph9Hy8DJjhtetQ6iJYjZ"></strong></p>
<h2>Conclusion</h2>
<p>In this article, we discussed some of the important concepts regarding Unicode character and then went on to learn about the UnicodeEncodeError and finally discussed the methods that we can use to avoid it. I hope by the end of this article you can handle Unicode characters in your python code with ease.&nbsp;</p>
<p>Please <a href="http://blog.finxter.com/" target="_blank" rel="noreferrer noopener">subscribe</a> and<a href="https://blog.finxter.com/subscribe" target="_blank" rel="noreferrer noopener"> stay tuned</a> for more interesting articles!</p>
<h2>Where to Go From Here?</h2>
<p>Enough theory, let’s get some practice!</p>
<p>To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?</p>
<p><strong>Practice projects is how you sharpen your saw in coding!</strong></p>
<p>Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?</p>
<p>Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.</p>
<p>Join my free webinar <a rel="noreferrer noopener" href="https://blog.finxter.com/webinar-freelancer/" target="_blank">“How to Build Your High-Income Skill Python”</a> and watch how I grew my coding business online and how you can, too—from the comfort of your own home.</p>
<p><a href="https://blog.finxter.com/webinar-freelancer/" target="_blank" rel="noreferrer noopener">Join the free webinar now!</a></p>
</div>


https://www.sickgaming.net/blog/2020/09/...ode-error/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016