Sick Gaming
[Tut] How to Read an XLS File in Python? - Printable Version

+- Sick Gaming (https://www.sickgaming.net)
+-- Forum: Programming (https://www.sickgaming.net/forum-76.html)
+--- Forum: Python (https://www.sickgaming.net/forum-83.html)
+--- Thread: [Tut] How to Read an XLS File in Python? (/thread-99288.html)



[Tut] How to Read an XLS File in Python? - xSicKxBot - 04-26-2022

How to Read an XLS File in Python?

<div><h2 class="wp-embed-aspect-16-9 wp-has-aspect-ratio">Problem Formulation and Solution Overview</h2>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">In this article, you’ll learn how to read an XML file and format the output in Python. </p>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">To make it more fun, we have the following running scenario:</p>
<p><em>Arman, a Music Appreciation student at the <a href="https://www.rcmusic.com/learning/oscar-peterson-school-of-music/music-appreciation" data-type="URL" data-id="https://www.rcmusic.com/learning/oscar-peterson-school-of-music/music-appreciation" target="_blank" rel="noreferrer noopener">Royal Conservatory of Music</a>, has been given course materials in an XML file format. Arman needs to purchase these books immediately. He’s into music, not computers. He needs you to format the output into a readable format.</em></p>
<p><em>Navigate to the Appendix Data</em> section<em> and download the XML file to follow along. Then, move this file to the current working directory.</em></p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4ac.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Question</strong>: How would we read in ax XML file and format the output?</p>
<p>We can accomplish this task by one of the following options:</p>
<ul>
<li><strong>Method 1</strong>: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/installing-beautiful-soup/" data-type="URL" data-id="https://blog.finxter.com/installing-beautiful-soup/" target="_blank"><em>Beautiful Soup</em></a></li>
<li><strong>Method 2</strong>: Use <a href="https://docs.python.org/3/library/xml.etree.elementtree.html" data-type="URL" data-id="https://docs.python.org/3/library/xml.etree.elementtree.html">XML eTree</a></li>
<li><strong>Method 3</strong>: Use <a rel="noreferrer noopener" href="https://docs.python.org/3/library/xml.dom.minidom.html" data-type="URL" data-id="https://docs.python.org/3/library/xml.dom.minidom.html" target="_blank">Minidom</a></li>
<li><strong>Method 4</strong>: Use <a href="https://blog.finxter.com/category/pandas-library/" data-type="URL" data-id="https://blog.finxter.com/category/pandas-library/" target="_blank" rel="noreferrer noopener">Pandas</a></li>
</ul>
<hr class="wp-block-separator"/>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">Before any data manipulation can occur, two (2) new libraries will require installation.</p>
<ul>
<li>The <em><a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" data-type="URL" data-id="https://blog.finxter.com/pandas-quickstart/" target="_blank">Pandas</a> </em>library enables access to/from a DataFrame.</li>
<li>The <a rel="noreferrer noopener" href="https://blog.finxter.com/installing-beautiful-soup/" data-type="URL" data-id="https://blog.finxter.com/installing-beautiful-soup/" target="_blank"><em>Beautiful Soup</em></a> library enables the parsing of XML and HTML files.</li>
</ul>
<p>To install these libraries, navigate to an <a rel="noreferrer noopener" href="https://blog.finxter.com/best-python-ide/" data-type="post" data-id="8106" target="_blank">IDE</a> terminal. At the command prompt (<code>$</code>), execute the code below. For the terminal used in this example, the command prompt is a dollar sign (<code>$</code>). Your terminal prompt may be different.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">$ pip install pandas</pre>
<p>Hit the <code>&lt;Enter&gt;</code> key on the keyboard to start the installation process.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">$ pip install bs4</pre>
<p>Hit the <code>&lt;Enter&gt;</code> key on the keyboard to start the installation process.</p>
<p>If the installations were successful, a message displays in the terminal indicating the same.</p>
<hr class="wp-block-separator"/>
<p>Feel free to view the PyCharm installation guide for the required libraries.</p>
<ul>
<li><a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-install-a-library-on-pycharm/" data-type="URL" data-id="https://blog.finxter.com/how-to-install-a-library-on-pycharm/" target="_blank">How to install</a><a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-install-pandas-in-python/" target="_blank"> Pandas on PyCharm</a></li>
<li><a href="https://blog.finxter.com/how-to-install-beautifulsoup-on-pycharm/" data-type="URL" data-id="https://blog.finxter.com/how-to-install-beautifulsoup-on-pycharm/" target="_blank" rel="noreferrer noopener">How to install Beautiful Soup on PyCharm</a></li>
</ul>
<hr class="wp-block-separator"/>
<p id="block-3b5c9c73-276c-4b84-a1bc-d5ba1de38d56">Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.</p>
<pre class="EnlighterJSRAW wp-embed-aspect-16-9 wp-has-aspect-ratio" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd from bs4 import BeautifulSoup
import xml.etree.ElementTree as ET
import base64, xml.dom.minidom
from xml.dom.minidom import Node</pre>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" />&nbsp;<strong>Note</strong>: The additional libraries indicated above do not require installation as they come built-in to Python.</p>
<hr class="wp-block-separator"/>
<h2>Method 1: Use Beautiful Soup</h2>
<p class="has-global-color-8-background-color has-background">A clean, compact way to read an XML file is to use Python’s <a rel="noreferrer noopener" href="https://blog.finxter.com/installing-beautiful-soup/" data-type="URL" data-id="https://blog.finxter.com/installing-beautiful-soup/" target="_blank">Beautiful Soup</a> library. A “go-to” tool for web scraping and XML data extraction.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="1" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">all_books = BeautifulSoup(open('books.xml'), 'xml')
pretty_xml = all_books.prettify()
print(pretty_xml)</pre>
<ul>
<li>The <em>books.xml</em> file is read and parsed using Beautiful Soup’s <a rel="noreferrer noopener" href="https://blog.finxter.com/parsing-xml-using-beautifulsoup-in-python/" data-type="URL" data-id="https://blog.finxter.com/parsing-xml-using-beautifulsoup-in-python/" target="_blank">XML parser</a>. The results are saved to <code>all_books</code>.</li>
<li>Next, Beautiful Soup’s <a rel="noreferrer noopener" href="https://blog.finxter.com/parsing-xml-using-beautifulsoup-in-python/" data-type="URL" data-id="https://blog.finxter.com/parsing-xml-using-beautifulsoup-in-python/" target="_blank"><code>prettify()</code></a> method is used to improve the appearance of the output.</li>
<li>Finally, the formatted output is sent to the terminal.</li>
</ul>
<p><strong>Output (snippet)</strong></p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><code>&lt;?xml version="1.0"?><br />&lt;catalog>   <br /> &lt;book>      <br /> &lt;isbn>978-0393050714&lt;/isbn><br />      &lt;title>Johann Sebastian Bach&lt;/title><br />      &lt;price>$5.99&lt;/price><br />   &lt;/book><br /> ......<br />&lt;/catalog></code></td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator"/>
<h2>Method 2: Use XML eTree</h2>
<p class="has-global-color-8-background-color has-background">The ElementTree library is built-in to Python and contains functions to read and parse XML and XML-like data structures. The hierarchical data format is based on a tree structure: a <em>root </em>representing the tree and <em>elements </em>representing the nodes. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="1" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">all_books = ET.parse('books.xml').getroot() for b in all_books.findall('book'): print ("{:&lt;20} {:&lt;30} {:&lt;30}".format(b.find('isbn').text, b.find('title').text, b.find('price').text))</pre>
<ul>
<li>The <em>books.xml</em> file is read in and parsed using the <code><a rel="noreferrer noopener" href="https://docs.python.org/3/library/xml.etree.elementtree.html" data-type="URL" data-id="https://docs.python.org/3/library/xml.etree.elementtree.html" target="_blank">eTree parse()</a></code> function. The results are saved to <code>all_books</code>.</li>
<li>Next, a <a rel="noreferrer noopener" href="https://blog.finxter.com/python-loops/" data-type="URL" data-id="https://blog.finxter.com/python-loops/" target="_blank">For</a> loop is instantiated. It traverses through each book in <code>all_books</code>.
<ul>
<li>Each book’s details are formatted into columns and output to the terminal.</li>
</ul>
</li>
</ul>
<p><strong>Output (snippet)</strong></p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td>978-0393050714</td>
<td>Johann Sebastian Bach</td>
<td>$5.99</td>
</tr>
<tr>
<td>978-1721260522</td>
<td>Ludwig van Beethoven</td>
<td>$9.99</td>
</tr>
<tr>
<td>978-0679745822</td>
<td>Johannes Brahms</td>
<td>$7.99</td>
</tr>
<tr>
<td>979-8653086533</td>
<td>Frederic Chopin</td>
<td>$7.99</td>
</tr>
<tr>
<td>978-1580469036</td>
<td>Claude Debussy</td>
<td>$13.99</td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator"/>
<h2>Method 3: Use minidom</h2>
<p class="has-global-color-8-background-color has-background">Minidom is a smaller version of DOM and comes with an API similar to other programming languages. However, feedback indicates this method is slow and a memory hogger. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="5" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">with open("books.xml",'r') as fp: data = fp.read() i = 0
all_books = xml.dom.minidom.parseString(data)
for book in all_books.getElementsByTagName('book'): isbn = all_books.getElementsByTagName('isbn')[i].firstChild.nodeValue title = all_books.getElementsByTagName('title')[i].firstChild.nodeValue price = all_books.getElementsByTagName('price')[i].firstChild.nodeValue print ("{:&lt;20} {:&lt;25} {:&lt;20}".format(isbn, title, price)) i +=1</pre>
<ul>
<li>The <em>books.xml</em> file is opened, and a file object, <code>fp</code> is created.
<ul>
<li>The contents of this file are read in and saved to <code>data</code>.</li>
</ul>
</li>
<li>A counter variable <code>i</code> is created to loop through <code>all_books</code> and is assigned the value 0.</li>
<li>Then <code>data</code> is read and parsed. The results save to <code>all_books</code>.</li>
<li>A <a rel="noreferrer noopener" href="https://blog.finxter.com/python-loops/" data-type="URL" data-id="https://blog.finxter.com/python-loops/" target="_blank">For</a> loop is instantiated. It traverses through each book in <code>all_books</code>.
<ul>
<li>Four (4) variables are used to locate and save the appropriate values.</li>
<li>They are formatted and output to the terminal in columns.</li>
<li>The counter variable is increased by one (1).</li>
</ul>
</li>
</ul>
<p><strong>Output (snippet)</strong></p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td>978-0393050714</td>
<td>Johann Sebastian Bach</td>
<td>$5.99</td>
</tr>
<tr>
<td>978-1721260522</td>
<td>Ludwig van Beethoven</td>
<td>$9.99</td>
</tr>
<tr>
<td>978-0679745822</td>
<td>Johannes Brahms</td>
<td>$7.99</td>
</tr>
<tr>
<td>979-8653086533</td>
<td>Frederic Chopin</td>
<td>$7.99</td>
</tr>
<tr>
<td>978-1580469036</td>
<td>Claude Debussy</td>
<td>$13.99</td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator"/>
<h2>Method 4: Use Pandas read_xml()</h2>
<p class="has-global-color-8-background-color has-background">The <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" data-type="URL" data-id="https://blog.finxter.com/pandas-quickstart/" target="_blank">Pandas</a> library has an option to read in an XML file and convert it to a DataFrame in one easy step.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="1" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_xml('books.xml')
print(df)</pre>
<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio">
<div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Reading and writing XML with Pandas" width="780" height="439" src="https://www.youtube.com/embed/zIzH5GfHHD8?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
</figure>
<ul>
<li>The <em>books.xml</em> file is read in and saved to the DataFrame <code>df</code>.</li>
<li>The output automatically formats into columns (including a header row) and is output to the terminal.</li>
</ul>
<p><strong>Output (snippet)</strong></p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td></td>
<td class="has-text-align-right" data-align="right">isbn</td>
<td class="has-text-align-right" data-align="right">title</td>
<td class="has-text-align-right" data-align="right">price</td>
</tr>
<tr>
<td>0</td>
<td class="has-text-align-right" data-align="right">978-0393050714</td>
<td class="has-text-align-right" data-align="right">Johann Sebastian Bach</td>
<td class="has-text-align-right" data-align="right">$5.99</td>
</tr>
<tr>
<td>1</td>
<td class="has-text-align-right" data-align="right">978-1721260522</td>
<td class="has-text-align-right" data-align="right">Ludwig van Beethoven</td>
<td class="has-text-align-right" data-align="right">$9.99</td>
</tr>
<tr>
<td>2</td>
<td class="has-text-align-right" data-align="right">978-0679745822</td>
<td class="has-text-align-right" data-align="right">Johannes Brahms</td>
<td class="has-text-align-right" data-align="right">$7.99</td>
</tr>
<tr>
<td>3</td>
<td class="has-text-align-right" data-align="right">979-8653086533</td>
<td class="has-text-align-right" data-align="right">Frederic Chopin</td>
<td class="has-text-align-right" data-align="right">$7.99</td>
</tr>
<tr>
<td>4</td>
<td class="has-text-align-right" data-align="right">978-1580469036</td>
<td class="has-text-align-right" data-align="right">Claude Debussy</td>
<td class="has-text-align-right" data-align="right">$13.99</td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator"/>
<h2>Appendix Data</h2>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">&lt;?xml version="1.0"?>
&lt;catalog>
   &lt;book>
      &lt;isbn>978-0393050714&lt;/isbn>
      &lt;title>Johann Sebastian Bach&lt;/title>
      &lt;price>$5.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-1721260522&lt;/isbn>
      &lt;title>Ludwig van Beethoven&lt;/title>
      &lt;price>$9.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-0679745822&lt;/isbn>
      &lt;title>Johannes Brahms&lt;/title>
      &lt;price>$7.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>979-8653086533&lt;/isbn>
      &lt;title>Frederic Chopin&lt;/title>
      &lt;price>$7.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-1580469036&lt;/isbn>
      &lt;title>Claude Debussy&lt;/title>
      &lt;price>$13.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-0520043176&lt;/isbn>
      &lt;title>Joseph Haydn&lt;/title>
      &lt;price>$25.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-1981659968&lt;/isbn>
      &lt;title>Wolfgang Amadeus Mozart&lt;/title>
      &lt;price>$8.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-1482379990&lt;/isbn>
      &lt;title>Franz Schubert&lt;/title>
      &lt;price>$26.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-0486257488&lt;/isbn>
      &lt;title>Robert Schumann&lt;/title>
      &lt;price>$14.99&lt;/price>
   &lt;/book>
   &lt;book>
      &lt;isbn>978-0486442723&lt;/isbn>
      &lt;title>Peter Tchaikovsky&lt;/title>
      &lt;price>$12.95&lt;/price>
   &lt;/book>
&lt;/catalog></pre>
<hr class="wp-block-separator"/>
<h2>Summary</h2>
<p>After reviewing the above methods in conjunction with Arman’s requirements, we decide that Method 4 best meets his needs.</p>
<p>Problem Solved! Happy Coding!</p>
<hr class="wp-block-separator"/>
</div>


https://www.sickgaming.net/blog/2022/04/22/how-to-read-an-xls-file-in-python/