Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How to Filter Data from an Excel File in Python with Pandas

#1
How to Filter Data from an Excel File in Python with Pandas

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload="{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;841107&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;1&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;5\/5 - (1 vote)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}">
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 142.5px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div>
</div>
<h2 class="wp-embed-aspect-16-9 wp-has-aspect-ratio">Problem Formulation and Solution Overview</h2>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">This article will show different ways to read and filter an Excel file in Python.</p>
<p>To make it more interesting, we have the following scenario:</p>
<p class="has-text-align-justify wp-embed-aspect-16-9 wp-has-aspect-ratio"><em>Sven is a Senior Coder at K-Paddles. K-Paddles manufactures Kayak Paddles made of Kevlar for the White Water Rafting Community. Sven has been asked to read an Excel file and run reports. This Excel file contains two (2) worksheets, Employees and Sales. </em></p>
<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large"><img loading="lazy" width="1024" height="550" src="https://blog.finxter.com/wp-content/uploads/2022/10/ww-rafting-1-1024x550.jpg" alt="" class="wp-image-841323" srcset="https://blog.finxter.com/wp-content/uploads/2022/10/ww-rafting-1-1024x550.jpg 1024w, https://blog.finxter.com/wp-content/uplo...00x161.jpg 300w, https://blog.finxter.com/wp-content/uplo...68x412.jpg 768w, https://blog.finxter.com/wp-content/uplo...36x824.jpg 1536w, https://blog.finxter.com/wp-content/uplo...8x1099.jpg 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>
<p>To follow along, download the <code>kp_data.xlsx</code> file and place it into the current working directory.</p>
<hr class="wp-block-separator has-alpha-channel-opacity wp-embed-aspect-16-9 wp-has-aspect-ratio"/>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4ac.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Question</strong>: How would we write code to filter an Excel file in Python?</p>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">We can accomplish this task by one of the following options:</p>
<ul type="video" class="wp-embed-aspect-16-9 wp-has-aspect-ratio">
<li><strong>Method </strong>1: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank"><code>read_excel()</code></a> and the <a rel="noreferrer noopener" href="https://blog.finxter.com/python-and-operator/" data-type="URL" data-id="https://blog.finxter.com/python-and-operator/" target="_blank"><code>&amp;</code></a> operator</li>
<li><strong>Method </strong>2: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank"><code>read_excel()</code></a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>loc[]</code></a></li>
<li><strong>Method </strong>3: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank"><code>read_excel()</code></a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>iloc[]</code></a></li>
<li><strong>Method 4</strong>: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank"><code>read_excel()</code></a>, <a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.Index.html" data-type="URL" data-id="https://pandas.pydata.org/docs/reference/api/pandas.Index.html" target="_blank"><code>index[]</code></a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>loc[]</code></a> </li>
<li><strong>Method 5</strong>: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank"><code>read_excel()</code></a> and <a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html" data-type="URL" data-id="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html" target="_blank"><code>isin()</code></a></li>
</ul>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 1: Use read_excel() and the &amp; operator</h2>
<p class="has-global-color-8-background-color has-background">This method uses the <a href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank" rel="noreferrer noopener"><code>read_excel()</code></a> function to read an XLSX file into a DataFrame and an expression to filter the results.</p>
<p>This example imports the above-noted Excel file into a DataFrame. The <code>Employees</code> worksheet is accessed, and the following filter is applied:</p>
<p class="has-contrast-color has-base-background-color has-text-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f440.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Give me the DataFrame rows for all employees who work in the Sales Department, and earn more than $55,000/annum.</p>
<p>Let’s convert this to Python code.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="5-6" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd cols = ['First', 'Last', 'Dept', 'Salary']
df_emps = pd.read_excel('kp_data.xlsx', sheet_name='Employees', usecols=cols)
df_salary = df_emps[(df_emps['Dept'] == 'Sales') &amp; (df_emps['Salary'] > 55000)] df_salary.to_excel('sales_55.xlsx', sheet_name='Sales Salaries Greater Than 55K') </pre>
<p>The first line in the above code snippet imports the <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" data-type="URL" data-id="https://blog.finxter.com/pandas-quickstart/" target="_blank">Pandas</a> library. This allows access to and manipulation of the XLSX file. Just so you know, the <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-install-openpyxl-in-python" data-type="URL" data-id="https://blog.finxter.com/how-to-install-openpyxl-in-python" target="_blank"><code>openpyxl</code></a> library must be installed before continuing.</p>
<p>The following line defines the four (4) columns to retrieve from the XLSX file and saves them to the variable <code>cols</code> as a <a href="https://blog.finxter.com/python-lists/" data-type="URL" data-id="https://blog.finxter.com/python-lists/" target="_blank" rel="noreferrer noopener">List</a>.</p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Note</strong>: Open the Excel file and review the data to follow along.</p>
<h3><span style="text-decoration: underline">Import the Excel File to Python</span></h3>
<p>On the next line in the code snippet, <a href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank" rel="noreferrer noopener"><code>read_excel()</code></a> is called and passed three (3) arguments:</p>
<ul>
<li>The name of the Excel file to import (<code>kp_data.xlsx</code>). </li>
<li>The worksheet name. The first worksheet in the Excel file is always read unless stated otherwise. For this example, our Excel file contains two (2) worksheets: <code>Employees</code> and <code>Sales</code>. The Employees worksheet can be referenced using <code>sheet_name=0</code> or <code>sheet_name='Employees'</code>. Both produce the same result.</li>
<li>The columns to retrieve from the Excel workheet (<code>usecols=cols</code>).</li>
</ul>
<p>The results save to <code>df_emps</code>.</p>
<h3><span style="text-decoration: underline">Filter the DataFrame</span></h3>
<p>The highlighted line applies a filter that references the DataFrame columns to base the filter on and the <a rel="noreferrer noopener" href="https://blog.finxter.com/python-and-operator/" data-type="URL" data-id="https://blog.finxter.com/python-and-operator/" target="_blank"><code>&amp;</code></a> operator to allow for more than one (1) filter criteria.</p>
<p>In Python, this filter:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df_salary = df_emps[(df_emps['Dept'] == 'Sales') &amp; (df_emps['Salary'] > 55000)]</pre>
<p>Equates to this:</p>
<p class="has-contrast-color has-base-background-color has-text-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f440.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Give me the DataFrame rows for all employees who work in the Sales Department, and earn more than $55,000/annum.</p>
<p>These results save to <code>sales_55.xlsx</code> with a worksheet ‘<code>Sales Salaries Greater Than 55K</code>‘ and placed into the current working directory.</p>
<h3><span style="text-decoration: underline">Contents of Filtered Excel File</span></h3>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" width="832" height="266" src="https://blog.finxter.com/wp-content/uploads/2022/10/km_sales_02.png" alt="" class="wp-image-844841" srcset="https://blog.finxter.com/wp-content/uploads/2022/10/km_sales_02.png 832w, https://blog.finxter.com/wp-content/uplo...300x96.png 300w, https://blog.finxter.com/wp-content/uplo...68x246.png 768w" sizes="(max-width: 832px) 100vw, 832px" /></figure>
</div>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/how-to-filter-data-from-an-excel-file-in-python-with-pandas/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FUs_53azk4Hg%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 2: Use read_excel() and loc[]</h2>
<p class="has-global-color-8-background-color has-background">This method uses the <a href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank" rel="noreferrer noopener"><code>read_excel()</code></a> function to read an XLSX file into a DataFrame and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>loc[]</code></a> to filter the results. The <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>loc[]</code></a> function can access either a group of rows or columns based on their label names.</p>
<p>This example imports the above-noted Excel file into a DataFrame. The <code>Employees</code> worksheet is accessed, and the following filter is applied:</p>
<p class="has-contrast-color has-base-background-color has-text-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f440.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Give me the DataFrame rows for all employees who work in the IT Department, and live in the United States.</p>
<p>Let’s convert this to Python code.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="4-6, 8-13" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd
from openpyxl import load_workbook cols = ['First', 'Last', 'Dept', 'Country']
df_emps = pd.read_excel('kp_data.xlsx', sheet_name='Employees', usecols=cols)
df_it = df_emps.loc[(df_emps.Dept == 'IT') &amp; (df_emps.Country == 'United States')] book = load_workbook('kp_data.xlsx')
writer = pd.ExcelWriter('kp_data.xlsx', engine='openpyxl')
writer.book = book
df_it.to_excel(writer, sheet_name = 'IT - US')
writer.save()
writer.close()</pre>
<p>The first line in the above code snippet imports the <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" data-type="URL" data-id="https://blog.finxter.com/pandas-quickstart/" target="_blank">Pandas</a> library. This allows access to and manipulation of the XLSX file. Just so you know, the <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-install-openpyxl-in-python" data-type="URL" data-id="https://blog.finxter.com/how-to-install-openpyxl-in-python" target="_blank"><code>openpyxl</code></a> library must be installed before continuing. </p>
<p>The following line imports <a href="https://openpyxl.readthedocs.io/en/stable/" data-type="URL" data-id="https://openpyxl.readthedocs.io/en/stable/" target="_blank" rel="noreferrer noopener"><code>openpyxl</code></a>. This is required, in this case, to save the filtered results to a new worksheet in the same Excel file.</p>
<p>The following line defines the four (4) columns to retrieve from the XLSX file and saves them to the variable <code>cols</code> as a <a href="https://blog.finxter.com/python-lists/" data-type="URL" data-id="https://blog.finxter.com/python-lists/"><code>List</code></a>.</p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Note</strong>: Open the Excel file and review the data to follow along.</p>
<h3><span style="text-decoration: underline">Import the Excel File to Python</span></h3>
<p>On the next line in the code snippet, <a href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/"><code>read_excel()</code></a> is called and passed three (3) arguments:</p>
<ul>
<li>The name of the Excel file to import (<code>kp_data.xlsx</code>). </li>
<li>The worksheet name. The first worksheet in the Excel file is always read unless stated otherwise. For this example, our Excel file contains two (2) worksheets: <code>Employees</code> and <code>Sales</code>. The Employees worksheet can be referenced using <code>sheet_name=0</code> or <code>sheet_name='Employees'</code>. Both produce the same result.</li>
<li>The columns to retrieve from the Excel worksheet (<code>usecols=cols</code>).</li>
</ul>
<p>The results save to <code>df_it</code>.</p>
<h3><span style="text-decoration: underline">Filter the DataFrame</span></h3>
<p>The highlighted line applies a filter using <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>loc[]</code></a> and passes the filter to return specific rows from the DataFrame.</p>
<p>In Python, this filter:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df_it = df_emps.loc[(df_emps.Dept == 'IT') &amp; (df_emps.Country == 'United States')]</pre>
<p>Equates to this:</p>
<p class="has-contrast-color has-base-background-color has-text-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f440.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Give me the DataFrame rows for all employees who work in the IT Department, and live in the United States.</p>
<h3><span style="text-decoration: underline">Saves Results to Worksheet in Same Excel File</span></h3>
<p>In the bottom highlighted section of the above code, the Excel file is re-opened using <a rel="noreferrer noopener" href="https://blog.finxter.com/python-working-with-excel-getting-started/" data-type="URL" data-id="https://blog.finxter.com/python-working-with-excel-getting-started/" target="_blank"><code>load_workbook()</code></a>. Then, a writer object is declared, the results filtered, and written to a new worksheet, called <code>IT - US</code> and the file is saved and closed.</p>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" width="770" height="294" src="https://blog.finxter.com/wp-content/uploads/2022/10/km_wwf_04.png" alt="" class="wp-image-849366" srcset="https://blog.finxter.com/wp-content/uploads/2022/10/km_wwf_04.png 770w, https://blog.finxter.com/wp-content/uplo...00x115.png 300w, https://blog.finxter.com/wp-content/uplo...68x293.png 768w" sizes="(max-width: 770px) 100vw, 770px" /></figure>
</div>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/how-to-filter-data-from-an-excel-file-in-python-with-pandas/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FJQBOpbhxQrM%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 3: Use read_excel() and iloc[]</h2>
<p class="has-global-color-8-background-color has-background">This method uses the <a href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/" target="_blank" rel="noreferrer noopener"><code>read_excel()</code></a> function to read an XLSX file into a DataFrame and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"></a><code><a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>iloc[]</code></a></code> to filter the results. The<a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"> </a><a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"></a><code><code><a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>iloc[]</code></a></code></code> function accesses either a group of rows or columns based on their location (integer value).</p>
<p>This example imports required <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" data-type="URL" data-id="https://blog.finxter.com/pandas-quickstart/" target="_blank"><code>Pandas</code></a> library and the above-noted Excel file into a DataFrame. The <code>Sales </code>worksheet is then accessed. </p>
<p>This worksheet contains the yearly sale totals for K-Paddles paddles. These results are filtered to the first six (6) rows in the DataFrame and columns shown below.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="5" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd cols = ['Month', 'Aspire', 'Adventurer', 'Maximizer']
df_sales = pd.read_excel('kp_data.xlsx', sheet_name='Sales', usecols=cols)
df_aspire = df_sales.iloc[0:6]
print(df_aspire)</pre>
<p>The results are output to the terminal.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td></td>
<td>Month</td>
<td>Aspire</td>
<td>Adventurer </td>
<td>Maximizer</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>2500 </td>
<td>5200</td>
<td>21100</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2630 </td>
<td>5100</td>
<td>18330</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>2140</td>
<td>4550</td>
<td>22470</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>3400 </td>
<td>5870</td>
<td>22270</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>3600 </td>
<td>4560</td>
<td>20960</td>
</tr>
<tr>
<td>5</td>
<td>6</td>
<td>2760 </td>
<td>4890 </td>
<td>20140</td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 4: Use read_excel(), index[] and loc[]</h2>
<p class="has-global-color-8-background-color has-background">This method uses the <a href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/"><code>read_excel()</code></a> function to read an XLSX file into a DataFrame in conjunction with <a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.Index.html" data-type="URL" data-id="https://pandas.pydata.org/docs/reference/api/pandas.Index.html" target="_blank"><code>index[]</code></a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>loc[]</code></a> to filter the results. The <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank"><code>loc[]</code></a> function can access either a group of rows or columns based on their label names.</p>
<p>This example imports the required <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" data-type="URL" data-id="https://blog.finxter.com/pandas-quickstart/" target="_blank"><code>Pandas</code></a> library and the above-noted Excel file into a DataFrame. The <code>Sales </code>worksheet is then accessed. This worksheet contains the yearly sale totals for K-Paddles paddles. </p>
<p>These results are filtered to view the results for the Pinnacle paddle using <a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.Index.html" data-type="URL" data-id="https://pandas.pydata.org/docs/reference/api/pandas.Index.html" target="_blank"><code>index[]</code></a> and passing it a start and stop position (stop-1).</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd cols = ['Month', 'Pinnacle']
df_pinnacle = pd.read_excel('kp_data.xlsx', sheet_name='Sales', usecols=cols)
print(df_pinnacle.loc[df_pinnacle.index[0:5], ['Month', 'Pinnacle']])</pre>
<p>The results are output to the terminal.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td></td>
<td>Month</td>
<td>Pinnacle </td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1500</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>1200</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>1340</td>
</tr>
<tr>
<td>3</td>
<td>4</td>
<td>1130</td>
</tr>
<tr>
<td>4</td>
<td>5</td>
<td>1740</td>
</tr>
</tbody>
</table>
</figure>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/how-to-filter-data-from-an-excel-file-in-python-with-pandas/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2F-JKVy_HliQE%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 5: Use read_excel() and isin()</h2>
<p class="has-global-color-8-background-color has-background">This method uses the <a href="https://blog.finxter.com/pandas-read-excel/" data-type="URL" data-id="https://blog.finxter.com/pandas-read-excel/"><code>read_excel()</code></a> function to read an XLSX file into a DataFrame using <a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html" data-type="URL" data-id="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html" target="_blank"><code>isin()</code></a> to filter the results. The <a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html" data-type="URL" data-id="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html" target="_blank"><code>isin()</code></a> function filters the results down to the records that match the criteria passed as an argument.</p>
<p>This example imports required <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" data-type="URL" data-id="https://blog.finxter.com/pandas-quickstart/" target="_blank"><code>Pandas</code></a> library and the above-noted Excel file into a DataFrame. The <code>Employees </code>worksheet is then accessed. </p>
<p>These results are filtered to view the results for all employees who reside in Chicago.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd cols = ['First', 'Last', 'City']
df_emps = pd.read_excel('kp_data.xlsx', sheet_name='Employees', usecols=cols)
print(df_emps[df_emps.City.isin(['Chicago'])])</pre>
<p>The results are output to the terminal.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td></td>
<td>First</td>
<td>Last</td>
<td>City</td>
</tr>
<tr>
<td>2</td>
<td>Luna</td>
<td>Sanders</td>
<td>Chicago</td>
</tr>
<tr>
<td>3</td>
<td>Penelope</td>
<td>Jordan</td>
<td>Chicago</td>
</tr>
<tr>
<td>9</td>
<td>Madeline</td>
<td>Walker</td>
<td>Chicago</td>
</tr>
<tr>
<td>34</td>
<td>Caroline</td>
<td>Jenkins</td>
<td>Chicago</td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Summary</h2>
<p>This article has provided five (5) ways to filter data from an Excel file using Python to select the best fit for your coding requirements.</p>
<p>Good Luck &amp; Happy Coding!</p>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Programmer Humor – Blockchain</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" width="280" height="394" src="https://blog.finxter.com/wp-content/uploads/2022/07/image-31.png" alt="" class="wp-image-457795" srcset="https://blog.finxter.com/wp-content/uploads/2022/07/image-31.png 280w, https://blog.finxter.com/wp-content/uplo...13x300.png 213w" sizes="(max-width: 280px) 100vw, 280px" /><figcaption><em>“Blockchains are like grappling hooks, in that it’s extremely cool when you encounter a problem for which they’re the right solution, but it happens way too rarely in real life.”</em> <strong>source </strong> – <a href="https://imgs.xkcd.com/comics/blockchain.png" data-type="URL" data-id="https://imgs.xkcd.com/comics/blockchain.png" target="_blank" rel="noreferrer noopener">xkcd</a></figcaption></figure>
</div>
</div>


https://www.sickgaming.net/blog/2022/10/...th-pandas/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016