Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas?

#1
How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas?

<div><p>Want to calculate the standard deviation of a column in your <a rel="noreferrer noopener" href="https://pandas.pydata.org/" target="_blank">Pandas </a>DataFrame?</p>
<p>In case you’ve attended your last statistics course a few years ago, let’s quickly recap the <strong>definition of variance</strong>: it’s the <em>average squared deviation of the list elements from the average value.</em></p>
<figure class="wp-block-image size-large is-resized"><img src="https://blog.finxter.com/wp-content/uploads/2020/04/image.png" alt="" class="wp-image-7490" width="185" height="66" srcset="https://blog.finxter.com/wp-content/uploads/2020/04/image.png 305w, https://blog.finxter.com/wp-content/uplo...00x106.png 300w" sizes="(max-width: 185px) 100vw, 185px" /></figure>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/04/image-1.png" alt="" class="wp-image-7491"/></figure>
</div>
<p><strong>You can do this by using the <code>pd.std()</code> function that calculates the standard deviation along all columns. You can then get the column you’re interested in after the computation.</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd # Create your Pandas DataFrame
d = {'username': ['Alice', 'Bob', 'Carl'], 'age': [18, 22, 43], 'income': [100000, 98000, 111000]}
df = pd.DataFrame(d) print(df)</pre>
<p>Your DataFrame looks like this:</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td></td>
<td>username</td>
<td>age</td>
<td>income</td>
</tr>
<tr>
<td>0</td>
<td>Alice</td>
<td>18</td>
<td>100000</td>
</tr>
<tr>
<td>1</td>
<td>Bob</td>
<td>22</td>
<td>98000</td>
</tr>
<tr>
<td>2</td>
<td>Carl</td>
<td>43</td>
<td>111000</td>
</tr>
</tbody>
</table>
</figure>
<p>Here’s how you can calculate the standard deviation of all columns:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">print(df.std())</pre>
<p>The output is the standard deviation of all columns:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">age 13.428825
income 7000.000000
dtype: float64</pre>
<p>To get the variance of an individual column, access it using simple indexing:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">print(df.std()['age'])
# 180.33333333333334</pre>
<p>Together, the code looks as follows. Use the interactive shell to play with it!</p>
<p> <iframe src="https://repl.it/@finxter/pandasstddev?lite=true" scrolling="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals" width="100%" height="700px" frameborder="no"></iframe> </p>
<h2>Standard Deviation in NumPy Library</h2>
<p>Python’s package for data science computation <a rel="noreferrer noopener" href="https://blog.finxter.com/numpy-tutorial/" target="_blank">NumPy</a> also has great statistics functionality. You can calculate all basic statistics functions such as <a rel="noreferrer noopener" href="https://blog.finxter.com/python-list-average/" target="_blank">average</a>, median, <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-calculate-variance-numpy-array/" target="_blank">variance</a>, and <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-calculate-column-standard-deviation-2d-numpy-array/" target="_blank">standard deviation</a> on NumPy arrays. Simply import the NumPy library and use the <code>np.var(a)</code> method to calculate the average value of NumPy array <code>a</code>.</p>
<p>Here’s the code:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import numpy as np a = np.array([1, 2, 3])
print(np.std(a))
# 0.816496580927726
</pre>
<h2>Where to Go From Here?</h2>
<p>Before you can become a data science master, you first need to master Python. <a rel="noreferrer noopener" href="https://blog.finxter.com/subscribe/" target="_blank">Join my free Python email course </a>and receive your daily Python lesson directly in your INBOX. It’s fun!</p>
<p><a rel="noreferrer noopener" href="https://blog.finxter.com/subscribe/" target="_blank">Join The World’s #1 Python Email Academy [+FREE Cheat Sheets as PDF]</a></p>
</div>


https://www.sickgaming.net/blog/2020/04/...on-pandas/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016