Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Calculating Entropy with SciPy

#1
Calculating Entropy with SciPy

<div><p><strong>Problem</strong>: How to calculate the entropy with the SciPy library?</p>
<p><strong>Solution</strong>: Import the <code>entropy()</code> function from the <code>scipy.stats</code> module and pass the probability and the base of the logarithm into it.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">
from scipy.stats import entropy p = [0.5, 0.25, 0.125, 0.125]
e = entropy(p, base=2) print(e)
# 1.75</pre>
<p><strong>Try It Yourself</strong>: Run this code in the interactive code shell! </p>
<p> <iframe height="400px" width="100%" src="https://repl.it/@finxter/ViciousStudiousOpengl?lite=true" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe> </p>
<p><em><strong>Exercise</strong>: Change the probabilities. How does the entropy change?</em></p>
<p>Let’s start slowly! You’ll going to learn the most relevant background about entropy next.</p>
<h2>Entropy Introduction</h2>
<p>In thermodynamics, entropy is explained as a <strong>state of uncertainty</strong> or randomness.</p>
<p>In statistics, we borrow this concept as it easily applies to calculating probabilities.</p>
<p>When we calculate <strong>statistical entropy</strong>, we are quantifying the amount of information in an event, variable, or distribution. Understanding this measurement is useful in <a href="https://blog.finxter.com/cheat-sheet-6-pillar-machine-learning-algorithms/" target="_blank" rel="noreferrer noopener" title="[Cheat Sheet] 6 Pillar Machine Learning Algorithms">machine learning</a> in many cases, such as building<a href="https://blog.finxter.com/decision-tree-learning-in-one-line-python/" target="_blank" rel="noreferrer noopener" title="Decision Tree Learning in One Line Python"> decision trees</a> or choosing the best <a href="https://blog.finxter.com/random-forest-classifier-made-simple/" target="_blank" rel="noreferrer noopener" title="Random Forest Classifier Made Simple">classifier </a>model.</p>
<p>We will discuss the applications of entropy later in this article, but first we will dig into the theory of entropy and how to calculate it with the use of SciPy.</p>
<h2>Calculating the Entropy</h2>
<p>Calculating the information of a variable was developed by <strong>Claude Shannon</strong>, whose approach answers the question, how many “yes” or “no” questions would you expect to ask to get the correct answer?</p>
<p>Consider flipping a coin. Assuming the coin is fair, you have 1 in 2 chance of predicting the outcome. You would guess either heads or tails, and whether you are correct or incorrect, you need just one question to determine the outcome.</p>
<p>Now, say we have a bag with four equally sized disks, but each is a different color:</p>
<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img loading="lazy" src="https://blog.finxter.com/wp-content/uploads/2020/09/image-42.png" alt="" class="wp-image-13154" width="581" height="113" srcset="https://blog.finxter.com/wp-content/uploads/2020/09/image-42.png 775w, https://blog.finxter.com/wp-content/uplo...300x58.png 300w, https://blog.finxter.com/wp-content/uplo...68x149.png 768w, https://blog.finxter.com/wp-content/uplo...150x29.png 150w" sizes="(max-width: 581px) 100vw, 581px" /></figure>
</div>
<p>To guess which disk has been drawn from the bag, one of the better strategies is to eliminate half of the colors. For example, start by asking if it is Blue or Red. If the answer is yes, then only one more question is required since the answer must be Blue or Red. If the answer is no, then you can assume it is Green or Gray, so only one more question is needed to correctly predict the outcome, bringing our total to two questions regardless if the answer to our question is Green of Gray.</p>
<p>We can see that when an event is less likely to occur, choosing 1 in 4 compared to 1 in 2, there is more information to learn, i.e., two questions needed versus one.</p>
<p>Shannon wrote his calculation this way:</p>
<pre class="wp-block-preformatted">Information(x) = -log(p(x))</pre>
<p>In this formula <code>log()</code> is a base-2 algorithm (because the result is either true or false), and <code>p(x)</code> is the probability of <code>x</code>.</p>
<p>As the higher the information value grows, the less predictable the outcome becomes.</p>
<p>When a probability is certain (e.g., a two-headed coin flip coming up heads), the probability is 1.0, which yields an information calculation of 0.</p>
<p>We can run Shannon’s calculation in python using the <code>math</code> library shown here:</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/09/image-43.png" alt="" class="wp-image-13156" srcset="https://blog.finxter.com/wp-content/uploads/2020/09/image-43.png 522w, https://blog.finxter.com/wp-content/uplo...00x137.png 300w, https://blog.finxter.com/wp-content/uplo...150x68.png 150w" sizes="(max-width: 522px) 100vw, 522px" /></figure>
</div>
<p>When we change the probability to 0.25, as in the case of choosing the correct color of the disk, we get this result:</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/09/image-44.png" alt="" class="wp-image-13158" srcset="https://blog.finxter.com/wp-content/uploads/2020/09/image-44.png 538w, https://blog.finxter.com/wp-content/uplo...00x128.png 300w, https://blog.finxter.com/wp-content/uplo...150x64.png 150w" sizes="(max-width: 538px) 100vw, 538px" /></figure>
</div>
<p>While it appears that the increase in information is linear, what happens when we calculate the roll of a single die, or ask someone to guess a number between 1 and 10? Here is a visual of the information calculations for a list of probabilities from less certain (<code>p = 0.1</code>) to certain (<code>p = 1.0</code>):</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/09/image-45.png" alt="" class="wp-image-13160" srcset="https://blog.finxter.com/wp-content/uploads/2020/09/image-45.png 624w, https://blog.finxter.com/wp-content/uplo...00x250.png 300w, https://blog.finxter.com/wp-content/uplo...50x125.png 150w" sizes="(max-width: 624px) 100vw, 624px" /></figure>
</div>
<p>The graph shows that with greater uncertainty, the information growth is sub-linear, not linear.</p>
<p><strong>Unequal Probabilities</strong></p>
<p>Going back to the colored disks example, what if we now have 8 disks in the bag, and they are not equally distributed? Look at this breakout by color:</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><strong>Color</strong></td>
<td><strong>Quantity</strong></td>
</tr>
<tr>
<td>Blue</td>
<td>1</td>
</tr>
<tr>
<td>Green</td>
<td>1</td>
</tr>
<tr>
<td>Red</td>
<td>2</td>
</tr>
<tr>
<td>Gray</td>
<td>4</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td><strong>8</strong></td>
</tr>
</tbody>
</table>
</figure>
<p>If we use the original strategy of eliminating half of the colors by asking if the disk Blue or Green, we become less efficient since there is a combined 0.25 probability of either color being correct in this scenario.</p>
<p>We know that gray has the highest probability. Using a slightly different strategy, we first ask if Gray is correct (1 question), then move on to the next highest probability, Red (2<sup>nd</sup> question), and then to check if it is Blue or Green (3<sup>rd</sup> question).</p>
<p>In this new scenario, weighting our guesses will lead to less information required. The tables below show the comparison of the two methods. The info column is the product of the Probability and Questions columns.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><strong>Equal Guesses</strong></td>
</tr>
<tr>
<td><strong>Color</strong></td>
<td><strong>Prob</strong></td>
<td><strong>Q’s</strong></td>
<td><strong>Info</strong></td>
</tr>
<tr>
<td>Blue</td>
<td>0.25</td>
<td>2</td>
<td>0.50</td>
</tr>
<tr>
<td>Green</td>
<td>0.25</td>
<td>2</td>
<td>0.50</td>
</tr>
<tr>
<td>Red</td>
<td>0.25</td>
<td>2</td>
<td>0.50</td>
</tr>
<tr>
<td>Gray</td>
<td>0.25</td>
<td>2</td>
<td>0.50</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td>1</td>
<td>8</td>
<td>2.00</td>
</tr>
</tbody>
</table>
</figure>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><strong>Weighted Guesses</strong></td>
</tr>
<tr>
<td><strong>Color</strong></td>
<td><strong>Prob</strong></td>
<td><strong>Q’s</strong></td>
<td><strong>Info</strong></td>
</tr>
<tr>
<td>Blue</td>
<td>0.125</td>
<td>3</td>
<td>0.375</td>
</tr>
<tr>
<td>Green</td>
<td>0.125</td>
<td>3</td>
<td>0.375</td>
</tr>
<tr>
<td>Red</td>
<td>0.25</td>
<td>2</td>
<td>0.50</td>
</tr>
<tr>
<td>Gray</td>
<td>0.5</td>
<td>1</td>
<td>0.50</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td>1</td>
<td>9</td>
<td>1.75</td>
</tr>
</tbody>
</table>
</figure>
<p>The Equal guess method takes an average of 2 questions, but the weighted guess method takes an average of 1.75.</p>
<p>We can use the Scipy library to perform the entropy calculation. Scipy’s “stats” sub-library has an entropy calculation that we can use. Here is the code to calculate the entropy for the scenario where the four disks have different probabilities:</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/09/image-46.png" alt="" class="wp-image-13161" srcset="https://blog.finxter.com/wp-content/uploads/2020/09/image-46.png 624w, https://blog.finxter.com/wp-content/uplo...00x124.png 300w, https://blog.finxter.com/wp-content/uplo...150x62.png 150w" sizes="(max-width: 624px) 100vw, 624px" /></figure>
</div>
<p>The entropy method takes two entries: the list of probabilities and your base. Base=2 is the choice here since we are using a binary log for the calculation.</p>
<p>We get the same result as in the table shown above. With minimal code, the Scipy library allows us to quickly calculate Shannon’s entropy.</p>
<h2>Further Uses</h2>
<p>Entropy calculation is successfully used in real-world application in Machine Learning. Here are some examples.</p>
<h3>Decision Trees</h3>
<p>A Decision Tree is based on a set of binary decisions (True or False, Yes or No). It is constructed with a series of nodes where each node is question: Does color == blue? Is the test score &gt; 90? Each node splits into two and decomposes into smaller and smaller subsets as you move through the tree.</p>
<p>&nbsp;Accuracy with your Decision Tree is maximized by reducing your loss. Using entropy as your loss function is a good choice here. At each step moving through the branches, entropy is calculated before and after each step. If the entropy decreases, the step is validated. Otherwise you must try another branch.</p>
<h3>Classification with Logistic Regression</h3>
<p>The key to a logistic regression is minimizing the loss or error for the best model fit. Entropy is the standard loss function for logistic regression and neural networks.</p>
<h3>Code Sample</h3>
<p>While there are several choices for using entropy as your loss function in machine learning, here is a snippet of code to show how the selection is made during model compilation:</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/09/image-47.png" alt="" class="wp-image-13163" srcset="https://blog.finxter.com/wp-content/uploads/2020/09/image-47.png 409w, https://blog.finxter.com/wp-content/uplo...300x49.png 300w, https://blog.finxter.com/wp-content/uplo...150x25.png 150w" sizes="(max-width: 409px) 100vw, 409px" /></figure>
</div>
<h2>Conclusion</h2>
<p>The purpose of this article was to shed some light on the use of entropy with Machine Learning and how it can be calculated with Python.</p>
</div>


https://www.sickgaming.net/blog/2020/09/...ith-scipy/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016