Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Matplotlib Boxplot – A Helpful Illustrated Guide

#1
Matplotlib Boxplot – A Helpful Illustrated Guide

<div><p><em>Do you want to plot numerical data? And do it in a beautiful, engaging, and scientifically sound way? And do all of this in a few simple lines of code? You’re in the right place!</em></p>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio">
<div class="wp-block-embed__wrapper">
<div class="ast-oembed-container"><iframe title="boxplot video" width="1333" height="1000" src="https://www.youtube.com/embed/6sChkHAjBvg?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div>
</div>
</figure>
<p>A great way to plot numerical data is the matplotlib<strong> boxplot</strong>. It displays the median, the interquartile range, and outliers of the data.</p>
<p><strong>How can you visualize your data with the boxplot?</strong></p>
<ol>
<li><strong>Get that data into an array-like object – list, <a href="https://blog.finxter.com/numpy-tutorial/" target="_blank" rel="noreferrer noopener">NumPy array</a>, pandas series, etc.</strong></li>
<li><strong>Pass it to <code>plt.boxplot()</code>.</strong></li>
<li><strong>Call <code>plt.show()</code>.</strong></li>
</ol>
<p><strong>As a result, matplotlib will draw a lovely boxplot for you.</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import matplotlib.pyplot as plt plt.boxplot(data)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img0.png" alt=""/></figure>
<p>The boxplot clearly shows the median of the data (orange line), the upper and lower quartiles (top and bottom parts of the box) and outliers (the circles at the top and/or bottom of the ‘whiskers’ of the plot).</p>
<p>There are quite a few things we can do to improve this plot – we don’t even know what the data represents! – so let’s dive into a more detailed example.</p>
<p><strong>Try It Yourself</strong>:</p>
<p>You can play with a simple example here in our interactive Python shell online. The resulting plot will be stored in a .png file in the online project (just click on “files”Wink:</p>
<p> <iframe height="400px" width="100%" src="https://repl.it/@finxter/matplotlibboxplotarticle?lite=true" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe> </p>
<h2>Matplotlib Boxplot Example</h2>
<p>The boxplot is an essential tool you should use when when exploring datasets. The <a href="https://matplotlib.org/api/_as_gen/matplotlib.pyplot.boxplot">matplotlib boxplot</a> function accepts a lot of keyword arguments and so can seem quite intimidating if you look at the docs. So, I’ll cover the most essential ones that you will use most often.</p>
<p>Boxplots show the distribution of numerical data, in particular they show if it is <a href="https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch12/5214889-eng.htm">skewed and whether there are unusual observations/outliers</a>. They are very helpful if you are dealing with a large amount of data and want to see a visual summary – in this way, they are similar to <a href="https://blog.finxter.com/matplotlib-histogram/">histograms</a>. They give you ability to compare multiple distributions at the same time because you can plot many boxplots on one Figure. This is not really possible with histograms – any more than 3 and it starts to look crowded.</p>
<p>As this is an article about how to best work with boxplots, I will not go into detail about how I generated the datasets. However, if you want to follow along, I am using Seaborn’s tips dataset and you can find more info <a href="https://seaborn.pydata.org/generated/seaborn.boxplot.html">here</a>.</p>
<p>Let’s assume you are a waiter/waitress at a restaurant and you have recorded the total bill in USD for each table you waited from Thursday – Sunday last week. You want to visualize this data to understand which days, if any, are the best to work. The total bill for all the days is stored in <code>total_bill</code> and the total bill for each day is stored in the variables <code>thu</code>, <code>fri</code>, <code>sat</code> and <code>sun</code> respectively.</p>
<p>Let’s plot total bill and add some info to the axes and a title. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.boxplot(total_bill)
plt.title('Total Bill ($) for All Days Last Week')
plt.ylabel('Total Bill ($)')
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img1.png" alt=""/></figure>
<p>This looks much better and it is now easy to understand what the boxplot is showing. We can see that the median bill for each table is about 17 USD and that the interquartile range (upper quartile – lower quartile) is from 24 – 14 = 10 USD. There are about 8 outliers where the bill was more than 40 USD and the lowest bill was about 3 USD.</p>
<h2>Matplotlib Boxplot Multiple</h2>
<p>Boxplots let you compare the distributions of different datasets. So, you will almost always want to plot more than one boxplot on a figure. To do this, pass the data you want to plot to <code>plt.boxplot()</code> as a list of lists.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Create list of lists
all_days = [thu, fri, sat, sun] # Pass to plt.boxplot()
plt.boxplot(all_days)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img2.png" alt=""/></figure>
<p>Here I combined all the individual datasets into a list of lists <code>all_days</code> and passed that to <code>plt.boxplot()</code>. Matplotlib automatically places the four boxplots a nice distance apart but does not label the x-axis for us. Let’s do that now.</p>
<h2>Matplotlib Boxplot Labels</h2>
<p>To label each boxplot, pass a list of strings to the <code>labels</code> keyword argument. If you have several labels, I recommend you create this first before passing it to <code>plt.boxplot()</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Create data and labels first
all_days = [thu, fri, sat, sun]
labels = ['Thu', 'Fri', 'Sat', 'Sun'] # Plot data and labels
plt.boxplot(all_days, labels=labels)
plt.ylabel('Total Bill ($)')
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img3.png" alt=""/></figure>
<p>Great, now we can see that each boxplot represents the total bill for each day of the week and which day is which.</p>
<p>Make sure your list of labels is the same length as the number of boxplots and that you pass them in the order you want them to appear. If you don’t want to label a particular boxplot, pass an empty string <code>''</code>. Finally, you can also pass ints and floats if you desire. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">all_days = [thu, fri, sat, sun] # Second label is an empty string, fourth is a float
labels = ['Thu', '', 'Sat', 999.9] plt.boxplot(all_days, labels=labels)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img4.png" alt=""/></figure>
<p>Your boxplots look much better now but the matplotlib default settings are quite boring. It’s important to make your visualizations engaging and one of the best ways to do this is to add some color.</p>
<h2>Matplotlib Boxplot Fill Color</h2>
<p>To just fill the color of the box, you first need to set <code>patch_artist=True</code>. Why is this?</p>
<p>Under the hood, <code>plt.boxplot()</code> returns a dictionary containing each part of the boxplot and these parts are <code>Line2D</code> objects. However, by definition, these do not have an <code>edgecolor</code> or <code>facecolor</code> – lines just have one color.</p>
<p>To color inside the box, you must turn it into a <a href="https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.patches.Patch.html"><code>Patch</code> object</a> which, by definition, has a <code>facecolor</code>.</p>
<p>To modify the box, use the <code>boxprops</code> (box properties) keyword argument. It accepts a dictionary and the key-value pair you need is <code>'facecolor'</code> plus a color. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Turn box into a Patch so that it has a facecolor property
plt.boxplot(total_bill, patch_artist=True, # Set facecolor to red boxprops=dict(facecolor='r'))
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img5.png" alt=""/></figure>
<p>Note that if you don’t set <code>patch_artist=True</code>, you will get an error.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Not setting patch_artist=True gives an error
plt.boxplot(total_bill, # Set facecolor to red boxprops=dict(facecolor='r'))
plt.show()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
&lt;ipython-input-97-d28bb5a14c71> in &lt;module> 2 plt.boxplot(total_bill, 3 # Set facecolor to red
----> 4 boxprops=dict(facecolor='r')) 5 plt.show() AttributeError: 'Line2D' object has no property 'facecolor'</pre>
<p>If you also want to change the color of the line surrounding the box, pass the additional argument <code>color=c</code> for some color <code>c</code> to <code>boxprops</code>. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Turn box into a Patch so that it has a facecolor property
plt.boxplot(total_bill, patch_artist=True, # Set facecolor and surrounding line to red boxprops=dict(facecolor='r', color='r'))
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img6.png" alt=""/></figure>
<p>Perfect, now you know how to change the box’s color, let’s look at changing the other parts.</p>
<h2>Matplotlib Boxplot Color</h2>
<p>You can change any part of a boxplot to any color you want.</p>
<p>There are a 6 parts you can color:</p>
<ol>
<li>box – the main body of the boxplot</li>
<li>median – horizontal line illustrating the median of the distribution</li>
<li>whiskers – vertical lines extending to the most extreme (non-outlier) data points</li>
<li>caps – horizontal lines at the ends of the whiskers</li>
<li>fliers – points above/below the caps representing outliers</li>
<li>mean – horizontal line illustrating the mean of the distributions (by default not included)</li>
</ol>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img7.png" alt="" width="371" height="251"/></figure>
</div>
<p>In the above image, I’ve labelled the first 5 parts but have not included the mean as it is not often used with boxplots.</p>
<p>Each of the parts can be modified by a <code>&lt;part&gt;props</code> keyword argument, similar to the <code>boxprops</code> one above.</p>
<p>The available keyword arguments are:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">boxprops, medianprops, whisperprops, capprops, flierprops, meanprops
</pre>
<p>For example, write this to set the color of the median line to red</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">medianprops=dict(color='red')
</pre>
<p>They all accept the <code>color</code> keyword argument and the value can be any matplotlib color string. The only different one is <code>flierprops</code> which also accepts <code>markeredgecolor</code> to color the line around the outliers.</p>
<p>Finally, remember to set <code>patch_artist=True</code> if you want to change the fill color of the box.</p>
<p>Let’s look at an example where I turn the entire boxplot red. Since there are so many keyword arguments to pass, I will first create a dictionary and use the <a href="https://blog.finxter.com/what-is-asterisk-in-python/"><code>**</code> operator</a> to unpack it in my <code>plt.boxplot()</code> call. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Set color to red
c = 'r' # Create dictionary of keyword aruments to pass to plt.boxplot
red_dict = {'patch_artist': True, 'boxprops': dict(color=c, facecolor=c), 'capprops': dict(color=c), 'flierprops': dict(color=c, markeredgecolor=c), 'medianprops': dict(color=c), 'whiskerprops': dict(color=c)} # Pass dictionary to boxplot using ** operator to unpack it
plt.boxplot(total_bill, **red_dict)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img8.png" alt=""/></figure>
<p>First I created a variable <code>c</code> to hold the color string in. This means that if I want to change the color to green, I only have to change one line of code – <code>c = 'g'</code> – and it will change the color everywhere.</p>
<p>Then I created <code>red_dict</code> where the key-value pairs is a string and dictionary. The first key is <code>patch_artists=True</code> and the other keys are the <code>&lt;part&gt;props</code> keyword argument. Finally, I created a boxplot of <code>total_bill</code> and colored it red by unpacking <code>red_dict</code> with the <code>**</code> operator.</p>
<p>If you want to brush up on your dictionary knowledge, check out my article <a href="https://blog.finxter.com/python-dictionary/">the ultimate guide to dictionaries</a>.<img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img9.png" width="400"> </p>
<p>The red plot is much more engaging than the standard matplotlib colors. But, because the median line was the same color as everything else, you lost some information it was showing. One way to rectify this is to set to median line to black with<code>'medianprops': dict(color='k')</code> in <code>red_dict</code>. The result is shown above.</p>
<h2>Matplotlib Boxplot Width</h2>
<p>To change the width of a boxplot, pass a float to to the <code>widths</code> keyword argument in <code>plt.boxplot()</code>. It represents the fraction of space the box takes up on the figure.</p>
<p>If you have one boxplot, the scalar represents the percentage of the plot the box takes up. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.boxplot(total_bill, widths=1)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img10.png" alt=""/></figure>
<p>Here the box takes up 100% of the width as <code>widths=1</code>. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.boxplot(total_bill, widths=0.1)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img11.png" alt=""/></figure>
<p>Here the box only takes up 10% of the space as <code>widths=0.1</code>.</p>
<p>If you plot multiple boxplots on the same figure and pass a float to <code>widths</code>, all boxes will be resized to take up that fraction of space in their area of the plot. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Boxes take up 100% of their allocated space
plt.boxplot(all_days, widths=1)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img12.png" alt=""/></figure>
<p>Here each boxplot takes up 100% of the space allocated as <code>widths=1</code>. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Boxes take up 80% of their allocated space
plt.boxplot(all_days, widths=0.8)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img13.png" alt=""/></figure>
<p>Here each boxplot takes up 80% of the space allocated to them as <code>widths=0.8</code>.</p>
<p>You can set the width of each boxplot individually by passing a <a href="https://blog.finxter.com/python-list-methods/" target="_blank" rel="noreferrer noopener">list</a> to <code>widths</code> instead of a scalar. In [83]:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.boxplot(all_days, widths=[0.1, 0.9, 0.5, 0.8], labels=['10%', '90%', '50%', '80%'])
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img14.png" alt=""/></figure>
<p>Here I have labelled the amount of horizontal space each box takes up. Although it is possible to do this, I do not recommend it. It adds another dimension to your boxplot but isn’t showing any new information. I personally think that <code>widths=0.8</code> looks best, but you are free to choose any size you want. Just make sure that your boxplots are the same width so as not to confuse your reader.</p>
<h2>Matplotlib Boxplot Horizontal</h2>
<p>To create a horizonal boxplot in matplotlib, set the <code>vert</code> keyword argument to <code>False</code>. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.boxplot(total_bill, vert=False)
plt.show()
</pre>
<figure class="wp-block-image"><img src="https://raw.githubusercontent.com/theadammurphy/matplotlib_articles/master/boxplot/final_html/img/img15.png" alt=""/></figure>
<h2>Conclusion</h2>
<p>That’s it, you now know all the basics of boxplots in matplotlib!</p>
<p>You’ve learned how to plot single and multiple boxplots on one figure. You can label them whatever you want and change the color of any of the 6 parts to anything you can imagine. Finally, you’ve learned to customize the width of your plots and plot horizontal ones as well.</p>
<p>There is still more to be learned about boxplots such as changing the outlier marker, adding legends, sorting them by groups and even working with them and the pandas library. But I’ll leave that for another article.</p>
<h2>Where To Go From Here?</h2>
<p>Do you wish you could be a programmer full-time but don’t know how to start?</p>
<p>Check out the pure value-packed webinar where Chris – creator of Finxter.com – teaches you to become a Python freelancer in 60 days or your money back!</p>
<p><a href="https://tinyurl.com/become-a-python-freelancer" target="_blank" rel="noreferrer noopener">https://tinyurl.com/become-a-python-freelancer</a></p>
<p>It doesn’t matter if you’re a Python novice or Python pro. If you are not making six figures/year with Python right now, you will learn something from this webinar.</p>
<p>These are proven, no-BS methods that get you results fast.</p>
<p>This webinar won’t be online forever. Click the link below before the seats fill up and learn how to become a Python freelancer, guaranteed.</p>
<p><a href="https://tinyurl.com/become-a-python-freelancer" target="_blank" rel="noreferrer noopener">https://tinyurl.com/become-a-python-freelancer</a></p>
</div>


https://www.sickgaming.net/blog/2020/03/...ted-guide/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016