Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Matplotlib Scatter Plot – Simple Illustrated Guide

#1
Matplotlib Scatter Plot – Simple Illustrated Guide

<div><p>Scatter plots are a key tool in any Data Analyst’s arsenal. If you want to see the relationship between two variables, you are usually going to make a scatter plot.&nbsp;</p>
<figure class="wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler wp-embed-aspect-4-3 wp-has-aspect-ratio">
<div class="wp-block-embed__wrapper">
<div class="ast-oembed-container"><iframe title="Scatter Plot in Python Matplotlib" width="1333" height="1000" src="https://www.youtube.com/embed/R1PjbmucE8U?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div>
</p></div>
</figure>
<p>In this article, you’ll learn the basic and intermediate concepts to create stunning matplotlib scatter plots.</p>
<h2>Matplotlib Scatter Plot Example</h2>
<p>Let’s imagine you work in a restaurant. You get paid a small wage and so make most of your money through tips. You want to make as much money as possible and so want to maximize the amount of tips. In the last month, you waited 244 tables and collected data about them all.</p>
<p>We’re going to explore this data using scatter plots. We want to see if there are any relationships between the variables. If there are, we can use them to earn more in future. </p>
<ul>
<li><strong>Note</strong>: this dataset comes built-in as part of the <code>seaborn</code> library. </li>
</ul>
<p>First, let’s import the modules we’ll be using and load the dataset.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import matplotlib.pyplot as plt
import seaborn as sns # Optional step
# Seaborn's default settings look much nicer than matplotlib
sns.set() tips_df = sns.load_dataset('tips') total_bill = tips_df.total_bill.to_numpy()
tip = tips_df.tip.to_numpy()</pre>
<p>The variable <em><code>tips_df</code></em> is a pandas <a href="https://blog.finxter.com/how-to-create-a-dataframe-in-pandas/" target="_blank" rel="noreferrer noopener" title="How to Create a DataFrame in Pandas?">DataFrame</a>. Don’t worry if you don’t understand what this is just yet. The variables <code>total_bill</code> and tip are both <a href="https://blog.finxter.com/numpy-tutorial/" target="_blank" rel="noreferrer noopener" title="NumPy Tutorial – Everything You Need to Know to Get Started">NumPy arrays</a>. </p>
<p>Let’s make a scatter plot of <code>total_bill</code> against tip. It’s very easy to do in matplotlib – use the <code>plt.scatter()</code> function. First, we pass the x-axis variable, then the y-axis one. We call the former the <em><strong>independent variable</strong></em> and the latter the <em><strong>dependent variable</strong></em>. A scatter graph shows what happens to the dependent variable (<em>y</em>) when we change the independent variable (<em>x</em>). </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.scatter(total_bill, tip)
plt.show()</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/8fRRBmsXSlLAMdwgcTjMnmkj3kB8NXpXfPtXcn_0lmWCvtiWXCHDgLQCpJYLIOWTt5d2NIzTs35WBWZjTOx5AWA9DST4EfXv66wuSpooCc1MqS6RblZrswMZNfY02glNgmvo1ul5" alt=""/></figure>
<p>Nice! It looks like there is a <em><strong>positive correlation</strong></em> between a <em><code>total_bill</code></em> and <em><code>tip</code></em>. This means that as the bill increases, so does the tip. So we should try and get our customers to spend as much as possible. </p>
<h2>Matplotlib Scatter Plot with Labels</h2>
<p>Labels are the text on the axes. They tell us more about the plot and is it essential you include them on every plot you make.</p>
<p>Let’s add some axis labels and a title to make our scatter plot easier to understand.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.scatter(total_bill, tip)
plt.title('Total Bill vs Tip')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()</pre>
<figure class="wp-block-image"><img src="https://lh3.googleusercontent.com/zj93CTeaL7BP6tCS2C5egmgsWZ6EX-67vFgFw2NC8UMYUzUVIpTQe0sWgUfoON-nBrNJP7gfcvhWz2XrWkF8Vf2MaCiGyG3oRlGgZG6cLIOL2jysD-u2jbPo0yCFqLSkVXoE6TIv" alt=""/></figure>
<p>Much better. To save space, we won’t include the label or title code from now on, but make sure you do.</p>
<p>This looks nice but the markers are quite large. It’s hard to see the relationship in the $10-$30 total bill range. </p>
<p>We can fix this by changing the marker size.</p>
<h2>Matplotlib Scatter Marker Size</h2>
<p>The <em><code>s</code></em> keyword argument controls the <em><strong>size</strong></em> of markers in <code>plt.scatter()</code>. It accepts a scalar or an array. </p>
<h3>Matplotlib Scatter Marker Size – Scalar</h3>
<p>In <em><code>plt.scatter()</code></em>, the default marker size is <code>s=72</code>.</p>
<p>The<a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html"> docs</a> define <em><code>s</code></em> as:</p>
<p><em><strong>    The marker size in points**2.</strong></em></p>
<p>This means that if we want a marker to have area 5, we must write <em><code>s=5**2</code></em>. </p>
<p>The other matplotlib functions do not define marker size in this way. For most of them, if you want markers with area 5, you write <em><code>s=5</code></em>. We’re not sure why <code>plt.scatter()</code> defines this differently. </p>
<p>One way to remember this syntax is that graphs are made up of square regions. Markers color certain areas of those regions. To get the area of a square region, we do <em><code>length**2</code></em>.  For more info, check out<a href="https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size"> this</a> Stack Overflow answer.</p>
<p>To set the best marker size for a scatter plot, draw it a few times with different <em><code>s</code></em> values. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Small s
plt.scatter(total_bill, tip, s=1)
plt.show()</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/a3ZRIgZrLXfFEFkJ6LhXCdfPv-otQIjB9fD53vd7wYHcUkYN5roQ5g2v2v9GM4gbvRqOTaTNP6aK5iNzUqSNQ32oGlB__Gra4XiYPw_uANODO64aXMyAvsA-T31UktF-H0_pU4m1" alt=""/></figure>
<p>A small number makes each marker small. Setting <em><code>s=1</code></em> is too small for this plot and makes it hard to read. For some plots with a lot of data, setting <code>s</code> to a very small number makes it much easier to read. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Big s
plt.scatter(total_bill, tip, s=100)
plt.show()</pre>
<figure class="wp-block-image"><img src="https://lh5.googleusercontent.com/PYXx-2K4Eb7nsgqDAyE9otkXw5eDV0MDJ1LRmgWTzAUChQlCQsGNQ7vs4VSEk4zLWtJLrhZjpnNtkHXIBBJWvHOMj82JGxSUW0tnS4t8oKZ8sQM_iSuF8zioqgvkGPCuAM2xtjcq" alt=""/></figure>
<p>Alternatively, a large number makes the markers bigger. This is too big for our plot and obscures a lot of the data.</p>
<p>We think that <code>s=20</code> strikes a nice balance for this particular plot.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Just right
plt.scatter(total_bill, tip, s=20)
plt.show()</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/hBNdm1PprOLpt9ESIkJwJOuz-9n2i0CmpI95ayP7n_MuJO4Q1Dejp87OPL5suPNir7wm5gpQmPuAkgYyGhxmOpLnoPCFfenjiMarDppauPBrlRoJ2IfeifIMVWv7BzbB2B4nU9cG" alt=""/></figure>
<p>There is still some overlap between points but it is easier to spot. And unlike for <code>s=1</code>, you don’t have to strain to see the different markers. </p>
<h3>Matplotlib Scatter Marker Size – Array</h3>
<p>If we pass an array to <em><code>s</code></em>, we set the size of each point individually. This is incredibly useful let’s use show more data on our scatter plot. We can use it to modify the size of our markers based on another variable. </p>
<p>You also recorded the size of each of table you waited. This is stored in the NumPy array <code>size_of_table</code>. It contains integers in the range 1-6, representing the number of people you served.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Select column 'size' and turn into a numpy array
size_of_table = tips_df['size'].to_numpy() # Increase marker size to make plot easier to read
size_of_table_scaled = [3*s**2 for s in size_of_table] plt.scatter(total_bill, tip, s=size_of_table_scaled)
plt.show()</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/z711Y09U401alARBkbnb0bC0QwTtjH3maqhGOoZL56g1lvzrW_DCLdb3gTO3jCOUw_dLMPdHUUP43-yIdYd3ao5y1djplJ6V_RwhWOc1NmaYiC5E8AFVExdACt1gIp42EKf5YO2R" alt="" width="593" height="431"/></figure>
<p>Not only does the tip increase when total bill increases, but serving more people leads to a bigger tip as well. This is in line with what we’d expect and it’s great our data fits our assumptions.</p>
<p>Why did we scale the <code>size_of_table</code> values before passing it to <em><code>s</code></em>? Because the change in size isn’t visible if we set <code>s=1</code>, …, <code>s=6</code> as shown below.</p>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/gm1H8z66gyycxN72ygrGy38KGSQkXLnKxhY8Umq8d0DiYljtdPRU-T7bSwu_K9_Bn6i1ydJQplEtptdAWFNY4bTPmgwdcveKhHNwswEdNgG1vMPSo2vNAkbpVEDNUb-VCBOUikXr" alt="" width="586" height="426"/></figure>
<p>So we first square each value and multiply it by 3 to make the size difference more pronounced.&nbsp;</p>
<p>We should label everything on our graphs, so let’s add a legend.</p>
<h2>Matplotlib Scatter Legend</h2>
<p>To add a legend we use the <em><code>plt.legend()</code></em> function. This is easy to use with line plots. If we draw multiple lines on one graph, we label them individually using the <em><code>label</code></em> keyword. Then, when we call <em><code>plt.legend()</code></em>, matplotlib draws a legend with an entry for each line. </p>
<p>But we have a problem. We’ve only got one set of data here. We cannot label the points individually using the <em><code>label</code></em> keyword.</p>
<p><strong>How do we solve this problem?</strong></p>
<p>We could create 6 different datasets, plot them on top of each other and give each a different size and label. But this is time-consuming and not scalable.</p>
<p>Fortunately, matplotlib has a scatter plot method we can use. It’s called the <em><code>legend_elements()</code></em> method because we want to label the different elements in our scatter plot. </p>
<p>The elements in this scatter plot are different sizes. We have 6 different sized points to represent the 6 different sized tables. So we want <code>legend_elements()</code> to split our plot into 6 sections that we can label on our legend.</p>
<p>Let’s figure out how <code>legend_elements()</code> works. First, what happens when we call it without any arguments?</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># legend_elements() is a method so we must name our scatter plot
scatter = plt.scatter(total_bill, tip, s=size_of_table_scaled) legend = scatter.legend_elements() print(legend)
# ([], [])</pre>
<p>Calling <code>legend_elements()</code> without any parameters, returns a tuple of length 2. It contains two empty lists.</p>
<p>The<a href="https://matplotlib.org/3.1.1/api/collections_api.html#matplotlib.collections.PathCollection.legend_elements"> docs</a> tell us <code>legend_elements()</code> returns the tuple <code>(handles, labels)</code>. Handles are the parts of the plot you want to label. Labels are the names that will appear in the legend. For our plot, the handles are the different sized markers and the labels are the numbers 1-6.  The <a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html" target="_blank" rel="noreferrer noopener"><code>plt.legend()</code></a> function accepts 2 arguments: handles and labels. </p>
<p>The <code>plt.legend()</code> function accepts two arguments: <code>plt.legend(handles, labels)</code>. As <code>scatter.legend_elements()</code> is a tuple of length 2, we have two options. We can either use the<a href="https://blog.finxter.com/what-is-asterisk-in-python/" target="_blank" rel="noreferrer noopener"> asterisk <code>*</code> operator</a> to unpack it or we can unpack it ourselves.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Method 1 - unpack tuple using *
legend = scatter.legend_elements()
plt.legend(*legend) # Method 2 - unpack tuple into 2 variables
handles, labels = scatter.legend_elements()
plt.legend(handles, labels)</pre>
<p>Both produce the same result. The matplotlib docs use method 1. Yet method 2 gives us more flexibility. If we don’t like the labels matplotlib creates, we can overwrite them ourselves (as we will see in a moment).&nbsp;</p>
<p>Currently, <code>handles</code> and <code>labels</code> are empty lists. Let’s change this by passing some arguments to <code>legend_elements()</code><em>.</em></p>
<p>There are<a href="https://matplotlib.org/3.1.1/api/collections_api.html#matplotlib.collections.PathCollection.legend_elements"> 4 optional arguments</a> but let’s focus on the most important one: <code>prop</code>.</p>
<p><code>Prop</code> – the <em>property</em> of the scatter graph you want to highlight in your legend. Default is <code>'colors'</code>, the other option is <code>'sizes'</code>.</p>
<p>We will look at different colored scatter plots in the next section. As our plot contains 6 different sized markers, we set <code>prop='sizes'</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">scatter = plt.scatter(total_bill, tip, s=size_of_table_scaled) handles, labels = scatter.legend_elements(prop='sizes')</pre>
<p>Now let’s look at the contents of <code>handles</code> and <code>labels</code><em>.</em></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> type(handles)
list
>>> len(handles)
6 >>> handles
[&lt;matplotlib.lines.Line2D object at 0x1a2336c650>,
&lt;matplotlib.lines.Line2D object at 0x1a2336bd90>,
&lt;matplotlib.lines.Line2D object at 0x1a2336cbd0>,
&lt;matplotlib.lines.Line2D object at 0x1a2336cc90>,
&lt;matplotlib.lines.Line2D object at 0x1a2336ce50>,
&lt;matplotlib.lines.Line2D object at 0x1a230e1150>]</pre>
<p>Handles is a list of length 6. Each element in the list is a <code>matplotlib.lines.Line2D</code> object. You don’t need to understand exactly what that is. Just know that if you pass these objects to <code>plt.legend()</code>, matplotlib renders an appropriate <code>'picture'</code>. For colored lines, it’s a short line of that color. In this case, it’s a single point and each of the 6 points will be a different size. </p>
<p>It is possible to<a href="https://matplotlib.org/tutorials/intermediate/legend_guide.html" target="_blank" rel="noreferrer noopener"> create custom handles</a> but this is out of the scope of this article. Now let’s look at <code>labels</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> type(labels)
list
>>> len(labels)
6 >>> labels
['$\\mathdefault{3}$', '$\\mathdefault{12}$', '$\\mathdefault{27}$', '$\\mathdefault{48}$', '$\\mathdefault{75}$', '$\\mathdefault{108}$']</pre>
<p>Again, we have a list of length 6. Each element is a string. Each string is written using LaTeX notation <code>'$...$'</code>. So the labels are the numbers 3, 12, 27, 48, 75 and 108. </p>
<p>Why these numbers? Because they are the unique values in the list <code>size_of_table_scaled</code><em>.</em> This list defines the marker size. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> np.unique(size_of_table_scaled)
array([ 3, 12, 27, 48, 75, 108])</pre>
<p>We used these numbers because using 1-6 is not enough of a size difference for humans to notice.&nbsp;</p>
<p>However, for our legend, we want to use the numbers 1-6 as this is the actual table size. So let’s overwrite <code>labels</code>. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">labels = ['1', '2', '3', '4', '5', '6']</pre>
<p>Note that each element must be a string.</p>
<p>We now have everything we need to create a legend. Let’s put this together.&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Increase marker size to make plot easier to read
size_of_table_scaled = [3*s**2 for s in size_of_table] # Scatter plot with marker sizes proportional to table size
scatter = plt.scatter(total_bill, tip, s=size_of_table_scaled) # Generate handles and labels using legend_elements method
handles, labels = scatter.legend_elements(prop='sizes') # Overwrite labels with the numbers 1-6 as strings
labels = ['1', '2', '3', '4', '5', '6'] # Add a title to legend with title keyword
plt.legend(handles, labels, title='Table Size')
plt.show()</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh5.googleusercontent.com/XRcuRUqU_TakWX5m6HdsKJF4S5jK4nuZAj6SuOk4FWUsOIVCZjzG5dVe7gX0_lCLWP1M_bqyEbMbc61SMVmQWlHR_TwyvFYzBlzbnWttOtehhIVLfyPezXMQrPxGrLejeq-SduaJ" alt="" width="599" height="436"/></figure>
<p>Perfect, we have a legend that shows the reader exactly what the graph represents. It is easy to understand and adds a lot of value to the plot.</p>
<p>Now let’s look at another way to represent multiple variables on our scatter plot: color.</p>
<h2>Matplotlib Scatter Plot Color</h2>
<p>Color is an incredibly important part of plotting. It could be an entire article in itself. Check out the Seaborn<a href="https://seaborn.pydata.org/tutorial/color_palettes.html" target="_blank" rel="noreferrer noopener"> docs</a> for a great overview. </p>
<p>Color can make or break your plot. Some color schemes make it ridiculously easy to understand the data. Others make it impossible.&nbsp;</p>
<p>However, one reason to change the color is purely for aesthetics.&nbsp;</p>
<p>We choose the color of points in <code>plt.scatter()</code> with the keyword <code>c</code> or <code>color</code>. </p>
<p>You can set <a href="https://matplotlib.org/3.1.0/tutorials/colors/colors.html#xkcd-colors" target="_blank" rel="noreferrer noopener">any color you want</a> using an RGB or RGBA tuple (red, green, blue, alpha). Each element of these tuples is a float in <code>[0.0, 1.0]</code>. You can also pass a hex RGB or RGBA string such as <code>'#1f1f1f'</code>. However, most of the time you’ll use one of the 50+ built-in<a href="https://matplotlib.org/3.1.0/gallery/color/named_colors.html" target="_blank" rel="noreferrer noopener"> named colors</a>. The most common are:</p>
<ul>
<li><code>'b'</code> or <code>'blue'</code></li>
<li><code>'r'</code> or <code>'red'</code></li>
<li><code>'g'</code> or <code>'green'</code></li>
<li><code>'k'</code> or <code>'black'</code></li>
<li><code>'w'</code> or <code>'white'</code></li>
</ul>
<p>Here’s the plot of <code>total_bill</code> vs <code>tip</code> using different colors</p>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh6.googleusercontent.com/hr89-gpGxu9B35dN3lxH60mDv-atJO_Ov7QKham16bNZXeyifADK4Pni3rfW1p6xEAXwDb5oMNRZEIKbxkcGECtPjGIubBAYm4KglfSP4qwdibSYZ9WOJqPBqxURhWx4o3BlHH5N" alt="" width="639" height="543"/></figure>
<p>For each plot, call <code>plt.scatter()</code> with <code>total_bill</code> and tip and set <code>color</code> (or <code>c</code>) to your choice</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Blue (the default value)
plt.scatter(total_bill, tip, color='b') # Red
plt.scatter(total_bill, tip, color='r') # Green
plt.scatter(total_bill, tip, c='g') # Black
plt.scatter(total_bill, tip, c='k')</pre>
<p><strong>Note</strong>: we put the plots on one figure to save space. We’ll cover how to do this in another article (hint: use <code><a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.subplots.html" target="_blank" rel="noreferrer noopener">plt.subplots()</a></code>)</p>
<h2>Matplotlib Scatter Plot Different Colors</h2>
<p>Our restaurant has a smoking area. We want to see if a group sitting in the smoking area affects the amount they tip.</p>
<p>We could show this by changing the size of the markers like above. But it doesn’t make much sense to do so. A bigger group logically implies a bigger marker. But marker size and being a smoker don’t have any connection and may be confusing for the reader.&nbsp;</p>
<p>Instead, we will color our markers differently to represent smokers and non-smokers.&nbsp;</p>
<p>We have split our data into four NumPy arrays: </p>
<ul>
<li>x-axis – non_smoking_total_bill, smoking_total_bill</li>
<li>y-axis – non_smoking_tip, smoking_tip</li>
</ul>
<p>If you draw multiple scatter plots at once, matplotlib colors them differently. This makes it easy to recognize the different datasets.</p>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh3.googleusercontent.com/HUzLQu2HhluHtzxUoQuXTp-p7I9wGTMOJNDSam8Zs0SjLVMcyggYE15rA8fd7ULFfHUQJgLd82Egss7EPboD0PRWp9rWp_24LBNGmvSwI-CU6CStxYGCD0nso5cnd7tSehoVeG9S" alt="" width="628" height="508"/></figure>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.scatter(non_smoking_total_bill, non_smoking_tip)
plt.scatter(smoking_total_bill, smoking_tip)
plt.show()</pre>
<p>This looks great. It’s very easy to tell the orange and blue markers apart. The only problem is that we don’t know which is which. Let’s add a legend.&nbsp;</p>
<p>As we have 2 <code>plt.scatter()</code> calls, we can label each one and then call <code>plt.legend()</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Add label names to each scatter plot
plt.scatter(non_smoking_total_bill, non_smoking_tip, label='Non-smoking')
plt.scatter(smoking_total_bill, smoking_tip, label='Smoking') # Put legend in upper left corner of the plot
plt.legend(loc='upper left')
plt.show()</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/T7dTAEypyAhjFwUzdY0dl_V_eBvIc0FH44mLGBkMI3MF9yFeq6tyWnD9tT0uO5xu-mi6lg8bE65CS0F3dWpWe3bxdQIzQmaruNak5-coc0j_l7VuSSNYmWV0ft_K_U-bGpLQncTf" alt="" width="631" height="510"/></figure>
<p>Much better. It seems that the smoker’s data is more spread out and flat than non-smoking data. This implies that smokers tip about the same regardless of their bill size. Let’s try to serve less smoking tables and more non-smoking ones.</p>
<p>This method works fine if we have separate data. But most of the time we don’t and separating it can be tedious.&nbsp;</p>
<p>Thankfully, like with <code>size</code>, we can pass <code>c</code><em> </em>an array/sequence.</p>
<p>Let’s say we have a list <code>smoker</code> that contains 1 if the table smoked and 0 if they didn’t.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.scatter(total_bill, tip, c=smoker)
plt.show()</pre>
<p><strong>Note</strong>: if we pass an array/sequence, we must the keyword <code>c</code> instead of <code>color</code>. Python raises a <code>ValueError</code> if you use the latter.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">ValueError: 'color' kwarg must be an mpl color spec or sequence of color specs.
For a sequence of values to be color-mapped, use the 'c' argument instead.</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/QZTqG--_YygrmU14lYODRiSyZVBAYqxdbwwyRBZ-V56I20Yko7mKF059ZjQWCfVo6ZiGtOWWhtPWHbrzue1V_ZgNR9D2K5_1nzZQSjoQBwXVNT2TRXIMIEzL486ETUlNMR-vgFWn" alt="" width="683" height="553"/></figure>
<p>Great, now we have a plot with two different colors in 2 lines of code. But the colors are hard to see.&nbsp;</p>
<h2>Matplotlib Scatter Colormap</h2>
<p>A colormap is a range of colors matplotlib uses to shade your plots. We set a colormap with the <code>cmap</code> argument. All possible colormaps are listed<a href="https://matplotlib.org/3.1.1/tutorials/colors/colormaps.html"> here</a>. </p>
<p>We’ll choose <code>'bwr'</code> which stands for blue-white-red. For two datasets, it chooses just blue and red.</p>
<p>If color theory interests you, we highly recommend this<a href="https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/ColorMapsExpanded.pdf" target="_blank" rel="noreferrer noopener"> paper</a>. In it, the author creates <code>bwr</code>. Then he argues it should be the default color scheme for all scientific visualizations. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.scatter(total_bill, tip, c=smoker, cmap='bwr')
plt.show()</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/rLwVJpIJXxluciptRkrLWwvT4F2U1AAeM-vYoGNXL4lwYXGUhb7hy2DCrSKLAuRkQCn2f0TCYPiFRJyJwc1uqqibn0Fb9Yy6RqC09FkKjL8uUTBS2l3zor7ra93ykb2uyFZR23R1" alt="" width="640" height="518"/></figure>
<p>Much better. Now let’s add a legend.</p>
<p>As we have one <code>plt.scatter()</code> call, we must use <code>scatter.legend_elements()</code> like we did earlier. This time, we’ll set <code>prop='colors'</code><em>.</em> But since this is the default setting, we call <code>legend_elements()</code> without any arguments. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># legend_elements() is a method so we must name our scatter plot
scatter = plt.scatter(total_bill, tip, c=smoker_num, cmap='bwr') # No arguments necessary, default is prop='colors'
handles, labels = scatter.legend_elements() # Print out labels to see which appears first
print(labels)
# ['$\\mathdefault{0}$', '$\\mathdefault{1}$']</pre>
<p>We unpack our legend into <code>handles</code> and <code>labels</code> like before. Then we print labels to see the order matplotlib chose. It uses an ascending ordering. So 0 (non-smokers) is first. </p>
<p>Now we overwrite <code>labels</code> with descriptive strings and pass everything to <code>plt.legend()</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Re-name labels to something easier to understand
labels = ['Non-Smokers', 'Smokers'] plt.legend(handles, labels)
plt.show()</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh6.googleusercontent.com/OY-HBwPgPvIhrIsBpfTUPrFuuTtN7AsPNc59uzy4nXgtWehryFpZpyBnQzOsD-ipNMD-x64OObXbXxjRl_Uuw1duZIU30c5MYISwEMLsdgwHhSd5uTNr6s7XHH1YhldXJEsVJUub" alt="" width="634" height="461"/></figure>
<p>This is a great scatter plot. It’s easy to distinguish between the colors and the legend tells us what they mean. As smoking is unhealthy, it’s also nice that this is represented by red as it suggests <code>'danger'</code>. </p>
<p>What if we wanted to swap the colors?&nbsp;</p>
<p>Do the same as above but make the <code>smoker</code> list 0 for smokers and 1 for non-smokers. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">smokers_swapped = [1 - x for x in smokers]</pre>
<p>Finally, as 0 comes first, we overwrite <code>labels</code> in the opposite order to before.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">labels = ['Smokers', 'Non-Smokers']</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/E9Dh9Ti8n_uY3eYUnNJzHUMNlAfR2K5DwZyvP-uXx_FnxfKr0OIjTaa-YCIHgYarpgySxBcSpSH5IbwzOOv1tN0-CSI61sx07hIupbouYGB3h2CHA8jXgfVpg1FbFDtjgKoTVtJX" alt="" width="633" height="460"/></figure>
</p>
<h2>Matplotlib Scatter Marker Types</h2>
<p>Instead of using color to represent smokers and non-smokers, we could use different marker types.</p>
<p>There are over 30<a href="https://matplotlib.org/3.1.1/api/markers_api.html#module-matplotlib.markers"> built-in markers</a> to choose from. Plus you can use any LaTeX expressions and even define your own shapes. We’ll cover the most common built-in types you’ll see. Thankfully, the syntax for choosing them is intuitive.&nbsp;</p>
<p>In our <code>plt.scatter()</code> call, use the <code>marker</code> keyword argument to set the marker type. Usually, the shape of the string reflects the shape of the marker. Or the string is a single letter matching to the first letter of the shape. </p>
<p>Here are the most common examples:</p>
<ul>
<li><code>'o'</code> – circle (default)</li>
<li><code>'v'</code> – triangle down</li>
<li><code>'^'</code> – triangle up</li>
<li><code>'s' </code>– square</li>
<li><code>'+'</code> – plus</li>
<li><code>'D'</code> – diamond</li>
<li><code>'d'</code> – thin diamond</li>
<li><code>'$...$'</code> – LaTeX syntax e.g. <code>'$\pi$'</code> makes each marker the Greek letter π. </li>
</ul>
<p>Let’s see some examples</p>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh5.googleusercontent.com/RO_j7X5ULATImdDclrZxCU_3qDzBUtkCtYuxMJ57kEnHl0YXyOztDyY8I0z0P57Id7q2cS8X1eQjA4s1Al2eY9RtpcKYf581O-_wJQbcYkG1Ioatod3KuyN_KciJ0UC8DnYfFGgK" alt="" width="695" height="589"/></figure>
<p>For each plot, call <code>plt.scatter()</code> with <code>total_bill</code> and tip and set <code>marker</code> to your choice</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Circle
plt.scatter(total_bill, tip, marker='o') # Plus
plt.scatter(total_bill, tip, marker='+') # Diamond
plt.scatter(total_bill, tip, marker='D') # Triangle Up
plt.scatter(total_bill, tip, marker='^')</pre>
<p>At the time of writing, you cannot pass an array to <code>marker</code> like you can with <code>color</code> or <code>size</code>. There is an<a href="https://github.com/matplotlib/matplotlib/issues/11155"> open GitHub issue</a> requesting that this feature is added. But for now, to plot two datasets with different markers, you need to do it manually.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Square marker
plt.scatter(non_smoking_total_bill, non_smoking_tip, marker='s', label='Non-smoking') # Plus marker
plt.scatter(smoking_total_bill, smoking_tip, marker='+', label='Smoking') plt.legend(loc='upper left')
plt.show()</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh6.googleusercontent.com/FVEU4r5Surhb5QDYgwqYvY9fh02cA6pAwMlxbiRAyEImbphl00dt30oz_SMK-7K9ZK5GX15cJxtoyUV5yEW8_9IN5gEe5juMWdK_He_HsNERGorOzZAiTm8Ji6zg-tHyFWROtEr9" alt="" width="663" height="483"/></figure>
<p>Remember that if you draw multiple scatter plots at once, matplotlib colors them differently. This makes it easy to recognise the different datasets. So there is little value in also changing the marker type.&nbsp;</p>
<p>To get a plot in one color with different marker types, set the same color for each plot and change each marker.&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Square marker, blue color
plt.scatter(non_smoking_total_bill, non_smoking_tip, marker='s', c='b' label='Non-smoking') # Plus marker, blue color
plt.scatter(smoking_total_bill, smoking_tip, marker='+', c='b' label='Smoking') plt.legend(loc='upper left')
plt.show()</pre>
<figure class="wp-block-image is-resized"><img loading="lazy" src="https://lh6.googleusercontent.com/Zr_QtgOpayASFtjolSB2_gdIEdu6lCq02yMLRXNOsoFsG6lxkmE3wTGOBw0k9AQrzqvEjbXSd8d0TMrQmb13B75JJ_LCRUmESO4fg5gyQC2OJODC44lrgfNR-gAK_6J4Xbgc852m" alt="" width="669" height="541"/></figure>
<p>Most would agree that different colors are easier to distinguish than different markers. But now you have the ability to choose.</p>
<h2>Summary</h2>
<p>You now know the 4 most important things to make excellent scatter plots.&nbsp;</p>
<p>You can make basic matplotlib scatter plots. You can change the marker size to make the data easier to understand. And you can change the marker size based on another variable.&nbsp;</p>
<p>You’ve learned how to choose any color imaginable for your plot. Plus you can change the color based on another variable.&nbsp;</p>
<p>To add personality to your plots, you can use a custom marker type.</p>
<p>Finally, you can do all of this with an accompanying legend (something most Pythonistas don’t know how to use!).&nbsp;</p>
<h2>Where To Go From Here</h2>
<p>Do you want to earn more money? Are you in a dead-end 9-5 job? Do you dream of breaking free and coding full-time but aren’t sure how to get started?&nbsp;</p>
<p>Becoming a full-time coder is scary. There is so much coding info out there that it’s overwhelming.&nbsp;</p>
<p>Most tutorials teach you Python and tell you to get a full-time job.&nbsp;</p>
<p>That’s ok but why would you want another office job?</p>
<p>Don’t you crave freedom? Don’t you want to travel the world? Don’t you want to spend more time with your friends and family?</p>
<p>There are hardly any tutorials that teach you Python and how to be your own boss. And there are none that teach you how to make six figures a year.</p>
<p>Until now.&nbsp;</p>
<p>We are full-time Python freelancers. We work from anywhere in the world. We set our own schedules and hourly rates. Our calendars are booked out months in advance and we have a constant flow of new clients.&nbsp;</p>
<p>Sounds too good to be true, right?</p>
<p>Not at all. We want to show you the exact steps we used to get here. We want to give you a life of freedom. We want you to be a six-figure coder.</p>
<p>Click the link below to watch our pure-value webinar. We show you the exact steps to take you from where you are to a full-time Python freelancer. These are proven, no-BS methods that get you results fast.</p>
<p><a href="https://tinyurl.com/python-freelancer-webinar" target="_blank" rel="noreferrer noopener">https://tinyurl.com/python-freelancer-webinar</a></p>
<p>It doesn’t matter if you’re a Python novice or Python pro. If you are not making six figures/year with Python right now, you will learn something from this webinar.</p>
<p>Click the link below now and learn how to become a Python freelancer.</p>
<p><a href="https://tinyurl.com/python-freelancer-webinar" target="_blank" rel="noreferrer noopener">https://tinyurl.com/python-freelancer-webinar</a></p>
<h2>References</h2>
<ul>
<li><a href="https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size">https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size</a></li>
<li><a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html">https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.scatter.html</a></li>
<li><a href="https://seaborn.pydata.org/generated/seaborn.scatterplot.html">https://seaborn.pydata.org/generated/seaborn.scatterplot.html</a></li>
<li><a href="https://matplotlib.org/3.1.1/api/collections_api.html#matplotlib.collections.PathCollection.legend_elements">https://matplotlib.org/3.1.1/api/collections_api.html#matplotlib.collections.PathCollection.legend_elements</a></li>
<li><a href="https://blog.finxter.com/what-is-asterisk-in-python/">https://blog.finxter.com/what-is-asterisk-in-python/</a></li>
<li><a href="https://matplotlib.org/3.1.1/api/markers_api.html#module-matplotlib.markers">https://matplotlib.org/3.1.1/api/markers_api.html#module-matplotlib.markers</a></li>
<li><a href="https://stackoverflow.com/questions/31726643/how-do-i-get-multiple-subplots-in-matplotlib">https://stackoverflow.com/questions/31726643/how-do-i-get-multiple-subplots-in-matplotlib</a></li>
<li><a href="https://matplotlib.org/3.1.0/gallery/color/named_colors.html">https://matplotlib.org/3.1.0/gallery/color/named_colors.html</a></li>
<li><a href="https://matplotlib.org/3.1.0/tutorials/colors/colors.html#xkcd-colors">https://matplotlib.org/3.1.0/tutorials/colors/colors.html#xkcd-colors</a></li>
<li><a href="https://github.com/matplotlib/matplotlib/issues/11155">https://github.com/matplotlib/matplotlib/issues/11155</a></li>
<li><a href="https://matplotlib.org/3.1.1/tutorials/colors/colormaps.html">https://matplotlib.org/3.1.1/tutorials/colors/colormaps.html</a></li>
<li><a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html">https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html</a></li>
<li><a href="https://matplotlib.org/tutorials/intermediate/legend_guide.html">https://matplotlib.org/tutorials/intermediate/legend_guide.html</a></li>
<li><a href="https://seaborn.pydata.org/tutorial/color_palettes.html">https://seaborn.pydata.org/tutorial/color_palettes.html</a></li>
<li><a href="https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/ColorMapsExpanded.pdf">https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/ColorMapsExpanded.pdf</a></li>
<li><a href="https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.subplots.html">https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.subplots.html</a></li>
</ul>
<p>The post <a href="https://blog.finxter.com/matplotlib-scatter-plot/" target="_blank" rel="noopener noreferrer">Matplotlib Scatter Plot – Simple Illustrated Guide</a> first appeared on <a href="https://blog.finxter.com/" target="_blank" rel="noopener noreferrer">Finxter</a>.</p>
</div>


https://www.sickgaming.net/blog/2020/11/...ted-guide/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016