Sick Gaming
[Tut] How to Create a DataFrame From Lists? - Printable Version

+- Sick Gaming (https://www.sickgaming.net)
+-- Forum: Programming (https://www.sickgaming.net/forum-76.html)
+--- Forum: Python (https://www.sickgaming.net/forum-83.html)
+--- Thread: [Tut] How to Create a DataFrame From Lists? (/thread-100421.html)



[Tut] How to Create a DataFrame From Lists? - xSicKxBot - 12-17-2022

How to Create a DataFrame From Lists?

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;985131&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;1&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;5\/5 - (1 vote)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 142.5px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div>
</div>
<p>Pandas is a great library for data analysis in Python. With Pandas, you can create visualizations, filter rows or columns, add new columns, and save the data in a wide range of formats. The workhorse of Pandas is the <strong>DataFrame</strong>. </p>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f449.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/pandas-quickstart/" data-type="post" data-id="16511" target="_blank" rel="noreferrer noopener">10 Minutes to Pandas (in 5 Minutes)</a></p>
<p>So the first step working with Pandas is often to get our data into a DataFrame. If we have data stored in <a href="https://blog.finxter.com/python-lists/" data-type="post" data-id="7332" target="_blank" rel="noreferrer noopener">lists</a>, how can we create this all-powerful DataFrame? </p>
<p>There are 4 basic strategies:</p>
<ol type="1">
<li>Create a <a href="https://blog.finxter.com/python-dictionary/" data-type="post" data-id="5232" target="_blank" rel="noreferrer noopener">dictionary</a> with column names as keys and your lists as values. Pass this dictionary as an argument when creating the DataFrame.</li>
<li>Pass your lists into the <code><a href="https://blog.finxter.com/python-ziiiiiiip-a-helpful-guide/" data-type="post" data-id="1938" target="_blank" rel="noreferrer noopener">zip()</a></code> function. As with strategy 1, your lists will become columns in the DataFrame.</li>
<li>Put your lists into a list instead of a dictionary. In this case, your lists become rows instead of columns.</li>
<li><a href="https://blog.finxter.com/how-to-create-a-dataframe-in-pandas/" data-type="post" data-id="16764" target="_blank" rel="noreferrer noopener">Create an empty DataFrame</a> and add columns one by one.</li>
</ol>
<h2>Method 1: Create a DataFrame using a Dictionary</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="1010" height="645" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-237.png" alt="" class="wp-image-985155" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-237.png 1010w, https://blog.finxter.com/wp-content/uploads/2022/12/image-237-300x192.png 300w, https://blog.finxter.com/wp-content/uploads/2022/12/image-237-768x490.png 768w" sizes="(max-width: 1010px) 100vw, 1010px" /></figure>
</div>
<p>The first step is to import pandas. If you haven’t already, <a href="https://blog.finxter.com/how-to-install-pandas-in-python/" data-type="post" data-id="35926" target="_blank" rel="noreferrer noopener">install pandas</a> first.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd</pre>
<p>Let’s say you have employee data stored as lists.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># if your data is stored like this
employee = ['Betty', 'Veronica', 'Archie', 'Jughead']
salary = [110_000, 20_000, 80_000, 70_000]
bonus = [1000, 500, 2500, 400]
tax_rate = [.1, .25, .17, .4]
absences = [0, 1, 0, 52]
</pre>
<p>Build a dictionary using column names as keys and your lists as values.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># you can easily create a dictionary that will define your dataframe
emp_data = { 'name': employee, 'salary': salary, 'bonus': bonus, 'tax_rate': tax_rate, 'absences': absences
}
</pre>
<p>Your lists will become columns in the resulting DataFrame.</p>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="367" height="164" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-230.png" alt="" class="wp-image-985144" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-230.png 367w, https://blog.finxter.com/wp-content/uploads/2022/12/image-230-300x134.png 300w" sizes="(max-width: 367px) 100vw, 367px" /></figure>
</div>
<h2>Create a DataFrame using the zip function</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="1010" height="668" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-238.png" alt="" class="wp-image-985156" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-238.png 1010w, https://blog.finxter.com/wp-content/uploads/2022/12/image-238-300x198.png 300w, https://blog.finxter.com/wp-content/uploads/2022/12/image-238-768x508.png 768w" sizes="(max-width: 1010px) 100vw, 1010px" /></figure>
</div>
<p>Pass each list as a separate argument to the <code><a rel="noreferrer noopener" href="https://blog.finxter.com/python-ziiiiiiip-a-helpful-guide/" data-type="post" data-id="1938" target="_blank">zip()</a></code> function. You can specify the column names using the <code>columns</code> parameter or by setting the <code>columns</code> property on a separate line.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df = pd.DataFrame(zip(employee, salary, bonus, tax_rate, absences))
emp_df.columns = ['name', 'salary', 'bonus', 'tax_rate', 'absences']
</pre>
<p>The <code>zip()</code> function creates an <a href="https://blog.finxter.com/iterators-iterables-and-itertools/" data-type="post" data-id="29507" target="_blank" rel="noreferrer noopener">iterator</a>. For the first iteration, it grabs every value at index 0 from each list. This becomes the first row in the DataFrame. Next, it grabs every value at index 1 and this becomes the second row. This continues until it exhausts the shortest list.</p>
<p>We can loop thru the iterator to see how this works.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">i = 0
for value in zip(employee, salary, bonus, tax_rate, absences): print(f'zipped value at index {i}: {value}') i += 1
</pre>
<p>Each of these values becomes a row in the DataFrame:</p>
<pre class="wp-block-preformatted"><code>zipped value at index 0: ('Betty', 110000, 1000, 0.1, 0)
zipped value at index 1: ('Veronica', 20000, 500, 0.25, 1)
zipped value at index 2: ('Archie', 80000, 2500, 0.17, 0)
zipped value at index 3: ('Jughead', 70000, 400, 0.4, 52)</code>
</pre>
<h2>Create a DataFrame using a list of lists</h2>
<p>What if you have a separate list for each employee? In this case, we can just create a <a href="https://blog.finxter.com/python-list-of-lists/" data-type="post" data-id="7890" target="_blank" rel="noreferrer noopener">list of lists</a>. Each of the inner lists becomes a row in the DataFrame.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># lists for employees instead of features
betty = ['Betty', 110000, 1000, 0.1, 0]
veronica = ['Veronica', 20000, 500, 0.25, 1]
archie = ['Archie', 80000, 2500, 0.17, 0]
jughead = ['Jughead', 70000, 400, 0.4, 52] emp_df = pd.DataFrame([betty, veronica, archie, jughead])
emp_df.columns = ['name', 'salary', 'bonus', 'tax_rate', 'absences']
emp_df
</pre>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="380" height="158" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-231.png" alt="" class="wp-image-985145" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-231.png 380w, https://blog.finxter.com/wp-content/uploads/2022/12/image-231-300x125.png 300w" sizes="(max-width: 380px) 100vw, 380px" /></figure>
</div>
<h2>Create a DataFrame using a list of dictionaries</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="856" height="863" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-239.png" alt="" class="wp-image-985157" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-239.png 856w, https://blog.finxter.com/wp-content/uploads/2022/12/image-239-298x300.png 298w, https://blog.finxter.com/wp-content/uploads/2022/12/image-239-150x150.png 150w, https://blog.finxter.com/wp-content/uploads/2022/12/image-239-768x774.png 768w" sizes="(max-width: 856px) 100vw, 856px" /></figure>
</div>
<p>If the employee data is stored in dictionaries instead of lists, we use a list of dictionaries.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">betty = {'name': 'Betty', 'salary': 110000, 'bonus': 1000, 'tax_rate': 0.1, 'absences': 0} veronica = {'name': 'Veronica', 'salary': 20000, 'bonus': 500, 'tax_rate': 0.25, 'absences': 1} archie = {'name': 'Archie', 'salary': 80000, 'bonus': 2500, 'tax_rate': 0.17, 'absences': 0} jughead = {'name': 'Jughead', 'salary': 70000, 'bonus': 400, 'tax_rate': 0.4, 'absences': 52} pd.DataFrame([betty, veronica, archie, jughead])</pre>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="374" height="159" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-232.png" alt="" class="wp-image-985146" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-232.png 374w, https://blog.finxter.com/wp-content/uploads/2022/12/image-232-300x128.png 300w" sizes="(max-width: 374px) 100vw, 374px" /></figure>
</div>
<p>The columns are determined by the keys in the dictionaries. What if the dictionaries don’t all have the same keys?</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">betty = {'name': 'Betty', 'salary': 110000, 'bonus': 1000, 'tax_rate': 0.1, 'absences': 0, 'hire_date': '2001-01-01'} veronica = {'name': 'Veronica', 'salary': 20000, 'bonus': 500, 'tax_rate': 0.25, 'absences': 1} archie = {'name': 'Archie', 'salary': 80000, 'bonus': 2500, 'tax_rate': 0.17, 'absences': 0, 'title': 'Vice Chief Leader'} jughead = {'name': 'Jughead', 'salary': 70000, 'bonus': 400, 'tax_rate': 0.4, 'absences': 52, 'rank': 'yes'} pd.DataFrame([betty, veronica, archie, jughead])
</pre>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="624" height="151" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-233.png" alt="" class="wp-image-985147" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-233.png 624w, https://blog.finxter.com/wp-content/uploads/2022/12/image-233-300x73.png 300w" sizes="(max-width: 624px) 100vw, 624px" /></figure>
</div>
<p>All of the keys will be used. Anytime pandas encounters a dictionary with a missing key, the missing value will be replaced with NaN which stands for ‘not a number’.</p>
<h2>Create an empty DataFrame and add columns one by one</h2>
<p>This method might be preferable if you needed to create a lot of new calculated columns. Here we create a new column for after-tax income.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df = pd.DataFrame()
emp_df['name'] = employee
emp_df['salary'] = salary
emp_df['bonus'] = bonus
emp_df['tax_rate'] = tax_rate
emp_df['absences'] = absences income = emp_df['salary'] + emp_df['bonus']
emp_df['after_tax'] = income * (1 - emp_df['tax_rate'])
</pre>
<h2>How to add a list to an existing DataFrame</h2>
<p>Here is a neat trick. If you want to edit a row in a DataFrame you can use the handy <code><a href="https://blog.finxter.com/slicing-data-from-a-pandas-dataframe-using-loc-and-iloc/" data-type="post" data-id="230997" target="_blank" rel="noreferrer noopener">loc</a></code> method. Loc allows you to access rows and columns by their index value.</p>
<p>To access a row:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df.loc[3]</pre>
<p>Output is the row with index value 3 as a Series:</p>
<pre class="wp-block-preformatted"><code>name Jughead
salary 70000
bonus 400
tax_rate 0.4
absences 52
Name: 3, dtype: object</code>
</pre>
<p>To access a column just pass in the column name as the index. Note that we have to specify the row and column indexes. The format is <code>[rows, columns]</code>. If you want all rows you can use “<code>:</code>” as we do here. The <code>:</code> also works if you want all columns.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df.loc[:, 'salary']</pre>
<p>Output is also a series</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">0 110000
1 20000
2 80000
3 70000
4 200000
Name: salary, dtype: int64
</pre>
<p>So how do we use <code>loc</code> to add a new row? If we use a row index that doesn’t exist in the DataFrame, it will create a new row for us.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">new_emp = ['Fonzie', 200000, 30000, .05, 112]
emp_df.loc[4] = new_emp
emp_df
</pre>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="366" height="183" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-234.png" alt="" class="wp-image-985148" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-234.png 366w, https://blog.finxter.com/wp-content/uploads/2022/12/image-234-300x150.png 300w" sizes="(max-width: 366px) 100vw, 366px" /></figure>
</div>
<p>You can also update existing data with <code>loc</code>. Let’s drop Fonzie’s salary. It looks a bit excessive.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df.loc[4, 'salary'] = 105000
emp_df
</pre>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="376" height="183" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-235.png" alt="" class="wp-image-985149" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-235.png 376w, https://blog.finxter.com/wp-content/uploads/2022/12/image-235-300x146.png 300w" sizes="(max-width: 376px) 100vw, 376px" /></figure>
</div>
<p>That’s more like it.</p>
<h2><strong>Conclusion</strong></h2>
<p>There are many different ways of creating a DataFrame. We looked at several methods using data stored in lists. Each will get the job done. </p>
<p>The most convenient method will depend on what your lists represent. </p>
<p>If each of your lists would best be represented as a column, then a dictionary of lists might be the easiest way to go. </p>
<p>If each of your lists would best be represented as a row, then a list of lists would be a good choice. </p>
<p>To add data in a list as a new row in an existing DataFrame, the <code>loc</code> method comes in handy. Loc is also useful for updating existing data.</p>
</div>


https://www.sickgaming.net/blog/2022/12/17/how-to-create-a-dataframe-from-lists/