Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How to Read a CSV File Into a Python List?

#1
How to Read a CSV File Into a Python List?

<div><p><em><strong>6 min read </strong></em></p>
<p>How to read data from a <code>.csv</code> file and add its column or row to the list? There are three main ways: </p>
<ul>
<li><strong>Option 1</strong> (the quickest): use the standard library </li>
<li><strong>Option 2</strong> (the most preferred): use <code>pandas.read_csv()</code> </li>
<li><strong>Option 3</strong> (optional): use <code>csv.reader()</code> </li>
</ul>
<h2>Short answer&nbsp;</h2>
<p>The simplest option to read a <code>.csv</code> file into a list is to use it with <code>open(“file”) as f:</code> and apply the actions you need. You should also remember that such an approach has its limitations, as you’ll see in the tutorial. </p>
<h2>Prerequisites&nbsp;</h2>
<p>To read the <code>.csv</code> file, I used the following tools: </p>
<ul>
<li>Python 3.8</li>
<li><a href="https://blog.finxter.com/best-python-ide/" target="_blank" rel="noreferrer noopener">PyCharm IDE</a> for convenient coding experience </li>
<li>Sublime Text Editor for a manual check of the <code>.csv</code> file </li>
</ul>
<p>By default, you can read CSV files using other tools or even default, pre-installed programs you have on your machine, so it is just a matter of choice what tools to use. The codebase can be executed anywhere with the same results.&nbsp;</p>
<h2>What is the CSV Format? </h2>
<p>Nowadays, three main data formats are used for passing data from one machine to another: CSV, XML, and JSON.&nbsp;</p>
<p>The abbreviation CSV stands for “comma-separated values”. As the name implies, it is just a list of elements separated by commas. It is the most straightforward format to transfer data and should be used if </p>
<ol>
<li>you need the most compact file size, or </li>
<li>you have a flat data structure. </li>
</ol>
<p>Keep in mind that CSV files do not give you such flexibility in presenting the data as the other two options. </p>
<p><strong>Related articles:</strong></p>
<ul>
<li><a href="https://blog.finxter.com/pandas-to_csv/">Pandas DataFrame to CSV</a></li>
<li><a href="https://blog.finxter.com/how-to-convert-a-list-of-lists-to-a-csv-file-in-python/">How to Convert a List of Lists to a CSV File in Python?</a></li>
</ul>
<h2>Example Task&nbsp;</h2>
<p>This is a real-world task in a simplified form.&nbsp;<strong>The goal is to read data from CSV file (70 KB) and form a list of all series codes present in the second line of it.&nbsp;</strong></p>
<p>The provided data is an open statistical data from the European Central Bank (ECB) in CSV format and present financial flows during the period. The file consist of three main fields: </p>
<ol>
<li>series code </li>
<li>observed date (period, e.g., 2019Q4, 2020Q1, etc.) </li>
<li>observed value (data point, float number) </li>
</ol>
<p><a href="http://sdw.ecb.europa.eu/export.do? mergeFilter=&amp;removeItem=L&amp;REF_AREA.252=I8&amp;COUNTERPART_AREA.252= W0&amp;rc=&amp;ec=&amp;legendPub=published&amp;oc=&amp;df=true&amp;DATASET=0&amp;dc=&amp;ACCOU NTING_ENTRY.252=A&amp;node=9689710&amp;showHide=&amp;removedItemList=&amp;pb=&amp;l egendNor=&amp;activeTab=&amp;STO.252=F&amp;STO.252=K7&amp;STO.252=KA&amp;STO.252=LE &amp;legendRef=reference&amp;REF_SECTOR.252=S1&amp;exportType=csv&amp;ajaxTab=true" target="_blank" rel="noreferrer noopener">Direct download link. </a></p>
<h2>Data Preparation </h2>
<p>To focus on the parsing option, I suggest you download and extract a file beforehand. In the examples, the file will be placed on the Desktop, but you can put it anywhere you like. </p>
<p>Script: </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os import wget link = "http://sdw.ecb.europa.eu/export.do? mergeFilter=&amp;removeItem=L&amp;REF_AREA.252=I8&amp;COUNTERPART_AREA.252=W0 &amp;rc=&amp;ec=&amp;legendPub=published&amp;oc=&amp;df=true&amp;DATASET=0&amp;dc=&amp;ACCOUNTING _ENTRY.252=A&amp;node=9689710&amp;showHide=&amp;removedItemList=&amp;pb=&amp;legendNo r=&amp;activeTab=&amp;STO.252=F&amp;STO.252=K7&amp;STO.252=KA&amp;STO.252=LE&amp;legendRe f=reference&amp;REF_SECTOR.252=S1&amp;exportType=csv&amp;ajaxTab=true" path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" wget.download(link, path) </pre>
<p>Script breakdown:&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os import wget</pre>
<p>Import statements are used to install code base which was written by someone else before and is ready to use just by referring to it. Some them (e.g. <code>wget</code>) should be additionally installed using similar command: </p>
<p>The following command will install the latest version of a module and its dependencies from the Python Packaging Index:&nbsp;</p>
<p><code>python -m pip install SomePackage </code></p>
<p><code>os</code> package is used to perform basic operation with files and folders in your operating system. </p>
<p><code>wget</code> package is used to download files from websites. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">link = "http://sdw.ecb.europa.eu/export.do? mergeFilter=&amp;removeItem=L&amp;REF_AREA.252=I8&amp;COUNTERPART_AREA.252=W0 &amp;rc=&amp;ec=&amp;legendPub=published&amp;oc=&amp;df=true&amp;DATASET=0&amp;dc=&amp;ACCOUNTING _ENTRY.252=A&amp;node=9689710&amp;showHide=&amp;removedItemList=&amp;pb=&amp;legendNo r=&amp;activeTab=&amp;STO.252=F&amp;STO.252=K7&amp;STO.252=KA&amp;STO.252=LE&amp;legendRe f=reference&amp;REF_SECTOR.252=S1&amp;exportType=csv&amp;ajaxTab=true" </pre>
<p>The string variable <code>link</code> is created which represents a direct download link. This link can be easily tested in any web-browser. </p>
<p><code>path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" </code></p>
<p>string variable <code>path</code> is created which represents a path in your system where files will be downloaded later. </p>
<p>The prefix “<code>f</code>” before the string makes it an “f-string” which means that you can use other variables in the string by using <code>{placeholders}</code>. In this case, variable <code>os.environ[‘HOMEPATH’]</code> refers to system variable (declared in the Windows system by default, not in your python script) and puts it into a string we just created. By default, <code>HOMEPATH</code> refers to the current user by <code>C:\Users\%user%</code> (you). </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">wget.download(link, path)</pre>
<p>The function call <code>wget.download()</code> triggers the file download from previously specified link and saves it by previously specified path. </p>
<p>The result of this step is a ready-to-use CSV file on your Desktop. Now we can parse data from CSV file and extract series codes to list.&nbsp;</p>
<h2>Data Exploration </h2>
<p>It is a good practice to explore data before you start parsing it. In this case, you can see that series codes are present in the second row of <code>data.csv</code>. </p>
<h3>Option 1 (the Fastest): Use the Standard Library </h3>
<p>This is the fastest option of reading a file using the standard library. Assuming the file is prepared and located on your Desktop, you can use the script below. This is the easiest way of getting data on the list. However, it has its drawbacks.&nbsp;</p>
<p>Input: </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" with open(path, "r") as f: print(list(f.readlines()[1].split(","))[1:]) </pre>
<p>Output:&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’]</pre>
<p>Script breakdown: </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">with open(path, "r") as f: print(list(f.readlines()[1].split(","))[1:]) </pre>
<p>The import statement and variable assignment is skipped as it was described previously and the main attention is given to the last statements. </p>
<p>This is a combined statement of three parts:&nbsp;</p>
<ol>
<li>The “with” statement, in the general meaning, allows us to define what code block (actions) we want to do with the object while it is “active”. In this case, we want to tell python that it has to do some actions while the file is open, and when all statements are completed, close it. </li>
<li>The “open” statement allows us to open a file and place it is into Python memory. In this case, we open the previously given file (“path” variable) in “r” mode, which stands for “read”-only mode. </li>
<li>The “print” statement allows you to see output on your screen. In this case we
<ol>
<li>take file object <code>f</code> with <code>open(path, 'r') as f</code>, </li>
<li>read second line with <code>f.readlines()[1]</code>, </li>
<li>split the line by the <code>,</code> separator in <code>f.readlines()[1].split(“,”)</code>, </li>
<li>convert the to list <code>list(f.readlines()[1].split(“,”))</code>, </li>
<li>return the list starting from second element as the first one is empty in <code>list(f.readlines()[1].split(“,”))[1:]</code>, and </li>
<li>print the result in <code>print(list(f.readlines()[1].split(“,”))[1:])</code>.</li>
</ol>
</li>
</ol>
<p>There is no specific documentation as this code base uses standard library which is built-in in Python. </p>
<p><strong>Pros/Cons</strong>: Such an approach allows the user to get an instant view of the CSV file and select the required data. You can use this for spot checks and simple transformations. It is important to remember that such an approach has the lowest amount of adjustable settings, and it requires lots of workarounds when transformations are complicated. </p>
<h3>Option 2 (the Most Preferred): Use pandas.read_csv() </h3>
<p>The most preferred option of reading <code>.csv</code> file is using the Pandas library (<a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-cheat-sheets/" target="_blank">Pandas</a><a href="https://blog.finxter.com/pandas-cheat-sheets/" target="_blank" rel="noreferrer noopener"> </a><a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-cheat-sheets/" target="_blank">cheat sheets here</a>). Pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool, built on top of the <a href="https://blog.finxter.com/python-crash-course/" target="_blank" rel="noreferrer noopener">Python programming</a> language. </p>
<p><a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-calculate-the-column-standard-deviation-of-a-dataframe-in-python-pandas/" target="_blank">Pandas </a>is usually used for more advanced data analysis where data is stored in a “DataFrame” which is basically like a table in excel. A DataFrame has a header row and an index column so that you can refer to table values by “column x row” intersection. </p>
<p>Script: </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os import pandas as pd path = f”C:{os.environ[‘HOMEPATH’]}\\Desktop\\data.csv” df = pd.read_csv(path, delimiter=”,”, skiprows=[0]) list = df.columns.to_list()[1:] print(list) </pre>
<p>Output:&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’] </pre>
<p>Script breakdown:&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv(path, delimiter=”,”, skiprows=[0])</pre>
<p>In this dataframe, variable <code>df</code> is created from the <code>.csv</code> file by executing pandas method <code>read_csv</code>. In this case the method requires several arguments: <code>file</code>, <code>delimiter</code> and <code>skiprows</code>. </p>
<p>The file is the same as used before. The delimiter is “,” which is a default option for .csv files, and it might be skipped. But it’s good to know that you can use any other delimiter.&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">list = df.columns.to_list()[1:] print(list)</pre>
<p>This line selects the column headers and puts them into a list starting from the second element going forward. The result is printed.&nbsp;</p>
<p><strong>Pros/Cons</strong>: Such an approach is relatively fast, visually appealing to the reader, and is fully adjustable using a consistent approach. Comparing to the first option when standard libraries are used, it requires additional packages to be installed. I personally believe that it is not a problem, and such a drawback can be neglected. But another point should not be skipped — the amount of data. This approach is inefficient when you need lots of “side” data, which is useless for your purpose. </p>
<p>Full documentation is available <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html" target="_blank" rel="noreferrer noopener">here </a>with more guides and instructions on how to use it. </p>
<h3>Option 3 (optional): use csv.reader()&nbsp;</h3>
<p>There is also another way how to read <code>.csv</code> files, which might be useful in certain circumstances. The <code>csv</code> module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats. </p>
<p>Script: </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os import csv path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" with open(path, 'r') as f: wines = list(csv.reader(f, delimiter=","))[1][1:] </pre>
<p>Output:&nbsp;</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’]</pre>
<p>Script breakdown: </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">with open(path, 'r') as f: wines = list(csv.reader(f, delimiter=","))[1][1:] </pre>
<p><code>csv.reader()</code> is a method which allows you to parse <code>.csv</code> file with specified delimiter. </p>
<p>After that, we select the second row using first brackets “<code>[1]</code>” and after that, select all elements from that list starting from second “<code>[1:]</code>” using <a href="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank" rel="noreferrer noopener">slicing</a>. </p>
<p>As this is a standard package, there is documentation at the <a href="https://docs.python.org/3/library/csv.html" target="_blank" rel="noreferrer noopener">official python website</a>:&nbsp;</p>
<p><strong>Pros/Cons</strong>: Such an approach is relatively simple and has just a few lines of code. On the other hand, it requires an additional package to be installed. </p>
<h2>Summary&nbsp;</h2>
<p>You should remember that there are different ways of reading data from CSV files. Select the one which suits your needs most or has the best performance and runtime. </p>
<h2>Where to Go From Here?</h2>
<p>Enough theory, let’s get some practice!</p>
<p>To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?</p>
<p><strong>Practice projects is how you sharpen your saw in coding!</strong></p>
<p>Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?</p>
<p>Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.</p>
<p>Join my free webinar <a rel="noreferrer noopener" href="https://blog.finxter.com/webinar-freelancer/" target="_blank">“How to Build Your High-Income Skill Python”</a> and watch how I grew my coding business online and how you can, too—from the comfort of your own home.</p>
<p><a href="https://blog.finxter.com/webinar-freelancer/" target="_blank" rel="noreferrer noopener">Join the free webinar now!</a></p>
</div>


https://www.sickgaming.net/blog/2020/05/...thon-list/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016