Table of Contents

Problem: Given a CSV file, how to read only specific column(s) from the csv file? (Reading a specific column from a csv file will yield all the row values pertaining to that column.)
Example: Consier the following csv file (countries.csv):
Country,Capital,Population,Area
Germany,Berlin,"84,267,549","348,560"
France,Paris,"65,534,239","547,557"
Spain,Madrid,"46,787,468","498,800"
Italy,Rome,"60,301,346","294,140"
India,Delhi,"1,404,495,187","2,973,190"
USA,Washington,"334,506,463","9,147,420"
China,Beijing,"1,449,357,022","9,388,211"
Poland,Warsaw,"37,771,789","306,230"
Russia,Moscow,"146,047,418","16,376,870"
England,London,"68,529,747","241,930"
Question: How will you read the above csv file and display the following columns
Countrycolumn along with theCapitalcolumn?- All values in the
populationcolumn?
Method 1: Using Pandas
Using the Pandas library is probably the best option if you are dealing with csv files. You can easily read a csv file and store an entire column within a variable.
Code:
import pandas as pd df = pd.read_csv("countries.csv")
country = df['Country']
# or
# country = df.Country
capital = df['Capital']
# or
# capital = df.Capital # displaying selected columns (Country and Capital)
for x, y in zip(country, capital): print(f"{x} {y}") # displaying a single column (Country)
print()
print(df['Population'])
Output:
Germany Berlin
France Paris
Spain Madrid
Italy Rome
India Delhi
USA Washington
China Beijing
Poland Warsaw
Russia Moscow
England London 0 84,267,549
1 65,534,239
2 46,787,468
3 60,301,346
4 1,404,495,187
5 334,506,463
6 1,449,357,022
7 37,771,789
8 146,047,418
9 68,529,747
Name: Population, dtype: object
Explanation:
- Read the csv file using
pd.read_csv()Pandas function. - Save all the information of the columns Country and Capital within independent variables using
country = df['Country']
- Alternatively, you can also use
country = df.Country
- Alternatively, you can also use
capital = df['Capital']
- Alternatively, you can also use
capital = df.Capital
- Alternatively, you can also use
- To display the country names and their capitals simultaneously, you can bind the two columns,
countryandcapital, using the zip() function and then display each country along with its capital using a for loop upon the zipped object. - To display all the values in the population column, you can simply use
df['Population'].
TRIVIAzip() is a built-in function in Python that takes an arbitrary number of iterables and binds them into a single iterable, a zip object. It combines the n-th value of each iterable argument into a tuple.
Read more about zip() here.
➤ List-Based Indexing of a DataFrame
In case you are not comfortable with using zip() to display multiple columns at once, you have another option. You can simply use list-based indexing to accomplish your goal.
List-based indexing is a technique that allows you to pass multiple column names as a list within the square-bracket selector.
Example:
import pandas as pd df = pd.read_csv("countries.csv")
print()
print(df[['Country', 'Capital']])
Output:
Country Capital
0 Germany Berlin
1 France Paris
2 Spain Madrid
3 Italy Rome
4 India Delhi
5 USA Washington
6 China Beijing
7 Poland Warsaw
8 Russia Moscow
9 England London
Method 2: Integer Based Indexing with iloc
Approach: The idea here is to use the df.iloc[rows, columns].values to access individual columns from the DataFrame using indexing. Note that the first column always has the index 0, while the second column has index 1, and so on.
rowsis used to select individual rows. Use the slicing colon:to ensure all rows have been selected.columnsis used to select individual columns.
- Use
country = data.iloc[:, 0].valuesto save the values of the Country column. capital = data.iloc[:, 1].valuesto save the values of the Capital column.population = data.iloc[:, 2].valuesto save the values of the Population column.
- Use
import pandas as pd data = pd.read_csv('countries.csv')
country = data.iloc[:, 0].values
capital = data.iloc[:, 1].values
population = data.iloc[:, 2].values
# displaying selected columns
print(data[['Country', 'Capital']])
print()
# displaying a single column (Population)
print(population)
Output:
Country Capital
0 Germany Berlin
1 France Paris
2 Spain Madrid
3 Italy Rome
4 India Delhi
5 USA Washington
6 China Beijing
7 Poland Warsaw
8 Russia Moscow
9 England London ['84,267,549' '65,534,239' '46,787,468' '60,301,346' '1,404,495,187' '334,506,463' '1,449,357,022' '37,771,789' '146,047,418' '68,529,747']
Method 3: Name-Based Indexing with loc()
Instead of selecting the columns by their index, you can also select them by their name using the df.loc[] selecter.
The following example shows how to select the columns Country and Capital from the given DataFrame.
import pandas as pd data = pd.read_csv('countries.csv')
val = data.loc[:, ['Country', 'Capital']]
print(val)
Output:
Country Capital
0 Germany Berlin
1 France Paris
2 Spain Madrid
3 Italy Rome
4 India Delhi
5 USA Washington
6 China Beijing
7 Poland Warsaw
8 Russia Moscow
9 England London
Related Tutorial: Slicing Data from a Pandas DataFrame using .loc and .iloc
Method 4: Using csv Module
csv module is yet another spectacular option in Python that allows you to play with csv files. Let us have a look at the code that helps us to read the given csv file and then read specific columns from it:
import csv population = []
with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for r in data: print("Country", ":", "Capital") # append values from population column to population list population.append(r['Population']) # displaying specific columns (Country and Capital) print(r['Country'], ":", r['Capital']) # display the population list print(population)
Output:
Country : Capital
Germany : Berlin
Country : Capital
France : Paris
Country : Capital
Spain : Madrid
Country : Capital
Italy : Rome
Country : Capital
India : Delhi
Country : Capital
USA : Washington
Country : Capital
China : Beijing
Country : Capital
Poland : Warsaw
Country : Capital
Russia : Moscow
Country : Capital
England : London
['84,267,549', '65,534,239', '46,787,468', '60,301,346', '1,404,495,187', '334,506,463', '1,449,357,022', '37,771,789', '146,047,418', '68,529,747']
Explanation:
- Import the
csvmodule and open up the csv file. Ensure that you feed in theencodingargument as it helps to eliminate any unreadable characters that may occur in the given csv file.
with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile
- Allow Python to read the csv file as a dictionary using
csv.Dictreaderobject. - Once the file has been read in the form of a dictionary, you can easily fetch the values from respective columns by using the keys within square bracket notation from the dictionary. Here each column represents the key within the given dictionary.
Bonus: Here’s a quick look at how the DictReader() class looks like:
import csv population = []
with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for row in data: print(row)
Output:
{'Country': 'Germany', 'Capital': 'Berlin', 'Population': '84,267,549', 'Area': '348,560'}
{'Country': 'France', 'Capital': 'Paris', 'Population': '65,534,239', 'Area': '547,557'}
{'Country': 'Spain', 'Capital': 'Madrid', 'Population': '46,787,468', 'Area': '498,800'}
{'Country': 'Italy', 'Capital': 'Rome', 'Population': '60,301,346', 'Area': '294,140'}
{'Country': 'India', 'Capital': 'Delhi', 'Population': '1,404,495,187', 'Area': '2,973,190'}
{'Country': 'USA', 'Capital': 'Washington', 'Population': '334,506,463', 'Area': '9,147,420'}
{'Country': 'China', 'Capital': 'Beijing', 'Population': '1,449,357,022', 'Area': '9,388,211'}
{'Country': 'Poland', 'Capital': 'Warsaw', 'Population': '37,771,789', 'Area': '306,230'}
{'Country': 'Russia', 'Capital': 'Moscow', 'Population': '146,047,418', 'Area': '16,376,870'}
{'Country': 'England', 'Capital': 'London', 'Population': '68,529,747', 'Area': '241,930'}
It is evident from the output that csv.DictReader() returns a dictionary for each row such that the column header is the key while the value in the row is the associated value in the dictionary.
Conclusion
To sum things up, there are majorly four different ways of accessing specific columns from a given csv file:
- List-Based Indexing.
- Integer-Based Indexing.
- Name-Based Indexing.
- Using csv modules
DictReaderclass.
Feel free to use the one that suits you best. I hope this tutorial helped you. Please subscribe and stay tuned for more interesting tutorials. Happy learning!
Learn Pandas the Fun Way by Solving Code Puzzles
If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).
It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?
Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.
https://www.sickgaming.net/blog/2022/05/...in-python/


