Login

How to Get the Last N Rows of a Pandas DataFrame?

Rate this post

In this tutorial we will unearth the solutions to three commonly asked questions that users come across while dealing with huge sets of data.

Problem Formulation

Given: Consider the following csv file (Note: You need to use it as a Pandas DataFrame).

import pandas as pd df = pd.read_csv('countries.csv')

print(df)

 Country Capital Population Area

0 Germany Berlin 84,267,549 348,560

1 France Paris 65,534,239 547,557

2 Spain Madrid 46,787,468 498,800

3 Italy Rome 60,301,346 294,140

4 India Delhi 1,404,495,187 2,973,190

5 USA Washington 334,506,463 9,147,420

6 China Beijing 1,449,357,022 9,388,211

7 Poland Warsaw 37,771,789 306,230

8 Russia Moscow 146,047,418 16,376,870

9 England London 68,529,747 241,930

Here’s the list of the questions that we will be focusing upon in this article:

How to get the last N rows of a Pandas DataFrame?
How to get last N rows from last N columns of a Pandas DataFrame?
How to read last N rows of a large csv file in Pandas?

Recommended Read: How to Select Rows From a DataFrame Based on Column Values?

Without further delay, let us dive into the solutions to the first question and learn how to get the last N rows of a Pandas DataFrame.

Method 1: Using iloc

Approach: Use the iloc property as pandas.DataFrame.iloc[-n:].

The iloc property is used to get or set the values of specified indices. Select the last n rows using the square bracket notation syntax [-n:] with the iloc property. Here, -n represents the index of the last n rows of the given pandas DataFrame.

Code:

import pandas as pd df = pd.read_csv('countries.csv')

rows = df.iloc[-5:]

print(rows)

Output:

 Country Capital Population Area

5 USA Washington 334,506,463 9,147,420

6 China Beijing 1,449,357,022 9,388,211

7 Poland Warsaw 37,771,789 306,230

8 Russia Moscow 146,047,418 16,376,870

9 England London 68,529,747 241,930

Method 2: Using tail()

Approach: Use the pandas.DataFrame.tail(n) to select the last n rows of the given DataFrame.

The tail(n) method returns n number of methods from the bottom end of the DataFrame. Here, n represents an integer that denotes the number of rows you want to fetch from the bottom end of the DataFrame.

Code:

import pandas as pd df = pd.read_csv('countries.csv')

rows = df.tail(5)

print(rows)

Output:

 Country Capital Population Area

5 USA Washington 334,506,463 9,147,420

6 China Beijing 1,449,357,022 9,388,211

7 Poland Warsaw 37,771,789 306,230

8 Russia Moscow 146,047,418 16,376,870

9 England London 68,529,747 241,930

Well, that brings us to the next question in line – “How to get the last N rows from last N columns of a Pandas DataFrame?”

Method 1: Integer Based Indexing

Approach: Call pandas.DataFrame.iloc[-n:, -m:] to display last n rows from the last m columns of the given DataFrame.

Code: In the following code snippet we will fetch the last 5 rows from the last 2 columns, i.e., Population and Area.

import pandas as pd df = pd.read_csv('countries.csv')

rows = df.iloc[-5:, -2:]

print(rows)

Output:

 Population Area

5 334,506,463 9,147,420

6 1,449,357,022 9,388,211

7 37,771,789 306,230

8 146,047,418 16,376,870

9 68,529,747 241,930

Method 2: Name Based Indexing

In case, you happen to know the names of the specific columns and you want to get the last N records from the DataFrame from those columns then you can follow a two step process.

Call the Pandas.DataFrame.loc(:, 'start_column_name':'end_column_name') selector. It allows you to use slicing on column names instead of integer identifiers which can be more comfortable.
.loc is for label based indexing. Hence, the negative indices are not found and reindexed to NaN. Thus, to deal with this you have to use the tail() method to extract the last N records from the selected columns.

Code: The following code snippet shows how you can use the column names and fetch the corresponding values from the last 5 rows of the given Dataframe.

import pandas as pd df = pd.read_csv('countries.csv')

rows = df.loc[:, 'Population':'Area']

print(rows.tail(5))

Output:

 Population Area

5 334,506,463 9,147,420

6 1,449,357,022 9,388,211

7 37,771,789 306,230

8 146,047,418 16,376,870

9 68,529,747 241,930

Last but not least, let us solve the third and final problem of today’s tutorial – “How to read last N rows of a large csv file in Pandas?”

Unfortunately, read_csv() does not facilitate us with any parameter that allows you to directly read the last N lines from a file. This can be a troublesome issue to handle when you are dealing with large datasets.

Thus, a workaround to this problem is to first find out the total number of lines/records in the file. Then use the skiprows parameter to directly jump to the row/line from which you want to select the records.

Code: In the following code snippet we will fetch the first 5 rows from the csv file into our DataFrame.

import pandas as pd def num_of_lines(fname): with open(fname) as f: for i, _ in enumerate(f): pass return i + 1 num_lines = num_of_lines("countries.csv")

n = 5

df = pd.read_csv("countries.csv", skiprows=range(1, num_lines - n))

print(df)

Output:

 Country Capital Population Area

0 USA Washington 334,506,463 9,147,420

1 China Beijing 1,449,357,022 9,388,211

2 Poland Warsaw 37,771,789 306,230

3 Russia Moscow 146,047,418 16,376,870

4 England London 68,529,747 241,930

Conclusion

Phew! We have successfully solved all the problems that were presented to us in this tutorial. I hope this tutorial helped you to sharpen your coding skills. Please stay tuned and subscribe for more interesting coding problems.

Recommended Reads:

Learn Pandas the Fun Way by Solving Code Puzzles

If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).

It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?

Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.

https://www.sickgaming.net/blog/2022/07/...dataframe/

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[Tut] Making $65 per Hour on Upwork with Pandas	xSicKxBot	0	1,320	05-24-2023, 08:16 PM Last Post: xSicKxBot
	[Tut] Pandas Series Object – A Helpful Guide with Examples	xSicKxBot	0	1,307	05-01-2023, 01:30 AM Last Post: xSicKxBot
	[Tut] Python List of Tuples to DataFrame ?	xSicKxBot	0	1,507	04-22-2023, 06:10 AM Last Post: xSicKxBot
	[Tut] Dictionary of Lists to DataFrame – Python Conversion	xSicKxBot	0	1,376	04-17-2023, 03:46 AM Last Post: xSicKxBot
	[Tut] Pandas Boolean Indexing	xSicKxBot	0	1,312	04-16-2023, 10:54 AM Last Post: xSicKxBot
	[Tut] Python List of Dicts to Pandas DataFrame	xSicKxBot	0	1,527	04-11-2023, 04:15 AM Last Post: xSicKxBot
	[Tut] How to Create a DataFrame From Lists?	xSicKxBot	0	1,217	12-17-2022, 03:17 PM Last Post: xSicKxBot
	[Tut] How to Filter Data from an Excel File in Python with Pandas	xSicKxBot	0	1,221	10-31-2022, 05:36 AM Last Post: xSicKxBot
	[Tut] How to Convert Pandas DataFrame/Series to NumPy Array?	xSicKxBot	0	1,209	10-24-2022, 02:13 PM Last Post: xSicKxBot
	[Tut] How to Apply a Function to Each Cell in a Pandas DataFrame?	xSicKxBot	0	1,073	08-23-2022, 05:25 PM Last Post: xSicKxBot

xSicKxBot

Problem Formulation

Method 1: Using iloc

Method 2: Using tail()

Method 1: Integer Based Indexing

Method 2: Name Based Indexing

Conclusion

Learn Pandas the Fun Way by Solving Code Puzzles