Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How to Get the Last N Rows of a Pandas DataFrame?

#1
How to Get the Last N Rows of a Pandas DataFrame?

Rate this post

In this tutorial we will unearth the solutions to three commonly asked questions that users come across while dealing with huge sets of data.

Problem Formulation


Given: Consider the following csv file (Note: You need to use it as a Pandas DataFrame).


import pandas as pd df = pd.read_csv('countries.csv')
print(df)
 Country Capital Population Area
0 Germany Berlin 84,267,549 348,560
1 France Paris 65,534,239 547,557
2 Spain Madrid 46,787,468 498,800
3 Italy Rome 60,301,346 294,140
4 India Delhi 1,404,495,187 2,973,190
5 USA Washington 334,506,463 9,147,420
6 China Beijing 1,449,357,022 9,388,211
7 Poland Warsaw 37,771,789 306,230
8 Russia Moscow 146,047,418 16,376,870
9 England London 68,529,747 241,930

Here’s the list of the questions that we will be focusing upon in this article:

  • How to get the last N rows of a Pandas DataFrame?
  • How to get last N rows from last N columns of a Pandas DataFrame?
  • How to read last N rows of a large csv file in Pandas?

Recommended Read: How to Select Rows From a DataFrame Based on Column Values?

Without further delay, let us dive into the solutions to the first question and learn how to get the last N rows of a Pandas DataFrame.

Method 1: Using iloc


Approach: Use the iloc property as pandas.DataFrame.iloc[-n:].

The iloc property is used to get or set the values of specified indices. Select the last n rows using the square bracket notation syntax [-n:] with the iloc property. Here, -n represents the index of the last n rows of the given pandas DataFrame.

Code:

import pandas as pd df = pd.read_csv('countries.csv')
rows = df.iloc[-5:]
print(rows)

Output:

 Country Capital Population Area
5 USA Washington 334,506,463 9,147,420
6 China Beijing 1,449,357,022 9,388,211
7 Poland Warsaw 37,771,789 306,230
8 Russia Moscow 146,047,418 16,376,870
9 England London 68,529,747 241,930

Method 2: Using tail()


Approach: Use the pandas.DataFrame.tail(n) to select the last n rows of the given DataFrame.

The tail(n) method returns n number of methods from the bottom end of the DataFrame. Here, n represents an integer that denotes the number of rows you want to fetch from the bottom end of the DataFrame.

Code:

import pandas as pd df = pd.read_csv('countries.csv')
rows = df.tail(5)
print(rows)

Output:

 Country Capital Population Area
5 USA Washington 334,506,463 9,147,420
6 China Beijing 1,449,357,022 9,388,211
7 Poland Warsaw 37,771,789 306,230
8 Russia Moscow 146,047,418 16,376,870
9 England London 68,529,747 241,930

Well, that brings us to the next question in line – “How to get the last N rows from last N columns of a Pandas DataFrame?”

Method 1: Integer Based Indexing


Approach: Call pandas.DataFrame.iloc[-n:, -m:] to display last n rows from the last m columns of the given DataFrame.

Code: In the following code snippet we will fetch the last 5 rows from the last 2 columns, i.e., Population and Area.

import pandas as pd df = pd.read_csv('countries.csv')
rows = df.iloc[-5:, -2:]
print(rows)

Output:

 Population Area
5 334,506,463 9,147,420
6 1,449,357,022 9,388,211
7 37,771,789 306,230
8 146,047,418 16,376,870
9 68,529,747 241,930

Method 2: Name Based Indexing


In case, you happen to know the names of the specific columns and you want to get the last N records from the DataFrame from those columns then you can follow a two step process.

  • Call the Pandas.DataFrame.loc(:, 'start_column_name':'end_column_name') selector. It allows you to use slicing on column names instead of integer identifiers which can be more comfortable.
  • .loc is for label based indexing. Hence, the negative indices are not found and reindexed to NaN. Thus, to deal with this you have to use the tail() method to extract the last N records from the selected columns.

Code: The following code snippet shows how you can use the column names and fetch the corresponding values from the last 5 rows of the given Dataframe.

import pandas as pd df = pd.read_csv('countries.csv')
rows = df.loc[:, 'Population':'Area']
print(rows.tail(5))

Output:

 Population Area
5 334,506,463 9,147,420
6 1,449,357,022 9,388,211
7 37,771,789 306,230
8 146,047,418 16,376,870
9 68,529,747 241,930

Last but not least, let us solve the third and final problem of today’s tutorial – “How to read last N rows of a large csv file in Pandas?

Unfortunately, read_csv() does not facilitate us with any parameter that allows you to directly read the last N lines from a file. This can be a troublesome issue to handle when you are dealing with large datasets.

Thus, a workaround to this problem is to first find out the total number of lines/records in the file. Then use the skiprows parameter to directly jump to the row/line from which you want to select the records.

Code: In the following code snippet we will fetch the first 5 rows from the csv file into our DataFrame.

import pandas as pd def num_of_lines(fname): with open(fname) as f: for i, _ in enumerate(f): pass return i + 1 num_lines = num_of_lines("countries.csv")
n = 5
df = pd.read_csv("countries.csv", skiprows=range(1, num_lines - n))
print(df)

Output:

 Country Capital Population Area
0 USA Washington 334,506,463 9,147,420
1 China Beijing 1,449,357,022 9,388,211
2 Poland Warsaw 37,771,789 306,230
3 Russia Moscow 146,047,418 16,376,870
4 England London 68,529,747 241,930

Conclusion


Phew! We have successfully solved all the problems that were presented to us in this tutorial.  I hope this tutorial helped you to sharpen your coding skills. Please stay tuned and subscribe for more interesting coding problems.

Recommended Reads:


Learn Pandas the Fun Way by Solving Code Puzzles


If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).

Coffee Break Pandas Book

It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?

Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.



https://www.sickgaming.net/blog/2022/07/...dataframe/
Reply



Possibly Related Threads…
Thread Author Replies Views Last Post
  [Tut] Making $65 per Hour on Upwork with Pandas xSicKxBot 0 1,319 05-24-2023, 08:16 PM
Last Post: xSicKxBot
  [Tut] Pandas Series Object – A Helpful Guide with Examples xSicKxBot 0 1,309 05-01-2023, 01:30 AM
Last Post: xSicKxBot
  [Tut] Python List of Tuples to DataFrame ? xSicKxBot 0 1,509 04-22-2023, 06:10 AM
Last Post: xSicKxBot
  [Tut] Dictionary of Lists to DataFrame – Python Conversion xSicKxBot 0 1,375 04-17-2023, 03:46 AM
Last Post: xSicKxBot
  [Tut] Pandas Boolean Indexing xSicKxBot 0 1,307 04-16-2023, 10:54 AM
Last Post: xSicKxBot
  [Tut] Python List of Dicts to Pandas DataFrame xSicKxBot 0 1,526 04-11-2023, 04:15 AM
Last Post: xSicKxBot
  [Tut] How to Create a DataFrame From Lists? xSicKxBot 0 1,220 12-17-2022, 03:17 PM
Last Post: xSicKxBot
  [Tut] How to Filter Data from an Excel File in Python with Pandas xSicKxBot 0 1,222 10-31-2022, 05:36 AM
Last Post: xSicKxBot
  [Tut] How to Convert Pandas DataFrame/Series to NumPy Array? xSicKxBot 0 1,213 10-24-2022, 02:13 PM
Last Post: xSicKxBot
  [Tut] How to Apply a Function to Each Cell in a Pandas DataFrame? xSicKxBot 0 1,075 08-23-2022, 05:25 PM
Last Post: xSicKxBot

Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016