Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Pandas melt() – A Simple Guide with Video

#1
Pandas melt() – A Simple Guide with Video

In this tutorial, we will learn how to use the Pandas melt() function which turns a wide data frame into a long one. The function unpivots selected parts of the data frame, so these columns get turned into rows.




Here are the parameters from the official documentation:


Parameter Type Description
id_vars tuple, list, or ndarray, optional Column(s) to use as identifier variables.
value_vars tuple, list, or ndarray, optional Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_name scalar Name to use for the variable column. If None it uses frame.columns.name or ‘variable’.
value_name scalar, default ‘value’ Name to use for the value column.
col_level int or str, optional If columns are a MultiIndex then use this level to melt.
ignore_index bool, default True If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.

Return Value: The pd.melt() function returns an unpivoted DataFrame.

Basic Example


We will start with an introductory example to get a basic understanding of the Pandas melt() function:

import pandas as pd
df = pd.DataFrame({'Student': ['Alice', 'Mary', 'Bob'], 'Major': ['Biology', 'CS', 'CS'], 'Age': [21, 20, 25]})
df

Student Major Age
0 Alice Biology 21
1 Mary CS 20
2 Bob CS 25

First, we import the Pandas library. Then we create a Pandas data frame that contains information about students. Finally, we output the data frame. We see each student’s name, major, and age.

Now, we apply the Pandas melt() function:

pd.melt(df, id_vars=['Student'], value_vars=['Major'])

Student variable value
0 Alice Major Biology
1 Mary Major CS
2 Bob Major CS

Inside the function, we put in the data frame and we apply the parameters “id_vars” and “value_vars“.

  • The “id_vars” parameter expects a list of the columns to use as the identifier variables. These are the columns that remain unchanged. For this example, we choose the “Student” column.
  • The “value_vars” parameter expects a list of the columns to be unpivoted. Or in other words: the columns that get transformed into rows. We select the “Major” column to be converted into rows.

The output shows a modified data frame. The “Student” column remains the same as before. But now, we neither have the “Major” column, nor the “Age” column and instead, we have the columns “variable” and “value“.

The “variable” column’s elements all say “Major” because we unpivoted the “Major” column and turned it into row values. The “value” column contains the respective “Major” value for each student.

Compared to the initial data frame, we see that the “Age” column is now completely missing. That’s because we did not incorporate it into the melt() function.

Unpivot Multiple Columns


In the previous example, we did not include the “Age” column because we did not assign it to the “id_vars” parameter or the “value_vars” parameter. All columns that are not assigned to one of these parameters are dropped.

But since the “value_vars” parameter expects a list of columns and not necessarily one column only, we can apply multiple columns to the parameter:

pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'])

Student variable value
0 Alice Major Biology
1 Mary Major CS
2 Bob Major CS
3 Alice Age 21
4 Mary Age 20
5 Bob Age 25

Here, we run the melt() function the same way as before. But this time, we assign two columns to the “value_vars” parameter, namely the “Major” column and the “Age” column. This way, we unpivot two columns instead of one.

As we can see, the data frame now has twice as many rows as before because the “variable” column now contains two unpivoted columns, the “Major” column, and the “Age” column. Thus, the “value” column now contains values for both unpivoted columns.

The column that we use as the identifier variable (the “Student” column) holds every student’s name two times. That’s because for each student we now have a major and age in two separate rows since the “Major” column and the “Age” column were unpivoted.

For example, Alice majors in Biology and is 21 years old. These are two separate rows in the new data frame, whereas this was only one row in the initial one.

Ignore the Index


In the last section, we created a data frame that was twice as long as the initial one. The longer data frame has its own indexes as we can see here:


Student variable value
0 Alice Major Biology
1 Mary Major CS
2 Bob Major CS
3 Alice Age 21
4 Mary Age 20
5 Bob Age 25

But maybe, we want to keep the original indexes to keep track of the size of the original data frame.

We achieve that by applying the “ignore_index” parameter and assigning it “False“. In the section above this parameter was set to “True” by default thus creating new indexes and representing the length of the new data frame.

If we set it to “False“, this is what we get:

pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'], ignore_index=False)

Student variable value
0 Alice Major Biology
1 Mary Major CS
2 Bob Major CS
0 Alice Age 21
1 Mary Age 20
2 Bob Age 25

The rest of the melt() function remains the same. We only change the “ignore_index” parameter. Now, we keep the original indexes.

For example, both “Alice” rows now have the index “0” because, in the original data frame, the “Alice” row’s index was “0” as well, whereas in the previous data frame the first “Alice” row’s index was set to “0” and the second “Alice” row’s index was set to “3”.

Multiple Identifier Columns


We already saw how we can unpivot multiple columns. We achieved that by applying multiple column names to the “value_vars” parameter which were then unpivoted.

The “id_vars” parameter also expects a list of columns, hence we can apply multiple columns here, too.

pd.melt(df, id_vars=['Student', 'Major'], value_vars=['Age'])

Student Major variable value
0 Alice Biology Age 21
1 Mary CS Age 20
2 Bob CS Age 25

We assign two columns to the “id_vars” parameter: The “Student” column and the “Major” column. So, these columns remain unchanged.

The “value_vars” parameter gets assigned the column “Age“. That means, this column gets unpivoted and thus converted into row values. So, the “variable” column’s elements all say “Age” and the “value” column contains the respective “Age” values of each student.

Applying multiple columns to both the “id_vars” parameter as well as the “value_vars” parameter makes the melt() function very flexible since we can unpivot our data frame in lots of different ways.

Label the “variable” and “value” columns


By now, when using the melt() function, the resulting data frame always contained two new columns, “variable” and “value“. These names can be confusing since in some data frames it might not be obvious what the columns contain.

Luckily, the melt() function provides us with the parameters “var_name” and “value_name“. Using these parameters, we can label the resulting “variable” and “value” columns by assigning them the desired name as a string value.

pd.melt(df, id_vars=['Student', 'Major'], value_vars=['Age'], var_name='Age column', value_name='Age values')

Student Major Age column Age values
0 Alice Biology Age 21
1 Mary CS Age 20
2 Bob CS Age 25

We apply the same melt() function as in the previous section. But this time, we use the naming parameters and assign the “var_name” parameter the string "Age column" and the “value_name” parameter the value "Age values". The resulting data frame contains these new column labels.

In this example, it was relatively easy to name the columns since we only had one unpivoted column (the “Age” column). So, we simply named the new columns after the “Age” column.

However, when we have multiple unpivoted columns, the naming process is not so straightforward. We have to find comprehensible labels.

Let’s have a look at the melt() function example from above with multiple columns to unpivot:

pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'])

Student variable value
0 Alice Major Biology
1 Mary Major CS
2 Bob Major CS
3 Alice Age 21
4 Mary Age 20
5 Bob Age 25

The “variable” column contains the values “Major” and “Age“. And the “value” parameter contains the respective values for these variables. Consequently, we need to find names that fit here properly:

pd.melt(df, id_vars=['Student'], value_vars=['Major', 'Age'], var_name='Major/Age column', value_name='Major/Age values')

Student Major/Age column Major/Age values
0 Alice Major Biology
1 Mary Major CS
2 Bob Major CS
3 Alice Age 21
4 Mary Age 20
5 Bob Age 25

We name the “variable” column "Major/Age column" and the “value” column "Major/Age values".

It is critical to label these columns properly, so we know what they contain.

Summary


All in all, we learned how to use the melt() function and how to apply its various parameters. We saw how we can unpivot one or more columns, how to handle indexes, how to set multiple identifier columns, and how to label the newly created columns.

The melt() function allows us to unpivot our data frames in several ways.

For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter blog page.

Happy Coding!



https://www.sickgaming.net/blog/2021/11/...ith-video/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

[-]
Discord

[-]
Active Threads
(Indie Deal) Start saving early with Gea...
Last Post: xSicKxBot
Today 05:23 PM
» Replies: 0
» Views: 6
Mobile - Suda 51 has had meetings with M...
Last Post: xSicKxBot
Today 05:23 PM
» Replies: 0
» Views: 1
AppleInsider - Apple fined $5.6M for fai...
Last Post: xSicKxBot
Today 05:23 PM
» Replies: 0
» Views: 1
Xbox Wire - Hitman Trilogy Brings the Wo...
Last Post: xSicKxBot
Today 05:22 PM
» Replies: 0
» Views: 2
News - Godot 4 First Alpha Release
Last Post: xSicKxBot
Today 05:22 PM
» Replies: 0
» Views: 1
News - Wordle Fans, Beware: A Twitter Bo...
Last Post: xSicKxBot
Today 05:22 PM
» Replies: 0
» Views: 1
News - Japanese Magazine Nintendo Dream ...
Last Post: xSicKxBot
Today 05:47 AM
» Replies: 0
» Views: 35
News - Magnum Quest tier list and reroll...
Last Post: xSicKxBot
Today 05:47 AM
» Replies: 0
» Views: 4
News - Watch Dogs: Legion Has No More Up...
Last Post: xSicKxBot
Today 05:47 AM
» Replies: 0
» Views: 5
[Tut] Find Index of Last Substring Occur...
Last Post: xSicKxBot
Yesterday 02:21 PM
» Replies: 0
» Views: 38

[-]
Twitter



Discord Server © SickGaming.net 2012-2021