4 minute read

In this post we will see how to convert the column into rows, rows into columns, transpose one column into multiple columns and how to Pivot/Unpivot the dataframe using the following useful pandas functions:

  • Melt
  • Pivot and Pivot_table
  • stack and Unstack
  • Wide_to_long

The above functions are useful to massage a DataFrame and rotates data from rows to columns or vice-versa, possibly aggregating multiple source values into the same target row and column intersection

This post will be helpful for someone who want to format the data from wide to long or long to wide for any reporting purpose or computing the custom aggrgations

Convert column into rows

Let’s create a dataframe of Quarterly sales figure of different locations

import pandas as pd
import numpy as np

df = pd.DataFrame({
'location': ['AL', 'TX', 'MO'],
'area': ['JN', 'AU', 'JC'],
'q1': [44, 25, 36],
'q2': [29, 32, 39],
'q3': [34, 43, 24],})

df
  location Area Q1 Q2 Q3
0 AL JN 44 29 34
1 TX AU 25 32 43
2 MO JC 36 39 24

a) pd.melt()

We want to unpivot the above dataframe and convert the columns into rows, Basically we want to format this dataframe from wide to long, We can use pandas.melt() function to achive this

df.melt(id_vars=["location", "area"], 
        var_name="quarter", 
        value_name="sales")
  .sort_values(['location', 'quarter'])
  .reset_index(drop=True)

What are the parameters of the melt function?

Id_vars: It is a list of column names that will repeated for all the quarterly sales and that’s why they are got ID variables that represent the sales figure i.e. location and area in this case

Var_name: The new column name that will be created after reshaping the dataframe, In this case we want to name it quarter and it will have the values as q1, q2 and q3

Value_name: The name of the value column, we want to name this column as sales for this dataframe

Out:

Here is the reshaped output of the dataframe:

  location area quarter sales
0 AL JN q1 44
1 AL JN q2 29
2 AL JN q3 34
3 MO JC q1 36
4 MO JC q2 39
5 MO JC q3 24
6 TX AU q1 25
7 TX AU q2 32
8 TX AU q3 43

b) df.stack()

We could also use stack to get the same result as above, the stack function stack the prescribed level(s) from columns to index.

It returns a reshaped DataFrame having a multi-level index with one or more new inner-most levels compared to the original dataFrame

(df.set_index(["location", "area"])
         .stack()
         .reset_index(name='sales')
         .rename(columns={'level_2':'quarter'}))

c) pd.wide_to_long()

we could also use wide_to_long() function to Unpivot a DataFrame from wide to long format. It is less flexible but more user-friendly than melt

(pd.wide_to_long(df, stubnames=['q'], i='location', j='quarter')
.reset_index().sort_values(['location', 'quarter'])
.rename(columns={'q':'sales'}))

What are the parameters of the wide_to_long function?

stubnames: The name of the new column that represent one or more group of columns with format in the original dataframe, In our case the group of columns are Q1,Q2,Q3

i: Each row of these wide variables are assumed to be uniquely identified by i, It can be a single or list of column names

j: It is the name of the column that specify what you want to call the suffix in the stubnames, In this case Q1, Q2, Q3 represent the Quarter

Out:

Here is the resulting dataframe, we’ve renamed the column q to sales to make the dataframe looks similar to the above output dataframe

  location quarter area Sales
0 AL 1 JN 44
3 AL 2 JN 29
6 AL 3 JN 34
2 MO 1 JC 36
5 MO 2 JC 39
8 MO 3 JC 24
1 TX 1 AU 25
4 TX 2 AU 32
7 TX 3 AU 43

Transpose one column into multiple columns

We want to pivot the resulting dataframes above back to their original dataframes, which is converting the rows into columns.

Let’s create a dataframe similar to the original dataframe above

df = pd.DataFrame({
'location': ['TX', 'TX', 'TX'],
'area': ['HOU', 'PAL', 'DAL'],
'sales': [72, 59, 63],})
  location area sales
0 TX HOU 72
1 TX PAL 59
2 TX DAL 63

a) pd.pivot()

we can pivot this dataframe using pivot_table or pivot function that will create a spreadsheet-style pivot table as a DataFrame and the data is transformed and stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

df.pivot_table(index='location', columns='area', values='sales')

#OR

df.pivot(index='location', columns='area', values='sales')

What are the parameters of the pivot() function?

Index: grouper or the column which you want to aggregate

Columns: grouper or the column, to be converted from rows to columns

Value: The value of the Index and Column that you want to display

Out:

Output Dataframe

area DAL HOU PAL
location      
TX 63 72 59

b) pd.unstack()

we can pivot at the level of index labels using unstack() function. It returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels

df.set_index(['location','area']).unstack()

Out:

Output Dataframe

  sales    
area DAL HOU PAL
location      
TX 63 72 59

Tags: ,

Categories: ,

Updated: