Sort Pandas Dataframe and Series

7 minute read

Sorting a dataframe by row and column values or by index is easy a task if you know how to do it using the pandas and numpy built-in functions

However sometimes you may find it confusing on how to sort values by two columns, a list of values or reset the index after sorting.

In this post we will learn sorting a dataframe and Series using the following functions

a) sort_values b) sort_index c) Categorical Series d) numpy sort and argsort e) Reindex f) And Sorted() function

Let’s create a dataframe of 11 counties with their CO2 emission and population and a column for the continent they belong to

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'Country':['USA','China','India','Russia','Switzerland','Japan','Sweden','Singapore','South Korea','UK','Australia'],
    'Co2_emission':[5107.393,10877.218,2454.774,1764.866,39.738,1320.776,np.NaN,55.018,673.324,379.150,764.866],
    'Population_million':[329,1433,1366,145,8.5,126,10,5.8,51,67,np.NaN],
    'Continent': ['NA','Asia','Asia','EU','EU','Asia','EU','Asia','Asia','EU','AU']
                 })

	Country	Co2_emission	Population_million	Continent
0	USA	5107.393	329.0	NA
1	China	10877.218	1433.0	Asia
2	India	2454.774	1366.0	Asia
3	Russia	1764.866	145.0	EU
4	Switzerland	39.738	8.5	EU
5	Japan	1320.776	126.0	Asia
6	Sweden	NaN	10.0	EU
7	Singapore	55.018	5.8	Asia
8	South Korea	673.324	51.0	Asia
9	UK	379.150	67.0	EU
10	Australia	764.866	NaN	AU

Sort dataframe by column Values

Here we are sorting the dataframe by column values i.e. population_in_million

sort_values functions sorts the dataframe by the column values provided as the first argument and setting ascending as True/False will display the results in that order

df.sort_values('Population_million',ascending = False)

	Country	Co2_emission	Population_million	Continent
1	China	10877.218	1433.0	Asia
2	India	2454.774	1366.0	Asia
0	USA	5107.393	329.0	NA
3	Russia	1764.866	145.0	EU
5	Japan	1320.776	126.0	Asia
9	UK	379.150	67.0	EU
8	South Korea	673.324	51.0	Asia
6	Sweden	NaN	10.0	EU
4	Switzerland	39.738	8.5	EU
7	Singapore	55.018	5.8	Asia
10	Australia	764.866	NaN	AU

Sort Dataframe with NaN on Top or Bottom

The column contains the NaN value and you can choose to display the NaN values either on Top or Bottom row of the sorted dataframe

na_position parameters you can set as first or last to push those values on the top or the bottom of dataframe

df.sort_values('Population_million',ascending = False,na_position = 'first')

	Country	Co2_emission	Population_million	Continent
10	Australia	764.866	NaN	AU
1	China	10877.218	1433.0	Asia
2	India	2454.774	1366.0	Asia
0	USA	5107.393	329.0	NA
3	Russia	1764.866	145.0	EU
5	Japan	1320.776	126.0	Asia
9	UK	379.150	67.0	EU
8	South Korea	673.324	51.0	Asia
6	Sweden	NaN	10.0	EU
4	Switzerland	39.738	8.5	EU
7	Singapore	55.018	5.8	Asia

Sort dataframe rows by multiple columns

You can also provide the list of columns to be sorted and their order also a list of boolean

Here we are sorting the dataframe first by column CO2_emission and then by Population_million and we want the order of the CO2_emission in ascending and population_million in Descending

df.sort_values(by = ['Co2_emission','Population_million'],ascending = [True,False])

	Country	Co2_emission	Population_million	Continent
4	Switzerland	39.738	8.5	EU
7	Singapore	55.018	5.8	Asia
9	UK	379.150	67.0	EU
8	South Korea	673.324	51.0	Asia
10	Australia	764.866	NaN	AU
5	Japan	1320.776	126.0	Asia
3	Russia	1764.866	145.0	EU
2	India	2454.774	1366.0	Asia
0	USA	5107.393	329.0	NA
1	China	10877.218	1433.0	Asia
6	Sweden	NaN	10.0	EU

Sort text column by Alphabetical order

if there is a text column and you want to sort it just alphabetically then applying sort_values on that column will do so

df.sort_values(by='Country')

	Country	Co2_emission	Population_million	Continent
10	Australia	764.866	NaN	AU
1	China	10877.218	1433.0	Asia
2	India	2454.774	1366.0	Asia
5	Japan	1320.776	126.0	Asia
3	Russia	1764.866	145.0	EU
7	Singapore	55.018	5.8	Asia
8	South Korea	673.324	51.0	Asia
6	Sweden	NaN	10.0	EU
4	Switzerland	39.738	8.5	EU
9	UK	379.150	67.0	EU
0	USA	5107.393	329.0	NA

Sort by Custom list or Dictionary using Categorical Series

Here we wanted to sort the dataframe by the continent column but in a particular custom order and not alphabetically

That custom order can be a list or the keys of a dictionary

[ “NA”, “Asia”, “AU”, “EU”])

{“NA”: 0, “Asia”: 1, “AU”: 2, “EU”: 3}

Pandas has an excellent feature called Categorical Series using which we can sort the column values in the pre-defined order

First make the Continent column a categorical and specify the ordering to use by passing a list value as the categories argument

df [ 'Continent'] = pd.Categorical(df['Continent'], categories=["NA", "Asia", "AU", "EU"])

Finally, sort the values by Continent column and it will order the rows as per the custom list passed in the categories column

df.sort_values(by='Continent')

Country	Co2_emission	Population_million	Continent
USA	5107.393	329.0	NA
China	10877.218	1433.0	Asia
India	2454.774	1366.0	Asia
Japan	1320.776	126.0	Asia
Singapore	55.018	5.8	Asia
South Korea	673.324	51.0	Asia
Australia	764.866	NaN	AU
Russia	1764.866	145.0	EU
Switzerland	39.738	8.5	EU
Sweden	NaN	10.0	EU
UK	379.150	67.0	EU

Sort dataframe based on a custom list

It’s not necessary that everytime you use the sort_values function for sorting

Here is a unique case if you have list of countries and want to arrange the dataframe rows based on the elements inside this list

First set the index of the dataframe to Country column and then pass the list as argument to the reindex function and that will Conform DataFrame to new index as per the list

reorderlist = ['China','Russia','Australia','India','Japan','Singapore','South Korea','UK','USA','Sweden','Switzerland']
df.set_index('Country',inplace=True)
df.reindex(reorderlist)

Country	Co2_emission	Population_million	Continent
China	10877.218	1433.0	Asia
Russia	1764.866	145.0	EU
Australia	764.866	NaN	AU
India	2454.774	1366.0	Asia
Japan	1320.776	126.0	Asia
Singapore	55.018	5.8	Asia
South Korea	673.324	51.0	Asia
UK	379.150	67.0	EU
USA	5107.393	329.0	NA
Sweden	NaN	10.0	EU
Switzerland	39.738	8.5	EU

reset index after sorting dataframe

When you sort the dataframe the index are moved up and down and to re-arrange the index you can use reset_index function and set drop as True to drop the index column which will be inserted as a new column

df.sort_values('Population_million',ascending = False).reset_index(drop=True)

	Country	Co2_emission	Population_million	Continent
0	China	10877.218	1433.0	Asia
1	India	2454.774	1366.0	Asia
2	USA	5107.393	329.0	NA
3	Russia	1764.866	145.0	EU
4	Japan	1320.776	126.0	Asia
5	UK	379.150	67.0	EU
6	South Korea	673.324	51.0	Asia
7	Sweden	NaN	10.0	EU
8	Switzerland	39.738	8.5	EU
9	Singapore	55.018	5.8	Asia
10	Australia	764.866	NaN	AU

Sort dataframe by date column

sort_values functions let you sort the dataframe by dates also

Let’s create a column with date range

df['Date']=pd.date_range(start='1/12/2019', end='1/22/2019', freq='D')

Sort the column by Date in descending order

df.sort_values('Date',ascending=False,inplace=True)

	Country	Co2_emission	Population_million	Continent	Date
10	Australia	764.866	NaN	AU	2019-01-22
9	UK	379.150	67.0	EU	2019-01-21
8	South Korea	673.324	51.0	Asia	2019-01-20
7	Singapore	55.018	5.8	Asia	2019-01-19
6	Sweden	NaN	10.0	EU	2019-01-18
5	Japan	1320.776	126.0	Asia	2019-01-17
4	Switzerland	39.738	8.5	EU	2019-01-16
3	Russia	1764.866	145.0	EU	2019-01-15
2	India	2454.774	1366.0	Asia	2019-01-14
1	China	10877.218	1433.0	Asia	2019-01-13
0	USA	5107.393	329.0	NA	2019-01-12

Sort dataframe by datetime index using sort_index

You can also sort the df by datetime index using the sort_index function

First set the date column created above as an index and then sort the index

df.set_index('Date').sort_index()

|Date|Country|Co2_emission|Population_million|Continent| |— |— |— |— |— | |2019-01-12|USA|5107.393|329.0|NA| |2019-01-13|China|10877.218|1433.0|Asia| |2019-01-14|India|2454.774|1366.0|Asia| |2019-01-15|Russia|1764.866|145.0|EU| |2019-01-16|Switzerland|39.738|8.5|EU| |2019-01-17|Japan|1320.776|126.0|Asia| |2019-01-18|Sweden|NaN|10.0|EU| |2019-01-19|Singapore|55.018|5.8|Asia| |2019-01-20|South Korea|673.324|51.0|Asia| |2019-01-21|UK|379.150|67.0|EU| |2019-01-22|Australia|764.866|NaN|AU|

Sort dataframe by row values

Here we will see how to sort the dataframe by a specific row values

Let’s create a dataframe of numbers first

df = pd.DataFrame(data={'x':[28,10,90], 'y':[45,58,67],'z':[19,82,37]}, index=['a', 'b', 'c'])

	x	y	z
a	28	45	19
b	10	58	82
c	90	67	37

Sort row values using sort_values

We are sorting the above dataframe based on the values in row a and by default it is ascending order

df.sort_values(by='a', axis=1)

	z	x	y
a	19	28	45
b	82	10	58
c	37	90	67

Note: In all the below cases the output will be same as the above table

Sort row values using numpy

We can also use numpy sort and argsort to achieve the same result as above

df.iloc[:, np.argsort(df.loc['a'])]

 df.apply(np.sort, axis = 1)

Alternatively, if you want to reverse the order then

df[['x', 'y', 'z']] = np.sort(df)[:, ::-1]

Sort row values using reindex

We can also use reindex and sorted function with a lambda as a key to sort the dataframe based on values in row a

df.reindex(sorted(df.columns, key=lambda x: df[x]['a']), axis=1)

Conclusion:

So we have reached to the end of this blog post and just wanted to highlight few key points that we have learn in this blog post

Sorting of the dataframe by single, multiple column values and arranging the sorted columns in ascending and descending order
How to use sort_values functions and the arguments like na_position, ascending etc.
Sorting columns based on a custom list or dictionary and using Pandas Categorical Series and reindex
How to Sort with date column and datetime index using sort_index
Sorting based on a specific row values by different methods like numpy argsort, sort_values and reindex

if there is any additional function which you know can be used to sort the rows and column values in a dataframe then please leave your comments in the comment section below

Share on

Twitter Facebook LinkedIn