Pandas select rows and columns in MultiIndex dataframe
We want to select or slice the rows and columns of a MultiIndex dataframe. In this post we will take a look on how to slice the dataframe using the index at all levels of a row and column
A MultiIndex dataframe can have multi index for both rows and columns. The MultiIndex keeps all the defined levels of an index, even if they are not actually used.
There are four methods that could be used to select the rows/columns of MultiIndex dataframe
- Using loc and iloc, which are used to access group of rows and columns by labels and integers respectively
- Dataframe.xs, it returns cross-section from the Series/DataFrame.
- Dataframe.Query, we can query the columns of a DataFrame with a boolean expression.
- IndexSlice, It creates an object to more easily perform multi-index slicing.
We will be following the steps in this order to select rows and columns from a multiindex dataframe
- Create a MultiIndex Dataframe
- Use loc to slice the dataframe using labels
- Use iloc to slice the dataframe based on integer position of Indexes
- Using Slicers, It slice a MultiIndex by providing multiple indexers
- Use xs method, it takes a key argument to select data at a particular level of a MultiIndex
- query could be used to select rows based on conditions with help of boolean expression
- IndexSlice with default slice command to perform MultiIndex Slicing
Create MultiIndex Dataframe
We have first created a MultiIndex from the cartesian product of list of rows and columns and after that a dataframe is built using the multiIndex rows and columns.
The dataframe row index has two levels: Region and Division and similarly columns has two levels as well: Quarter(Q1 &Q2) and Buy & Sell.
import pandas as pd
import numpy as np
x = np.round(np.random.uniform(1, 5, size=(9, 4)), 2)
rowIndx = pd.MultiIndex.from_product(
[["East", "North", "South"], ["A", "B", "C"]],
names=["Region", "Division"],
)
colIndex = pd.MultiIndex.from_product(
[["Q1", "Q2"], ["Buy", "Sell"]]
)
multidf = pd.DataFrame(data=x, index=rowIndx, columns=colIndex)
multidf
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | A | 3.93 | 3.88 | 4.67 | 1.74 |
B | 3.88 | 1.05 | 2.95 | 1.06 | |
C | 3.77 | 4.82 | 2.79 | 1.07 | |
North | A | 4.89 | 1.57 | 2.74 | 1.30 |
B | 3.57 | 3.08 | 4.04 | 3.52 | |
C | 1.27 | 4.34 | 4.18 | 4.12 | |
South | A | 4.07 | 1.90 | 1.45 | 2.84 |
B | 4.60 | 1.27 | 2.81 | 3.96 | |
C | 3.25 | 1.85 | 2.47 | 1.46 |
Using loc and iloc to slice MultiIndex Dataframe
We want to slice the above dataframe using loc and iloc, let’s start with slicing just one single row by Region
Slice Row at level=0
The row label at level=0 i.e. Region is passed as list to slice the Region East row
multidf.loc[['East']]
This will just slice the row with Region East at level=0 and all rows at level=1
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | A | 3.93 | 3.88 | 4.67 | 1.74 |
B | 3.88 | 1.05 | 2.95 | 1.06 | C | 3.77 | 4.82 | 2.79 | 1.07 |
Slice Row and Column at level=0
We can slect all the rows starting Region North and just column Q1
multidf.loc['NORTH':, :'Q1':]
Q1 | |||
---|---|---|---|
Region | Division | Buy | Sell |
North | A | 4.89 | 1.57 |
B | 3.57 | 3.08 | C | 1.27 | 4.34 |
South | A | 4.07 | 1.90 |
B | 4.60 | 1.27 | |
C | 3.25 | 1.85 |
Slice Column at level=1
We want to slice with column index at level=1, so in this case we would like to see all the rows and just two columns Q1-Sell and Q2-Buy
multidf.loc[:,('Q1', 'Sell'):('Q2', 'Buy')]
Q1 | Q2 | ||
---|---|---|---|
Region | Division | Sell | Buy |
East | A | 3.88 | 4.67 |
B | 1.05 | 2.95 | C | 4.82 | 2.79 |
North | A | 1.57 | 2.74 |
B | 3.08 | 4.04 | C | 4.34 | 4.18 |
South | A | 1.90 | 1.45 |
B | 1.27 | 2.81 | C | 1.85 | 2.47 |
Slice row at level=1
Using iloc, we can select the rows and columns based on the integer position at all the levels of rows and columns, Let’s see how to do that
We want every second row at level=1 of row index Region and one column at level=1 i.e. Q2-Sell
multidf.iloc[::2,3:4]
This will skip every other row and will select the column at 3rd position at level=1. You can see the row East-B is skipped and thereafter North A & C is skipped.
Q2 | ||
---|---|---|
Region | Division | Sell |
East | A | 1.74 |
C | 1.07 | |
North | B | 3.52 |
South | A | 2.84 |
C | 1.46 |
Using slicers to slice a MultiIndex by multiple indexers
We can provide any of the selectors as if we are indexing by label, and we can use slice(None) to select all the content of that level and don’t have to specify all the deeper levels
Let’s understand how to use slicer with an example, we want to slice the rows with Region East & North at level=0 and rows B & C at level=1
multidf.loc[(slice('East', 'North'), slice('B', 'C')), :]
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | |||||
B | 3.88 | 1.05 | 2.95 | 1.06 | C | 3.77 | 4.82 | 2.79 | 1.07 |
North | |||||
B | 3.57 | 3.08 | 4.04 | 3.52 | |
C | 1.27 | 4.34 | 4.18 | 4.12 |
Another Example, here we want to slice the East region and column Q1-Buy
multidf.loc[(slice('East')), (slice('Q1'),slice('Buy'))]
Q1 | |||
---|---|---|---|
Region | Division | Buy | |
East | A | 3.93 | |
B | 3.88 | ||
C | 3.77 |
If you want both Q1 & Q2 buy columns
multidf.loc[(slice('East')), (slice('Q1','Q2'),slice('Buy'))]
Q1 | Q2 | ||
---|---|---|---|
Region | Division | Buy | Buy |
East | A | 3.93 | 4.67 |
B | 3.88 | 2.95 | C | 3.77 | 2.79 |
North | A | 4.89 | 2.74 |
B | 3.57 | 4.04 | |
C | 1.27 | 4.18 |
Using xs to slice a MultiIndex dataframe
This method(xs) returns a cross-section from the Series/DataFrame. It takes a key argument to select data at a particular level of a MultiIndex.
we can also specify the level to indicate which levels are used. Levels can be referred by label or position
Select rows for Region East, we can also specify the level=0 in this case
multidf.xs('East', level=0)
Q1 | Q2 | |||
---|---|---|---|---|
Division | Buy | Sell | Buy | Sell |
A | 3.93 | 3.88 | 4.67 | 1.74 |
B | 3.88 | 1.05 | 2.95 | 1.06 | C | 3.77 | 4.82 | 2.79 | 1.07 |
we can also slice by multiple levels by passing levels as list, here we want to select the rows with Region North at level=0 and Division C at level=1
multidf.xs(('North', 'C'), level=[0,1])
Q1 | Q2 | ||
---|---|---|---|
Region | Division | Sell | Buy |
North | C | 4.34 | 4.18 |
We can also slice columns, Here we want only Q1-Buy so the column index level is passed as [0, 1] and axis=1
multidf.xs(('Q1', 'Buy'), level=[0,1], axis=1)
Q1 | |||
---|---|---|---|
Region | Division | Buy | |
East | A | 3.93 | |
B | 3.88 | C | 3.77 |
North | A | 4.89 | |
B | 3.57 | C | 1.27 |
South | A | 4.07 | |
B | 4.60 | ||
C | 3.25 |
Using query to slice a MultiIndex dataframe based on condition
We want to query the columns of a DataFrame with a boolean expression. It takes an expression to be evaluated as parameter and returns the DataFrame resulting from the provided query expression.
We want to get all the rows where division is ‘A’
multidf.query("Division == 'A'")
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | A | 3.93 | 3.88 | 4.67 | 1.74 |
North | A | 4.89 | 1.57 | 2.74 | 1.30 |
South | A | 4.07 | 1.90 | 1.45 | 2.84 |
Select all the rows in Division A & B
multidf.query("Division.isin(['A', 'B'])")
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | A | 3.93 | 3.88 | 4.67 | 1.74 |
B | 3.88 | 1.05 | 2.95 | 1.06 | |
North | A | 4.89 | 1.57 | 2.74 | 1.30 |
B | 3.57 | 3.08 | 4.04 | 3.52 | |
South | A | 4.07 | 1.90 | 1.45 | 2.84 |
Using get_level_values()
This method returns an Index of values for requested level and useful to get an individual level of values from a MultiIndex
Select all rows where division is A, you can pass the label as parameter to get_level_values() method
multidf[multidf.index.get_level_values('Division') == 'A']
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | A | 3.93 | 3.88 | 4.67 | 1.74 |
North | A | 4.89 | 1.57 | 2.74 | 1.30 |
South | A | 4.07 | 1.90 | 1.45 | 2.84 |
We can also filter the column index, here we want all the Buy columns in the dataframe, you can also pass the index integer position as parameter
multidf.loc[:, multidf.columns.get_level_values(1) == 'Buy']
Q1 | Q2 | ||
---|---|---|---|
Region | Division | Buy | Buy |
East | A | 3.93 | 4.67 |
B | 3.88 | 2.95 | C | 3.77 | 2.79 |
North | A | 4.89 | 2.74 |
B | 3.57 | 4.04 | C | 1.27 | 4.18 |
South | A | 4.07 | 1.45 |
B | 4.60 | 2.81 | |
C | 3.25 | 2.47 |
Select rows based on conditions
Let’s select the rows based on conditions, we will use get_level_values() and loc methods to filter the dataframe
We will first define our MultiIndex condition and save it in a variable, here we want Q1-Sell>2.5 and Q1-Buy>4 and Region is North and South
condition = ((multidf[( 'Q1', 'Sell')]>2.5)
&(multidf[( 'Q1', 'Buy')]>4)
&(multidf.index.get_level_values(0).isin(['South', 'North'])))
multidf[condition]
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
North | A | 4.89 | 1.57 | 2.74 | 1.30 |
South | C | 3.25 | 1.85 | 2.47 | 1.46 |
Alternatively we can also use np.where() to filter the rows based on condition.
The numpy.where() function just returns the index of all the matching rows which are evaluated true for the condition.
multidf.iloc[np.where(condition)]
Another example where we want row where Q-Sell is maximum. we are using argmax that returns the index of the maximum value along an axis
multidf.iloc[np.argmax(multidf[( 'Q1', 'Sell')])]
Out:
Q1 Buy 1.08
Sell 4.54
Q2 Buy 4.73
Sell 2.19
Name: (North, C), dtype: float64
Using IndexSlice
It creates an object to perform multi-index slicing, which means we don’t have to construct the slices on our own.
All usages of colons : are converted into slice object. If multiple arguments are passed to the index operator, the arguments are turned into n-tuples
idx = pd.IndexSlice
multidf.loc[idx[:, 'A':'B'], :]
It returns all the Division rows A and B for all Regions
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | A | 3.93 | 3.88 | 4.67 | 1.74 |
B | 3.88 | 1.05 | 2.95 | 1.06 | |
North | A | 4.89 | 1.57 | 2.74 | 1.30 |
B | 3.57 | 3.08 | 4.04 | 3.52 | |
South | A | 4.07 | 1.90 | 1.45 | 2.84 |
B | 4.60 | 1.27 | 2.81 | 3.96 |
So IndexSlice requires you to specify enough levels of the MultiIndex to remove an ambiguity
Here we are slicing by Region and Division both
idx = pd.IndexSlice
multidf.loc[idx['East':'North', 'A':'B'], :]
Q1 | Q2 | ||||
---|---|---|---|---|---|
Region | Division | Buy | Sell | Buy | Sell |
East | A | 3.93 | 3.88 | 4.67 | 1.74 |
B | 3.88 | 1.05 | 2.95 | 1.06 | |
North | A | 4.89 | 1.57 | 2.74 | 1.30 |
B | 3.57 | 3.08 | 4.04 | 3.52 |
we can also slice the column, here we want only Q1 at level=0 for all the Regions and Divisions
idx = pd.IndexSlice
multidf.loc[:, :'Q1']
Q1 | |||
---|---|---|---|
Region | Division | Buy | Sell |
East | A | 3.93 | 3.88 |
B | 3.88 | 1.05 | C | 3.77 | 4.82 |
North | A | 4.89 | 1.57 |
B | 3.57 | 3.08 | C | 1.27 | 4.34 |
South | A | 4.07 | 1.90 |
B | 4.60 | 1.27 | |
C | 3.25 | 1.85 |