4 minute read

In this post we will see how to apply a function along the axis of a dataframe using apply and applymap and how to map the values of a Series from one domain to another using map

When to use apply, applymap and map?

Apply: It is used when you want to apply a function along the axis of a dataframe, it accepts a Series whose index is either column (axis=0) or row (axis=1). For example: df.apply(np.square), it will give a dataframe with number squared

applymap: It is used for element wise operation across one or more rows and columns of a dataframe. It has been optimized and some cases work faster than apply but it’s good to compare it with apply before going for any heavier operation . Example: df.applymap(np.square), it will give a dataframe with number squared

map: It can be used only for a Series object and helps to substitutes the series value from the lookup dictionary, Series or a function and missing value will be substituted as NaN. Since it works only with series or dictionary so you can expect a better and optimized performance. Example: df[‘Col1’].map({‘Trenton’:’New Jersey’, ‘NYC’:’New York’, ‘Los Angeles’:’California’})

How to use Pandas apply?

First create a dataframe with three rows a,b and c and indexes A1,B1,C1

Create Dataframe

import pandas as pd
import numpy as np
df = pd.DataFrame({'a' : [ 10,20,30],'b' : [5,10,15],'c' : [10,100,1000]},index=['A1','B1','C1'])
df

Ouput:

Define functions

We will define two functions to apply to this dataframe, function multiply_by_2 will be applied across the column and multiply_col1_col2 will be applied across the rows of dataframe

def multiply_by_2(col):
    return col*2

def multiply_col1_col2(col):
    return col['a']*col['b']

Apply function across dataframe columns(axis=0)

Now apply the function multiply_by_2 across the columns, so by default the value of axis=0, so we have to just pass the function without axis parameter

df.apply(multiply_by_2)

Output:

All the cell values are doubled

Apply function across dataframe rows(axis=1)

Now apply the function multiply_col1_col2 across the rows of the dataframe. Here we have set the axis parameter as 1 (axis=1)

df.apply(multiply_col1_col2,axis=1)

It will return a series object with values obtained by multiplying col1 and col2 with the same indexes

A1     50
B1    200
C1    450
dtype: int64

Create a new Column col1xcol2 with the above series

Now we will create a new column using the above Series called as ‘Col1XCol2”

df['col1Xcol2'] = df.apply(multiply_col1_col2,axis=1)

Pandas apply function with Result_type parameter

It’s a parameter set to {expand, reduce or broadcast} to get the desired type of result. the default value is None

In the above scenario if result_type is set to broadcast then the output will be a dataframe substituted by the Col1xCol2 value

df.apply(multiply_col1_col2,axis=1,result_type='broadcast')

The results is broadcasted to the original shape of the frame, the original index and columns is retained

To understand result_type as expand and reduce we will first create a function that returns a list value

def multi_and_list(col):
    return [col['a']*2,col['b']*2,col['c']*2]

Now apply this function across the dataframe column with result_type as expand

df.apply(multi_and_list,axis=1,result_type='expand')

if result_type is set as expand then It returns a dataframe though the function returns a list.

result_type reduce is just opposite of expand and returns a Series if possible rather than expanding list-like results

How to use lambda with apply?

you can also use lambda expression with pandas apply function.

We will multiply the values at Col1 and Col2 as shown above using the lambda function. Since we have to apply this for each row so we will use axis=1

df.apply(lambda x: x['a']*x['b'],axis=1)

Output:

A1 50 B1 200 C1 450 dtype: int64

Pandas apply function with arguments

Many a times we have to pass an additional argument to a function and it’s a good news that You can also pass a positional argument and keyword argument to apply function.

Create a Function with argument

This function calculates the haversine distance between two geo-coordinates and takes a series of origin and dest lat and long and an additional argument radius(rad)

In the following section we will see how to pass this radius as an argument

from math import radians, cos, sin, asin, sqrt

def haversine(row,rad):

    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians
    lon1, lat1, lon2, lat2 = map(radians, [row['dest_long'], row['dest_lat'],row['orig_long'], row['orig_lat']])

    # haversine formula
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a))
    r = rad # Radius of earth in kilometers. Use 3956 for miles
    return c * r

Create dataframe with origin and destination city latitude and longitude

df_coords = pd.DataFrame({'orig_city':['New York','Charlotte','Boston','Bridgewater'],
'orig_lat':[40.7128,35.2271,42.36,40.594],
'orig_long':[74.006,80.843,71.0589,74.604],
'dest_city':['Trenton','Texas','Sunnyvale','San Jose'],
'dest_lat':[40.2206,31.9686,37.3688,37.3382],
'dest_long':[40.2206,31.9686,37.3688,37.3382]})
df_coords

Apply function with arguments

Now we will find haversine distance between origin and destination city in the above dataframe. So we will apply the haversine function defined above using the apply function.

In haversine function above rad is a required argument and the dataframe doesn’t have any radius column.

We will pass the radius values as args=(3956,) in the apply function as a positional argument. to calculate distance in miles

df['haversine_dist]=df_coords.apply(haversine,axis=1,args=(3956,))

We will add the calculated haversine distance as a new column

Apply function with keyword arguments (kwds)

We will pass the radius values as keyword argument as rad=3956 in the apply function to calculate distance in kilometer

df_coords['orig_dest_haver_dist']=df_coords.apply(haversine,axis=1,rad=3956)
df_coords

How to use Pandas applymap?

As defined above, it is used for element wise operation of a dataframe and a scalar value is returned for every elements

We will square each number in the above dataframe using lambda expression with applymap function

df.applymap(lambda x: x**2)

There are more vectorized way of doing this operation is available like df *2 which is much faster and optimized

How to use Pandas map?

Maps are used to map or substitute a value from a lookup table i.e. a dictionary, function or a series here.

I would suggest to read this detailed post on how to use pandas map function


Tags: , ,

Categories: , ,

Updated: