5 minute read

We have dataframe with dates or timestamps columns and we would like to filter the rows by Month, Hour, day or by last n days from today’s date.

Pandas has a dt accessor object for datetimelike properties of the series and can be used to access the properties from Timestamp or a collection of timestamps like a DatetimeIndex.

if you need a refresher on working with datetime in pandas then check this out

Here are some of these datetime Index properties that we will check out in this post:

  • week
  • weekday
  • day_name
  • is_month_end
  • is_month_start
  • is_year_end
  • is_year_start

Let’s get started, we will first build a dataframe with date as an index

Create Dataframe with DatetimeIndex

We have first created a series of date range by using pd.date_range() that returns a fixed frequency DatetimeIndex and gives the range of equally spaced time points.

The start date is 1-JAN-2020 and end date is 24-DEC-2020 13:27:00, with a period of 37 min, so the total length of date series was 13955, you can expect a dataframe of these many rows.

The second column in dataframe is value and that has same length as the date column.

We have set up the date column as the Datetimeindex of the dataframe, which is Immutable ndarray-like of datetime64 data.

import pandas as pd
import numpy as np

dates = pd.date_range(start='01-JAN-2020', end='24-DEC-2020 13:27:00', freq='37 min')

df=pd.DataFrame({'date':dates, 'value':np.random.uniform(-1,0,len(dates))})
df.set_index('date', inplace=True)
df.sample(2)

This is how our dataframe looks like

date value
2020-08-20 08:31:00 46.53
2020-08-15 07:02:00 81.78
2020-07-04 22:12:00 84.84
2020-01-29 23:36:00 31.49
2020-01-19 08:55:00 35.61

Filter by time - Hour and Minutes

We want to filter the above dataframe by a hour and minute condition, where hour is greater than or equal to 10 and less than equal to 14 and minute is greater than equal to 10 and less than equal to 30

We will use the DatetimeIndex property hour and minute to get the hour and minutes respectively from the date value or timestamp.

Since the index is a DatetimeIndex we were using df.index.month, however if your date is a datetime column series then you can use the dt accessor df.date.dt.month to get the month

df[((df.index.hour>=10) & (df.index.hour<=14)) & 
   (df.index.minute>10) & (df.index.minute<30)].sort_index(ascending=True).sample(5)

After the rows are filter by hour and minutes condition, I’m sorting it in ascending order and grabbing a sample set of 5.

date value
2020-11-02 13:27:00 86.03
2020-11-06 14:16:00 52.47
2020-07-04 12:20:00 87.32
2020-09-27 12:16:00 83.01
2020-11-09 13:11:00 91.82

Filter by Month

Here we want to filter by month and interested to find the dates where month integer value is in 2, 7,6 and 11

We will again use the DatetimeIndex month properties to get the month from the date

df[df.index.month.isin([7, 6, 11, 2])].sort_index(ascending=True).sample(5)
date value
2020-07-16 01:13:00 79.10
2020-07-19 06:55:00 72.70
2020-07-30 17:20:00 87.90
2020-06-09 04:18:00 87.91
2020-11-07 15:33:00 14.82

We want to filter the rows from dataframe by month but this time we will use the month names and not the integer values

df[df.index.month_name().isin(['July', 'June', 'November', 'February'])].sort_index(ascending=True).sample(5)
date value
2020-07-08 13:09:00 79.71
2020-02-07 17:16:00 49.45
2020-02-15 15:12:00 10.02
2020-07-10 05:14:00 64.78
2020-02-07 04:19:00 72.80

We can also filter it by abbreviated month name by taking the slice, first 3 letters of each month

df[df.index.month_name().str[:3].isin(['Jul', 'Jun', 'Nov', 'Feb'])].sort_index(ascending=True).sample(5)

Filter last 3 months

We can also filter by last N months, Since there is no month argument for pd.Timedelta so the work around is to get the month of last date and subtract N to it.

df[df.index.month >= (df.index.date[-1].month-3)]
date value
2020-09-01 00:29:00 94.28
2020-09-01 01:06:00 67.04
2020-09-01 01:43:00 54.11
2020-09-01 02:20:00 96.92
2020-09-01 02:57:00 97.31
…. ….
2020-12-24 10:30:00 35.17
2020-12-24 11:07:00 82.86
2020-12-24 11:44:00 61.94
2020-12-24 12:21:00 49.61
2020-12-24 12:58:00 85.49

Alternatively, we can use pd.offsets() also but that throws a warning that comparison of Timestamp with datetime.date is deprecated and use pd.Timestamp(date) or ts.date() == date instead

df[df.index.date > (df.index.date[-1]- pd.offsets.MonthBegin(3))]

Filter by day of week

We want to filter the dates by day of week and the attributes that can be used for this is day_name and weekday

Here we will filter the dates by weekdays: Monday, Wednesday and Sunday from the dataframe

df[df.index.day_name().isin(['Monday','Wednesday','Sunday'])].sort_index(ascending=True).head(5)
date value
2020-01-01 00:00:00 12.60
2020-01-01 00:37:00 27.79
2020-01-01 01:14:00 52.52
2020-01-01 01:51:00 67.40
2020-01-01 02:28:00 95.25

We can also filter it by abbreviated weekday name by taking the slice, first 3 letters of each day

df[df.index.day_name().str[:3].isin(['Mon','Wed','Sun'])].sort_index(ascending=True).head(5)

Altenatively, you can also filter by index of the weekday and it starts from Monday i.e. 0, so for Mon, Wed and Sun it would be 0,2,4.

df[df.index.weekday.isin([0,2,4])].sort_index(ascending=True).head(5)

Filter by Month start and end

Filter by month start and end can be done by using the properties is_month_start and is_month_end

we want all the dates in dataframe which are either start of the month or end of the month

df[(df.index.is_month_end)|(df.index.is_month_start)].sort_index(ascending=True).tail()
date value
2020-04-30 14:38:00 37.32
2020-10-01 03:13:00 69.76
2020-09-30 03:47:00 48.22
2020-05-01 00:30:00 83.94
2020-09-01 10:58:00 97.83

Filter by Quarter start and end

Similar to month start and end we can use property is_quarter_start and is_quarter_end

So we want all the dates in dataframe which are either start of a Quarter or end of a Quarter

date value
2020-10-01 23:34:00 96.62
2020-10-01 22:57:00 64.27
2020-10-01 22:20:00 56.88
2020-10-01 21:43:00 57. 75
2020-10-01 21:06:00 14.60
…. ….
2020-01-01 02:28:00 95.25
2020-01-01 01:51:00 67.40
2020-01-01 01:14:00 52.52
2020-01-01 00:37:00 27.79
2020-01-01 00:00:00 12.60

Filter last N days, Weeks, hours, seconds

We want to filter by last N days, weeks, hours and seconds, The pd.TimeDelta() represents a duration, the difference between two dates or times.

So for last N days, we could get the date at the bottom of index, which is technically the last date in dataframe and then subtract the N days timedelta from it.

Here we are subtracting 200 days and this will give us the last 200 days rows in the dataframe

df[df.index.date > (df.index.date[-1]- pd.Timedelta(days=200))]
date value
2020-06-08 00:33:00 24.88
2020-06-08 01:10:00 31.50
2020-06-08 01:47:00 46.71
2020-06-08 02:24:00 50.60
2020-06-08 03:01:00 30.39
…. ….
020-12-24 10:30:00 35.17
2020-12-24 11:07:00 82.86
2020-12-24 11:44:00 61.94
2020-12-24 12:21:00 49.61
2020-12-24 12:58:00 85.49

Similarly, to filter rows by last N weeks, hours and seconds we can just construct a Timedelta from the passed arguments, Following are the allowed keywords:

days, seconds, microseconds, milliseconds, minutes, hours, weeks

Filter last 8 weeks:

We want to filter last 8 weeks, so we construct a timedelta and subtracted it from the last date in dataframe

df[df.index.date > (df.index.date[-1]- pd.Timedelta(weeks=8))]

Tags: ,

Categories: ,

Updated: