Pandas select rows between two dates
We want to select/filter rows between two dates of a dataframe which has a date as column/index
There are various ways in pandas to select rows between two dates, we will discuss about each of them in the following sections
- pandas.series.between()
- Boolean Indexing
- Pandas dataframe query
- Dataframe loc
- Dataframe truncate
How to Select rows between date?
1. Select rows using pandas.series.between():
Let’s create a dataframe with date as column and values in a date range between 10-Sep-2022 thru 2-Oct-2022
df=pd.DataFrame(pd.date_range('10-sep-2022','2-oct-2022',freq='9h'),columns=['timestamp'])
df.head()
| date | |
|---|---|
| 0 | 2022-09-10 00:00:00 |
| 1 | 2022-09-10 09:00:00 |
| 2 | 2022-09-10 18:00:00 |
| 3 | 2022-09-11 03:00:00 |
| 4 | 2022-09-11 12:00:00 |
We want to filter the rows in this dataframe(df) where rows are between 15-SEP-2022 and 2-OCT-2022
pandas.series.between() function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False
Note: If your datetime column have the Pandas datetime type (e.g. datetime64[ns]), for proper filtering you need the pd.Timestamp object,
df[df.date.between('15-sep-2022','21-sep-2022')]
Out:
| date | |
|---|---|
| 14 | 2022-09-15 06:00:00 |
| 15 | 2022-09-15 15:00:00 |
| 16 | 2022-09-16 00:00:00 |
| 17 | 2022-09-16 09:00:00 |
| 18 | 2022-09-16 18:00:00 |
| 19 | 2022-09-17 03:00:00 |
| 20 | 2022-09-17 12:00:00 |
| 21 | 2022-09-17 21:00:00 |
| 22 | 2022-09-18 06:00:00 |
| 23 | 2022-09-18 15:00:00 |
| 24 | 2022-09-19 00:00:00 |
| 25 | 2022-09-19 09:00:00 |
| 26 | 2022-09-19 18:00:00 |
| 27 | 2022-09-20 03:00:00 |
| 28 | 2022-09-20 12:00:00 |
| 29 | 2022-09-20 21:00:00 |
2. Filter rows using Boolean Indexing:
We could use boolean indexing to select the rows between the dates, you can pass the dates as datetime.datetime,np.datetime64, pd.Timestamp, or even datetime strings
df[(df['date'] > '2022-09-15') & (df['date'] < '2022-09-21')]
Let’s suppose your date column is type of datetime64[ns] then you can use dt accessor to filter dates with specific format
df[(df['date'].dt.strftime('%Y-%m-%d') >= '2022-09-15')&(df['date'].dt.strftime('%Y-%m-%d') <= '2022-09-20')]
Out:
| date | |
|---|---|
| 14 | 2022-09-15 06:00:00 |
| 15 | 2022-09-15 15:00:00 |
| 16 | 2022-09-16 00:00:00 |
| 17 | 2022-09-16 09:00:00 |
| 18 | 2022-09-16 18:00:00 |
| 19 | 2022-09-17 03:00:00 |
| 20 | 2022-09-17 12:00:00 |
| 21 | 2022-09-17 21:00:00 |
| 22 | 2022-09-18 06:00:00 |
| 23 | 2022-09-18 15:00:00 |
| 24 | 2022-09-19 00:00:00 |
| 25 | 2022-09-19 09:00:00 |
| 26 | 2022-09-19 18:00:00 |
| 27 | 2022-09-20 03:00:00 |
| 28 | 2022-09-20 12:00:00 |
| 29 | 2022-09-20 21:00:00 |
3. Select rows using Dataframe query
We could also use the query to select the rows between the dates
df.query('20220915 < date < 20220921')
Out:
| date | |
|---|---|
| 14 | 2022-09-15 06:00:00 |
| 15 | 2022-09-15 15:00:00 |
| 16 | 2022-09-16 00:00:00 |
| 17 | 2022-09-16 09:00:00 |
| 18 | 2022-09-16 18:00:00 |
| 19 | 2022-09-17 03:00:00 |
| 20 | 2022-09-17 12:00:00 |
| 21 | 2022-09-17 21:00:00 |
| 22 | 2022-09-18 06:00:00 |
| 23 | 2022-09-18 15:00:00 |
| 24 | 2022-09-19 00:00:00 |
| 25 | 2022-09-19 09:00:00 |
| 26 | 2022-09-19 18:00:00 |
| 27 | 2022-09-20 03:00:00 |
| 28 | 2022-09-20 12:00:00 |
| 29 | 2022-09-20 21:00:00 |
You could also use pd.Timestamp to perform a query as shown below
ts = pd.Timestamp
df.query('@ts("20220915T100000") <= date < @ts("20220921T120000")')
Out:
| date | |
|---|---|
| 15 | 2022-09-15 15:00:00 |
| 16 | 2022-09-16 00:00:00 |
| 17 | 2022-09-16 09:00:00 |
| 18 | 2022-09-16 18:00:00 |
| 19 | 2022-09-17 03:00:00 |
| 20 | 2022-09-17 12:00:00 |
| 21 | 2022-09-17 21:00:00 |
| 22 | 2022-09-18 06:00:00 |
| 23 | 2022-09-18 15:00:00 |
| 24 | 2022-09-19 00:00:00 |
| 25 | 2022-09-19 09:00:00 |
| 26 | 2022-09-19 18:00:00 |
| 27 | 2022-09-20 03:00:00 |
| 28 | 2022-09-20 12:00:00 |
| 29 | 2022-09-20 21:00:00 |
| 30 | 2022-09-21 06:00:00 |
4. Select rows using loc
For using the .loc, we need to first set the date column as index
df=df.set_index('date')
After that sort the index
df.sort_index(inplace=True, ascending=True)
and eventually select the rows between the dates
df.loc['2022-09-15':'2022-09-20']
Out:
| date |
|---|
| 2022-09-15 06:00:00 |
| 2022-09-15 15:00:00 |
| 2022-09-16 00:00:00 |
| 2022-09-16 09:00:00 |
| 2022-09-16 18:00:00 |
| 2022-09-17 03:00:00 |
| 2022-09-17 12:00:00 |
| 2022-09-17 21:00:00 |
| 2022-09-18 06:00:00 |
| 2022-09-18 15:00:00 |
| 2022-09-19 00:00:00 |
| 2022-09-19 09:00:00 |
| 2022-09-19 18:00:00 |
| 2022-09-20 03:00:00 |
| 2022-09-20 12:00:00 |
| 2022-09-20 21:00:00 |
###
5. Select rows using dataframe truncate
It truncates a Series or DataFrame before and after some index value.
This is a useful shorthand for boolean indexing based on index values above or below certain thresholds
Because the index is a DatetimeIndex containing only dates, we can specify before and after as strings. They will be coerced to Timestamps before truncation
df.set_index('date').truncate(before='2022-09-15',
after='2022-09-20')
Out:
| date |
|---|
| 2022-09-15 06:00:00 |
| 2022-09-15 15:00:00 |
| 2022-09-16 00:00:00 |
| 2022-09-16 09:00:00 |
| 2022-09-16 18:00:00 |
| 2022-09-17 03:00:00 |
| 2022-09-17 12:00:00 |
| 2022-09-17 21:00:00 |
| 2022-09-18 06:00:00 |
| 2022-09-18 15:00:00 |
| 2022-09-19 00:00:00 |
| 2022-09-19 09:00:00 |
| 2022-09-19 18:00:00 |
| 2022-09-20 03:00:00 |
| 2022-09-20 12:00:00 |
| 2022-09-20 21:00:00 |
You can read this post to see how to filter dates by month, hour, day or last n days of weeks