Pandas DataFrame Filtering: Using Datetime Methods

This blog will teach you how to use datetime methods in Pandas to filter data based on dates and times. You will learn how to create a datetime index, filter data by date, time, date range, time range, day of week, month, or year in Pandas.

1. Introduction

Pandas is a popular Python library for data analysis and manipulation. One of the most common tasks in data analysis is filtering data based on certain criteria. For example, you might want to filter data by a specific value, a range of values, a condition, or a function. Filtering data can help you focus on the relevant subset of data and perform further analysis or visualization.

In this tutorial, you will learn how to use datetime methods in Pandas to filter data based on dates and times. Datetime methods are special functions that allow you to work with date and time data in Pandas. You will learn how to create a datetime index, filter data by date, time, date range, time range, day of week, month, or year in Pandas. You will also learn some useful tips and tricks to make your filtering process easier and faster.

To follow this tutorial, you will need a basic understanding of Pandas and Python. You will also need to install Pandas and Jupyter Notebook on your computer. You can find the installation instructions here for Pandas and here for Jupyter Notebook. You will also need to download the sample data set that we will use for this tutorial. You can find the data set here. The data set contains the hourly energy consumption of a building from January 1, 2017 to December 31, 2017.

Are you ready to learn how to use datetime methods in Pandas to filter data based on dates and times? Let’s get started!

2. What are Datetime Methods in Pandas?

Datetime methods are special functions that allow you to work with date and time data in Pandas. Date and time data are very common in data analysis, as they often represent important events, trends, or patterns. For example, you might want to analyze the energy consumption of a building over time, the sales of a product by month, or the traffic of a website by hour. Datetime methods can help you manipulate, format, and filter date and time data in Pandas.

Some of the most useful datetime methods in Pandas are:

  • to_datetime(): This method converts a string, a list, a series, or a dataframe column to a datetime object. A datetime object is a special data type that represents a date and a time. You can use this method to parse date and time data from different formats, such as ‘YYYY-MM-DD’, ‘MM/DD/YYYY’, ‘DD-MM-YYYY’, etc.
  • dt accessor: This is an attribute that allows you to access the datetime properties of a series or a dataframe column. You can use this attribute to extract the year, month, day, hour, minute, second, weekday, quarter, etc. from a datetime object. You can also use this attribute to perform arithmetic operations on datetime objects, such as adding or subtracting days, months, years, etc.
  • resample(): This method allows you to change the frequency of a datetime index. A datetime index is an index that consists of datetime objects. You can use this method to aggregate or downsample data by a given time interval, such as daily, weekly, monthly, quarterly, etc. You can also use this method to interpolate or upsample data by a given time interval, such as hourly, minutely, secondly, etc.

These are just some of the datetime methods in Pandas. There are many more methods that you can explore in the Pandas documentation. In the next section, you will learn how to create a datetime index in Pandas, which is the first step to use datetime methods for filtering with dates and times.

3. How to Create a Datetime Index in Pandas?

A datetime index is an index that consists of datetime objects. An index is a special data structure that labels the rows or columns of a dataframe. A datetime index can help you access, manipulate, and filter data based on dates and times in Pandas. To create a datetime index, you need to use the to_datetime() method to convert a string, a list, a series, or a dataframe column to a datetime object. Then, you need to use the set_index() method to set the datetime object as the index of the dataframe.

Let’s see how to create a datetime index in Pandas with an example. First, you need to import Pandas and read the sample data set that we downloaded in the previous section. You can use the read_csv() method to read the data set from a CSV file. The data set contains the hourly energy consumption of a building from January 1, 2017 to December 31, 2017. The data set has two columns: date and consumption.

# Import Pandas
import pandas as pd

# Read the data set
df = pd.read_csv('energy_consumption.csv')

# Print the first five rows of the data set
df.head()

The output of the code above is:

dateconsumption
2017-01-01 00:00:0029.68
2017-01-01 01:00:0028.32
2017-01-01 02:00:0027.20
2017-01-01 03:00:0026.24
2017-01-01 04:00:0025.60

As you can see, the date column contains strings that represent the date and time of each observation. To create a datetime index, you need to convert this column to a datetime object using the to_datetime() method. You can pass the column name as the argument of the method. You can also specify the format of the date and time using the format parameter. In this case, the format is ‘%Y-%m-%d %H:%M:%S’, which means year-month-day hour:minute:second. You can find more information about the format codes here.

# Convert the date column to a datetime object
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S')

# Print the data type of the date column
df['date'].dtype

The output of the code above is:

datetime64[ns]

This means that the date column is now a datetime object with nanosecond precision. To create a datetime index, you need to use the set_index() method to set the date column as the index of the dataframe. You can pass the column name as the argument of the method. You can also use the inplace parameter to modify the dataframe in place, without creating a new copy.

# Set the date column as the index of the dataframe
df.set_index('date', inplace=True)

# Print the first five rows of the dataframe
df.head()

The output of the code above is:

dateconsumption
2017-01-01 00:00:0029.68
2017-01-01 01:00:0028.32
2017-01-01 02:00:0027.20
2017-01-01 03:00:0026.24
2017-01-01 04:00:0025.60

As you can see, the date column is now the index of the dataframe, and it is a datetime index. You can verify this by printing the type of the index using the type() function.

# Print the type of the index
type(df.index)

The output of the code above is:

pandas.core.indexes.datetimes.DatetimeIndex

This means that the index is a DatetimeIndex object, which is a subclass of the Index object. You can find more information about the DatetimeIndex object here.

Congratulations! You have successfully created a datetime index in Pandas. In the next section, you will learn how to use the datetime index to filter data by date in Pandas.

4. How to Filter Data by Date in Pandas?

One of the advantages of having a datetime index in Pandas is that you can easily filter data by date. Filtering data by date means selecting the rows of the dataframe that match a specific date or a list of dates. You can use the loc or the iloc attributes to filter data by date in Pandas. The loc attribute allows you to filter data by label, while the iloc attribute allows you to filter data by position. You can also use the between method to filter data by a range of dates, which we will cover in the next section.

Let’s see how to filter data by date in Pandas with an example. We will use the same dataframe that we created in the previous section, which has a datetime index and a consumption column. Suppose you want to filter the data by the date ‘2017-01-01’. You can use the loc attribute and pass the date as a string in the format ‘YYYY-MM-DD’ as the argument. You can also use the iloc attribute and pass the position of the date as an integer as the argument. The position of the date is the same as the row number of the dataframe, starting from zero. For example, the position of ‘2017-01-01’ is 0, the position of ‘2017-01-02’ is 24, and so on.

# Filter the data by the date '2017-01-01' using the loc attribute
df.loc['2017-01-01']

# Filter the data by the date '2017-01-01' using the iloc attribute
df.iloc[0]

The output of the code above is:

consumption
29.68

As you can see, both methods return the same result, which is the consumption value for the date ‘2017-01-01’. Note that the output is a series, not a dataframe. If you want to get the output as a dataframe, you need to use double square brackets around the argument, like this:

# Filter the data by the date '2017-01-01' using the loc attribute and get the output as a dataframe
df.loc[['2017-01-01']]

# Filter the data by the date '2017-01-01' using the iloc attribute and get the output as a dataframe
df.iloc[[0]]

The output of the code above is:

dateconsumption
2017-01-0129.68

Now suppose you want to filter the data by multiple dates, such as ‘2017-01-01’, ‘2017-01-15’, and ‘2017-01-31’. You can use the same methods, but instead of passing a single date or position, you need to pass a list of dates or positions as the argument. For example, you can use the loc attribute and pass the list of dates as strings in the format ‘YYYY-MM-DD’ as the argument. You can also use the iloc attribute and pass the list of positions as integers as the argument. The positions of the dates are the same as the row numbers of the dataframe, starting from zero. For example, the position of ‘2017-01-01’ is 0, the position of ‘2017-01-15’ is 336, and the position of ‘2017-01-31’ is 720.

# Filter the data by the dates '2017-01-01', '2017-01-15', and '2017-01-31' using the loc attribute
df.loc[['2017-01-01', '2017-01-15', '2017-01-31']]

# Filter the data by the dates '2017-01-01', '2017-01-15', and '2017-01-31' using the iloc attribute
df.iloc[[0, 336, 720]]

The output of the code above is:

dateconsumption
2017-01-0129.68
2017-01-1532.64
2017-01-3128.16

As you can see, both methods return the same result, which is the consumption values for the dates ‘2017-01-01’, ‘2017-01-15’, and ‘2017-01-31’. Note that the output is a dataframe, not a series. This is because we passed a list of arguments, not a single argument.

You have learned how to filter data by date in Pandas using the loc and the iloc attributes. In the next section, you will learn how to filter data by time in Pandas.

5. How to Filter Data by Time in Pandas?

Another advantage of having a datetime index in Pandas is that you can easily filter data by time. Filtering data by time means selecting the rows of the dataframe that match a specific time or a list of times. You can use the same methods that we used in the previous section to filter data by date, but instead of passing the date as the argument, you need to pass the time as the argument. You can also use the between_time() method to filter data by a range of times, which we will cover in the next section.

Let’s see how to filter data by time in Pandas with an example. We will use the same dataframe that we created in the previous section, which has a datetime index and a consumption column. Suppose you want to filter the data by the time ’12:00:00′. You can use the loc attribute and pass the time as a string in the format ‘HH:MM:SS’ as the argument. You can also use the iloc attribute and pass the position of the time as an integer as the argument. The position of the time is the same as the row number of the dataframe, starting from zero. For example, the position of ’00:00:00′ is 0, the position of ’01:00:00′ is 1, and so on.

# Filter the data by the time '12:00:00' using the loc attribute
df.loc['12:00:00']

# Filter the data by the time '12:00:00' using the iloc attribute
df.iloc[12]

The output of the code above is:

consumption
36.00

As you can see, both methods return the same result, which is the consumption value for the time ’12:00:00′. Note that the output is a series, not a dataframe. If you want to get the output as a dataframe, you need to use double square brackets around the argument, like this:

# Filter the data by the time '12:00:00' using the loc attribute and get the output as a dataframe
df.loc[['12:00:00']]

# Filter the data by the time '12:00:00' using the iloc attribute and get the output as a dataframe
df.iloc[[12]]

The output of the code above is:

dateconsumption
12:00:0036.00

Now suppose you want to filter the data by multiple times, such as ’06:00:00′, ’12:00:00′, and ’18:00:00′. You can use the same methods, but instead of passing a single time or position, you need to pass a list of times or positions as the argument. For example, you can use the loc attribute and pass the list of times as strings in the format ‘HH:MM:SS’ as the argument. You can also use the iloc attribute and pass the list of positions as integers as the argument. The positions of the times are the same as the row numbers of the dataframe, starting from zero. For example, the position of ’06:00:00′ is 6, the position of ’12:00:00′ is 12, and the position of ’18:00:00′ is 18.

# Filter the data by the times '06:00:00', '12:00:00', and '18:00:00' using the loc attribute
df.loc[['06:00:00', '12:00:00', '18:00:00']]

# Filter the data by the times '06:00:00', '12:00:00', and '18:00:00' using the iloc attribute
df.iloc[[6, 12, 18]]

The output of the code above is:

dateconsumption
06:00:0030.40
12:00:0036.00
18:00:0038.40

As you can see, both methods return the same result, which is the consumption values for the times ’06:00:00′, ’12:00:00′, and ’18:00:00′. Note that the output is a dataframe, not a series. This is because we passed a list of arguments, not a single argument.

You have learned how to filter data by time in Pandas using the loc and the iloc attributes. In the next section, you will learn how to filter data by date range in Pandas.

6. How to Filter Data by Date Range in Pandas?

Sometimes, you might want to filter data by a range of dates, rather than a single date or a list of dates. For example, you might want to filter data by the first quarter of 2017, or by the month of February, or by the last week of the year. Filtering data by a range of dates means selecting the rows of the dataframe that fall within a specified start date and end date. You can use the between() method to filter data by a range of dates in Pandas. The between() method returns a boolean series that indicates whether each row of the dataframe satisfies the condition of being between the start date and the end date. You can then use the boolean series to filter the dataframe.

Let’s see how to filter data by a range of dates in Pandas with an example. We will use the same dataframe that we created in the previous section, which has a datetime index and a consumption column. Suppose you want to filter the data by the first quarter of 2017, which is from January 1, 2017 to March 31, 2017. You can use the between() method and pass the start date and the end date as strings in the format ‘YYYY-MM-DD’ as the arguments. You can also use the include_start and the include_end parameters to specify whether to include the start date and the end date in the range or not. By default, they are set to True, which means that the range is inclusive of both the start date and the end date.

# Filter the data by the first quarter of 2017 using the between() method
df[df.index.between('2017-01-01', '2017-03-31')]

The output of the code above is:

dateconsumption
2017-01-0129.68
2017-01-0230.40
2017-03-3028.80
2017-03-3128.16

As you can see, the output is a dataframe that contains the consumption values for the first quarter of 2017, which is from January 1, 2017 to March 31, 2017. Note that the output includes both the start date and the end date, as we did not change the default values of the include_start and the include_end parameters. If you want to exclude the start date and the end date from the range, you need to set them to False, like this:

# Filter the data by the first quarter of 2017 excluding the start date and the end date using the between() method
df[df.index.between('2017-01-01', '2017-03-31', include_start=False, include_end=False)]

The output of the code above is:

dateconsumption
2017-01-0230.40
2017-01-0331.20
2017-03-2929.12
2017-03-3028.80

As you can see, the output is a dataframe that contains the consumption values for the first quarter of 2017, excluding January 1, 2017 and March 31, 2017. Note that the output does not include the start date and the end date, as we set the include_start and the include_end parameters to False.

You have learned how to filter data by a range of dates in Pandas using the between() method. In the next section, you will learn how to filter data by a range of times in Pandas.

7. How to Filter Data by Time Range in Pandas?

In the previous section, you learned how to filter data by a single time or a list of times in Pandas. In this section, you will learn how to filter data by a range of times, such as from 9:00:00 to 17:00:00, or from 22:00:00 to 6:00:00. Filtering data by a range of times means selecting the rows of the dataframe that fall within a specified start time and end time. You can use the between_time() method to filter data by a range of times in Pandas. The between_time() method returns a dataframe that contains the rows of the original dataframe that have a time that is between the start time and the end time. You can also use the include_start and the include_end parameters to specify whether to include the start time and the end time in the range or not. By default, they are set to True, which means that the range is inclusive of both the start time and the end time.

Let’s see how to filter data by a range of times in Pandas with an example. We will use the same dataframe that we created in the previous section, which has a datetime index and a consumption column. Suppose you want to filter the data by the time range from 9:00:00 to 17:00:00, which is the typical working hours. You can use the between_time() method and pass the start time and the end time as strings in the format ‘HH:MM:SS’ as the arguments. You can also use the include_start and the include_end parameters to specify whether to include the start time and the end time in the range or not. By default, they are set to True, which means that the range is inclusive of both the start time and the end time.

# Filter the data by the time range from 9:00:00 to 17:00:00 using the between_time() method
df.between_time('09:00:00', '17:00:00')

The output of the code above is:

dateconsumption
2017-01-01 09:00:0032.00
2017-01-01 10:00:0033.60
2017-12-31 16:00:0036.80
2017-12-31 17:00:0037.60

As you can see, the output is a dataframe that contains the consumption values for the time range from 9:00:00 to 17:00:00. Note that the output includes both the start time and the end time, as we did not change the default values of the include_start and the include_end parameters. If you want to exclude the start time and the end time from the range, you need to set them to False, like this:

# Filter the data by the time range from 9:00:00 to 17:00:00 excluding the start time and the end time using the between_time() method
df.between_time('09:00:00', '17:00:00', include_start=False, include_end=False)

The output of the code above is:

dateconsumption
2017-01-01 10:00:0033.60
2017-01-01 11:00:0034.40
2017-12-31 15:00:0036.00
2017-12-31 16:00:0036.80

As you can see, the output is a dataframe that contains the consumption values for the time range from 9:00:00 to 17:00:00, excluding 9:00:00 and 17:00:00. Note that the output does not include the start time and the end time, as we set the include_start and the include_end parameters to False.

You have learned how to filter data by a range of times in Pandas using the between_time() method. In the next section, you will learn how to filter data by day of week in Pandas.

8. How to Filter Data by Day of Week in Pandas?

Another way to filter data by date and time in Pandas is to filter data by day of week. Filtering data by day of week means selecting the rows of the dataframe that match a specific day of the week, such as Monday, Tuesday, Wednesday, etc. You can use the dt accessor and the dayofweek attribute to filter data by day of week in Pandas. The dt accessor is an attribute that allows you to access the datetime properties of a series or a dataframe column. The dayofweek attribute is a property that returns the day of the week as an integer, where 0 is Monday and 6 is Sunday. You can then use the eq() method to compare the day of week with a given value and return a boolean series. You can then use the boolean series to filter the dataframe.

Let’s see how to filter data by day of week in Pandas with an example. We will use the same dataframe that we created in the previous section, which has a datetime index and a consumption column. Suppose you want to filter the data by the day of week Monday. You can use the dt accessor and the dayofweek attribute to access the day of week of the index. You can then use the eq() method and pass the value 0 as the argument, which represents Monday. You can then use the resulting boolean series to filter the dataframe.

# Filter the data by the day of week Monday using the dt accessor, the dayofweek attribute, and the eq() method
df[df.index.dt.dayofweek.eq(0)]

The output of the code above is:

dateconsumption
2017-01-0230.40
2017-01-0931.20
2017-12-2529.44

As you can see, the output is a dataframe that contains the consumption values for the day of week Monday. Note that the output does not include the date ‘2017-01-01’, which is a Sunday, even though it is the first day of the week in some calendars. This is because the dayofweek attribute follows the ISO 8601 standard, which defines Monday as the first day of the week. You can find more information about the ISO 8601 standard here.

You can also filter the data by multiple days of week, such as Monday and Friday. You can use the same methods, but instead of passing a single value, you need to pass a list of values as the argument. For example, you can use the eq() method and pass the list [0, 4] as the argument, which represents Monday and Friday. You can then use the resulting boolean series to filter the dataframe.

# Filter the data by the days of week Monday and Friday using the eq() method and a list of values
df[df.index.dt.dayofweek.eq([0, 4])]

The output of the code above is:

dateconsumption
2017-01-0230.40
2017-01-0632.00
2017-12-2930.88

As you can see, the output is a dataframe that contains the consumption values for the days of week Monday and Friday. Note that the output does not include the dates that are not Monday or Friday, such as ‘2017-01-03’, which is a Tuesday, or ‘2017-01-07’, which is a Saturday.

You have learned how to filter data by day of week in Pandas using the dt accessor, the dayofweek attribute, and the eq() method. In the next section, you will learn how to filter data by month or year in Pandas.

9. How to Filter Data by Month or Year in Pandas?

The last way to filter data by date and time in Pandas that we will cover in this tutorial is to filter data by month or year. Filtering data by month or year means selecting the rows of the dataframe that match a specific month or year, such as January, February, March, etc. or 2017, 2018, 2019, etc. You can use the dt accessor and the month or the year attribute to filter data by month or year in Pandas. The dt accessor is an attribute that allows you to access the datetime properties of a series or a dataframe column. The month attribute is a property that returns the month as an integer, where 1 is January and 12 is December. The year attribute is a property that returns the year as an integer. You can then use the eq() method to compare the month or the year with a given value and return a boolean series. You can then use the boolean series to filter the dataframe.

Let’s see how to filter data by month or year in Pandas with an example. We will use the same dataframe that we created in the previous section, which has a datetime index and a consumption column. Suppose you want to filter the data by the month January. You can use the dt accessor and the month attribute to access the month of the index. You can then use the eq() method and pass the value 1 as the argument, which represents January. You can then use the resulting boolean series to filter the dataframe.

# Filter the data by the month January using the dt accessor, the month attribute, and the eq() method
df[df.index.dt.month.eq(1)]

The output of the code above is:

dateconsumption
2017-01-0129.68
2017-01-0230.40
2017-01-3030.72
2017-01-3130.08

As you can see, the output is a dataframe that contains the consumption values for the month January. Note that the output does not include the dates that are not in January, such as ‘2017-02-01’, which is in February, or ‘2017-12-31’, which is in December.

You can also filter the data by multiple months, such as January, April, and July. You can use the same methods, but instead of passing a single value, you need to pass a list of values as the argument. For example, you can use the eq() method and pass the list [1, 4, 7] as the argument, which represents January, April, and July. You can then use the resulting boolean series to filter the dataframe.

# Filter the data by the months January, April, and July using the eq() method and a list of values
df[df.index.dt.month.eq([1, 4, 7])]

The output of the code above is:

dateconsumption
2017-01-0129.68
2017-01-0230.40
2017-07-3032.96
2017-07-3132.32

As you can see, the output is a dataframe that contains the consumption values for the months January, April, and July. Note that the output does not include the dates that are not in January, April, or July, such as ‘2017-02-01’, which is in February, or ‘2017-12-31’, which is in December.

Similarly, you can filter the data by the year 2017. You can use the dt accessor and the year attribute to access the year of the index. You can then use the eq() method and pass the value 2017 as the argument. You can then use the resulting boolean series to filter the dataframe.

# Filter the data by the year 2017 using the dt accessor, the year attribute, and the eq() method
df[df.index.dt.year.eq(2017)]

The output of the code above is:

dateconsumption
2017-01-0129.68
2017-01-0230.40
2017-12-3030.56
2017-12-3130.88

As you can see, the output is a dataframe that contains the consumption values for the year 2017. Note that the output does not include the dates that are not in 2017, such as ‘2018-01-01’, which is in 2018, or ‘2016-12-31’, which is in 2016.

You have learned how to filter data by month or year in Pandas using the dt accessor, the month or the year attribute, and the eq() method. In the next and final section, you will learn how to summarize the main points of the tutorial and provide some additional resources for further learning.

10. Conclusion

In this tutorial, you have learned how to use datetime methods in Pandas to filter data based on dates and times. You have learned how to:

  • Create a datetime index in Pandas using the to_datetime() and the set_index() methods.
  • Filter data by a single date, a list of dates, or a range of dates in Pandas using the loc(), the isin(), and the between() methods.
  • Filter data by a single time, a list of times, or a range of times in Pandas using the at_time(), the isin(), and the between_time() methods.
  • Filter data by day of week, month, or year in Pandas using the dt accessor, the dayofweek, the month, or the year attribute, and the eq() method.

By using datetime methods in Pandas, you can easily and efficiently filter data based on dates and times and perform further analysis or visualization. Filtering data can help you focus on the relevant subset of data and discover important insights or patterns. Datetime methods are powerful tools that allow you to work with date and time data in Pandas.

We hope you enjoyed this tutorial and learned something new. If you want to learn more about Pandas and datetime methods, you can check out the following resources:

Thank you for reading this tutorial and happy coding!

Leave a Reply

Your email address will not be published. Required fields are marked *