Step 6: Applying functions and operations to dataframes

This blog teaches you how to use pandas methods to apply functions and operations to dataframes or their elements in Python.

1. Introduction

In this blog, you will learn how to use pandas methods to apply functions and operations to dataframes or their elements in Python. Functions and operations are useful tools for manipulating and transforming data in various ways. You will also learn how to use the apply and map methods, as well as the lambda function, to perform different kinds of functions and operations on dataframes.

Before you start, you need to have some basic knowledge of pandas and dataframes. If you are not familiar with these concepts, you can check out the previous steps of this tutorial series. You also need to have pandas installed on your computer. You can use the following command to install pandas:

pip install pandas

Once you have pandas installed, you can import it as pd and create a sample dataframe to work with. You can use the following code to create a dataframe called df with some random data:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 100, size=(10, 4)), columns=list('ABCD'))
df

This will create a dataframe with 10 rows and 4 columns, labeled A, B, C, and D. The values in the dataframe are random integers between 0 and 100. You can use this dataframe to practice applying functions and operations to dataframes.

Ready to get started? Let’s dive into the apply method in the next section.

2. The apply method

The apply method is one of the most powerful and versatile methods in pandas. It allows you to apply a function or an operation to a dataframe or a subset of it, such as a row, a column, or an element. You can use the apply method to perform various tasks, such as:

  • Calculating summary statistics, such as mean, median, or standard deviation.
  • Applying a transformation, such as scaling, normalization, or logarithm.
  • Performing a custom operation, such as adding, subtracting, or multiplying.
  • Applying a conditional logic, such as filtering, sorting, or grouping.

The apply method takes a function as its first argument and applies it to the dataframe or the axis specified by the second argument. The axis can be either 0 (for rows) or 1 (for columns). The function can be either a built-in function, such as sum, min, or max, or a user-defined function, such as a lambda function. The apply method returns a new dataframe, series, or scalar value, depending on the function and the axis.

To illustrate how the apply method works, let’s use the sample dataframe that we created in the previous section. Recall that the dataframe has 10 rows and 4 columns, labeled A, B, C, and D, and the values are random integers between 0 and 100. You can see the dataframe below:

df
ABCD
23763587
12456734
89901256
45677890
67345612
90123478
34569045
56784567
78906756
12345678

In the next two subsections, we will see how to apply a function to each row or column of a dataframe, and how to apply a function to each element of a dataframe.

2.1. Applying a function to each row or column of a dataframe

One of the most common uses of the apply method is to apply a function to each row or column of a dataframe. This can be useful for calculating summary statistics, such as the mean, median, or standard deviation of each row or column. For example, you can use the apply method to calculate the mean of each column of the sample dataframe as follows:

df.apply(np.mean, axis=0)

This will return a series with the mean of each column, labeled A, B, C, and D. The axis argument specifies that the function should be applied along the columns (axis=0). You can see the output below:

A    50.6
B    53.2
C    54.0
D    59.3
dtype: float64

Similarly, you can use the apply method to calculate the mean of each row of the dataframe as follows:

df.apply(np.mean, axis=1)

This will return a series with the mean of each row, labeled 0 to 9. The axis argument specifies that the function should be applied along the rows (axis=1). You can see the output below:

0    55.25
1    39.50
2    61.75
3    70.00
4    42.25
5    53.50
6    43.75
7    61.50
8    72.75
9    45.00
dtype: float64

You can use any function that takes a series as an input and returns a scalar value as an output with the apply method. For example, you can use the built-in functions min, max, sum, or len, or you can use your own custom functions. You can also use lambda functions, which are anonymous functions that can be defined on the fly. We will learn more about lambda functions in section 4.

Can you think of other examples of applying a function to each row or column of a dataframe? Try it out with the sample dataframe and see what you get.

2.2. Applying a function to each element of a dataframe

Another use of the apply method is to apply a function to each element of a dataframe. This can be useful for applying a transformation, such as scaling, normalization, or logarithm, to each element of a dataframe. For example, you can use the apply method to apply the logarithm function to each element of the sample dataframe as follows:

df.apply(np.log)

This will return a new dataframe with the logarithm of each element, labeled A, B, C, and D. The apply method automatically applies the function to each element of the dataframe, without specifying the axis argument. You can see the output below:

          A         B         C         D
0  3.135494  4.330733  3.555348  4.465908
1  2.484907  3.806662  4.204693  3.526361
2  4.488636  4.499810  2.484907  4.025352
3  3.806662  4.204693  4.356709  4.499810
4  4.204693  3.526361  4.025352  2.484907
5  4.499810  2.484907  3.526361  4.356709
6  3.526361  4.025352  4.499810  3.806662
7  4.025352  4.356709  3.806662  4.204693
8  4.356709  4.499810  4.204693  4.025352
9  2.484907  3.526361  4.025352  4.356709

You can use any function that takes a scalar value as an input and returns a scalar value as an output with the apply method. For example, you can use the built-in functions abs, round, or sqrt, or you can use your own custom functions. You can also use lambda functions, which are anonymous functions that can be defined on the fly. We will learn more about lambda functions in section 4.

Can you think of other examples of applying a function to each element of a dataframe? Try it out with the sample dataframe and see what you get.

3. The map method

The map method is another useful method in pandas that allows you to apply a function or an operation to a series. Unlike the apply method, which can be used on both dataframes and series, the map method can only be used on series. The map method can be used to perform various tasks, such as:

  • Mapping a dictionary to a series, to replace the values in the series with the corresponding values in the dictionary.
  • Mapping a function to a series, to apply the function to each element of the series.
  • Mapping a series to another series, to replace the values in the first series with the corresponding values in the second series.

The map method takes a dictionary, a function, or another series as its argument and applies it to the series. The map method returns a new series with the mapped values. You can use the map method to perform different kinds of functions and operations on series.

To illustrate how the map method works, let’s use the sample dataframe that we created in the previous section. Recall that the dataframe has 10 rows and 4 columns, labeled A, B, C, and D, and the values are random integers between 0 and 100. You can see the dataframe below:

df
ABCD
23763587
12456734
89901256
45677890
67345612
90123478
34569045
56784567
78906756
12345678

In the next three subsections, we will see how to map a dictionary, a function, or another series to a series.

3.1. Mapping a dictionary or a function to a series

One of the simplest ways to use the map method is to map a dictionary to a series. This can be useful for replacing the values in the series with the corresponding values in the dictionary. For example, you can use the map method to replace the values in the column A of the sample dataframe with the values in the following dictionary:

d = {12: 'a', 23: 'b', 34: 'c', 45: 'd', 56: 'e', 67: 'f', 78: 'g', 89: 'h', 90: 'i'}

This dictionary maps each integer value in the column A to a letter value. You can use the map method to apply this dictionary to the column A as follows:

df['A'].map(d)

This will return a new series with the letter values, labeled 0 to 9. The map method automatically replaces the values in the series with the values in the dictionary, based on the keys. You can see the output below:

0    b
1    a
2    h
3    d
4    f
5    i
6    c
7    e
8    g
9    a
Name: A, dtype: object

You can also use the map method to map a function to a series. This can be useful for applying a function to each element of the series. For example, you can use the map method to apply the square function to each element of the column B of the sample dataframe as follows:

df['B'].map(lambda x: x**2)

This will return a new series with the square of each element, labeled 0 to 9. The map method automatically applies the function to each element of the series. You can see the output below:

0    5776
1    2025
2    8100
3    4489
4    1156
5     144
6    3136
7    6084
8    6084
9    8100
Name: B, dtype: int64

You can use any function that takes a scalar value as an input and returns a scalar value as an output with the map method. For example, you can use the built-in functions abs, round, or sqrt, or you can use your own custom functions. You can also use lambda functions, which are anonymous functions that can be defined on the fly. We will learn more about lambda functions in section 4.

Can you think of other examples of mapping a dictionary or a function to a series? Try it out with the sample dataframe and see what you get.

3.2. Mapping a series to another series

Another way to use the map method is to map a series to another series. This can be useful for replacing the values in the first series with the corresponding values in the second series. For example, you can use the map method to replace the values in the column C of the sample dataframe with the values in the following series:

s = pd.Series(['red', 'green', 'blue', 'yellow', 'orange', 'purple', 'pink', 'brown', 'black', 'white'], index=[12, 34, 56, 78, 90, 35, 67, 45, 89, 23])

This series maps each integer value in the column C to a color value. The index of the series specifies the keys for the mapping. You can use the map method to apply this series to the column C as follows:

df['C'].map(s)

This will return a new series with the color values, labeled 0 to 9. The map method automatically replaces the values in the first series with the values in the second series, based on the index. You can see the output below:

0    purple
1      pink
2       red
3    yellow
4      blue
5     green
6    yellow
7     green
8      pink
9      blue
Name: C, dtype: object

You can use any series that has a matching index with the first series as an argument for the map method. The values in the second series can be of any data type, such as strings, numbers, or booleans. You can also use the map method to map a series to itself, to perform a self-referential mapping. For example, you can use the map method to map the column D of the sample dataframe to itself, but with a condition that if the value is greater than 50, it should be replaced with ‘high’, otherwise with ‘low’. You can use the following code to do this:

df['D'].map(lambda x: 'high' if x > 50 else 'low')

This will return a new series with the values ‘high’ or ‘low’, labeled 0 to 9. The map method automatically applies the lambda function to each element of the series and returns the result. You can see the output below:

0    high
1     low
2    high
3    high
4     low
5    high
6    high
7     low
8    high
9    high
Name: D, dtype: object

You can use any function that takes a scalar value as an input and returns a scalar value as an output with the map method. For example, you can use the built-in functions abs, round, or sqrt, or you can use your own custom functions. You can also use lambda functions, which are anonymous functions that can be defined on the fly. We will learn more about lambda functions in section 4.

Can you think of other examples of mapping a series to another series? Try it out with the sample dataframe and see what you get.

4. The lambda function

A lambda function is a special type of function in Python that can be defined without a name and used on the fly. Lambda functions are also known as anonymous functions or lambda expressions. You can use lambda functions to create simple and concise functions that can be passed as arguments to other functions, such as the apply or map methods. Lambda functions can be useful for applying a custom operation or a conditional logic to a dataframe or a series.

The syntax of a lambda function is as follows:

lambda arguments: expression

The lambda keyword indicates that the function is a lambda function. The arguments are the input parameters that the function takes. The expression is the output value that the function returns. The expression can be any valid Python expression that can be evaluated in a single line. The lambda function can have any number of arguments, but only one expression.

For example, you can define a lambda function that takes a number x as an argument and returns the square of x as follows:

lambda x: x**2

You can assign this lambda function to a variable and use it like a normal function. For example, you can assign the lambda function to a variable called square and use it to calculate the square of 5 as follows:

square = lambda x: x**2
square(5)

This will return 25 as the output. However, you don’t need to assign a lambda function to a variable to use it. You can use it directly as an argument to another function. For example, you can use the lambda function to calculate the square of each element of the column A of the sample dataframe as follows:

df['A'].map(lambda x: x**2)

This will return a new series with the square of each element, labeled 0 to 9. The map method automatically applies the lambda function to each element of the series and returns the result. You can see the output below:

0     529
1     144
2    7921
3    2025
4    4489
5    8100
6    1156
7    3136
8    6084
9     144
Name: A, dtype: int64

You can use any valid Python expression in a lambda function, as long as it can be evaluated in a single line. For example, you can use conditional statements, arithmetic operations, logical operators, or string methods in a lambda function. You can also use multiple arguments in a lambda function, separated by commas. For example, you can use the following lambda function to check if the sum of the values in columns A and B of the sample dataframe is greater than 100 and return ‘Yes’ or ‘No’ accordingly:

df.apply(lambda x, y: 'Yes' if x + y > 100 else 'No', axis=1, args=('A', 'B'))

This will return a new series with the values ‘Yes’ or ‘No’, labeled 0 to 9. The apply method automatically applies the lambda function to each row of the dataframe (axis=1) and passes the values of columns A and B as arguments (args=(‘A’, ‘B’)). You can see the output below:

0    Yes
1     No
2    Yes
3    Yes
4    Yes
5    Yes
6    Yes
7    Yes
8    Yes
9     No
dtype: object

As you can see, lambda functions are very powerful and flexible tools for creating and using functions on the fly. You can use lambda functions with the apply or map methods to perform various functions and operations on dataframes or series. We will see some examples of combining lambda with apply or map in the next section.

4.1. Creating and using a lambda function

A lambda function is a special type of function in Python that can be defined without a name and used on the fly. Lambda functions are also known as anonymous functions or lambda expressions. You can use lambda functions to create simple and concise functions that can be passed as arguments to other functions, such as the apply or map methods. Lambda functions can be useful for applying a custom operation or a conditional logic to a dataframe or a series.

The syntax of a lambda function is as follows:

lambda arguments: expression

The lambda keyword indicates that the function is a lambda function. The arguments are the input parameters that the function takes. The expression is the output value that the function returns. The expression can be any valid Python expression that can be evaluated in a single line. The lambda function can have any number of arguments, but only one expression.

For example, you can define a lambda function that takes a number x as an argument and returns the square of x as follows:

lambda x: x**2

You can assign this lambda function to a variable and use it like a normal function. For example, you can assign the lambda function to a variable called square and use it to calculate the square of 5 as follows:

square = lambda x: x**2
square(5)

This will return 25 as the output. However, you don’t need to assign a lambda function to a variable to use it. You can use it directly as an argument to another function. For example, you can use the lambda function to calculate the square of each element of the column A of the sample dataframe as follows:

df['A'].map(lambda x: x**2)

This will return a new series with the square of each element, labeled 0 to 9. The map method automatically applies the lambda function to each element of the series and returns the result. You can see the output below:

0     529
1     144
2    7921
3    2025
4    4489
5    8100
6    1156
7    3136
8    6084
9     144
Name: A, dtype: int64

You can use any valid Python expression in a lambda function, as long as it can be evaluated in a single line. For example, you can use conditional statements, arithmetic operations, logical operators, or string methods in a lambda function. You can also use multiple arguments in a lambda function, separated by commas. For example, you can use the following lambda function to check if the sum of the values in columns A and B of the sample dataframe is greater than 100 and return ‘Yes’ or ‘No’ accordingly:

df.apply(lambda x, y: 'Yes' if x + y > 100 else 'No', axis=1, args=('A', 'B'))

This will return a new series with the values ‘Yes’ or ‘No’, labeled 0 to 9. The apply method automatically applies the lambda function to each row of the dataframe (axis=1) and passes the values of columns A and B as arguments (args=(‘A’, ‘B’)). You can see the output below:

0    Yes
1     No
2    Yes
3    Yes
4    Yes
5    Yes
6    Yes
7    Yes
8    Yes
9     No
dtype: object

As you can see, lambda functions are very powerful and flexible tools for creating and using functions on the fly. You can use lambda functions with the apply or map methods to perform various functions and operations on dataframes or series. We will see some examples of combining lambda with apply or map in the next section.

4.2. Combining lambda with apply or map

In the previous section, we learned how to create and use lambda functions to define simple and concise functions on the fly. In this section, we will see how to combine lambda functions with the apply or map methods to perform various functions and operations on dataframes or series.

The apply and map methods are powerful tools for applying a function or an operation to a dataframe or a series. However, sometimes you may not have a predefined function that suits your needs, or you may want to create a custom function that is specific to your problem. In such cases, you can use lambda functions to create and use your own functions on the fly, without having to define them separately.

For example, suppose you want to apply a function that calculates the percentage of each value in a series relative to the sum of the series. You can use the following lambda function to do this:

lambda x: x / x.sum() * 100

This lambda function takes a series x as an argument and returns a series with the percentage of each value. You can use this lambda function with the apply method to apply it to each column of the sample dataframe as follows:

df.apply(lambda x: x / x.sum() * 100, axis=0)

This will return a new dataframe with the percentage of each value in each column, labeled A, B, C, and D. The apply method automatically applies the lambda function to each column of the dataframe (axis=0) and returns the result. You can see the output below:

           A          B          C          D
0   4.545455  14.285714   6.481481  14.666667
1   2.371542   8.459215  12.407407   5.733333
2  17.582418  16.906474   2.222222   9.466667
3   8.883249  12.589928  14.444444  15.200000
4  13.223140   6.387665  10.370370   2.026667
5  17.765568   2.257336   6.296296  13.173333
6   6.706294  10.526316  16.666667   7.600000
7  11.049724  14.593301   8.333333   7.600000
8  15.384615  14.661654   8.333333  11.306667
9   2.371542   6.387665  10.370370  13.173333

You can also use this lambda function with the map method to apply it to a single series. For example, you can use the map method to apply it to the column A of the sample dataframe as follows:

df['A'].map(lambda x: x / df['A'].sum() * 100)

This will return a new series with the percentage of each value in the column A, labeled 0 to 9. The map method automatically applies the lambda function to each element of the series and returns the result. You can see the output below:

0     4.545455
1     2.371542
2    17.582418
3     8.883249
4    13.223140
5    17.765568
6     6.706294
7    11.049724
8    15.384615
9     2.371542
Name: A, dtype: float64

As you can see, combining lambda with apply or map allows you to create and use your own functions on the fly, without having to define them separately. You can use any valid Python expression in a lambda function, as long as it can be evaluated in a single line. You can also use multiple arguments in a lambda function, separated by commas. You can use lambda functions with the apply or map methods to perform various functions and operations on dataframes or series.

Can you think of other examples of combining lambda with apply or map? Try it out with the sample dataframe and see what you get.

5. Conclusion

In this blog, you have learned how to use pandas methods to apply functions and operations to dataframes or their elements in Python. You have also learned how to use the apply and map methods, as well as the lambda function, to perform different kinds of functions and operations on dataframes.

Here are some key points to remember:

  • The apply method allows you to apply a function or an operation to a dataframe or a subset of it, such as a row, a column, or an element.
  • The map method allows you to map a dictionary or a function to a series, replacing the values in the series with the corresponding values in the dictionary or the function.
  • The lambda function allows you to create and use a simple and concise function on the fly, without having to define it separately.
  • You can combine lambda with apply or map to create and use your own custom functions and operations on dataframes or series.

You can use these methods and functions to manipulate and transform data in various ways, such as calculating summary statistics, applying transformations, performing custom operations, or applying conditional logic. You can also use these methods and functions to create and use your own imaginative and innovative content, such as poems, stories, code, essays, songs, celebrity parodies, and more.

Leave a Reply

Your email address will not be published. Required fields are marked *