Linux Hint Posted January 21, 2023 Share Posted January 21, 2023 The weighted average is the average of the data that identifies the specific numbers that are more important than the other numbers in the DataFrame. We will be implementing all possible ways in which the Pandas weighted average can be calculated with the help of several examples. Formula (values_column*weights_column).sum()/weights_column.sum() Here, values_column is the numeric column in the Pandas DataFrame that stores the values, and weights_column is the numeric column that will store the weight of each value. Method 1: Return Weighted Average Let’s use the custom function that computes the weighted average of the Pandas DataFrame. We will use the sum() function to calculate the sum in the following computation: sum(DataFrame_object[weight_data]*DataFrame_object[value_data])/DataFrame_object[weight_data].sum() Here, weight_data is the column in the DataFrame that holds weights for values in the value_data column. Example In this example, we have a DataFrame named ‘calculations’ with 2 columns of integer type. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with these two columns by passing them as arguments. import pandas # Create the dataframe with 2 columns and 5 rows calculations=pandas.DataFrame.from_dict({'count':[7,8,9,0,4], 'quantity':[2,3,4,5,2] }) # Display the DataFrame - calculations print(calculations) # Custom function that calculates the weighted average def weighted_avg_calculation(calculations,value_data,weight_data): return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum() print() # Call the function by passing the DataFrame, 'quantity' as value_data and 'count' as weight_data print(weighted_avg_calculation(calculations,'quantity','count')) Output count quantity 0 7 2 1 8 3 2 9 4 3 0 5 4 4 2 2.9285714285714284 Explanation So, the custom function is: It will return the weighted average. So, the weighted average of the above DataFrame is 2.92. Method 2: Return Weighted Average in Groups Now, we will use the groupby() function to group the rows and return the weighted average in each group. The apply() method is used along with the groupby() that takes the weighted average and columns as parameters. DataFrame_object.groupby('grouping_column').apply(weighted_avg_calculation,'value_data','weight_data') Here, rows were grouped based on values in the ‘grouping_column’. The weighted_avg_calculation is a custom function that computes the weighted average. The weight_data is the column in the DataFrame that holds weights for values in the value_data column. Example In this example, we have a DataFrame named ‘calculations’ with 3 columns. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with the two columns by passing them as arguments. We will group the rows based on the ‘item’ column and return the weighted average in each group. import pandas # Create the dataframe with 3 columns and 5 rows calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15], 'quantity':[100,200,345,670,50], 'item':['plastic','iron','iron','steel','plastic'] }) # Display the DataFrame - calculations print(calculations) # Custom function that calculates the weighted average def weighted_avg_calculation(calculations,value_data,weight_data): return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum() print() print(calculations.groupby('item').apply(weighted_avg_calculation,'quantity','count')) Output count quantity item 0 12 100 plastic 1 34 200 iron 2 56 345 iron 3 10 670 steel 4 15 50 plastic item iron 290.222222 plastic 72.222222 steel 670.000000 dtype: float64 Explanation So, the custom function is: It will return the weighted average. There are three groups in the calculations DataFrame. The weighted average for the ‘iron’ group is 290.22 The weighted average for the ‘plastic’ group is 72.22 The weighted average for the ‘steel’ group is 670.00 Method 3: Return Weighted Average Using NumPy NumPy module supports the average() function in which we can pass the values and weights to it and get the weighted average of the pandas DataFrame. In the first parameter, we need to pass the values column. In the second parameter, we will assign the ‘weight data’ column to weights. numpy.average(DataFrame_object[‘value_data’],weights=DataFrame_object[‘weight_data’]) Example In this example, we have a DataFrame named ‘calculations’ with 2 columns. We will directly use numpy.average() to calculate the weighted average. import pandas import numpy # Create the dataframe with 2 columns and 5 rows calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15], 'quantity':[100,200,345,670,50] }) # Display the DataFrame - calculations print(calculations) print() print(numpy.average(calculations['quantity'],weights=calculations['count'])) Output: count quantity 0 12 100 1 34 200 2 56 345 3 10 670 4 15 50 273.7795275590551 dtype: float64 Explanation Here, the quantity column will be the value, and the count will be the weights. The weighted average is 273.77. Conclusion The Pandas weighted average is a valuable and technical function. We have done the custom function of the Pandas weighted average and the NumPy Pandas weighted average. The average is something we need to calculate in almost everything, even the budgets of small groceries. Thus, when talking about the millions of data, the weighted average Pandas function is a treat for all the users working on the specific data average calculations in their fields. View the full article Quote Link to comment Share on other sites More sharing options...

## Recommended Posts

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.