Search the Community

Showing results for tags 'python libraries'.

Found 2 results

Sort By
- Date
- Relevancy

pandas Pandas Groupby Max

Linux Hint posted a topic in Databases, Data Engineering & Data Science

In Python, the “Pandas” library supports different modules and methods to perform several data operations such as DataFrame creation, data selection, data extraction, and others. The “groupby()” is one of the Pandas methods that is used in Python to create a group based on column values. To find the maximum value for each specified group the “max()” function is used in Python. This article will provide you with a detailed guide on how to determine the maximum value of the selected columns group on single or multiple columns. For this, consider the content provided below: How to Determine the Max Value From the Grouped Data of Pandas DataFrame? Find the Maximum Value From the Grouped Data of the Single Column Find the Maximum Value From the Grouped Data of the Multiple Column Group Data By a Specific Column and Extract Maximum Value From Multiple Columns Determining and Sorting the Maximum Value How to Determine the Max Value From the Grouped Data of Pandas DataFrame? To determine the max value from the grouped data, the “df.groupby()” method is used along with the “max()” method. Here is the syntax: df.groupby([Col1])[Col2].max() For further understanding of the “df.groupby()” method, you can check this detailed guide. Now, let’s explore this method using the following examples: Example 1: Find the Maximum Value From the Grouped Data of the Single Column Let’s overview the following example: import pandas data = pandas.DataFrame({'Team': ['X', 'X', 'X', 'Y', 'Y', 'Y'],'Players': [10, 20, 30, 5, 22, 33],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]}) print(data, '\n') print(data.groupby('Team')['Points'].max()) In the above code: The “pd.DataFrame()” function creates/constructs a DataFrame. The “groupby()” method is utilized to group the data based on the “Team” column. The “max()” method is then applied to the “Points” column of each group to find the maximum number of points scored by each team. Output The above output is a new DataFrame object that contains two columns named “Team” and “Points”, where each row represents a team and its maximum score. Example 2: Find the Maximum Value From the Grouped Data of the Multiple Column Let’s understand this example by the following code: import pandas data = pandas.DataFrame({'Team': ['X', 'X', 'X', 'Y', 'Y', 'Y'],'Players': ['A', 'B', 'B', 'A', 'B', 'A'],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]}) print(data, '\n') print(data.groupby(['Team', 'Players'])['Points'].max()) In the above code: The “pd.DataFrame()” function of the “pandas” module is used to create a DataFrame. The “groupby()” method groups on multiple columns and the “max()” function is used to determine the maximum value of each group in the selected columns. Output The maximum value of the “Points” column has been determined for each group created on multiple columns “Team” and “Players”. Example 3: Group Data By a Specific Column and Extract Maximum Value From Multiple Columns Take the following code to understand this example: import pandas data = pandas.DataFrame({'Team': ['X', 'X', 'X', 'Y', 'Y', 'Y'],'Players': [10, 20, 30, 5, 22, 33],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]}) print(data, '\n') print(data.groupby('Team')['Points', 'Players'].max()) Here in this code: The “groupby()” method groups the data of DataFrame based on the “Team” column. The “max()” method is then applied to the “Points” and “Players” columns of each group to find the maximum value. Output The maximum value of the multiple columns of the specified group has been displayed. Example 4: Determining and Sorting the Maximum Value To sort the maximum value of the specified group data, use the below code: import pandas data = pandas.DataFrame({'Team': ['X', 'X', 'Y', 'Y', 'Z', 'Z'],'Players': [10, 20, 30, 5, 22, 33],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]}) print(data, '\n') print(data.groupby('Team')['Points'].max().reset_index().sort_values(['Points'], ascending=True)) In this example: The “groupby()” method groups the data based on the “Team” column and the “max()” method determines the maximum value of the selected column “Points”. The “reset_index()” method is used to reset the index of the DataFrame and “sort_values()” is used to sort the maximum value in ascending order. Output The maximum points of the teams have been sorted in ascending order. Conclusion The “DataFrame.groupby()” method is used along with the “max()” function to calculate the max value from the grouped data. The “groupby()” is used to group the data based on single or more than two columns. The “sort_values()” function can also be used with the “groupby()” and “max()” functions to sort the maximum value. This tutorial has presented an extensive guide on Pandas “groupby” max using numerous examples. View the full article
- August 1, 2023
- - python
  - python libraries
pandas Pandas Sum Column

Linux Hint posted a topic in Databases, Data Engineering & Data Science

This article will demonstrate how to sum all or particular columns in a Pandas DataFrame using Python. The DataFrame.sum() function will be used along with a few helpful parameters in the numerous examples of this tutorial. The ‘dataframe.sum()’ function in Pandas returns the total sum for the specified axis. If the input is an axis of the index, the function adds each column’s values individually. Then it does the same for each column, returning a series storing the sum of the data/values in each column. Additionally, it supports calculating the DataFrame’s sum by ignoring the missing values. Syntax pandas.DataFrame_object.sum(axis = None, skipna = None, level = None, numeric_only = None, min_count = 0, **kwargs) Parameters axis: {columns (1), index (0)} skipna: Ignore NA/null values when calculating the result. level: If the specified axis is hierarchical (a multi-index), count to a particular index level before converting to a Series. numeric_only: Just float, int, and Boolean columns are acceptable. If None, try to use everything; if not, only numerical data. For Series, not implemented. min_count: The number of possible values required to complete the operation. The outcome will be NA if there are fewer non-NA values present than min_count. Return DataFrame (if level specified) or Series. DataFrame For all the examples, we will use the following ‘analysis’ DataFrame. It holds 12 rows with 5 columns. import pandas # Create the dataframe using lists analysis = pandas.DataFrame([[23,'sravan',1000,34,56], [23,'sravan',700,11,0], [23,'sravan',20,4,2], [21,'siva',400,32,45], [21,'siva',100,456,78], [23,'sravan',00,90,12], [21,'siva',400,32,45], [20,'sahaja',120,1,67], [23,'sravan',00,90,12], [22,'suryam',450,76,56], [22,'suryam',40,0,1], [22,'suryam',12,45,0] ],columns=['id','name','points3','points1','points2']) # Display the DataFrame - analysis print(analysis) Output id name points3 points1 points2 0 23 sravan 1000 34 56 1 23 sravan 700 11 0 2 23 sravan 20 4 2 3 21 siva 400 32 45 4 21 siva 100 456 78 5 23 sravan 0 90 12 6 21 siva 400 32 45 7 20 sahaja 120 1 67 8 23 sravan 0 90 12 9 22 suryam 450 76 56 10 22 suryam 40 0 1 11 22 suryam 12 45 0 Here, the ‘id’, ‘points3’, ‘points2’, and ‘points1’ columns are numeric, and make sure that you need to load the DataFrame for all the examples that we are discussing in this tutorial. Scenario 1: Sum of All Columns We can directly apply sum() on the DataFrame to return the sum of values in each column. pandas.DataFrame_object.sum() Example # Return the sum of values in all columns print(analysis.sum()) Output id 264 name sravansravansravansivasivasravansivasahajasrav... points3 3242 points1 871 points2 374 Explanation You can see that the sum of values in each column is returned. Scenario 2: Sum of Particular Column If you want to return the sum of values in a particular column, then you need to specify the column name and the DataFrame object. pandas.DataFrame_object[‘column’].sum() Example Let’s return the sum of values in the ‘points1’,’points2’, and ‘points3’ columns separately. # Return the sum of values in points1 column print(analysis['points1'].sum()) # Return the sum of values in points2 column print(analysis['points2'].sum()) # Return the sum of values in points3 column print(analysis['points3'].sum()) Output 871 374 3242 Explanation Sum of values in the points1 column is 871. Sum of values in the points2 column is 374. Sum of values in the points3 column is 3242. Scenario 3: Sum Across Rows If you want to return the sum of values across each row, then you need to specify the axis parameter in the sum() function and set it to 1. pandas.DataFrame_object[[column/s…]].sum(axis=1) Example Let’s return the sum of values of ‘points1’, ‘points2’, and ‘points3’ across all rows and store the result in the ‘SUM’ column. # Return the sum of values across each row analysis['SUM']=analysis[['points1','points2','points3']].sum(axis=1) print(analysis) Output id name points3 points1 points2 SUM 0 23 sravan 1000 34 56 1090 1 23 sravan 700 11 0 711 2 23 sravan 20 4 2 26 3 21 siva 400 32 45 477 4 21 siva 100 456 78 634 5 23 sravan 0 90 12 102 6 21 siva 400 32 45 477 7 20 sahaja 120 1 67 188 8 23 sravan 0 90 12 102 9 22 suryam 450 76 56 582 10 22 suryam 40 0 1 41 11 22 suryam 12 45 0 57 Explanation Now, the new column – ‘SUM’ holds the sum of three points. We can also add across rows without using sum(). By using the “+” operator, we can achieve the previous functionality. Example Add values in points1 and points2 columns and store the result in the ‘2 Added‘ column. Add values in points1, points2, and points3 columns and store the result in the ‘3 Added‘ column. import pandas # Create the dataframe using lists analysis = pandas.DataFrame([[23,'sravan',1000,34,56], [23,'sravan',700,11,0], [23,'sravan',20,4,2], [21,'siva',400,32,45], [21,'siva',100,456,78], [23,'sravan',00,90,12], [21,'siva',400,32,45], [20,'sahaja',120,1,67], [23,'sravan',00,90,12], [22,'suryam',450,76,56], [22,'suryam',40,0,1], [22,'suryam',12,45,0] ],columns=['id','name','points3','points1','points2']) # Add values in points1 and points2 columns and store the result in '2 Added' column analysis['2 Added']=analysis['points1']+analysis['points2'] # Add values in points1,points2 and points2columns and store the result in '3 Added' column analysis['3 Added']=analysis['points1']+analysis['points2']+analysis['points3'] print(analysis) Output id name points3 points1 points2 2 Added 3 Added 0 23 sravan 1000 34 56 90 1090 1 23 sravan 700 11 0 11 711 2 23 sravan 20 4 2 6 26 3 21 siva 400 32 45 77 477 4 21 siva 100 456 78 534 634 5 23 sravan 0 90 12 102 102 6 21 siva 400 32 45 77 477 7 20 sahaja 120 1 67 68 188 8 23 sravan 0 90 12 102 102 9 22 suryam 450 76 56 132 582 10 22 suryam 40 0 1 1 41 11 22 suryam 12 45 0 45 57 Scenario 4: sum() With groupby() If you want to return the sum of values for individual groups, then you have to use groupby() with sum(). So groupby() is used to group the column values in a particular column, and sum() will return the sum in each group. pandas.DataFrame_object.groupby(‘grouping_column’).sum() Example Let’s group the rows based on the name column and return the sum of values in each group for all columns. import pandas # Create the dataframe using lists analysis = pandas.DataFrame([[23,'sravan',1000,34,56], [23,'sravan',700,11,0], [23,'sravan',20,4,2], [21,'siva',400,32,45], [21,'siva',100,456,78], [23,'sravan',00,90,12], [21,'siva',400,32,45], [20,'sahaja',120,1,67], [23,'sravan',00,90,12], [22,'suryam',450,76,56], [22,'suryam',40,0,1], [22,'suryam',12,45,0] ],columns=['id','name','points3','points1','points2']) # group the rows based on name column and return sum of values in each group for all columns print(analysis.groupby('name').sum()) Output id points3 points1 points2 name sahaja 20 120 1 67 siva 63 900 520 168 sravan 115 1720 229 82 suryam 66 502 121 57 Explanation So there are 4 groups in the ‘name’ column. For each group, the sum of id, points3, points1, and points2 is returned. Conclusion We tried to teach you how to compute the sum across DataFrames using the Pandas sum() method. We have discussed the row-wise and column-wise addition of values in the examples of this post. Additionally, you learned how to add columns conditionally and how to sum the values after grouping the column of the DataFrame. Now, you may be able to sum the columns of the DataFrame together or sum the values within the DataFrame column by yourself. View the full article
- January 30, 2023
- - 1
- - python
  - python libraries

Forum Statistics

45k
Total Topics

44.8k
Total Posts

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Calendars

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

Website URL

LinkedIn Profile URL

About Me

Cloud Platforms

Cloud Experience

Development Experience

Current Role

Skills

Certifications

Favourite Tools

Interests

pandas Pandas Groupby Max

pandas Pandas Sum Column

Forum Statistics