Classifying your data for more refined and accurate understanding of the data is an important faucet of data analysis, hence, we can utilize Pandas to carry out such operations by organizing and summarizing our data.
Groupby Function
One way we can classify our data is using the group by function:
df.groupby(by=grouping_columns)[columns_to_show].function()
Pivot_table Function
For summarizing data, we can employ similar methods to that of the excel function, pivot tables, to avoid showing large chunks of data
df.pivot_table([rows_to_be_displayed'], [columns_to_be_displayed], aggfunc=function)
Given the newly learnt functions can you perform these actions back to our retail dataset?
1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date
2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates?
Answers for Activity: Classification
import numpy as np import pandas as pd df=pd.read_csv("Retail dataset.csv") # #1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date print(df.groupby(['date'])['net_sales'].describe()) #2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates? df.pivot_table(['net_sales', 'net_quantity'], ['date', 'order_fufilled'], aggfunc='mean')
You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy. Users will be personally liable for any infringement of Copyright and Licensing laws. Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.