It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Get started on your learning journey towards data science using Python. Equip yourself with practical skills in Python programming for the purpose of basic data manipulation and analysis.

Classifying your data for more refined and accurate understanding of the data is an important faucet of data analysis, hence, we can utilize Pandas to carry out such operations by organizing and summarizing our data.

**Groupby Function**

One way we can classify our data is using the group by function:

df.groupby(by=grouping_columns)[columns_to_show].function()

**Pivot_table Function**

For summarizing data, we can employ similar methods to that of the excel function, pivot tables, to avoid showing large chunks of data

df.pivot_table([rows_to_be_displayed'], [columns_to_be_displayed], aggfunc=function)

Given the newly learnt functions can you perform these actions back to our retail dataset?

1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date

2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates?

**Answers for Activity: Classification**

import numpy as np import pandas as pd df=pd.read_csv("Retail dataset.csv") # #1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date print(df.groupby(['date'])['net_sales'].describe()) #2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates? df.pivot_table(['net_sales', 'net_quantity'], ['date', 'order_fufilled'], aggfunc='mean')