LibGuides: Python for Basic Data Analysis: PD.11 Data classification and summary

Data Classification and Summary

Classifying your data for more refined and accurate understanding of the data is an important faucet of data analysis, hence, we can utilize Pandas to carry out such operations by organizing and summarizing our data.

Groupby Function

One way we can classify our data is using the group by function:

df.groupby(by=grouping_columns)[columns_to_show].function()

Pivot_table Function

For summarizing data, we can employ similar methods to that of the excel function, pivot tables, to avoid showing large chunks of data

df.pivot_table([rows_to_be_displayed'],
               [columns_to_be_displayed], aggfunc=function)

Activity Data: Classification

Task
Answers

Given the newly learnt functions can you perform these actions back to our retail dataset?

1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date

2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates?

Retail Dataset

Answers for Activity: Classification

import numpy as np
import pandas as pd
df=pd.read_csv("Retail dataset.csv")
#
#1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date
print(df.groupby(['date'])['net_sales'].describe())
#2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates?
df.pivot_table(['net_sales', 'net_quantity'],
               ['date', 'order_fufilled'], aggfunc='mean')

Python for Basic Data Analysis

Data Classification and Summary

Video Guides

Activity Data: Classification

Further Readings