Skip to Main Content

Python for Basic Data Analysis

Start your data science journey with Python. Learn practical Python programming skills for basic data manipulation and analysis.

Data Classification and Summary

Classifying your data for more refined and accurate understanding of the data is an important faucet of data analysis, hence, we can utilize Pandas to carry out such operations by organizing and summarizing our data.

Groupby Function

One way we can classify our data is using the group by function:

df.groupby(by=grouping_columns)[columns_to_show].function()

Pivot_table Function

For summarizing data, we can employ similar methods to that of the excel function, pivot tables, to avoid showing large chunks of data

df.pivot_table([rows_to_be_displayed'],
               [columns_to_be_displayed], aggfunc=function)

 

Video Guides

Activity Data: Classification

Given the newly learnt functions can you perform these actions back to our retail dataset?

1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date

2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates?

Answers for Activity: Classification

import numpy as np
import pandas as pd
df=pd.read_csv("Retail dataset.csv")
#
#1. Using the groupby and describe function can we obtain statistical information of the retail dataset by net_sales and date
print(df.groupby(['date'])['net_sales'].describe())
#2. Can we create a pivot table showing the average net sales and net quantity for failed and successful orders by specific dates?
df.pivot_table(['net_sales', 'net_quantity'],
               ['date', 'order_fufilled'], aggfunc='mean')