Skip to Main Content

Python for Basic Data Analysis

Start your data science journey with Python. Learn practical Python programming skills for basic data manipulation and analysis.

Data Visualisation

Since Pandas simply helps us with data structuring, we will need to employ Pandas in conjunction with other modules to help visualize this data. Two common modules are seaborn and matplotlib. With seaborn and matplotlib we can create some visualisations quickly using the following functions

1. Line plots

sns.lineplot(x=X_FIEL',y=Y_FIELD,data=DATFRAME)

2. Regression plots

ay=sns.relplot(x=X_FIELD,y=Y_FIELD,hue=DATA_CLASSIFIER,data=DATFRAME)

3. Histogram Plots

plt.hist(X_FIELD, bins=NUMBER_OF_BINS)

4.  Pair Plots

sns.pairplot(DATAFRAME, hue=DATA_CLASSIFIER,height=HEIGHT)

 

 

Video Guides

Activity: Data visualisation

Go ahead and try to plot these graphs using the retail dataset.

1. Plot a line plot of net_sales against date, is there any correlation?

2. Perform a regression analysis between average_selling_price and avg_margins and see if there is any correlation

3. Plot a histogram of revenue across 50 bins

4. Perform a pairplot, classify the data by order_fufilled status

Activity: Data visualization

Go ahead and try to plot these graphs using the retail dataset.

1. Plot a line plot of net_sales against date, is there any correlation?

2. Perform a regression analysis between average_selling_price and avg_margins and see if there is any correlation

3. Plot a histogram of revenue across 50 bins

4. Perform a pairplot, classify the data by order_fufilled status

Answers for Activity: Data Visualization

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

df=pd.read_csv("Retail dataset.csv")

#1. Plot a line plot of net_sales against date, is there any correlation?
sns.lineplot(x='date',y='net_sales',data=df)
plt.show()
print("done")
#2. Perform a regression analysis between average_selling_price and avg_margins and see if there is any correlation
sns.regplot(x='average_selling_price',y='avg_margins',data=df)
#3. Plot a histogram of revenue across 50 bins
plt.hist(df.revenue, bins=50)
#4. Perform a pairplot, classify the data by order_fufilled status
sns.pairplot(df, hue='order_fufilled',height=3)
plt.show()

Further Readings