Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Python for Basic Data Analysis: PD.6 Handling Missing Data

Get started on your learning journey towards data science using Python. Equip yourself with practical skills in Python programming for the purpose of basic data manipulation and analysis.

Handling Missing Data

If a data set has missing entries, the values are given as NaN, which stands for "Not a Number".These NaN values will always be float64 dtype due to technical reason.

Pandas allows us to access missing data. To access NaN entries we can use pd.isnull() or  pd.notnull()

We can also replace these null values with whatever we want using fillna()

Should we need to replace other values that represent null values but are not shown by a NaN entries, we may use replace()

 

Activity: Missing Data

We can also acquire specific statistical information using common pandas syntaxes, as well as retrieve information with slicing methods similar to a list, try out these examples and take a look at the output.

1. Output rows which have Nan entries in Net Sales

 

2. Output rows which do not have Nan entries in order fufilled

 

3. Replace the Nan entries in Net Sales with 0 using fillna()

 

4. Replace entries which are 'MISSING' in order fufilled with False using replace()

Answers for Activity: Missing Data

import numpy as np
import pandas as pd
df=pd.read_csv("Retail dataset.csv")

#1. Output rows which have Nan entries in Net Sales
print(df[pd.isnull(df['net_sales'])])
#2. Output rows which do not have Nan entries in order fufilled
print(df[pd.notnull(df['order_fufilled'])])
#3. Replace the Nan entries in Net Sales using fillna()
df['net_sales'].fillna(0)
#4. Replace Nan entries in order fufilled using replace()'
df['order_fufilled'].replace("MISSING",False)

Video Guides