If a data set has missing entries, the values are given as
NaN, which stands for "Not a Number".These
NaN values will always be
float64 dtype due to technical reason.
Pandas allows us to access missing data. To access
NaN entries we can use
We can also replace these null values with whatever we want using
Should we need to replace other values that represent null values but are not shown by a
NaN entries, we may use
We can also acquire specific statistical information using common pandas syntaxes, as well as retrieve information with slicing methods similar to a list, try out these examples and take a look at the output.
1. Output rows which have Nan entries in Net Sales
2. Output rows which do not have Nan entries in order fufilled
3. Replace the Nan entries in Net Sales with 0 using fillna()
4. Replace entries which are 'MISSING' in order fufilled with False using replace()
Answers for Activity: Missing Data
import numpy as np import pandas as pd df=pd.read_csv("Retail dataset.csv") #1. Output rows which have Nan entries in Net Sales print(df[pd.isnull(df['net_sales'])]) #2. Output rows which do not have Nan entries in order fufilled print(df[pd.notnull(df['order_fufilled'])]) #3. Replace the Nan entries in Net Sales using fillna() df['net_sales'].fillna(0) #4. Replace Nan entries in order fufilled using replace()' df['order_fufilled'].replace("MISSING",False)