If a data set has missing entries, the values are given as NaN
, which stands for "Not a Number".These NaN
values will always be float64
dtype due to technical reason.
Pandas allows us to access missing data. To access NaN
entries we can use pd.isnull()
or pd.notnull()
We can also replace these null values with whatever we want using fillna()
Should we need to replace other values that represent null values but are not shown by a NaN
entries, we may use replace()
We can also acquire specific statistical information using common pandas syntaxes, as well as retrieve information with slicing methods similar to a list, try out these examples and take a look at the output.
1. Output rows which have Nan entries in Net Sales
2. Output rows which do not have Nan entries in order fufilled
3. Replace the Nan entries in Net Sales with 0 using fillna()
4. Replace entries which are 'MISSING' in order fufilled with False using replace()
Answers for Activity: Missing Data
import numpy as np import pandas as pd df=pd.read_csv("Retail dataset.csv") #1. Output rows which have Nan entries in Net Sales print(df[pd.isnull(df['net_sales'])]) #2. Output rows which do not have Nan entries in order fufilled print(df[pd.notnull(df['order_fufilled'])]) #3. Replace the Nan entries in Net Sales using fillna() df['net_sales'].fillna(0) #4. Replace Nan entries in order fufilled using replace()' df['order_fufilled'].replace("MISSING",False)
You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy. Users will be personally liable for any infringement of Copyright and Licensing laws. Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.