Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Python for Basic Data Analysis: PD.3 Finding and Describing data

Get started on your learning journey towards data science using Python. Equip yourself with practical skills in Python programming for the purpose of basic data manipulation and analysis.

Finding data

Similar to lists in python, we can use indexing methods to find specific rows and columns of the data frame. Pandas accesses these indexes through this syntax.

df.iloc[row,column]

Activity: Finding data

See if you can get all these.

1. Output the third row of the data frame

2. Output the last row of the data frame

3. Output the last 3 columns of the data frame

Describing specific data

We can make use of the basic functions above to further deepdive on our dataset. We can get some information about only some data types that we are interested in using:

df.describe(include=['OBJECT_TYPE1','OBJECT_TYPE2'])

Furthermore, we can obtain the frequency of certain values using:

df.value_counts(Normalize=False)

Activity: Describing specific data

Try obtaining these outputs!

1. Output information about the object and bool data types data within the dataset (Bonus: What if we try returning data types that don't exist within the dataset)

2. The order_fulfilled data type is bool, can we get information about the number of Trues and Falses?

3. If 2. is possible, can you normalize this data?

Answers for Activity: Finding Data

import numpy as np
import pandas as pd
df=pd.read_csv("Retail dataset.csv")

#1. Output the third row of the data frame
print(df.iloc[2])
#2. Output the last row of the data frame
print(df.iloc[-1])
#3. Output the last 3 columns of the data frame
print(df.iloc[:,-3:])

Answers for Activity: Describing specific data

import pandas as pd
df=pd.read_csv("Retail dataset.csv")

#1. Output information about the object and bool data types data within the dataset (Bonus: What if we try returning data types that don't exist within the dataset)
print(df.describe(include=['object','bool']))
#2. The order_fulfilled data type is bool, can we get information about the number of Trues and Falses?
print(df['order_fufilled'].value_counts())
#3. If 2. is possible, can you normalize this data?
print(df['order_fufilled'].value_counts(normalize=True)
)

Video Guides

Further Readings