Skip to Main Content

Python for Basic Data Analysis

Start your data science journey with Python. Learn practical Python programming skills for basic data manipulation and analysis.

Finding data

Similar to lists in python, we can use indexing methods to find specific rows and columns of the data frame. Pandas accesses these indexes through this syntax.

df.iloc[row,column]

Activity: Finding data

See if you can get all these.

1. Output the third row of the data frame

2. Output the last row of the data frame

3. Output the last 3 columns of the data frame

Describing specific data

We can make use of the basic functions above to further deepdive on our dataset. We can get some information about only some data types that we are interested in using:

df.describe(include=['OBJECT_TYPE1','OBJECT_TYPE2'])

Furthermore, we can obtain the frequency of certain values using:

df.value_counts(Normalize=False)

Video Guides

Activity: Describing specific data

Try obtaining these outputs!

1. Output information about the object and bool data types data within the dataset (Bonus: What if we try returning data types that don't exist within the dataset)

2. The order_fulfilled data type is bool, can we get information about the number of Trues and Falses?

3. If 2. is possible, can you normalize this data?

Answers for Activity: Finding Data

import numpy as np
import pandas as pd
df=pd.read_csv("Retail dataset.csv")

#1. Output the third row of the data frame
print(df.iloc[2])
#2. Output the last row of the data frame
print(df.iloc[-1])
#3. Output the last 3 columns of the data frame
print(df.iloc[:,-3:])

Answers for Activity: Describing specific data

import pandas as pd
df=pd.read_csv("Retail dataset.csv")

#1. Output information about the object and bool data types data within the dataset (Bonus: What if we try returning data types that don't exist within the dataset)
print(df.describe(include=['object','bool']))
#2. The order_fulfilled data type is bool, can we get information about the number of Trues and Falses?
print(df['order_fufilled'].value_counts())
#3. If 2. is possible, can you normalize this data?
print(df['order_fufilled'].value_counts(normalize=True)
)

Further Readings