Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Python for Basic Data Analysis: PD.2 Basics of Pandas

Get started on your learning journey towards data science using Python. Equip yourself with practical skills in Python programming for the purpose of basic data manipulation and analysis.

Basic Syntaxes and functions of pandas

Now that we have our data imported, we can make use of pandas basic syntaxes and functions to get a basic statistical analysis of our dataset.

1. df.head() - this shows you a mini preview of your data so you can get an idea of what column headers and the type of data within the file

2. df.shape - This returns the number of rows and columns in your dataset in a vector output (no. of rows, no. of columns)
3. df.columns - Returns the list of column headers and the datatype of these headers
4. df.info() - Returns detailed information about your dataset
5. df.describe() - Returns detailed statiscal information about your dataset

Go ahead and try these!

Note: Keep in mind we use df to call these functions as that is the variable we used to assign our dataset to, we can use any other variable as well

Activity: Basic Functions

Given the newly learnt functions can you perform these actions on python using pandas?

1. Output the first few rows of data

2. Output data dimensions (i.e. the number of rows and columns of the dataset)

3. Output the column header names

4. Obtain detailed information about the dataset

5. Output statistical information about the dataset

import pandas as pd

df=pd.read_csv("Retail dataset.csv")

#1. Output the first few rows of data
print(df.head())

#2. Output data dimensions (i.e. the number of rows and columns of the dataset)
print(df.shape)

#3. Output the column header names
print(df.columns)

#4. Obtain detailed information about the dataset
print(df.info())

#5. Output statistical information about the dataset
print(df.describe())

Video Guides