Start your data science journey with Python. Learn practical Python programming skills for basic data manipulation and analysis.

- Home
- Python Essentials for Data Analysis IToggle Dropdown
- 1.1 Getting started - Hello, World!
- 1.2 Variables
- 1.3 Data types
- 1.4 Printing
- 1.5 Lists
- 1.6 Dictionaries
- 1.7 Input function
- 1.8 Arithmetic operators
- 1.9 Comparison operators
- 1.10 Logical operators
- 1.11 Identity operators
- 1.12 Membership operators
- 1.13 Conditional statements (if-elif-else)
- 1.14 Importing modules
- 1.15 For loops
- 1.16 While loops

- Python Essentials for Data Analysis IIToggle Dropdown
- 2.1 Introduction to Functions in Python
- 2.2 Functions - Arguments
- 2.3 Functions with Return Values
- 2.4 Functions - A Fun Exercise!
- 2.5 Functions - Arbitrary Arguments (*args)
- 2.6 Functions - Arbitrary Keyword Arguments (**kwargs)
- 2.7 Recursive Functions
- 2.8 Lambda Expressions
- 2.9 Functions - More Exercises

- Data Analysis with Pandas
- PD.1 Introduction to Pandas
- PD.2 Basics of Pandas
- PD.3 Finding and Describing data
- PD.4 Assigning Data
- PD.5 Manipulating Data
- PD.6 Handling Missing Data
- PD.7 Removing and adding data
- PD.8 Renaming data
- PD.9 Combining data
- PD.10 Using Pandas with other functions/mods
- PD.11 Data classification and summary
- PD.12 Data visualisation

- Data Analysis with NumPyToggle Dropdown
- NP.1 Introduction to NumPy
- NP.2 Create Arrays Using lists
- NP.3 Creating Arrays with NumPy Functions
- NP.4 Array Slicing
- NP.5 Array Reshaping
- NP.6 Math with NumPy I
- NP.7 Combining 2 arrays
- NP.8 Adding elements to arrays
- NP.9 Inserting elements into arrays
- NP.10 Deleting elements from arrays
- NP.11 Finding unique elements and sorting
- NP.12 Math with NumPy II
- NP.13 Analysing data across arrays
- NP.14 NumPy Exercises

- Learning Resources
- Contact Us

In order to perform data analysis on data, it must be first structured in a manner which we are able to manipulate and perform operations on, a common way in python in which this is done is through the pandas module.

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

We will be going through the following pages you to equip you with the basic understanding and applications of pandas for data analysis along with other modules often used in conjunction with pandas to achieve these goals.

Note: We will be using python 3 syntax

Pandas first and foremost function is to assist with data structures, we will be using the following modules to assist with the data visualization process, do import them when trying out the other activities

%matplotlib inline

import numpy as np

import pandas as pd

Selecting Data source

To better understand how pandas works with datasets, we've prepared large data sets for you to manipulate and test.

Select from either Retail/BNF Industries dataset

+ Retail Industry dataset : Retail dataset.csv

You may have noticed that the two files are in diff formats (i.e. csv and xlsx). That's the beauty of pandas, it allows you to read both variants of the excel files!

Now to use pandas to read the dataset we can use the following syntax below and assign it to a variable.

pd.read_csv("FILE_NAME_HERE.FORMAT")

Go ahead and try assigning it to the variable df below!

"""Read the data file of choice""" df=pd.read_csv("Retail dataset.csv")

In this case we are reading csv files, however there may be circumstances where we deal with excel or xlsx files as well, we can simply replace `pd.read_csv`

with `pd.read_excel`

- Retail DatasetDataset to be used

- Last Updated: Jun 24, 2024 9:14 AM
- URL: https://libguides.ntu.edu.sg/python
- Print Page

You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy.
Users will be personally liable for any infringement of Copyright and Licensing laws.

Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.