Start your data science journey with Python. Learn practical Python programming skills for basic data manipulation and analysis.

- Home
- Python Essentials for Data Analysis IToggle Dropdown
- 1.1 Getting started - Hello, World!
- 1.2 Variables
- 1.3 Data types
- 1.4 Printing
- 1.5 Lists
- 1.6 Dictionaries
- 1.7 Input function
- 1.8 Arithmetic operators
- 1.9 Comparison operators
- 1.10 Logical operators
- 1.11 Identity operators
- 1.12 Membership operators
- 1.13 Conditional statements (if-elif-else)
- 1.14 Importing modules
- 1.15 For loops
- 1.16 While loops

- Python Essentials for Data Analysis IIToggle Dropdown
- 2.1 Introduction to Functions in Python
- 2.2 Functions - Arguments
- 2.3 Functions with Return Values
- 2.4 Functions - A Fun Exercise!
- 2.5 Functions - Arbitrary Arguments (*args)
- 2.6 Functions - Arbitrary Keyword Arguments (**kwargs)
- 2.7 Recursive Functions
- 2.8 Lambda Expressions
- 2.9 Functions - More Exercises

- Data Analysis with Pandas
- PD.1 Introduction to Pandas
- PD.2 Basics of Pandas
- PD.3 Finding and Describing data
- PD.4 Assigning Data
- PD.5 Manipulating Data
- PD.6 Handling Missing Data
- PD.7 Removing and adding data
- PD.8 Renaming data
- PD.9 Combining data
- PD.10 Using Pandas with other functions/mods
- PD.11 Data classification and summary
- PD.12 Data visualisation

- Data Analysis with NumPyToggle Dropdown
- NP.1 Introduction to NumPy
- NP.2 Create Arrays Using lists
- NP.3 Creating Arrays with NumPy Functions
- NP.4 Array Slicing
- NP.5 Array Reshaping
- NP.6 Math with NumPy I
- NP.7 Combining 2 arrays
- NP.8 Adding elements to arrays
- NP.9 Inserting elements into arrays
- NP.10 Deleting elements from arrays
- NP.11 Finding unique elements and sorting
- NP.12 Math with NumPy II
- NP.13 Analysing data across arrays
- NP.14 NumPy Exercises

- Learning Resources
- Contact Us

There may be situations where we must combine different datasets to perform other analysis, Pandas allows us to do this in three ways: `concat()`

, `join()`

, and `merge()`

(In ascending order of complexity).

The most common and simplest method is `concat([DATAFRAME1,DATAFRAME2])`

. This function puts together a list of elements along an axis, especially useful when the columns in the dataframe are the same.

`join()`

lets you combine different DataFrame objects which have an index in common. More useful for cases where two dataframes only have one column in common and different other headers.

**Example**:

DATAFRAME1.join(DATFRAME2, lsuffix='STRING1', rsuffix='STRING2')

Most of what `merge()`

does can be done by `join()`

and at a much lower complexity, you may read the full documentation if you're interested.

Let's move away from our retail dataset for awhile and use the Canada and UK Dataset to see if we can successfully perform our combination operations on these data sets.

1. Using `concat()`

, combine the 2 dataset, assuming all headers are the same

2. Assuming only the column category_id is the same, combine the 2 datasets now. (Hint: Are there extra parameters to take into account if the two dataframes have the same column name but are not to be classified together under one column?)

**Answers for Activity: Combining Data**

import numpy as np import pandas as pd #1. Using concat(), combine the 2 dataset, assuming all headers are the same df1=pd.read_csv("Canadian Vids Data.csv") df2=pd.read_csv("UK Vids Data.csv") print(pd.concat([df1,df2])) #2. Assuming only the columns title and trending_date are the same, combine the 2 datasets now. df1=pd.read_csv("Canadian Vids Data.csv") df2=pd.read_csv("UK Vids Data.csv") left=df1.set_index(['category_id']) right=df2.set_index(['category_id']) print(left.join(right,lsuffix='_CANADA',rsuffix='_UK').columns)

- Last Updated: Feb 6, 2024 10:02 AM
- URL: https://libguides.ntu.edu.sg/python
- Print Page

You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy.
Users will be personally liable for any infringement of Copyright and Licensing laws.

Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.