There may be situations where we must combine different datasets to perform other analysis, Pandas allows us to do this in three ways: concat()
, join()
, and merge()
(In ascending order of complexity).
The most common and simplest method is concat([DATAFRAME1,DATAFRAME2])
. This function puts together a list of elements along an axis, especially useful when the columns in the dataframe are the same.
join()
lets you combine different DataFrame objects which have an index in common. More useful for cases where two dataframes only have one column in common and different other headers.
Example:
DATAFRAME1.join(DATFRAME2, lsuffix='STRING1', rsuffix='STRING2')
Most of what merge()
does can be done by join()
and at a much lower complexity, you may read the full documentation if you're interested.
Let's move away from our retail dataset for awhile and use the Canada and UK Dataset to see if we can successfully perform our combination operations on these data sets.
1. Using concat()
, combine the 2 dataset, assuming all headers are the same
2. Assuming only the column category_id is the same, combine the 2 datasets now. (Hint: Are there extra parameters to take into account if the two dataframes have the same column name but are not to be classified together under one column?)
Answers for Activity: Combining Data
import numpy as np import pandas as pd #1. Using concat(), combine the 2 dataset, assuming all headers are the same df1=pd.read_csv("Canadian Vids Data.csv") df2=pd.read_csv("UK Vids Data.csv") print(pd.concat([df1,df2])) #2. Assuming only the columns title and trending_date are the same, combine the 2 datasets now. df1=pd.read_csv("Canadian Vids Data.csv") df2=pd.read_csv("UK Vids Data.csv") left=df1.set_index(['category_id']) right=df2.set_index(['category_id']) print(left.join(right,lsuffix='_CANADA',rsuffix='_UK').columns)
You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy. Users will be personally liable for any infringement of Copyright and Licensing laws. Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.