There may be situations where we must combine different datasets to perform other analysis, Pandas allows us to do this in three ways:
merge() (In ascending order of complexity).
The most common and simplest method is
concat([DATAFRAME1,DATAFRAME2]). This function puts together a list of elements along an axis, especially useful when the columns in the dataframe are the same.
join() lets you combine different DataFrame objects which have an index in common. More useful for cases where two dataframes only have one column in common and different other headers.
DATAFRAME1.join(DATFRAME2, lsuffix='STRING1', rsuffix='STRING2')
Most of what
merge() does can be done by
join() and at a much lower complexity, you may read the full documentation if you're interested.
Let's move away from our retail dataset for awhile and use the Canada and UK Dataset to see if we can successfully perform our combination operations on these data sets.
concat(), combine the 2 dataset, assuming all headers are the same
2. Assuming only the column category_id is the same, combine the 2 datasets now. (Hint: Are there extra parameters to take into account if the two dataframes have the same column name but are not to be classified together under one column?)
Answers for Activity: Combining Data
import numpy as np import pandas as pd #1. Using concat(), combine the 2 dataset, assuming all headers are the same df1=pd.read_csv("Canadian Vids Data.csv") df2=pd.read_csv("UK Vids Data.csv") print(pd.concat([df1,df2])) #2. Assuming only the columns title and trending_date are the same, combine the 2 datasets now. df1=pd.read_csv("Canadian Vids Data.csv") df2=pd.read_csv("UK Vids Data.csv") left=df1.set_index(['category_id']) right=df2.set_index(['category_id']) print(left.join(right,lsuffix='_CANADA',rsuffix='_UK').columns)