Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Python for Basic Data Analysis: PD.5 Manipulating Data

Get started on your learning journey towards data science using Python. Equip yourself with practical skills in Python programming for the purpose of basic data manipulation and analysis.

Manipulating the dataset

We are able to change certain features of our dataset from what it originally was through pandas.
Here are some of the ways we can do that:
1. Changing datatypes
2. Sorting the datatypes

Changing datatypes
If we have certain data that are not in our desired datatype we can simply reassign it as show below.
We make use of our basic pandas function, df.info(), to see the changes made.

"""We can change data types using astype as well"""
print(df.info())
df['net_sales'] = df['net_quantity'].astype('float64')
print(df.info())

 

Sorting the data
We can now make changes to the order of the dataset according to certain rules we want.

"""ascending=False as we want to to sort in descending order"""
df.sort_values(by='net_sales', ascending=False).head()

"""we can sort to multiple and specific columns"""
df.sort_values(by=['order_fufilled', 'net_sales'], ascending=[False, True]).head()

 

Calling specific attributes of the dataset

We can also acquire specific statistical information using common pandas syntaxes, as well as retrieve information with slicing methods similar to a list, try out these examples and take a look at the output.

1. Obtaining mean of net sales

df['net_sales'].mean()

2. Obtaining statistics of fufilled orders only (i.e. order_fulfilled==1)

df[df['order_fufilled'] == 1].mean()

3. Output the mean cost of fulfilled orders only

df[df['order_fufilled'] == 1]['cost_of_sales'].mean()

4. Acquiring maximum net sales of orders that weren't fufilled and placed before 1/1/2019

df[(df['order_fufilled'] == 0) & (df['date'] < "1/1/2019")]['net_sales'].max()

5. Data frame slicing rows 0-20 for columns net_sales to net_quantity

df.loc[0:20, 'net_sales':'net_quantity']

6. Data frame slicing rows 0 to 4 and columns 0 to 2 as indices

df.iloc[0:5, 0:3]

7. Calling last row, all columns of the data set

df[-1:]

 

Video Guides

Further Readings