Start your data science journey with Python. Learn practical Python programming skills for basic data manipulation and analysis.

- Home
- Python Essentials for Data Analysis IToggle Dropdown
- 1.1 Getting started - Hello, World!
- 1.2 Variables
- 1.3 Data types
- 1.4 Printing
- 1.5 Lists
- 1.6 Dictionaries
- 1.7 Input function
- 1.8 Arithmetic operators
- 1.9 Comparison operators
- 1.10 Logical operators
- 1.11 Identity operators
- 1.12 Membership operators
- 1.13 Conditional statements (if-elif-else)
- 1.14 Importing modules
- 1.15 For loops
- 1.16 While loops

- Python Essentials for Data Analysis IIToggle Dropdown
- 2.1 Introduction to Functions in Python
- 2.2 Functions - Arguments
- 2.3 Functions with Return Values
- 2.4 Functions - A Fun Exercise!
- 2.5 Functions - Arbitrary Arguments (*args)
- 2.6 Functions - Arbitrary Keyword Arguments (**kwargs)
- 2.7 Recursive Functions
- 2.8 Lambda Expressions
- 2.9 Functions - More Exercises

- Data Analysis with Pandas
- PD.1 Introduction to Pandas
- PD.2 Basics of Pandas
- PD.3 Finding and Describing data
- PD.4 Assigning Data
- PD.5 Manipulating Data
- PD.6 Handling Missing Data
- PD.7 Removing and adding data
- PD.8 Renaming data
- PD.9 Combining data
- PD.10 Using Pandas with other functions/mods
- PD.11 Data classification and summary
- PD.12 Data visualisation

- Data Analysis with NumPyToggle Dropdown
- NP.1 Introduction to NumPy
- NP.2 Create Arrays Using lists
- NP.3 Creating Arrays with NumPy Functions
- NP.4 Array Slicing
- NP.5 Array Reshaping
- NP.6 Math with NumPy I
- NP.7 Combining 2 arrays
- NP.8 Adding elements to arrays
- NP.9 Inserting elements into arrays
- NP.10 Deleting elements from arrays
- NP.11 Finding unique elements and sorting
- NP.12 Math with NumPy II
- NP.13 Analysing data across arrays
- NP.14 NumPy Exercises

- Learning Resources
- Contact Us

We are able to change certain features of our dataset from what it originally was through pandas.

Here are some of the ways we can do that:

*1. Changing datatypes
2. Sorting the datatypes*

**Changing datatypes**

If we have certain data that are not in our desired datatype we can simply reassign it as show below.

We make use of our basic pandas function, `df.info()`

, to see the changes made.

"""We can change data types using astype as well""" print(df.info()) df['net_sales'] = df['net_quantity'].astype('float64') print(df.info())

**Sorting the data**

We can now make changes to the order of the dataset according to certain rules we want.

"""ascending=False as we want to to sort in descending order""" df.sort_values(by='net_sales', ascending=False).head() """we can sort to multiple and specific columns""" df.sort_values(by=['order_fufilled', 'net_sales'], ascending=[False, True]).head()

We can also acquire specific statistical information using common pandas syntaxes, as well as retrieve information with slicing methods similar to a list, try out these examples and take a look at the output.

**1. Obtaining mean of net sales**

df['net_sales'].mean()

**2. Obtaining statistics of fufilled orders only (i.e. order_fulfilled==1)**

df[df['order_fufilled'] == 1].mean()

**3. Output the mean cost of fulfilled orders only **

df[df['order_fufilled'] == 1]['cost_of_sales'].mean()

**4. Acquiring maximum net sales of orders that weren't fufilled and placed before 1/1/2019**

df[(df['order_fufilled'] == 0) & (df['date'] < "1/1/2019")]['net_sales'].max()

**5. Data frame slicing rows 0-20 for columns net_sales to net_quantity**

df.loc[0:20, 'net_sales':'net_quantity']

**6. Data frame slicing rows 0 to 4 and columns 0 to 2 as indices**

df.iloc[0:5, 0:3]

**7. Calling last row, all columns of the data set**

df[-1:]

- Last Updated: Feb 6, 2024 10:02 AM
- URL: https://libguides.ntu.edu.sg/python
- Print Page

You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy.
Users will be personally liable for any infringement of Copyright and Licensing laws.

Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.