Now that we have our data imported, we can make use of pandas basic syntaxes and functions to get a basic statistical analysis of our dataset.
1. df.head()
- this shows you a mini preview of your data so you can get an idea of what column headers and the type of data within the file
df.shape
- This returns the number of rows and columns in your dataset in a vector output (no. of rows, no. of columns)df.columns
- Returns the list of column headers and the datatype of these headersdf.info()
- Returns detailed information about your datasetdf.describe()
- Returns detailed statiscal information about your dataset
Go ahead and try these!
Note: Keep in mind we use df
to call these functions as that is the variable we used to assign our dataset to, we can use any other variable as well
Given the newly learnt functions can you perform these actions on python using pandas?
1. Output the first few rows of data
2. Output data dimensions (i.e. the number of rows and columns of the dataset)
3. Output the column header names
4. Obtain detailed information about the dataset
5. Output statistical information about the dataset
import pandas as pd df=pd.read_csv("Retail dataset.csv") #1. Output the first few rows of data print(df.head()) #2. Output data dimensions (i.e. the number of rows and columns of the dataset) print(df.shape) #3. Output the column header names print(df.columns) #4. Obtain detailed information about the dataset print(df.info()) #5. Output statistical information about the dataset print(df.describe())
You are expected to comply with University policies and guidelines namely, Appropriate Use of Information Resources Policy, IT Usage Policy and Social Media Policy. Users will be personally liable for any infringement of Copyright and Licensing laws. Unless otherwise stated, all guide content is licensed by CC BY-NC 4.0.