Now that we have our data imported, we can make use of pandas basic syntaxes and functions to get a basic statistical analysis of our dataset.
df.head() - this shows you a mini preview of your data so you can get an idea of what column headers and the type of data within the file
df.shape- This returns the number of rows and columns in your dataset in a vector output (no. of rows, no. of columns)
df.columns- Returns the list of column headers and the datatype of these headers
df.info()- Returns detailed information about your dataset
df.describe()- Returns detailed statiscal information about your dataset
Go ahead and try these!
Note: Keep in mind we use
df to call these functions as that is the variable we used to assign our dataset to, we can use any other variable as well
Given the newly learnt functions can you perform these actions on python using pandas?
1. Output the first few rows of data
2. Output data dimensions (i.e. the number of rows and columns of the dataset)
3. Output the column header names
4. Obtain detailed information about the dataset
5. Output statistical information about the dataset
import pandas as pd df=pd.read_csv("Retail dataset.csv") #1. Output the first few rows of data print(df.head()) #2. Output data dimensions (i.e. the number of rows and columns of the dataset) print(df.shape) #3. Output the column header names print(df.columns) #4. Obtain detailed information about the dataset print(df.info()) #5. Output statistical information about the dataset print(df.describe())