Exploratory Data Analysis(EDA) in Python!

guest_blog 30 Jul, 2021
6 min read

Introduction

Exploratory Data Analysis

Exploratory Data Analysis(EDA)

– Handle Missing value
– Removing duplicates
– Outlier Treatment
– Normalizing and Scaling( Numerical Variables)
– Encoding Categorical variables( Dummy Variables)
– Bivariate Analysis

Exploratory Data Analysis - Import Libraries
Box-plot after removing outliers

Box-plot after removing outliers

  1. Exploratory Data Analysis - Data Shape

  2. Exploratory Data Analysis - Data Information

    Exploratory Data Analysis - Data Type

  3. Exploratory Data Analysis - Describe

Exploratory Data Analysis - Sum

Image for postExploratory Data Analysis - Impute Missing values

Exploratory Data Analysis - Impute Missing Values

Image for post

Handling Duplicate records

Image for post

Image for post

Image for post

Handling Outlier

Image for post

Box-plot before removing outliers

Image for post

Box-plot after removing outliers

Bivariate Analysis

  1. Two Categorical Variables

    1. Bar chart
    2. Grouped bar chart
    3. Point plot

Image for post

Correlation between all the variables

Normalizing and Scaling

Image for post

Image for post

ENCODING

Image for post

Image for post

Image for post

About the Author

Ritika Singh – Data Scientist

I am a Data scientist by profession and a Blogger by passion. I have been working on machine learning projects for more than 2 years. Here you will find articles on “Machine Learning, Statistics, Deep Learning, NLP and Artificial Intelligence”.

guest_blog 30 Jul, 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Abdallah
Abdallah 17 Sep, 2020

Why did you treat postal code as a numerical variable? It is not meaningful to represent it that way, since a numerical value for postal code will be misinterpreted by any machine learning algorithm. For example, the postal code "90049" will be matched with a label based on the correlation and the postal code "300" will be matched to the other label since it has a lower value, which is incorrect. It would be better represented as a categorical variable, even if there are many unique observations.

Bala
Bala 10 Oct, 2020

Hi Ritika, Can you pls. help me with the csv file that you used for this tutorial? I would like to use the file to learn the steps taught here.

rohith gaddam
rohith gaddam 11 Oct, 2020

cool and clear its easy to understand tq for the explanation i fall in love with ur blog

Prasanna
Prasanna 31 Dec, 2020

found the blog on "EDA with Python' very useful . But there is a humongous distraction in the site. The floating ads(of courses offered by you) in the page are a huge distraction. Not sure how anyone from the page admin has not noticed it. The content of this blog is awesome though

Shallom Micah
Shallom Micah 24 Jan, 2021

Thanks alot for this. Am glad i came across this.

Saurabh Singh
Saurabh Singh 14 Mar, 2021

I appreciate your work. Thanks

Jogin
Jogin 12 Apr, 2021

Hi Ritika, Really nice blog, I liked it and wanted to learn more from you

Nitin Shelke
Nitin Shelke 12 Apr, 2021

Thanks Ritika for the nice explanation.

Monwa
Monwa 27 Nov, 2021

I am interested to know the basics on how to analyze data, get rid of duplicates and missing values.

Abhishek Parida
Abhishek Parida 06 Mar, 2022

Where can I find the dataset to follow and practice the code? Thank you in advance.