Naive Bayes Classifier Explained: Applications and Practice Problems of Naive Bayes Classifier

Sunil 23 Aug, 2024
9 min read

Introduction

Let’s start with a practical example of using the Naive Bayes Algorithm.

Assume this is a situation you’ve got into in your data science project:

You are working on a classification problem and have generated your set of hypotheses, created features, and discussed the importance of variables. Within an hour, stakeholders want to see the first cut of the model.

What will you do? You have hundreds of thousands of data points and several variables in your training data set. In such a situation, if I were in your place, I would have used ‘Naive Bayes Classifier,‘ which can be extremely fast relative to other classification algorithms. It works on Bayes’ theorem of probability to predict the class of unknown data sets.

In this article, we explore the Naive Bayes theorem, discussing its applications in the Naive Bayes model. We’ll provide a Naive Bayes example and examine the Naive Bayes classifier in machine learning, including a practical Naive Bayes classifier example.

Learning Objectives

  • Understand the definition and working of the Naive Bayes algorithm.
  • Get to know the various applications, pros, and cons of the classifier.
  • Learn how to implement the NB Classifier or bayesian classification in R and Python with a sample project.

If you prefer to learn the Naive Bayes’ theorem from the basics concepts to the implementation in a structured manner, you can enroll in this free course: Naive Bayes Course from Scratch.

What is Naive Bayes Classifier?

Naïve Bayes belongs to a family of generative learning algorithms, aiming to model the distribution of inputs within a specific class or category. Unlike discriminative classifiers such as logistic regression, it doesn’t learn which features are most crucial for distinguishing between classes. It’s widely used in text classification, spam filtering, and recommendation systems.

What is the Naive Bayes Algorithm?

It is a classification technique based on Bayes’ Theorem with an independence assumption among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

The Naïve Bayes classifier is a popular supervised machine learning algorithm used for classification tasks such as text classification. It belongs to the family of generative learning algorithms, which means that it models the distribution of inputs for a given class or category. This approach is based on the assumption that the features of the input data are conditionally independent given the class, allowing the algorithm to make predictions quickly and accurately.

A Brief Review of Bayesian Statistics

In statistics, naive Bayes are simple probabilistic classifiers that apply Bayes’ theorem. This theorem is based on the probability of a hypothesis, given the data and some prior knowledge. The naive Bayes classifier assumes that all features in the input data are independent of each other, which is often not true in real-world scenarios. However, despite this simplifying assumption, the naive Bayes classifier is widely used because of its efficiency and good performance in many real-world applications.

Moreover, it is worth noting that naive Bayes classifiers are among the simplest Bayesian network models, yet they can achieve high accuracy levels when coupled with kernel density estimation. This technique involves using a kernel function to estimate the probability density function of the input data, allowing the classifier to improve its performance in complex scenarios where the data distribution is not well-defined. As a result, the naive Bayes classifier is a powerful tool in machine learning, particularly in text classification, spam filtering, and sentiment analysis, among others.

Example of Naive Bayes Algorithm

For example, if a fruit is red, round, and about 3 inches wide, we might call it an apple. Even if these things are related, each one helps us decide it’s probably an apple. That’s why it’s called ‘Naive.

An NB model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Bayes theorem provides a way of computing posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:

Bayes' theorem formula [Naive Bayes' Algorithm]

Above,

  • P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
  • P(c) is the prior probability of class.
  • P(x|c) is the likelihood which is the probability of the predictor given class.
  • P(x) is the prior probability of the predictor.

Are you a beginner in Machine Learning? Do you want to master the machine learning algorithms like Naive Bayes? Here is a comprehensive course covering the machine learning and deep learning algorithms in detail – Certified AI & ML Blackbelt+ Program.

Sample Project to Apply Naive Bayes

Problem Statement

HR analytics is revolutionizing the way human resources departments operate, leading to higher efficiency and better results overall. Human resources have been using analytics for years.

However, the collection, processing, and analysis of data have been largely manual, and given the nature of human resources dynamics and HR KPIs, the approach has been constraining HR. Therefore, it is surprising that HR departments woke up to the utility of machine learning so late in the game.

Here is an opportunity to try predictive analytics in identifying the employees most likely to get promoted.

Practice Now

How Do Naive Bayes Algorithms Work?

Time needed: 1 minute

Let’s understand it using an example. Below I have a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.

  1. Convert the data set into a frequency table

    In this first step data set is converted into a frequency table

  2. Create Likelihood table by finding the probabilities

    Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

    Naive bayes

  3. Use Naive Bayesian equation to calculate the posterior probability

    Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of the prediction.

Problem: Players will play if the weather is sunny. Is this statement correct?

We can solve it using the above-discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here P( Sunny | Yes) * P(Yes) is in the numerator, and P (Sunny) is in the denominator.

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

The Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification (nlp) and with problems having multiple class labels.

What Are the Pros and Cons of Naive Bayes?

Pros:

  • It is easy and fast to predict class of test data set. It also perform well in multi class prediction
  • When assumption of independence holds, the classifier performs better compared to other machine learning models like logistic regression or decision tree, and requires less training data.
  • It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

Cons:

  • If categorical variable has a category (in test set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
  • On the other side, Naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
  • Another limitation of this algorithm is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

Applications of Naive Bayes Algorithms

  • Real-time Prediction: Naive Bayesian classifier is an eager learning classifier and it is super fast. Thus, it could be used for making predictions in real time.
  • Multi-class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.
  • Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayesian classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)
  • Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.

How to Build a Basic Model Using Naive Bayes in Python and R ?

Again, scikit learn (python library) will help here to build a Naive Bayes model in Python. There are five types of NB models under the scikit-learn library:

  • Gaussian Naive Bayes: gaussiannb is used in classification tasks and it assumes that feature values follow a gaussian distribution.
  • Multinomial Naive Bayes: It is used for discrete counts. For example, let’s say,  we have a text classification problem. Here we can consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.
  • Bernoulli Naive Bayes: The binomial model is useful if your feature vectors are boolean (i.e. zeros and ones). One application would be text classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.
  • Complement Naive Bayes: It is an adaptation of Multinomial NB where the complement of each class is used to calculate the model weights. So, this is suitable for imbalanced data sets and often outperforms the MNB on text classification tasks.
  • Categorical Naive Bayes: Categorical Naive Bayes is useful if the features are categorically distributed. We have to encode the categorical variable in the numeric format using the ordinal encoder for using this algorithm.

Python Code:

Try out the below code in the coding window and check your results on the fly!

# importing required libraries
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')

# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)

# Now, we need to predict the missing target variable in the test data
# target variable - Survived

# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']

# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']

'''
Create the object of the Naive Bayes model
You can also add other parameters and test your code here
Some parameters are : var_smoothing
Documentation of sklearn GaussianNB: 

https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html

 '''
model = GaussianNB()

# fit the model with the training data
model.fit(train_x,train_y)

# predict the target on the train dataset
predict_train = model.predict(train_x)
print('Target on train data',predict_train) 

# Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)

# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test) 

# Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)

R Code:

require(e1071) #Holds the Naive Bayes Classifier
Train <- read.csv(file.choose())
Test <- read.csv(file.choose())

#Make sure the target variable is of a two-class classification problem only

levels(Train$Item_Fat_Content)

model <- naiveBayes(Item_Fat_Content~., data = Train)
class(model) 
pred <- predict(model,Test)
table(pred)

Above, we looked at the basic NB Model. You can improve the power of this basic model by tuning parameters and handling assumptions intelligently. Let’s look at the methods to improve the performance of this model. I recommend you go through this document for more details on Text classification using Naive Bayes.

Also Read: Understanding & Interpreting Confusion Matrix in Machine Learning (Updated 2024)

Tips to Improve the Power of the NB Model

Here are some tips for improving power of Naive Bayes Model:

  • If continuous features do not have normal distribution, we should use transformation or different methods to convert it in normal distribution.
  • If test data set has zero frequency issue, apply smoothing techniques “Laplace Correction” to predict the class of test data set.
  • Remove correlated features, as the highly correlated features are voted twice in the model and it can lead to over inflating importance.
  • Naive Bayes classifiers has limited options for parameter tuning like alpha=1 for smoothing, fit_prior=[True|False] to learn class prior probabilities or not and some other options (look at detail here). I would recommend to focus on your  pre-processing of data and the feature selection.
  • You might think to apply some classifier combination technique like ensembling, bagging and boosting but these methods would not help. Actually, “ensembling, boosting, bagging” won’t help since their purpose is to reduce variance. Naive Bayes has no variance to minimize.

Also Read: Introduction to Neural Network in Machine Learning

Conclusion

In this article, we looked at one of the supervised machine learning algorithms, “Naive Bayes Classifier” mainly used for classification. Congrats, if you’ve thoroughly & understood this article, you’ve already taken your first step toward mastering this algorithm. From here, all you need is practice.

Further, I would suggest you focus more on data pre-processing and feature selection before applying the algorithm. In a future post, I will discuss about text and document classification using naive bayes in more detail.

I hope this overview gives you a good sense of how the Naive Bayes classifier works. It’s a simple yet powerful tool in machine learning, and that’s why the Naive Bayes algorithm is so popular for classification tasks.

Key Takeaways

  • The Naive Bayes algorithm is one of the most popular and simple machine learning classification algorithms.
  • It is based on the Bayes’ Theorem for calculating probabilities and conditional probabilities.
  • You can use it for real-time and multi-class predictions, text classifications, spam filtering, sentiment analysis, and a lot more.

You can use the following free resource to learn more: Machine Learning Certification Course for Beginners.

Q1. What is naive in Naive Bayes classifier?

A. The Naive Bayes classifier assumes independence among features, a rarity in real-life data, earning it the label ‘naive’.

Q2. What is a Naive Bayes classifier in machine learning?

Naive Bayes: Simple, probabilistic classifier assuming feature independence. Effective for large datasets and text classification.

Q3. What is the difference between Naive Bayes and Maximum Entropy?

A. The key assumption of Naive Bayes is conditional independence, implying that features used in the model are considered independent given the class variable.

Q4. What is the critical assumption of Naive Bayes?

A. The key assumption of Naive Bayes is conditional independence, implying that features used in the model are considered independent given the class variable.

Q5. What are examples of Naive Bayes use?

A. Examples include spam filtering and sentiment analysis.

Sunil 23 Aug, 2024

Sunil Ray is Chief Content Officer at Analytics Vidhya, India's largest Analytics community. I am deeply passionate about understanding and explaining concepts from first principles. In my current role, I am responsible for creating top notch content for Analytics Vidhya including its courses, conferences, blogs and Competitions. I thrive in fast paced environment and love building and scaling products which unleash huge value for customers using data and technology. Over the last 6 years, I have built the content team and created multiple data products at Analytics Vidhya. Prior to Analytics Vidhya, I have 7+ years of experience working with several insurance companies like Max Life, Max Bupa, Birla Sun Life & Aviva Life Insurance in different data roles. Industry exposure: Insurance, and EdTech Major capabilities: Content Development, Product Management, Analytics, Growth Strategy.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Arun CR
Arun CR 14 Sep, 2015

Hi Sunil , From the weather and play table which is table [1] we know that frequency of sunny is 5 and play when sunny is 3 no play when suny is 2 so probability(play/sunny) is 3/5 = 0.6 Why do we need conditional probabilty to solve this? Is there problems that can be solved only using conditional probability. can you suggest such examples. Thanks, Arun

Arun CR
Arun CR 14 Sep, 2015

Great article and provides nice information.

Nishi Singh
Nishi Singh 12 Dec, 2015

Amazing content and useful information

Leena
Leena 08 Feb, 2016

I'm new to machine learning and Python.Could you please help to read data from CSV and to separate the same data set to training and test data

Sushma honnidige
Sushma honnidige 07 Mar, 2016

very useful article.

RAJKUMAR
RAJKUMAR 16 Mar, 2016

Very nice....but...if u dont mind...can you please give me that code in JAVA ...

jitesh
jitesh 05 Apr, 2016

is it possible to classify new tuple in orange data mining tool??

SPGupta
SPGupta 11 Apr, 2016

good.

devenir riche au Maroc
devenir riche au Maroc 14 Apr, 2016

I am really impressed together with your writing skills as wwell as with the format to your weblog. Is that this a paid theme or did you customiz it yourself? Anyway stay up the excellent quality writing, it's uncommon tto see a great weblog like this one these days..

รับซื้อ Patek Philippe
รับซื้อ Patek Philippe 22 Apr, 2016

You should be a part of a contest for one of the most useful websites online. I will highly recommend this blog!

TingTing
TingTing 22 Apr, 2016

Thanks for tips to improve the performance of models, that 's really precious experience.

Leanna Partridge
Leanna Partridge 11 May, 2016

Nice piece - Just to add my thoughts , people require a CA OCF-1 , We used a sample document here http://goo.gl/ibPgs2

Miguel Batista
Miguel Batista 11 May, 2016

Hi, I have a question regarding this statement: 'If continuous features do not have normal distribution, we should use transformation or different methods to convert it in normal distribution.' Can you provide an example or a link to the techniques? Thank you, MB

bd tv
bd tv 06 Jun, 2016

Can ӏ simply juѕt say what а comfoгt too find someone who actսaly knows what they're talking aƄout over the internet. Υou certainly know how to bring an issue to light and make it impߋrtant. A lot more peoⲣⅼe ought to read this and undᥱrstand this side of your story. I can't believe you aren't more popular because you certainly have the gift.

nir
nir 13 Jul, 2016

Great article! Thanks. Are there any similar articles for other classification algorithms specially target towards textual features and mix of textual/numeric features?

Nick
Nick 19 Aug, 2016

great article with basic clarity.....nice one

Catherine
Catherine 29 Aug, 2016

This article is extremely clear and well laid-out. Thank you!

pangavhane nitin
pangavhane nitin 31 Aug, 2016

ty

alfiya
alfiya 31 Aug, 2016

Explanation given in simple word. Well explained! Loved this article.

Chris Rucker
Chris Rucker 01 Sep, 2016

The 'y' should be capitalized in your code - great article though.

Akash Swamy
Akash Swamy 06 Sep, 2016

This is the best explanation of NB so far simple and short :)

John
John 09 Sep, 2016

Great article! Really enjoyed it. Just wanted to point out a small error in the Python code. Should be a capital "Y" in the predict like so : model.fit(x, Y) Thanks!

Adnan
Adnan 20 Sep, 2016

Is this dataset related to weather? I am confused as a newbie. Can you please guide?

bh
bh 19 Oct, 2016

best artical that help me to understand this concept

Richard
Richard 25 Oct, 2016

Am new to machine learning and this article was handy to me in understanding naive bayes especially the data on weather and play in the table. Thanks for sharing keep up

Lisa
Lisa 10 Nov, 2016

Thanks to you I can totally understand NB classifier.

Saritha MV
Saritha MV 11 May, 2017

Hi, I am a student and doing a project on Naive Bayes Classifier(NBC). i have been given a trained dataset, and asked to classify the test images using NBC. i want to know how to extract or rather what features should i infer/ extract from the trained dataset (eg. dogs) and store in vector? i need an understanding on how to extract the features of a trained data, so that i can compare the test images with them and classify. your quick help on this query will be very appreciated.

Sreeni Jilla
Sreeni Jilla 06 Jun, 2017

Converting the data into Frequency Table and Likelihood table is great to understand the entire content. Thanks bunch for posting a great article.

T B
T B 03 Jul, 2017

Really nice article, very use-full for concept building.

AKshay
AKshay 04 Jul, 2017

I didn't understand the 3rd step. Highest probability out of which probability values? >> Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability. Higher than what?

DN
DN 30 Jul, 2017

Concept explained well... nice Article

Mayank
Mayank 02 Aug, 2017

Nice article. I have one question : "If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation." As per your statement above, does it not make NB non-feasible for real life situations? We will have lot of situations where the category is not in training data set and is visible in test data. Will laplace work in those cases?

Rajeshwari
Rajeshwari 31 Aug, 2017

thanks nice artical that help me to understand this concept

amit Kumar yadav
amit Kumar yadav 18 Sep, 2017

Good article and I am waiting for text and documents classification using naive base algorithm.

Stella
Stella 20 Sep, 2017

Superb information in just one blog.Enjoyed the simplicity.Thanks for the effort.

Abdirahman Omar Hashi
Abdirahman Omar Hashi 25 Sep, 2017

i wish i could find articles like this about machine learning, deep learning and data science

Imad Ahmad
Imad Ahmad 26 Sep, 2017

Brief and to the point. Very well explained. Thanks.

Aishwarya
Aishwarya 26 Nov, 2017

Good start point for beginners

scholarspro
scholarspro 11 Jan, 2018

ScholarsPro is a leading training and consulting organization known for its service offerings in the domains of Big Data, Project Management, ITSM, Scrum, Business Analytics, and many others. Recognized for its world-class, result-driven, training solutions designed by internationally acclaimed industry experts, ScholarsPro aims at catering to the professional development needs of aspirants from all across the globe.

Abdul Samad
Abdul Samad 12 Apr, 2018

Weldone sanil I have a question regarding naive bayes,currently i am working on a project that is detect depression through naive bayes algorithm so plz suggest few links regarding my projects.i shall be gratefull to you. Thanku so much

Tongesai Maune
Tongesai Maune 04 May, 2018

I am not understanding the x and the y variables. Can someone help me

Jitesh Mohite
Jitesh Mohite 13 May, 2018

Hi Sunil, I am using sklearn for doing match prediction I am using same data which you have use, but somehow I am not getting the same result. import numpy as np from sklearn.naive_bayes import GaussianNB # To calculate the accuracy score of the model from sklearn.metrics import accuracy_score Weather = np.array([[0], [1], [2], [0], [0], [1], [2], [2], [0], [2], [0], [1], [1], [2]]) # Sunny - 0, b. Overcast - 1, Rainy - 2 Play = np.array([0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0]) # Yes-1, No-0 clf = GaussianNB() clf.fit(Weather, Play) print(clf.predict([[1]])) print(clf.score(Weather, Play)) print(accuracy_score(Weather, Play)) Output: [1] 0.642857142857 0.428571428571 Can you please try my example and let me know where I am doing a mistake?

Artificial Neural Networks with Edge-Based Architecture – The Informaticists
Artificial Neural Networks with Edge-Based Architecture – The Informaticists 25 Aug, 2020

[…] [4] Ray, Sunil. (2020, April 01). Learn Naive Bayes Algorithm: Naive Bayes Classifier Examples. Retrieved August 05, 2020, from https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/ […]

Jayer
Jayer 23 Jan, 2023

Hello, how do i check which variables are significant using the naive Bayes algorithm in R

Sisay
Sisay 26 Jan, 2023

It is nice, please make it open the code

Aminur
Aminur 12 Sep, 2023

Really Awesome article ❤️❤️