Naive Bayes

Using Naive Bayes Theorem to label data

Jayesh Kawli

Mar 11, 2017 • 5 min read

Today I am going to write off-topic about using Naive Bayes algorithm for record classification. This is a supervised category of an algorithm. Which means we train the algorithm with given input records with known labels, make model and then apply the created model on unknown records to correctly classify them in given category.

Since examples are the best to get familiar with any new algorithm. Let's begin with an example dataset. I have used the dataset from this source since I could not come up with ideal dataset which could be apt to explain the algorithm in logical and clear way.

Outlook	Temperature	Humidity	Wind	To play
sunny	hot	high	false	no
sunny	hot	high	true	no
overcast	hot	high	false	yes
rainy	mild	high	false	yes
rainy	cool	normal	false	yes
rainy	cool	normal	true	no
overcast	cool	normal	true	yes
sunny	mild	high	false	no
sunny	cool	normal	false	yes
rainy	mild	normal	false	yes
sunny	mild	normal	true	yes
overcast	mild	high	true	yes
overcast	hot	normal	false	yes
rainy	mild	high	true	no

Listed above is the training dataset. Given the parameters

Outlook
Temperature
Humidity
Wind

every record is classified with two labels. To play outside or not. Now this is a training dataset. So far so good. Suppose you're given following dataset which has unknown label and your task is to use training dataset and Naive Bayes algorithm to compute label for an unknown record.

Outlook	Temperature	Humidity	Wind	To play
sunny	cool	high	true	?

To make actual classification, let's see how Naive Bayes algorithm looks like :


P(event1/event2) = (P(event2/event1) * P(event1))/(P(event2))

Let's look at individual parameters first,



P(event1) - Independent probability of event1, also called as a prior probability

P(event2) - Independent probability of event2

P(event1/event2) - Conditional probability that event1 will happen given the event 2. Posterior probability which is the main probability in question given the list of training dataset and conditions

P(event2/event1) - Conditional probability that event2 will happen given the event 1. This is also called as a likelihood that event2 will happen given the event1

Let's begin. For the given record we have two possibilities of outcome. To play = "yes" or To play = "no". To choose which value to take, we will compute following two probabilities and compare them. Whichever is higher, we will go with that verdict.


P(yes/(sunny/cool/high/true)) and P(no/(sunny/cool/high/true))

Since we have 4 parameters to consider in the given dataset,we will need to compute 4 intermediate probabilities for each verdict.

Probabilities


P(yes/(sunny/cool/high/true)) = P(yes) * P(sunny/yes) * P(cool/yes) * P(high/yes) * P(true/yes)

// Similarly
P(no/(sunny/cool/high/true)) = P(no) * P(sunny/no) * P(cool/no) * P(high/no) * P(true/no)

Now how do we calculate these intermediate probabilities? Ride along!

First we will calculate P(yes) and P(no). Notice there are total 14 records out of which 5 say "no" and remaining 9 say "yes". Given this information,


P(no) = 5 / 14 = 0.36
P(yes) = 9 / 14 = 0.64

We got two of the above unknown probabilities. Let's move on

To get the value of P(sunny/yes), count the number of records which have label saying "yes". Out of these records, again count the number of records for which outlook says "sunny".


Number of records with label "yes" - 9
Number of records with label "yes" which say outlook is sunny - 2
Which concludes to,
P(sunny/yes) = 2 / 9 = 0.22

Similarly, to get the value of P(cool/yes), count the number of records which have label saying "yes". Out of these records, again count the number of records for which temperature says "cool".


Number of records with label "yes" - 9
Number of records with label "yes" which say temperature is cool - 3
Which concludes to,
P(cool/yes) = 3 / 9 = 0.33

With the similar logic in the past two points, we calculate remaining two intermediate probabilities associated with verdict P(yes/(sunny/cool/high/true))


// For Humidity
Number of records with label "yes" - 9
Number of records with label "yes" which say humidity is high - 3
Which concludes to,
P(high/yes) = 3 / 9 = 0.33

// For Windy condition
Number of records with label "yes" - 9
Number of records with label "yes" which say wind is true - 3
Which concludes to,
P(true/yes) = 3 / 9 = 0.33

With all the values at hand, let's compute P(yes/(sunny/cool/high/true))


P(yes/(sunny/cool/high/true)) = P(yes) * P(sunny/yes) * P(cool/yes) * P(high/yes) * P(true/yes)

P(yes/(sunny/cool/high/true)) = 0.64 * 0.22 * 0.33 * 0.33 * 0.33

P(yes/(sunny/cool/high/true)) = 0.00505

P(yes/(sunny/cool/high/true)) = 0.00505

Now, let's go to the other side to compute value of P(no/(sunny/cool/high/true)). Let's start from the beginning - Slowly and cautiously

As calculated above we already have value for P(no) which is 0.36
To get the value of P(sunny/no), count the number of records which have label saying "no". Out of these records, again count the number of records for which outlook says "sunny".


Number of records with label "no" - 5
Number of records with label "no" which say outlook is sunny - 3
Which concludes to,
P(sunny/no) = 3 / 5 = 0.6

With the similar logic from past points, we will calculate remaining three intermediate probabilities associated with verdict P(no/(sunny/cool/high/true))


// For Temperature
Number of records with label "no" - 5
Number of records with label "no" which say temperature is cool - 1 
Which concludes to,
P(cool/no) = 1 / 5 = 0.2

// For Humidity
Number of records with label "no" - 5
Number of records with label "no" which say humidity is high - 4 
Which concludes to,
P(high/no) = 4 / 5 = 0.8

// For Windy condition
Number of records with label "no" - 5
Number of records with label "no" which say wind is true - 3
Which concludes to,
P(true/no) = 3 / 5 = 0.6

With all the values at hand, let's compute P(no/(sunny/cool/high/true))


P(no/(sunny/cool/high/true)) = P(no) * P(sunny/no) * P(cool/no) * P(high/no) * P(true/no)

P(no/(sunny/cool/high/true)) = 0.36 * 0.6 * 0.2 * 0.8 * 0.6

P(no/(sunny/cool/high/true)) = 0.02

As summarized from previously computed value


P(yes/(sunny/cool/high/true)) = 0.00505

// Thus finally since 0.02 > 0.00505 we can safely conclude that,

P(no/(sunny/cool/high/true)) > P(yes/(sunny/cool/high/true))

Thus given the following test data,

Outlook	Temperature	Humidity	Wind	To play
sunny	cool	high	true	no

training data, and Naive Bayes algorithm, we can safely say that we are not going to play today.

Let's wait until weather subsides and it's shiny, warm, less windy and humid to venture outside

To make sure you understood the Naive Bayes explanation correctly, here's another test dataset with unknown label. Can you utilize training dataset to correctly identify its label?

Outlook	Temperature	Humidity	Wind	To play
overcast	mild	normal	false	?

Give it a try and message me on Twitter if you need any help with the exercise!

Reference:

Naive Bayes Classifiers

Sign up for more like this.