Module 4: Supervised Learning — Classification (Predicting Categories)

AI classification

In the last module, we predicted how much (numbers). In this module, we are learning how to predict which one (categories).

If Regression is about drawing a line through points, Classification is about drawing a line that separates groups.


4.1 What is Classification? (The Sorting Hat)

Classification is the process of taking an input and putting it into a specific “bucket” or category.

Binary vs. Multi-class

  • Binary Classification: Only two buckets. (e.g., Is this email Spam or Not Spam? Is this transaction Fraud or Legit?)
  • Multi-class Classification: More than two buckets. (e.g., Is this a photo of a Dog, Cat, or Bird? Is this iris flower a Setosa, Versicolor, or Virginica?)

4.2 Logistic Regression: The Name is a Trick

Here is a secret that confuses every beginner: Logistic Regression is used for Classification, not Regression.

How it works: The Sigmoid Curve

While Linear Regression draws a straight line that goes on forever, Logistic Regression uses a special curve called the Sigmoid Function.

  • It takes any number and “squashes” it to be between 0 and 1.
  • This result is treated as a Probability. If the model outputs 0.85, it’s saying “I’m 85% sure this is Spam.”

The Decision Boundary

Once we have that percentage, we set a “line in the sand” (usually at 0.5).

  • Above 0.5? It’s Category A.
  • Below 0.5? It’s Category B.

4.3 Implementing Classification in Python

Let’s look at a practical example: building a “Spam Filter” for your inbox. code Pythondownloadcontent_copyexpand_less

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# 1. Imagine our data: X (Email Length, Number of Typos), y (1 for Spam, 0 for Not Spam)
X = [[100, 5], [10, 0], [500, 20], [15, 1]] # Examples
y = [1, 0, 1, 0] # Labels

# 2. Split into Training and Testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

# 3. Create the "Classifier"
classifier = LogisticRegression()

# 4. Train the model
classifier.fit(X_train, y_train)

# 5. Make a prediction for a new email
# (e.g., 200 words long with 10 typos)
new_email = [[200, 10]]
prediction = classifier.predict(new_email)

print(f"Prediction (1=Spam, 0=Not Spam): {prediction[0]}")


4.4 Evaluating the Model: Why Accuracy is Often a Liar

In Regression, we used MSE. In Classification, we use a Confusion Matrix.

The Problem with Accuracy

Imagine you have a test for a very rare disease that only 1 in 1,000 people have. If I build a “model” that just guesses “No Disease” for everyone, I will be 99.9% accurate. But I’m a terrible doctor because I missed the person who actually needed help!

The Confusion Matrix: 4 Important Boxes

  1. True Positive (TP): You said “Spam,” and it was Spam. (Good!)
  2. True Negative (TN): You said “Safe,” and it was Safe. (Good!)
  3. False Positive (FP): You said “Spam,” but it was a message from your Grandma! (The “False Alarm”)
  4. False Negative (FN): You said “Safe,” but it was a virus. (The “Missed Danger”)

Key Metrics to Remember:

  • Precision: “When I guess Spam, how often am I right?”
  • Recall: “Of all the actual Spam emails, how many did I catch?”
  • F1-Score: The perfect balance between Precision and Recall.

4.5 Decision Trees: The “Flowchart” Method

If Logistic Regression is about math and probabilities, a Decision Tree is about logic. It’s like playing a game of “20 Questions.”

  • Step 1: Is the email longer than 100 words? (Yes/No)
  • Step 2: Does it contain the word “Free”? (Yes/No)
  • Step 3: Is the sender in your contacts? (Yes/No)

Why use them? They are incredibly interpretable. You can literally print out the tree and see why the computer made its decision. This makes them a favorite for business meetings where you need to explain the “logic” behind the AI.


Module 4 Summary

You’ve now mastered the two pillars of Supervised Learning: Regression (Numbers) and Classification (Categories). You know that Logistic Regression is actually a classifier, you know how to read a Confusion Matrix, and you’ve seen how Decision Trees use logic to reach a conclusion.

In our next and final module, we’ll see what happens when the computer has to learn without any labels!