Skip to main content


Support Vector Machine (SVM) is one of the machine learning’s classifier. Its goal is to find the optimal separating hyperplane which maximizes the margin of the data train. there are three important parts of this algorithm,namely the optimal separating hyperplane, the margin, and the data train. SVM will be implemented in a data train, so it is a supervised learning algorithm. this algorithm classifies the data into a certain class which makes it as a classification algorithm. To predict a class of a new data, SVM uses a hyperplane as the model that separates the classes and we can classify a new data just by looking at its position towards the hyperplane.

What Is Hyperplane?

we know that the data train can be implemented in a space having any dimensions. If we use a two-dimensional space, the hyperplane becomes a line. If we use a three-dimensional space, it becomes a plane. And in more dimensions, it becomes a hyperplane. So, a hyperplane is just a generalization for a plane which the main task is to separate the data train into two classes. when we separate the data using a linear field (line, plane, etc), we can only divide the data train into two classes in which in this case, we can say that there are two categories of data,namely, positive (1) and negative (-1). The positive class can be presumed to be on the top of the hyperplane, and vice versa for the negative class. Furthermore, since this algorithm uses a linear field for the class division, it is clear that if we want to separate the data train perfectly then there should be an enough space between both classes that enables a linear field to separate them into the decent position.The characteristic of this kind of the data train is called as linearly separable.

Simple illustration about the characteristics.

The Problems

From the above illustration, we can see that there is a hyperplane located
arbitrarily which separates the data into two classes, one is above the line and
the other is below the line. However, there is an error for this model when we add
some data into the data train.

Based on the illustration, we can simply take a conclusion that it is not a good model of hyperplane when it is located too close to data points of a class as there is a possibility that a new data will go out of the scope of the hyperplane. So we need to enlarge the distance from the data point to the hyperplane, yet the question is how do we find the best distance?
To answer that question is To find the margin of a hyperplane, we calculate the distance between the hyperplane and the closest data point, double the value and we get the margin. Here is an illustration of margin.

From the illustration, it is clear that when we move the hyperplane to be closer to
the data point, then the margin will be smaller and we’ve known that this is not a
good approach as it does not anticipate the characteristic of the new data. So,
we can conclude that the optimal hyperplane is the one that maximizes the
margin of the data train in which it would be more consistent in receiving any new
data that has unpredictable characteristic.

There are several example of hyperplanes having different equation.

Based on the illustration and the characteristic of an optimal hyperplane, our task
is to find a hyperplane with certain equation and has the biggest margin.

How to compute the margin?

Before we compute the margin, we need to know the equation of the hyperplane
as it’ll be used to determine the position of the point in the hyperplane.

If we implement the SVM on a two dimensional space, we’ll get a line as the
special representation of the hyperplane in which the equation is y = ax + b.
Now, suppose we have two vectors, namely w = (-b, -a, 1) and x = (1, x, y). We
will prove that these two vectors have a correlation with the line equation in which
if we can achieve that point, then we’ll have two representations of the equation.

Afterwards, we’ll choose the best equation representing the hyperplane based on
several considerations.

Here is an illustration of the computation

Based on the above computation, we can see that both equations represent the
will always perpendicular to the hyperplane (normal vector). It is very

same thing, or in other words we can say that we find another way to express the
line equation. The new equation has two vectors as the variable representation
and it performs the dot product. For the equation of the hyperplane, we will use
this new equation because of these considerations:

helpful when we want to compute the distance between a data point and the hyperplane.

Compute the Margin

We can see that the vector w is perpendicular to the hyperplane and based on
the previous explanation, we can get the vector’s component just by looking at
the coefficient of the line equation (standard equation), namely w = (-b, -a, 1) if
the line equation is y - ax - b = 0. From the above illustration we know that the
vector w can be represented as (0, -3, 1), whereas the vector x is (1, x, y). In this
case we can neglect the first component (0) as it only determines the position of
the hyperplane relative to the original point (0, 0).

Our task is to compute the distance between the data point A and the hyperplane
or in other words we will find the norm (magnitude of a vector) of the vector d.
Since the vector d is the projection of the vector a onto the vector w, we can
apply this formula to find the projection vector: d = (u.a)u.

After we have the distance, simply double the value to get the margin.

That’s all for the second part of this SVM’s tutorial. Next, we’ll see how to find the
optimal hyperplane when we already had the margin.

Finding the Optimal Hyperplane

Let’s take a look at the previous margin of the data train.

As we can see, it is not the optimal hyperplane as intuitively, we can get the
bigger margin if we move the hyperplane to the right. We can move it to the right
till it reaches a certain position as if the position exceeds the limit, it will have a
new point of reference and surely the created margin will be reduced.

Therefore, we will use another approach in which we create two new
hyperplanes separating the data and there is no any data point between them.
Afterwards, we create a new hyperplane crossing the line representing the new
margin in the middle. Here is the illustration.

From the above illustration, we can see that the data point A and B become the
part of the hyperplane X and Y respectively. We also see that the hyperplane Z
crosses the margin P in the middle. By applying this approach, there are no data
points between the limiting hyperplanes (X and Y) which means it makes the
margin of the data train is created from the distance between the hyperplane and
any of two data points reside on the limiting hyperplanes. Based on this
condition, this hyperplane is considered as the optimal separating hyperplane.

Two and Three Dimensional Vector in the Equation of a Hyperplane

We’ve known that the equation of a hyperplane can be represented in w.x = 0,
where w = (-b, -a, 1) and x = (1, x, y). This representation is for three
dimensional vector, yet there is another way to represent the equation of a
hyperplane, namely w.x + b = 0. What is the difference between both equations?

We can see that we need to add a b value to the latter equation which means it is
a hyperplane’s equation which is represented in a two dimensions vector space.
We can prove it by the following procedure:

In this tutorial, we’ll use the hyperplane’s equation having only two vector’s

The Constraints

Suppose we have a hyperplane with this equation: w.x + b = 0. We also have the
limiting hyperplanes which are represented in these equations respectively: w.x +
b = d and w.x + b = -d. These equations state that the distance between the
limiting hyperplanes and the optimal hyperplane is equal. However, we can
reduce the complexity of the equation by replacing the value d with one (it can be
any value and I use one just for the simplicity).
The next step is we will assure that there is no any data point between the
limiting hyperplanes and we can utilize their equations to create the following

From the constraints, we can check whether a data point satisfies the rule. Let’s
take an example for the data point A. We can see that this data point resides
exactly in one of the limiting hyperplane which means it satisfies the equation of
w.x + b = 1 or in other words it’s just the equation of a line, namely y = ax - b + 1
where -b + 1 is a constant. The procedures to determine whether a data point

follows the rule is still applied to another data point residing outside the limiting
hyperplane. If the equation of w.x + b returns a value which is less that 1 and
more than -1, than the data point does not satisfy the constraints and for this
case we will not choose this kind of limiting hyperplanes to create the optimal

Furthermore, we can get a single constraint for the limiting hyperplane just by
combining both constraints specified before. This single constraint will be used
as the equation for the matter of optimization later.

The Margin

Let’s take a look at this illustration.

As a reminder, our goal is to find the optimal hyperplane in which it is the same
as finding the biggest margin of the data train. If you recall again, we got the
optimal hyperplane by creating the limiting hyperplanes where there are two data
points becoming a part of them.

One of the approach to find the value of the margin is by converting the margin M
to the vector representation and then we can compute the norm of that vector. To
do the conversion, we utilize the vector w as the base vector and the idea is we
get the vector M as the result of the multiplication of the vector w by a scalar.
Here is the details of the process.

We’ve got the vector representation for the margin and now we’ll see how to
compute the norm of the margin by applying the vector in the equation of a

The Optimization Problem

Finally, we’ve got the way to compute the margin and according to the formula,
we can only change the norm of w to get the maximum margin.

As we can see, when we maximize the norm of w, the margin will become
smaller. So, our task is to find the limiting hyperplanes that satisfies the constraint
and gives us the minimum value for the norm of w.

To get the smallest norm, we can use the single constraint which then gives us
this optimization problem:

We have a couple (w,b) and since the vector w can be represented as (-a,1),then
What we’re gonna do is to manage the value of the gradient (a) so that the norm of w is minimum.

We can use the single constraint for this problem:

Yi (w.xi + b) >=1

For any i=1,…, n


According to me Support vector machine makes the model efficient by tuning its parameters, Pros and Cons, and finally a problem to solve. i would suggest you to use SVM and analyse the power of this model by tuning the parameters. SVMs are really good for text classification. SVMs are good at finding the best linear separator. The kernel trick makes SVMs non-linear algorithms. Choosing an appropriate kernel is the key. For good SVM and choosing the right kernel function is not easy. We need to be patient while building SVMs on large datasets. i hope this Article is useful for you. 


Popular posts from this blog

Building Contents of Watson Chatbots

In today’s world Chatbots are tremendously transforming the way we interact with software by providing a great business opportunity for almost every company. Chatbots are seen in almost all the websites and also in applications. The first question I ask to myself, what is Chatbot? Chatbots are known by different names some call it “conversational gents”, some “Chatter Robot”. Chatbots are basically a computer program that mimics written or spoken human speech in its natural format using Artificial Intelligence techniques such as Natural Language Processing (NLP) which is used for conversation purpose. In today’s era Chatbots are most commonly used in customer service space, acts as a human face of the brand for support operatives and customer satisfaction reps. We all know virtual assistants like Apple Siri or Amazon Alexa, are two most popular chatbots interacting via voice rather than text. Chatbots engages their customers in the right place, at the right time, with right ...


I sometimes see people refer to neural networks as just “another tool in your machine learning toolbox”. They have some pros and cons, they work here or there, and sometimes you can use them to win Kaggle competitions. Unfortunately, this interpretation completely misses the forest for the trees. Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0 . The “classical stack” of  Software 1.0  is what we’re all familiar with — it is written in languages such as Python, C++, etc. It consists of explicit instructions to the computer written by a programmer. By writing each line of code, the programmer is identifying a specific point in program space with some desirable behavior. In contrast,  Software 2.0  is written in neural network weights. No human is involved in writing this code because there are a lot of weights (typical networks might have millions), and co...

Machine Learning The Easy Way

We are running in the most quality period of human race. when you open some article about machine learning, you see dozens of detailed descriptions. The idea behind writing this blog is to get the knowledge about Machine learning across the world. Through this blog, ML provides potential solutions in all different domains and more, and is set to be a pillar of our future civilization.. I am providing a flow level understanding about various machine learning Types along with description. These should be sufficient to get your hands dirty. So what exactly is “machine learning” Machine Learning (ML) is coming into its own, It is playing a key role in a wide range of critical applications, such as data mining, natural language processing, image recognition, and expert systems. Machine Learning is all around us. Apple, Amazon, Microsoft, Uber and many more companies are using machine learning. Generally there are four approaches in Machine Learning -: 1) Sup...