Support Vector Machines

Support Vector Machine is a machine learning algorithm used in automatic classification of previously unseen data. In this post I would like to explain how SVM works and where it’s usually used.

In general, machine learning based classification is about learning how to separate two sets of data examples. Basing on this knowledge system can correctly put unseen examples into one of the other set. Spam filter is a very good example of automatic classification system. Let’s imagine a two dimensional space with points, SVM algorithm is about finding a line (hyperplane) that separate points into two classes.


The main idea behind this algorithm is that the gap dividing points should be as wide as it’s possible.

Objects, features and documents

SVMs are working with vectors i n-dimensional space. Number of dimension is equal to the number of “features” one object can have. Therefore whole space describe all possible configuration of objects (all possible solutions). For example, if we want to classify human gender basing only on data containing height, weight and feet size, vector space will have three dimensions.

So what is “feature” then? Feature is a property of an object (eg. height, word, etc.) and it’s value describing importance of a particular property to an object. In case of previous example (gender classification) feature can be a height or weight, in case of text document it can be a number of occurrence of particular word in whole document.

Talking about document classification, there are several ways to describe importance of a property to an object.

  • TF (term frequency) : how many times a word appear in a document.
  • boolean : 1 if term exists in document, 0 if not exists
  • logarithmial : tf(t,d) = 1 + log(tf(t,d)) where t – term, d– document,tf– term frequency
  • normalized : tf(t,d) = f(t,d) / max(f(w,d)) where w is any word of a document, f is frequency of a word in document
  • IDF (inverse document frequency) : is a measure of whether the term (t) is common or rare across all documents (D). idf(t,D) = log(|D| / |d containing t|)
  • TFIDF :tfidf(t,d,D) = tf(t,d) x idf(t,D) – this method is very popular and gives quite good results

Finding hyperplane

As I mentioned above, in case of two dimensions, we are looking for a line dividing two sets of points. In two dimensions, any line can be defined as y=ax+b. In case of higher dimensions ax+b becomes w・x + b, where w is normal vector to hyperplane and w・x is a dot product between vector w and x. In order to find such hyperplane we need to find values of parameters w and b which maximize the margin between hyperplane and points from both classes.

Above task need some sophisticated mathematical tools like quadratic programming optimization, Kernel Trick or Lagrange Multipliers. Therefore I can advice you to visit amazing blog of Ian Barber, developer at Google, where you’ll find more details about this algorithm and it’s implementation in PHP. Also on Wikipedia you can find very good explanation about mathematical background of solving this task.

The strongest point of SVM comparing to other classification algorithms is that the classification does not need to be linear. It means data space can be divided by a hyperplane laying in much more higher dimension and the original data space.
Good visualization can be seen below:

How about multiclass classification?

Right, in most of cases we will need an algorithm which can separate data to multiple classes.

As SVM is a binary classification algorithm, in case we want to separate objects into more than two classes we need to train this algorithm separately for each class in one of following way: OneToRest, OneToOne or OneToAll.

After training such algorithm can help us classify many different kind of data relatively quickly.