As with Naive Bayes, Support Vector Machines (or SVMs in short) can be used to solve the task of assigning objects to classes. However, the way this task is solved is completely different to the setting in Naive Bayes.
Each object is considered to be a point in n dimensional feature space, n being the number of features used to describe the objects numerically. In addition each object is assigned a binary label, let us assume the labels are “positive” and “negative”. During learning, the algorithm tries to find a hyperplane in that space, that perfectly separates positive from negative objects. It is trivial to think of settings where this might very well be impossible. To remedy this situation, objects can be assigned so called slack terms, that punish mistakes made during learning appropriately. That way, the algorithm is forced to find the hyperplane that causes the least number of mistakes.
Another way to overcome the problem of there being no linear hyperplane to separate positive from negative objects is to simply project each feature vector into an higher dimensional feature space and search for a linear separating hyperplane in that new space. Usually the main problem with learning in high dimensional feature spaces is the so called curse of dimensionality. That is, there are fewer learning examples available than free parameters to tune. In the case of SVMs this problem is less detrimental, as SVMs impose additional structural constraints on their solutions. Each separating hyperplane needs to have a maximal margin to all training examples. In addition, that way, the solution may be based on the information encoded in only very few examples.