An accuracy comparison between scikit-learn’s SVM and LightTwinSVM program

Recently, I’ve introduced the LightTwinSVM program on my blog (If you haven’t read it, check out this post.). It is a fast and simple implementation of TwinSVM classifier. Some people might ask why I should use this program over other popular SVM’s implementation such as LIBSVM and scikit-learn. The short answer is that TwinSVM has better accuracy than that of SVM in most cases.

In order to show the effectiveness of the LightTwinSVM program in terms of accuracy, experiments were conducted on 10 UCI datasets benchmark datasets.

Before presenting the results, parameters selection and evaluation metrics should be explained.

Parameters Selection

The performance of SVM and TwinSVM classifiers is highly dependent on the choice of hyper-parameters. Therefore, grid search is often used to find optimal value hyper-parameters. The penalty parameter \(C\) is selected from the set \(\{2^{i} | i=-8,-7, \dots, 4,5\}\). Also, the RBF kernel was employed as it is often used and yields good generalization. Its parameter \( \gamma \) is chosen from the set \( \{ 2^{i} | i=-10,-9,\dots,1,2 \} \).

Evaluation metrics

Similar to most research papers on classification, K-fold cross-validation is used to evaluate and compare classifiers. K was set to five folds. In this evaluation method, training samples are divided into 5 sets. The classifier is evaluated on 1 set and is trained on the remained sets. This process is repeated five times until the classifier is trained and tested on all the sets. The average accuracy of these runs is reported as the final prediction accuracy. Finally, classification methods are compared with each other based on this prediction accuracy.

Test Environment

Experiments were performed on a system with the following specs:

CPU: AMD Ryzen 7 1800X (Default clock speed) | RAM: 16GB @ 2.4 GHz | OS: Ubuntu 18.04.1 | Python version: 3.6.7

Results

Now, it’s time to present the classification results of the LightTwinSVM program. The table below shows the comparison of LighTwinSVM with scikit-learn’s SVM.

Datasets	LightTwinSVM	Scikit-learn's SVM	Difference in Accuracy
Pima-Indian	78.91	78.26	0.65
Australian	87.25	86.81	0.44
Haberman	76.12	76.80	-0.68
Cleveland	85.14	84.82	0.32
Sonar	75.16	64.42	10.74
Heart-Statlog	85.19	85.19	0
Hepatitis	85.81	83.23	2.58
WDBC	98.24	98.07	0.17
Spectf	80.55	79.78	0.81
Titanic	82.04	81.71	0.33
Mean	83.44	81.90	1.53

As can be seen from the above table, LightTwinSVM outperforms sklearn’s SVM on most datasets. Even though the difference in accuracy between the two classifiers is not very large, for some datasets such as Sonar and Hepatitis, there is a significant difference between the LightTwinSVM and Sklearn’s SVM.

After all, these results indicate that the LightTwinSVM can be used to solve classification tasks that have already been solved by standard SVM. By using TwinSVM, you may get a better classification accuracy for your problem.

If you are interested in using LightTwinSVM for your task/project, its installation guide and example usage can be found on its GitHub repository.

Let me know your thoughts, problems, or questions by leaving a comment below.

Parameters Selection

Evaluation metrics

Test Environment

Results

Leave a Reply Cancel reply