Support Vector Machine with Limited Memory

Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical
foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However, despite these prominent properties, SVMs are usually not chosen for large-scale data mining problems because their training complexity is highly dependent on the data set size. Unlike traditional pattern recognition and machine learning, real-world data mining applications often involve a huge number of data records that does not fit in main memory and a multiple scans of the data set is often too expensive.

Our Clustering-Based SVM (CB-SVM) maximizes the SVM performance for very large data sets given a limited amount of memory. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples. These samples carry statistical summaries of the data and maximize the benefit of learning.

Related Publications



DM - Data Mining Lab, Department of Computer Science and Engineering, Pohang University of Science and Technology

Copyright (c) 2008-2009 POSTECH Data Mining Lab, All Rights Reserved.