PP-SVM: Privacy Preserving Support Vector Machines

Data Mining has many applications in the real world. Classification is an important sub-class of problems found in a wide variety of situations. Fraud detection is one of the biggest and most important classification problems. Take the case of identifying fraudulent credit card transactions. Banks collect transactional information for credit card customers. Due to the growing threat of identity theft, credit card loss, etc. identifying fraudulent transactions can lead to annual savings of billions of dollars. Deciding whether a particular transaction is true or false is a classification problem. Another completely different, though equally important problem is in healthcare (e.g., diagnosis of disease). Many such problems abound. Currently, classifiers are run locally or over data collected at one central location (i.e., in a data warehouse).  The accuracy of a classifier usually improves with more training data.  Data collected from different sites is especially useful, since it provides a better estimation of the population than the data collected at a single site. However, privacy and security concerns restrict the free sharing of data. There are both legal and commercial reasons to not share data. For e.g., HIPAA laws require that medical data not be released without appropriate anonymization. Similar constraints arise in many applications; European Community legal restrictions apply to disclosure of any individual data. In commercial terms, data is often a valuable business asset. For example, complete manufacturing processes are trade secrets (although individual techniques may be commonly known). Thus, it is increasingly important to enable privacy-preserving distributed mining of information.

Support Vector Machine (SVM) classification is one of the most favorably used classification methodology in data mining and machine learning. SVMs have proven to be effective in many real-world applications.  This website is dedicated to developing secure SVM classification solutions for distributed data.

These solutions are based on our SVM Java implementation:

Related Publications


DM - Data Mining Lab, Department of Computer Science and Engineering, Pohang University of Science and Technology

Copyright (c) 2008-2009 POSTECH Data Mining Lab, All Rights Reserved.