With the wide spread of the Internet, myriad databases have been brought online, providing massive amounts of hidden structured data that are not listed in Web search engines like Google. Data retrieval problem - that of finding relevant data from large databases - has thus become a challenge.
While users incorporate "soft" criteria (e.g., preference or relevance) for
retrieving data and expect ranked results, relational database management
systems (RDBMS) are based on Boolean query models like SQL, supporting "hard"
criteria (e.g., price < $100,000) and returning exact matches.
This research targets to develop an
"Intelligent Data Retrieval System (IDRS)" by adopting cutting edge machine
learning methodologies such as ranking SVM into RDBMS. Ranking SVMs have shown
high accuracy in learning ranking functions but yet to be integrated within
RDBMS for retrieving data. Our initial results on House datasets have shown
promising results of the integration. However, our current implementation is based
on a "loose" integration - Ranking SVM is integrated to RDBMS in the
application level - thus it is yet unscalable to large datasets. To make the system
scalable, it is imperative to support a "tight" integration where ranking SVM is
integrated in the SQL level.
H. Yu, S. Hwang & K. C.-C. Chang, "Enabling Soft Queries for Data Retrieval", Information Systems, Elsevier, 2007 [pdf]
H. Yu, "SVM Selective Sampling for Ranking with Application to Data Retrieval", Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2005. (KDD'05 full paper, 12% accepted) [pdf]
H. Yu, S. Hwang & K. C.-C. Chang, "RankFP: A Framework for Supporting Rank Formulation and Processing", Proc. of IEEE Int. Conf. on Data Engineering, 2005. (ICDE'05 short paper, 19% accepted) [pdf]
DM - Data Mining Lab, Department of Computer Science and Engineering, Pohang University of Science and Technology
Copyright (c) 2008-2009 POSTECH Data Mining Lab, All Rights Reserved.