RankRelief: feature selection algorithm for global order. Implemented by Jinoh Oh (kurin@postech.ac.kr) Hwanjo Yu (hwanjoyu@postech.ac.kr) 1. Files base.pl: library which contain algorithm and other subroutines. summary.plx: perl script for summarize result of each experiment. tau.plx: perl scripts for measure tau value for one excution of svm-light. tau_all.plx: perl scripts for measure tau value for multiple excution of svm-light based on tau.plx selection.plx: perl scripts which excute experiment. It do the 3-fold cross validation by using subroutines in base.pl and measure the result by using tau_all.plx for each iteration of algorithm. After Experiment, it call summary.plx to summarize result of experiment. 2. Discription This source code is about RankRelief algorithm(We call it rankR in follows) which we proposed in paper. We implemented rankR in perl. And we use Ranking SVM as ranking machine. More specifically we use svm-light which are implemented by joachims group. If you want to download svm-light then visit http://svmlight.joachims.org. Main source code which you have to excute is selection.plx file. It use subroutines in base.pl and use other scipts summary.plx and tau_all.plx to make framework of experiment. So if you want to add more experiment, all you need is editing the selection.plx. Except framework, all other algorithms details and subroutines are in base.pl. So it you want to check our algorithm, see base.pl. 3. prerequisite condition As we mentioned above, we implement our algorithm by using svm-light as Ranking Machine. So if you want to use our perl script, you must intall svm-light and setting svm-light properly. If you don't then our code doesn't works. 3.1 Setting SVM-Light 1. if you are not intalled svm-light then, visit http://svmlight.joachims.org and intall the svm-light in your local directory. 2. In our code, we use svm-light command as global command. So if you install the svm-light in your own directory, you have to add information about svm-light in your PATH. You can solve this problem by using .bashrc files or something like that which determined by your system. 3. Make sure you can excute svm-light commans in any directories in your machine. 4. How to run After setting SVM-Light, you can use our perl scripts as follows. $ perl selection.plx [datafile] [fold_size] [data file] means name of datafile. There can be a problem when you use data file in other directory. So, I recommend you to put your testing datafile in the same directory with perl scripts. Format contraints are written in below subsection. [fold_size] means number of folds when you excute cross validation. If you use our dataset (i.e. ohsumed1, ohsumed2, ohsumed3) I recommend use 3-fold cross validation. 4.1 Data format There is contraint on data format of input data. Because we implement our codes on assumption of treat svm-light as Ranking machine, we need to meet our input data to input format of svmlight. So, we want input data which satisfy svm-light format. If you wonder about svm-light input file format, visit http://svmlight.joachims.org. 4.2 Output Through this script, Result are stored in ./result. And in this directory, there are some files and directories. Most important file is summary. This file arrange sequence of result for each algorithm in a row. For example if first row of summary file is "rankR_poly 0.5656 0.3453 ... " then it means result of first iteration by using rankR algorithm and polynomial kernel is 0.5656 and second iteration is 0.3453. Because it summarize results of all algorithms, I think, in usual case, all you need to see will be only summary file. In result directory, there are directories and each one represent result of each algorithm and kernel. For example if you precess rankR algorithm for polynomial kernel and rbf kernel then there are rankR_poly, rankR_rbf directories. In each directory, Fold directories are exist for middel product of reduced feature data set. I think you don't need see inside of Fold directories. All you have to focus is result_number files. Number means number of features which product that result file. For example, result_2625 means that file contains result of cross validation which use 2625 features of data.