The 1 000 Netflix Contest is to develop

  • Slides: 57
Download presentation
The $1, 000 Netflix Contest is to develop a "ratings prediction program“ that can

The $1, 000 Netflix Contest is to develop a "ratings prediction program“ that can beat Netflix’s (called Cinematch) by 10% in predicting what rating users gave to movies. I. e. , predict rating(M, U) where (M, U) QUALIFYING(Movie. ID, User. ID). Netflix uses Cinematch to decide which movies a user will probably like next (based on all past rating history). All ratings are "5 -star" ratings (5 is highest. 1 is lowest. Caution: 0 means “did not rate”). Unfortunately rating=0 does not mean that the user "disliked" that movie, but that it wasn't rated at all. Most “ratings” are 0. Therefore, the ratings data sets are NOT vector spaces! One can approach the Netflix contest problem as a data mining Classification or Prediction problem. A "history of ratings by users to movies“, TRAINING(Movie. ID, User. ID, Rating, Date) is given with which to train your predictor, which will predict the ratings given to QUALIFYING movie-user pairs (Netflix knows the rating given to Qualifying pairs, but you don't. ) Since the TRAINING is very large, Netflix also provides a “smaller, but representative subset” of TRAINING, PROBE(Movie. ID, User. ID) (~2 orders of magnitude smaller than TRAINING). Netflix gives 5 years to submit QUALIFYING predictions. That contest window is about 1/2 gone now. A team can submit as many solution as they wish and at any time. Each October, Netflix give $50, 000 to the team on top the so-called Netflix Leaderboard. Bellcore has won that twice.

The Netflix Contest (USER versus MOVIE voting) One can address the prediction or classification

The Netflix Contest (USER versus MOVIE voting) One can address the prediction or classification problem using several different "approaches". USER VOTERs (approach 1): To predict the rating of a pair, (M, U), we take TRAINING as a vector space of user ratings vectors. The users are the points in the vector space and the movies are the dimensions in that vector space. Since there are 17, 770 movies each user is tuple of 17770 ratings, if all movies are used as dimensions. That’s too many dimensions! The first dimension pruning: restrict to only those movies that U has rated ( =support. U ). We also allow another round of dimension pruning based on correlation with M. Once the dimensions movie set is pruned, we pick a “Set of Near Neighbor users to U”, (NNS) from the users, V, who have rated M ( =support. M ). “Near” is defined based on correlation with U. One can think of this step as the voter pruning step. Note: most correlations calculations involve the other variable also. I. e. , the result of a user pruning depends on the pruned movie set and vice versa. Thus, theoretically, the movie/user pruning steps could be alternated ad infinitum! Our current approach is to allow an initial global dimension prune, then the voter prune, then a final dimension prune. You will see these 3 prune steps in the. config files. We then let voters vote, but they don’t necessarily cast the straight-forward rating(M, V) vote. The best way to think about the 3 pruning steps (and there could be more!) is: We prune down the dimensions so that vector space methods are tractable, emeliorating the curse of dimensionality (the first, which may be turned off, is a global dimension prune (not based on individual voters). The second is the voter prune based on the currently pruned dimensions. The third is a final dimension prune (different for each voter) which give the final vector space over which the vote by that voter is calculated. Then we let those VOTERS vote as to the best rating prediction to be made. There are many ways to prune, vote, tally, and decide on the final prediction. These choices make up the. config file. MOVIE VOTERs (approach 2) is identical with roles of Movies (voters) and Users (dimensions) reversed

The Netflix Contest (Using SLURM to generate a clustering) SLURM has been set up

The Netflix Contest (Using SLURM to generate a clustering) SLURM has been set up to run on the Penryn Cluster 2 (32 8 processor machines - 1 terrabyte of main memory) so that one can create a. config file (must end in. config) which specifies all the parameters for the program. Issuing: . /mpp-submit -S -i Data/probe-full. txt -c pf. 0001/u. 00. 00. config -t. 0001 -d. /pf. 0001 The program pulls parameters from. config: -t. 0001 means Square. Error threshold =. 0001 -d. /pf. 0001 means results goto. /pf. 0001 dir. The prog takes as input, the file Data/probe-full. txt (which is not quite the full probe but close) with format: Takes Input. File. txt Config. File. config Sq. Err. Thrhld Directory as input (Movie. ID with interleaved User. IDs format or. txt format. See next slide) (shows which program to run. In. config format. See next slide) (if Prediction. Sq. Err ≤ Sq. Err. Thrhld, put pair in Dir/lo-Input. File. txt, else put in Dir/hi-Input. File. txt ) (existing directory for the output) mpp-submit –S –i Input. File. txt –c Config. File. config –t Sq. Err. Thrhd –d Dir Puts as output (in Dir) lo-Input. File. Name. txt Hi-Input. File. Name. txt Input. File. Name. config Input. File. Name. rmse

The Netflix Contest (Using SLURM to generate a clustering) . /mpp-submit -S -i Data/probe-full.

The Netflix Contest (Using SLURM to generate a clustering) . /mpp-submit -S -i Data/probe-full. txt -c pf. 0001/u. 00. 00. config -t. 0001 -d. /pf. 0001 Input. File Config. File: pf. 0001/u. 00. 00. config Data/probe-full. txt 1: 30878 2647871 1283744 2488120 317050 1904905 1989766 14756 1027056 1149588 1394012 1406595 2529547 1682104 2625019 2603381 1774623 470861 712610 1772839 1059319 2380848 548064 2: 1959936 748922 1131325 1312846 2314531 1636093 584750 2418486 715897 1172326 etc. Program sets parameters as specified in the. config: [Prune_Movies_in_Co. Sup. UV] method=Movie. Common. Co. Support. Prune leftside = 0 width = 2000 mstrt = 0 mstrt_mult=0 ustrt = 0 ustrt_mult=0 TSa = -100 TSb = -100 Tdvp = -1 Tdvs = -1 Tvdp = -1 Tvds = -1 TD = -1 TP = -1 PPm =. 1 TV = -1 TSD = -1 Ch = 1 Ct = 1 (Part identical to blue for movie voting params) user_voting = enabled movie_voting = disabled user_vote_weight = 1 # processed only if user voting enabled. [user_voting] Prune_Movie_in_Sup. U = disabled Prune_Users_in_Sup. M = enabled Prune_Movies_in_Co. Sup. UV = enabled [Prune_Movies_in_Sup. U] method=Movie. Prune leftside = 0 [Prune_Users_in_Sup. M] width = 30000 method=User. Common. Co. Support. Prune mstrt = 0 leftside = 0 mstrt_mult=0 width = 30000 ustrt = 0 mstrt = 0 ustrt_mult=0 mstrt_mult=0 TSa = -100 ustrt = 0 TSb = -100 ustrt_mult=0 Tdvp = -1 TSa = -100 Tdvs = -1 TSb = -100 Tvdp = -1 Only the method, leftside, width, Ch=Choice, Tdvp = -1 Tvds = -1 Tdvs = -1 Ct=Count parameters are used at this time. TD = -1 Tvdp = -1 TP = -1 Tvds = -1 PPm =. 1 Using this program, the many "lo-u. xx" TD = -1 TV = -1 and, if movie voting is also enabled, "lo. TP = -1 TSD = -1 PPm =. 1 m. yy" files constitute what we have called Ch = 1 TV = -1 Ct = 1 a clustering (tho they’re not mutually TSD = -1 exclusive). Once we have {z-lo. xx. yy | z=u of Ch = 1 m } we can make a submission by: Ct = 1 where 1: and 2: are movie. IDs and the others are user. IDs. Note, this in an interleaved format of a 2 -column DB file, probe-full(movie. ID, user. ID) qualifying pair (m, u), use correlations to pick program to make that prediction.

The Netflix Contest (Using this scheme to predict Qualifying pair ratings) The above prediction

The Netflix Contest (Using this scheme to predict Qualifying pair ratings) The above prediction scheme requires the existence of Square Errors (Sq. Err), e. g. , clusters files, lo-u. vv. nn. txt and lo-m. nn. vv. txt are composed of all input pairs such that Sq. Err ≤. 0001 To predict rating(M, U) for pairs from Qualifying, we won’t have answers, so we won’t have Sq. Errs of our predictions relative to those answers. So how can we form good cluster then? Once that’s decided what matchup algorithm should we use to match a cluster (program) to a Qualifying pair to be predicted? After the clusters are created, we can try the matchup algorithms that worked best for Probe predictions, but We may want to develop new ones because the performance of those matchup algorithms may depend on the way the clusters were created. We could use the same 288 configs to generate a new config-subset-collection of Qualifying pairs using, e. g. , prediction some kind of prediction variation instead of thresholded prediction Sq. Err? lo-u. vv. nn. txt could be constructed to consist of Qualifying pairs as follows (a variation based method): Set all answers in Qualifying to 1. Use. /mpp-submit to create clusters as above (threshold=. 0001) in a directory, q 1. Set all answers in Qualifying to 2. Use. /mpp-submit to create clusters as above (threshold=. 0001) in a directory, q 2, etc. This will create a clustering of 288*5=1440 cluster sets (but, of course, only 288 different programs configs). One could matchup a Qualifying pair using count-based correlations, Pearson-correlations, 1 -perpendicular-correlations, or? One could matchup (M, U) with the cluster in which the sum of the M and U counts (or counts relative to cluster size) is max? Other?

Mi: urd The Netflix Files {Mi} i=1. . 17770 given by Netflix as: avg:

Mi: urd The Netflix Files {Mi} i=1. . 17770 given by Netflix as: avg: 5655 u/m u. ID rating u i 1 rmk, u ui 2 date dmk, u Mi ( u. ID, Rating, Date ). . For each Movie. ID, Mi, . this is a file of all users who rated it, the rating, ui n the rating date. i mu u 1. . . uk. . . u 480189 m 1 : rmhuk mh avg: 209 m/u ---- 100, 480, 507 ---- Training (Mid, Uid, R, D) orderd by Mid: m. ID m 1 u. ID u 1 u 2 rating rm, u day_number dm, u Training (Uid, Mid, R, D) ordered by Uid: ---- 17, 770 ------- m 17770 . TRAINING in My. SQL with key (m. ID, u. ID) 11 -bit day numbers starting at 1=1/1/99. and ending at 2922=12/31/06. . m 17770 u 480189 r 17770, 480189 d 17770, 480189 or U 2649429 u. ID u 1 m. ID m 1 m 2 rating ru, m day_number du, m . TRAINING in My. SQL with key (u. ID, m. ID) 11 -bit day numbers starting at 1=1/1/99. and ending at 2922=12/31/06. . u 480189 m 17770 TRAINING as M-U interaction cube (Rolodex 47 B Model, mu) : . . b= 13 day_numbers. 4 b= 3 b= ratings Pu 2 b= bit-sliced TRAINING: M-U interaction cube (Rolodex Model, mu) b= 480189 , 0 1 b= 0 1 m 1 0 0 1 mh 0 Pmh, 2 0 1 0 m 17770 1 uk u 480189

The Program: Code Structure - the main modules mpp-mpred. C reads a Neflix PROBE

The Program: Code Structure - the main modules mpp-mpred. C reads a Neflix PROBE file Mi(Uid) and passes Mi and Probe. Support(Mi) to mpp-user. C to make predictions for each pair (Mi, U), foreach U Probe. Support(Mi). It can also calls separate instances of mpp-user. C for many Us, to be processed in parallel (governed mpp-user. C movie-vote. C user-vote. C by the number of "slots" specified in 1 st code line. ) prune. C mpp-user. C loops thru Probe. Support(M), the ULOOP, reading in the designated (matchedup) config file, then out a (Mi, U) prediction for each U. writing If the user-vote-approach is used , mpp-user. C calls user-vote. C, passing it (M, Support(M), U, Support(U)). If the movie-vote-approach is used, mpp-user. C calls movie-vote. C, passing it (M, Support(M), U, Support(U). user-vote. C does the specified pruning by calling prune. C, looping through the pruned set of user voters, V, calculating a vote for each, combining those votes and returning a prediction_vote(M, U) movie-vote. C does similarly.

What kind of pruning can be specified? mpp-mpred. C Again, all parameters are specified

What kind of pruning can be specified? mpp-mpred. C Again, all parameters are specified in a configuration file and the values specified there are consumed at runtime using, e. g. , the call: mpp -i Input_. txt_file -c config -n 16 where Input_. txt_file is the input Probe subset file and 16 is the number of parallel threads that mpp-mpred. C will generate (here, 16 movies are processed in parallel, each sent to a separate instantiation of mpp-user. C) mpp-user. C A sample config file is given later. There are up to 3 types of pruning used (for pruning down support(M) as the set of all users that rate M or down support(U) as the set of all movies that rate U: movie-vote. C user-vote. C pruning 1. correlation or similarity threshold based pruning 2. count based pruning 3. ID window based pruning Under correlation or similarity threshold based pruning, and using support(M)=sup. M for example (pruning support(U) is similar) we allow any function f: sup. M [0, High. Value] to be called a user correlation provided only that f(u, u)=High. Value for every u in sup. M. Examples include Pearson_Correlation, Gaussian_of_Distance, 1_perp_Correlation (see appendix of these notes), relative_exact_rating_match_count (Tingda is using), dimension_of_common_cosupport, and functions based on Standard Deviations. Under count based pruning, we usually order by one of the correlations above first (into a multimap) then prune down to a specified count of the most highly correlated. Under ID window based pruning we prune down to a window of user. IDs within sup. M (or movie. IDs within sup. U) by specifying a leftside (number added to U, so leftside is relative to U as a user. ID) and a width.

How does one specify prunings? mpp-mpred. C Again, in a file (this one is

How does one specify prunings? mpp-mpred. C Again, in a file (this one is named config) there is a section for specifying the parameters for user-voting and a separate mpp-user. C section for specifying parameters for movie-voting. E. g. , for movie voting, at the bottom, there are 3 external movie-vote. C prunings possible (0 or more can be chosen): user-vote. C 1. an intial pruning of dimensions to be used (since dimensions are user, it prunes sup. M): prune. C 2. a pruning of movie voters, N, (in sup. U) 3 a final pruning of dimensions (Co. Support(M, N) for the specific movie voter, N. E. g. , parameters are specifies type of prune (there are 3 types: User. Prune with a full range of possibilities; specified for this final prune as follows: User. Fast. Prune with just Pearson. Correlation pruning; Common. Co. Support. Prune which orders users, V, according to the size of their Common. Co. Support with U only (note that this is a correlation of sorts too. ) [movie_voting Prune_Users_in_Co. Sup. MN] method = User. Common. Co. Support. Prune leftside = 0 specify leftside (from Uid) of an ID interval prune of sup. M specify the width of an ID interval prune of sup. M width = 8000 mstrt = 0 specify starting movie (intercept and slope) for N loop mstrt_mult = 0. 0 ustrt = 0 specify starting movie (intercept and slope) for V loop ustrt_mult = 0. 0 specify Pearson. Corr threshold (a=Amal, meaning: use Amal's table lookup) TSa = -100 specify Pearson. Corr threshold (b=bill, meaning: use bill's formula - note if there has been prior pruning this TSb = -100 will have a different value than Amal's) Tdvp = -1 threshold "diff of vectors" population-based std_dev prune Tdvs = -1 threshold "diff of vectors"sample-based std_dev prune Tvdp = -1 Note: all thresholds are for threshold "vectorof diffs" population-based std_dev prune Tvds = -1 similarities, not distance threshold "vector of diffs"sample-based std_dev prune TD = -1 threshold (Gaussian of) Euclidean distance based prune i. e. , when we start with a TP = -1 distance we follow it with the threshold for (Gaussian of) 1 perpendicular distance prune exponent for (Gaussian of) 1 perpendicular distance prune PPm =. 1 Gaussian to make it a TV = -1 threshold (Gaussian of) a variation based prune similarity or correlation. TSD = -1 threshold std_dev based prune Ch = 1 Picks odering for count-based prune below: 1=Amal_Pearson, 2=Bill_Pearson, etc. Ct = 2 threshol for count based prune

mpp-mpred. C 1 /** file * * This contains the main entry point and

mpp-mpred. C 1 /** file * * This contains the main entry point and contains the code for driving * the multi-process shared memory implementation of the vertical PTree * based predictor system. */ /* Standard includes. */ #include <stdlib. h> #include <unistd. h> #include <stdio. h> #include <wait. h> #include <sys/types. h> #include <time. h> /* Standard C++ includes. */ #include <fstream> #include <iostream> #include <vector> /* Local C++ includes. */ #include "mpp. Config. H" #include "Prediction. Config. H" #include "User. Set. H" #include "Movie. Set. H" #include "mpp. h" using namespace std; /* Definition of structures static to this module. */ struct task_table { int pid; int movie; int predictions; time_t start; }; /* * The following two global variables define the two sets of PTree's * which will be used to carry out the predictions. * * The User. Set of PTree's have user rating PTree's across the vertical * axis of the table. Each rating is encoded using three PTree's. * The Movie. Set has movie rating PTree's across the vertical axis of * the table. Each movie is encoded using three PTree's. */ User. Set Users; Movie. Set Movies; int top. Mov. K = 5, ver. K = 50; bool use_pearson_movies = false; /* * The minimum user correlation required to be eligible to participate * in voting. */ float Minimum_User_Correlation = 0. 5; float cor. Data[17771]; unsigned short int sup. Data[17771]; string probe; /* External functions. */ extern int Mpred_User_Predict(mpp. Config &, unsigned long int, vector <int> &, PTree &); /** * Internal private function. * * This function prints the current status of the task table. It is * an encapsulation function for reducing the complexity of the * job_table function. * In the case of either transaction a status table is printed out * which reflects the current progress of the simulation. * * param max_slots The maximum number of subordinates process * which will be managed. * * param table A pointer to the task table which is to * be changed. * * param changed The slot number in the task table which is * being updated.

mpp-mpred. C 2 * param reason A character pointer to a description string *

mpp-mpred. C 2 * param reason A character pointer to a description string * indicating why the table is being updated. */ extern void print_job_table(int max_slots, struct task_table const * const table, int const changed, char const * const reason) { auto int entry; auto time_t now = time(NULL); fprintf(stdout, "Task status change: %s", ctime(&now)); fputs("t. Slott PIDt. Moviet. Usersn", stdout); fputs("t-----t-----n", stdout); for (entry= 0; entry < max_slots; ++entry) { fprintf(stdout, "t%-5 dt%5 dt%5 d", entry, table[entry]. pid, table[entry]. movie, table[entry]. predictions); if ( entry == changed ) fprintf(stdout, "t<- %sn", reason); else fputs("n", stdout); } fputs("n", stdout); return; } /** * Internal private function. * * This function maintains a table which correllates process ID's with * the movies they are processing, the total number of predictions * required per movie and the time required to process a movie. * * Depending on the value of the movie number arguement this function * either stores the relationship or retrieves the movie associated * with the PID. * * In the case of either transaction a status table is printed out * which reflects the current progress of the simulation. * * param max_slots The maximum number of subordinate processes * which are under management. * param pid The process ID number. * param movie_number A movie value of zero causes this function to locate * and return the PID of the subordinate slave process * which is processing the momvie. A non-zero value * causes the PID to be stored in the relationship array. * param predictions This arguement is only referenced when an update * is made to the task table. This arguement is * the number of customer predictions to be made * for the movie being scheduled * return No return values are defined. */ extern void job_table(int max_slots, int const pid, int const movie_number, int const predictions) { auto char msg[50]; auto int lp, changed = 0; auto time_t now = time(NULL); static int movie_count = 0, prediction_count = 0; static bool first = true; static struct task_table *table; /* Initialize the process table on the first call. */ if ( first ) { size_t amt = max_slots * sizeof(struct task_table); table = (struct task_table *) malloc(amt); if ( table == NULL ) { fputs("Cannot allocate job table. n", stderr); exit(1); } for (lp= 0; lp < max_slots; ++lp) { table[lp]. pid = 0; table[lp]. movie = 0; table[lp]. predictions = 0; table[lp]. start = 0;

mpp-mpred. C 3 * param reason A character pointer to a description string *

mpp-mpred. C 3 * param reason A character pointer to a description string * indicating why the table is being updated. */ extern void print_job_table(int max_slots, struct task_table const * const table, int const changed, char const * const reason) { auto int entry; auto time_t now = time(NULL); fprintf(stdout, "Task status change: %s", ctime(&now)); fputs("t. Slott PIDt. Moviet. Usersn", stdout); fputs("t-----t-----n", stdout); for (entry= 0; entry < max_slots; ++entry) { fprintf(stdout, "t%-5 dt%5 dt%5 d", entry, table[entry]. pid, table[entry]. movie, table[entry]. predictions); if ( entry == changed ) fprintf(stdout, "t<- %sn", reason); else fputs("n", stdout); } fputs("n", stdout); return; } /** * Internal private function. * * This function maintains a table which correllates process ID's with * the movies they are processing, the total number of predictions * required per movie and the time required to process a movie. * * Depending on the value of the movie number arguement this function * either stores the relationship or retrieves the movie associated * with the PID. * * In the case of either transaction a status table is printed out * which reflects the current progress of the simulation. * * param max_slots The maximum number of subordinate processes * which are under management. * param pid The process ID number. * param movie_number A movie value of zero causes this function to locate * and return the PID of the subordinate slave process * which is processing the momvie. A non-zero value * causes the PID to be stored in the relationship array. * param predictions This arguement is only referenced when an update * is made to the task table. This arguement is * the number of customer predictions to be made * for the movie being scheduled * return No return values are defined. */ extern void job_table(int max_slots, int const pid, int const movie_number, int const predictions) { auto char msg[50]; auto int lp, changed = 0; auto time_t now = time(NULL); static int movie_count = 0, prediction_count = 0; static bool first = true; static struct task_table *table; /* Initialize the process table on the first call. */ if ( first ) { size_t amt = max_slots * sizeof(struct task_table); table = (struct task_table *) malloc(amt); if ( table == NULL ) { fputs("Cannot allocate job table. n", stderr); exit(1); } for (lp= 0; lp < max_slots; ++lp) { table[lp]. pid = 0; table[lp]. movie = 0; table[lp]. predictions = 0; table[lp]. start = 0; }

mpp-mpred. C 4 first = false; } /* Add a task to the table.

mpp-mpred. C 4 first = false; } /* Add a task to the table. */ if ( movie_number != 0 ) { for (lp= 0; lp < max_slots; ++lp) { if ( table[lp]. pid == 0 ) { changed = lp; table[lp]. pid = pid; table[lp]. movie = movie_number; table[lp]. predictions = predictions; table[lp]. start = now; print_job_table(max_slots, table, changed, "Started"); fflush(stdout); return; } } } /* Remove a task from the table. */ for (lp= 0; lp < max_slots; ++lp) { if ( table[lp]. pid == pid ) { auto time_t run_time = time(NULL) - table[lp]. start; auto float per_user = run_time; prediction_count += table[lp]. predictions; snprintf(msg, sizeof(msg), "Completed: %lu " "[%. 2 f/user] secs. ", run_time, per_user/table[lp]. predictions); print_job_table(max_slots, table, lp, msg); table[lp]. pid = 0; table[lp]. movie = 0; table[lp]. predictions = 0; table[lp]. start = 0; fprintf(stdout, "t. Movies: %5 dt. Predictions: %dnn", ++movie_count, prediction_count); fflush(stdout); return; }}} /** * Main program starts here. */ int main(int argc, char **argv) { /* The following variable controls whether or not movie predictions * are to be run in parallel, ie. each in its own process. */ auto bool have_input = false, single_threaded = true; char snbufr[10]; int movie_count = 0; int max_process_slots, process_count = 0; pid_t pid; time_t run_start, t 1, t 2; string data_root = PTREEDATA"/"; string corr_root = data_root + "mv_corr/co_mv_"; string supp_root = data_root + "mv_supp/sp_mv_"; string ptree_set_id = data_root + "nf_us_mv_pt"; string ptree_set_id. T = data_root + "nf_mv_us_pt"; ifstream in. File 1; ifstream in. File 2; auto mpp. Config config; /* Option parsing. */ auto int gopt; while ( (gopt = getopt(argc, argv, "C: c: i: n: ")) != EOF ) { switch ( gopt ) { case 'c': if ( !config. read_config(optarg) ) { fprintf(stderr, "%s: Cannot read " "standard configuration - " "%sn", argv[0], optarg); exit(1); }

mpp-mpred. C 5 break; case 'C': if ( !config. read_cluster_config(optarg) ) { fprintf(stderr, "%s:

mpp-mpred. C 5 break; case 'C': if ( !config. read_cluster_config(optarg) ) { fprintf(stderr, "%s: Cannot read " "cluster configuration - " "%sn", argv[0], optarg); exit(1); } break; case 'i': have_input = true; probe. assign(optarg); break; case 'n': single_threaded = false; max_process_slots = atoi(optarg); break; } } if ( !have_input ) { fprintf(stderr, "%s: No input file specified. n", argv[0]); return 1; } if ( !config. is_standard_config() && !config. is_cluster_config() ) { fprintf(stderr, "%s: No configuration specified. n", argv[0]); return 1; } fprintf(stderr, "%s: Vertical Rating Predictor - %snn", argv[0], VERSION); fputs("Data files: n", stderr); fprintf(stderr, "tid: t%sn", ptree_set_id. c_str()); fprintf(stderr, "tid. T: t%sn", ptree_set_id. T. c_str()); fprintf(stderr, "tsupp: t%s*n", supp_root. c_str()); fprintf(stderr, "tcorr: t%s*nn", corr_root. c_str()); fprintf(stderr, "t. Input: t%snn", probe. c_str()); if ( single_threaded ) fputs("Mode: single-threadedn", stderr); else fprintf(stderr, "Mode: %d way multi-processorn", max_process_slots); if ( config. is_standard_config() ) { auto Prediction. Config *pcfg = config. get_standard_config(); fputs("n. Prediction configuration: n", stderr); pcfg->print(stderr); } /** Load the rating data as two separate sets of PTree's. */ t 1=time(NULL); fputs("Data load started. n", stderr); fputs("t. User ptrees - ", stderr); if ( !Users. load_binary() ) { fputs("nn. Failed load. n", stderr); return 1; } fputs("identities - ", stderr); if ( !Users. load_identities() ) { fputs("nn. Failed load. n", stderr); return 1; } fputs("completed. n", stderr); fputs("t. Movie ptrees - ", stderr); if ( !Movies. load_binary() ) { fputs("nn. Failed load. n", stderr); return 1; } fputs("completed. n", stderr); t 2=time(NULL); fprintf(stderr, "Data load completed, time = %unn", t 2 - t 1); ifstream in. File; in. File. open(probe. c_str() ); char str[100]; int last_movie_id = 0, new_movie_id = 0; bool last_movie = true; in. File>>str; string str 1(str); str 1. erase(str 1. size()-1); new_movie_id = atoi(str 1. c_str()); /* Start of loop over movies begins here. */ run_start = time(NULL); for(int movie_cnt= 0; !in. File. eof(); movie_cnt++) { vector <int> probe. Us;

++movie_count; last_movie_id = new_movie_id; last_movie = true; mpp-mpred. C 6 while( last_movie && (in.

++movie_count; last_movie_id = new_movie_id; last_movie = true; mpp-mpred. C 6 while( last_movie && (in. File>>str) ) { string str 1(str); if (str 1. at(str 1. size() - 1) == ': ') { str 1. erase(str 1. size() - 1); new_movie_id = atoi(str 1. c_str()); last_movie = false; } else probe. Us. push_back(atoi(str 1. c_str())); } /* Check to see if predictions of movies are * to be single-threaded. If so run the * movie prediction synchronously and then * skip to the next movie. */ if ( single_threaded ) { auto time_t now = time(NULL); auto float start = now; fprintf(stderr, "Starting movie: %d, " "Users: %d, ", M, probe. Us. size()); Mpred_User_Predict(config, M, probe. Us, user_list); now = time(NULL); fprintf(stderr, "Completed: %2. 0 f " "[%. 2 f/user] secs. nn", now - start, (now - start)/probe. Us. size()); continue; /* M is the movie to be predicted. */ t 1 = time(NULL); unsigned long int M = last_movie_id - 1; /* read the pearson correlations for movies * NOTE using pearson not Perp * Try to find bes co-related movie set for * pmv */ snprintf(snbufr, sizeof(snbufr), "%d", last_movie_id); string sn(snbufr); string out. Corr 1 = corr_root + sn + ". bin"; in. File 1. open( out. Corr 1. c_str() ); string out. Supp 1 = supp_root + sn + ". bin"; in. File 2. open( out. Supp 1. c_str() ); in. File 1. read(reinterpret_cast<char*>(&cor. Data), 17771*sizeof(float)); in. File 2. read(reinterpret_cast<char*>(&sup. Data), 17771*sizeof(short int)); in. File 1. close(); in. File 2. close(); /* Get the list of users who have rated this movie. */ auto PTree user_list = Movies. get_users(M); } /* Start prediction for movie pmv for given * users in probe. User set. Fork a new process and * generate customer predictions in this new fork. */ if ( process_count < max_process_slots ) { pid = fork(); if ( pid == -1 ) { perror("FPP fork failed. "); exit(1); } /* Child - process movie and exit. */ if ( pid == 0 ) { Mpred_User_Predict(config, M, probe. Us, user_list); _exit(0); } /* Parent - update task table. */ ++process_count; job_table(max_process_slots, pid, M, probe. Us. size()); } /* Wait for any child processes to complete. */ if ( process_count == max_process_slots ) { int status; pid = wait(&status); if ( pid == -1 ) { perror("FPP wait failed. "); exit(1); } --process_count; job_table(max_process_slots, pid, 0, 0); if ( WIFEXITED(status) == 0 ) { fprintf(stderr, "t. Error in movie, " "status = %dn", WEXITSTATUS(status)); } } } /* Capture all remaining slave processes. */ do { int status; pid = wait(&status); if ( pid == -1 ) { fputs("No processes left. n", stderr); process_count = 0; continue; } --process_count; job_table(max_process_slots, pid, 0, 0); if ( WIFEXITED(status) == 0 ) { fprintf(stderr, "t. Error in movie, " "status = %dn", WEXITSTATUS(status)); } } while ( process_count > 0 ); in. File. close(); fputs("n. Predictions completed. n", stderr); fprintf(stderr, "t. Movies: %dn", movie_count); fprintf(stderr, "t. Time: %dn", time(NULL) -run_start); return 0; }

/** file * This file contains the driver code which * implements predictions of

/** file * This file contains the driver code which * implements predictions of recommendations. */ mpp-user. C 1 /* Program compilation defines folloow. * * These defines enable and control generation of movie specific logfiles. * The MOVIE_LOGGING define needs to be enabled to turn on generation of * logfiles. Other defines increase the amount of output generated. */ #if 0 #define MOVIE_LOGGING #endif #if 0 #define MEMORY_LOGGING #endif #if 0 #define VOTE_LOGGING #endif // Include files. #include <stdio. h> #include <time. h> // Standard C++ includes. #include <fstream> #include <iostream> #include <vector> #include <map> #include <utility> // Local C++ include files. #include <PTree. Set. H> #include "mpp. Config. H" #include "User. Set. H" #include "Movie. Set. H" /* Standard C include files. */ #include "mpp. h" using namespace std; // External variables. extern int top. Mov. K, ver. K; extern bool use_pearson_movies; extern float Minimum_User_Correlation; extern float cor. Data[17771]; extern unsigned short int sup. Data[17771]; extern string probe; // CREATES, OPENS logfile if logging enabled, else NULL returned LOGGING #if defined(MOVIE_LOGGING) static inline FILE * open_logfile(string movie_number) { auto string logname(". /Output/" + probe. substr(probe. find_last_of('/') + 1) + "_" + movie_number + ". log"); return(fopen(logname. c_str(), "w+")); } #else static inline FILE * open_logfile(string movie_number) {return NULL; } #endif // ENABLING causes nearest nbr user voting to print for each prediction. #if defined(VOTE_LOGGING) static inline void print_votes( FILE *logfile, int user, double vote, double weight, double v. Rt, double VBar, double Ub, double voter_corr) { if ( logfile == NULL ) return; fprintf(logfile, "tt. Vote: %. 2 ft. Weight: %. 2 ft. User: %dn", vote, weight, user); fprintf(logfile, "tttv. Rt: %. 2 ft. Vbar: %. 2 ft. Ub: %. 2 fn", v. Rt, VBar, Ub); fprintf(logfile, "ttt. Cor: %. 2 fnn", voter_corr); return; } #else static inline void print_votes( FILE *logfile, int user, double vote, double weight, double v. Rt, double VBar, double Ub, double voter_corr){ return; } #endif // Enabling prints amount of memory consumed against given starting pt. #if defined(MEMORY_LOGGING) static inline void log_memory(FILE *logfile, const char *fmt, void *start) { fprintf(logfile, fmt, (char *) sbrk(0) - (char *) start); return; } #else static inline void log_memory(FILE *logfile __attribute__ ((unused)), const char *fmt __attribute__ ((unused)), void *start __attribute__ ((unused))) { return; } #endif extern int Mpred_User_Predict (mpp. Config &config, unsigned long int M, vector <int> & user_list, PTree & M_support) { auto void *movie_memory_start; auto char snbufr[10]; auto time_t start_time = time(NULL); auto unsigned long int U; auto FILE *predictions; auto FILE *logfile; auto Prediction. Config *pcfg = NULL;

mpp-user. C 2 // OPEN log and prediction files. snprintf(snbufr, sizeof(snbufr), "%lu", Movies. get_identity(M));

mpp-user. C 2 // OPEN log and prediction files. snprintf(snbufr, sizeof(snbufr), "%lu", Movies. get_identity(M)); string sn(snbufr); string out. Pred. Name(". /Output/"+probe. substr(probe. find_last_of('/')+1) + "_" + sn + ". predict"); logfile = open_logfile(sn); if ( (predictions = fopen(out. Pred. Name. c_str(), "w+")) == NULL ) { fputs("Cannot open prediction file. n", stderr); return 0; } fprintf(predictions, "%lu: n", Movies. get_identity(M)); if ( logfile != NULL ) fflush(logfile); /* * Write descriptor to output logfile and the number of the movie * to the prediction file. */ if ( logfile != NULL ) fprintf(logfile, "n. Beginning movie: %5 dt. Users: %dt" "PID: %dn", Movies. get_identity(M), user_list. size(), getpid()); if ( logfile != NULL ) movie_memory_start = sbrk(0); support. M. clearbit(U); support. U. clearbit(M); if ( support. M. get_count() < 1) { fprintf(predictions, "%. 2 fn", vote); fflush(predictions); continue; } /* Get configuration information. */ if ( config. is_standard_config() ) pcfg = config. get_standard_config(); if ( config. is_cluster_config() ) { pcfg = config. select_configuration(Users, U); config. show_selection(logfile); } /* Config file needs: (mpp-user part) * External Pruning: * 1. Reset support in movie-vote call: yes, no. * * Voting selection: * 2. Set vote_wt: 0 <= vote_wt <= 1 * (VOTE_wt = 1 - vote_wt) * Forcing in Range: * 5. Select 0, 1 or 2 force_vote_in_ranges: * user-vote movie-VOTE */ /* Select eligible clusters for this movie. */ if ( config. is_cluster_config() ) config. select_clusters(Movies, M); /* User voting. */ if ( pcfg->do_user_voting() ) vote = user_vote(pcfg, M, support. M, U, support. U); //if ( vote < 1 ) vote = 1; else if ( vote > 5 ) vote = 5; /* Loop over users starts here. */ for (unsigned int user= 0; user < user_list. size(); ++user) { auto double vote = DEFAULT_VOTE, VOTE = DEFAULT_VOTE, vote_wt = 0. 0, VOTE_wt = 0. 0; /* Movie voting. */ if ( pcfg->do_movie_voting() ) VOTE = movie_vote(pcfg, M, support. M, U, support. U); //if ( VOTE < 1 ) VOTE = 1; else if ( VOTE > 5 ) VOTE = 5; U = Users. get_index(user_list[user]); auto PTree support. M(M_support), support. U = Users. get_movies(U); /* Set user_vote_weight here. */ vote_wt = pcfg->get_user_vote_weight(); VOTE_wt = 1. 0 - vote_wt; vote = (vote * vote_wt + VOTE * VOTE_wt ) / (vote_wt + VOTE_wt);

mpp-user. C 3 //sum. SCor=sum. SCor/countdim. MN; sum. PCor=sum. PCor/countdim. MN; sum. DCor=sum. DCor/countdim.

mpp-user. C 3 //sum. SCor=sum. SCor/countdim. MN; sum. PCor=sum. PCor/countdim. MN; sum. DCor=sum. DCor/countdim. MN; sumdim. MN=sumdim. MN/countdim. MN; //sums. Cor=sums. Cor/countdim. UV; sump. Cor=sump. Cor/countdim. UV; sumd. Cor=sumd. Cor/countdim. UV; sumdim. UV=sumdim. UV/countdim. UV; // vote=(vote*sumdim. UV + VOTE*sumdim. MN)/(sumdim. UV+sumdim. MN); //auto double red=. 4; vote=(vote*exp(-pow(Vsdp, 2))+VOTE*exp(-red*pow(Nsdp, 2)))/(exp(-pow(Vsdp, 2))+exp(-red*pow(Nsdp, 2))); //auto double red=1. 0; vote=(vote*exp(-pow(Vsdp, 2))+VOTE*red*exp(-pow(Nsdp, 2)))/(exp(-pow(Vsdp, 2))+red*exp(-pow(Nsdp, 2))); // if ( sums. Cor>sum. SCor + 0. 1 ){ vote=( vote*sumdim. UV*(2+sums. Cor)+VOTE*sumdim. MN*(2+sum. SCor) )/( sumdim. UV*(2+sums. Cor)+sumdim. MN*(2+sum. SCor)); } // vote=VOTE; // if ( Nsdp < 2. 0 && Vsdp > 2. 0 && sum. SCor +. 5 > sums. Cor ) vote=VOTE; // if ( Nsdp < 0. 5 && Vsdp > 2 ) vote=VOTE; // vote=(vote*exp(-pow(Vsdp, 2) ) + VOTE*exp(-pow(Nsdp, 2)))/(exp(-pow(Vsdp, 2)) + exp(-pow(Nsdp, 2))); //. 937465(95) // Final output occurs here. if ( (vote < 1) && (vote != DEFAULT_VOTE) ) vote = 1; if ( (vote > 5) && (vote != DEFAULT_VOTE) ) vote = 5; // force vote into range fprintf(predictions, "%. 2 fn", vote); fflush(predictions); if (logfile != NULL) fprintf(logfile, "t. Prediction #%d: %0. 1 ftuser: %ut" "config: %snn", user, vote, Users. get_identity(U), pcfg->get_name()); } // ULOOP end if (logfile!=NULL) { float total_time = time(NULL) - start_time; fprintf(logfile, "Ending movie: %dt. Time: %. 2 f [%. 2 f/user] " "secs. t", Movies. get_identity(M), total_time, (float) (total_time/user_list. size())); log_memory(logfile, "Memory: %dn", movie_memory_start); fputs("n", logfile); fclose(logfile); } fclose(predictions); return 0; } // MLOOP end

/** file This file contains the implementation of the user voting function. */ User-vote.

/** file This file contains the implementation of the user voting function. */ User-vote. C 1 /* Include files. */ #include <stdio. h> #include <math. h> #include <PTree. H> #include "Movie. Set. H" #include "User. Set. H" #include "mpp. Config. H" #include "Prediction. Config. H" #include "mpp. h" /* Config file needs: (user-vote part) * u. Cor Internal Pruning: * * 1. Select 0 or 1 of * 1. 1 For selected in 1, set Threshold: * Threshold defaults are: * * * u. Cor vote weighting: * 2. Select 0 or 1 of * * Standard Deviation Internal Pruning: * * 3. Select 0 or more of: * 3. 1 Foreach selected in 2, set Threshold: * Threshold defaults are: * * 3. 2 Foreach selected in 2, set pow exp: * Power Exponent defaults are: * * External Pruning: * 4. Select 0 or more of: * 4. 1 Foreach selected in 2, select 1 of: * * 4. 2 Reset non-pruned support in 2 nd: * * 4. 3 Foreach selected in 2, set parameter: * Prune Parameter defaults are: * * Forcing in Range: * 5. Select 0 or more force_vote_in_range: */ dv. Corp, dv. Thrp, 0 dv. Cors, dv. Thrs, 0 vd. Corp, vd. Thrp, 0 vd. Cors, vd. Thrs, 0 p. Cor, p. Thr, 0 d. Cor, d. Thr, 0 s. Cor s. Thr, 0 (Default u. Cor=1. By selecting 1 of these, we reset u. Cor value to it. ) dv. Corp, dv. Cors, vd. Corp, vd. Cors, p. Cor, d. Cor, s. Cor (population/sample; diffference_of_vectors/vector_of_differences) d. UVsdp, d. UVsdp. Thr, 0 d. UVsds, d. UVsds. Thr, 0 Vsdp_Usdp, Vsdp_Usdp. Thr, 0 Vsds_Usds. Thr 0 d. UVsdp. Exp, -1 d. UVsds. Exp, -1 Vsdp_Usdp. Exp, -1 Vsds_Usds. Exp -1 Prune_Movies_In_Sup. U, Prune, yes, Prune_Users__In_Sup. M, Fast. Prune, Prune_Movies_In_Co. Sup. UV Common. Co. Support. Prune no. mstrt, ustrt, TSa, TSb, Tdvp, Tdvs, Tvdp, Tvds, TD, TP, PPm, TV, TSD, Ch, Ct 0 0 -100 -1 -1 -1 1 no def in_Voter_LOOP after_Voter_LOOP before_return

/** * Public function. * This function implements user voting. * * param pcfg

/** * Public function. * This function implements user voting. * * param pcfg A pointer to the class containing the parameters * which configure the voting. * param M The movie number for which a prediction is to be * made * * param support. M The PTree identifying the support for the movie * to be predicted. * param U The identity number of the user for which a * prediction is to be made. * param support. U The Ptree identifying the support for the user * who a predication is being made for. * return The recommended prediction. */ extern double user_vote(Prediction. Config *pcfg, unsigned long int M, PTree & support. M, unsigned long int U, PTree & support. U) { /* Enabled for boundary based prediction revisions. */ #if 0 auto double z 0 IP 55=0, z 0 IP 44=0, z 0 IP 33=0, z 0 IP 22=0, z 0 IP 11=0, z 0 IP 15=0, z 0 IP 14=0, z 0 IP 13=0, z 0 IP 12=0, z 0 IP 51=0, z 0 IP 41=0, z 0 IP 31=0, z 0 IP 25=0, z 0 IP 24=0, z 0 IP 23=0, z 0 IP 52=0, z 0 IP 42=0, z 0 IP 35=0, z 0 IP 34=0, z 0 IP 53=0, z 0 IP 45=0, z 0 IP 54=0; #endif auto double vote = DEFAULT_VOTE, vote_sum = 0, vote_cnt = 0; auto double Vb, Ub, ds. Sq, u. Cor = 1; struct pruning *internal_prune; struct external_prune *external_prune; auto PTree sup. M = support. M, sup. U = support. U; sup. M. clearbit(U); sup. U. clearbit(M); User-vote. C 2 /* External pruning: PRUNE MOVIES sup. U */ external_prune = pcfg->get_user_Prune_Movies_in_Sup. U(); if ( external_prune->enabled ) { if( sup. U. get_count() > external_prune->params. Ct ) do_pruning(external_prune, M, U, sup. M, sup. U); sup. M. clearbit(U); sup. U. clearbit(M); if( (sup. M. get_count() < 1) || (sup. U. get_count() < 1) ) return vote; } /* Reset user support if requested. */ if ( pcfg->reset_user_support() ) { sup. M = support. M; sup. M. clearbit(U); } /* External pruning: Prune Users sup. M */ external_prune = pcfg->get_user_Prune_Users_in_Sup. M(); if ( external_prune->enabled ) { if ( sup. M. get_count() > external_prune->params. Ct ) do_pruning(external_prune, M, U, sup. M, sup. U); sup. M. clearbit(U); sup. U. clearbit(M); if( (sup. M. get_count() < 1) || (sup. U. get_count() < 1) ) return vote; } /* VN: VLOOP strt (Vs are user voters)*/ auto unsigned long int *sup. Mlist = sup. M. get_indexes(); for (unsigned long int v= 0; v < sup. M. get_count(); ++v) { auto unsigned long int V = sup. Mlist[v]; auto double MV = Users. get_rating(V, M) - 2, max = 0, sm. V = 0, sm. U = 0, UV = 0, VV = 0, dm;

auto PTree cs. UV = sup. U & Users. get_movies(V); cs. UV. clearbit(M); dm

auto PTree cs. UV = sup. U & Users. get_movies(V); cs. UV. clearbit(M); dm = cs. UV. get_count(); if( dm < 1) continue; /* turn on only if doing Inner-Product Boundary-Based prediction revisions */ #if 0 auto double S 1=0, S 2=0, S 3=0, S 4=0, S 5=0, C 1=0, C 2=0, C 3=0, C 4=0, C 5=0, A 1=0, A 2=0, A 3=0, A 4=0, A 5=0, S 11=0, S 22=0, S 33=0, S 44=0, S 55=0, C 11=0, C 22=0, C 33=0, C 44=0, C 55=0, A 11=0, A 22=0, A 33=0, A 44=0, A 55=0, sm. N=0, sm. M=0, NN=0, MM=0; #endif /* External pruning: PRUNE MOVIES Co. Sup. UV */ external_prune = pcfg->get_user_Prune_Movies_in_Co. Sup. UV(); if ( external_prune->enabled ) { if( cs. UV. get_count() > external_prune->params. Ct ) do_pruning(external_prune, M, U, sup. M, cs. UV); cs. UV. clearbit(M); sup. M. clearbit(U); dm = cs. UV. get_count(); if( dm < 1 ) continue; } /* VN: NLOOP strt (Ns are movie vector_space_dimensions) */ auto unsigned long int *cs. UVlist = cs. UV. get_indexes(); for (unsigned long int n= 0; n < cs. UV. get_count(); ++n) { auto unsigned long int N = cs. UVlist[n]; auto double NU = Users. get_rating(U, N) - 2, NV = Users. get_rating(V, N) - 2; if( pow(NU-NV, 2) > max) max = pow(NU-NV, 2); sm. V += NV; sm. U += NU; UU += NU * NU; UV += NU * NV; VV += NV * NV; //turn on only if doing Inner-Product Boundary-Based prediction revisions #if 0 if(NU==1&&NV>0){S 1+=NV; ++C 1; }else{ if(NU==2&&NV>0){S 2+=NV; ++C 2; }else{ if(NU==3&&NV>0){S 3+=NV; ++C 3; }else{ if(NU==4&&NV>0){S 4+=NV; ++C 4; }else{ if(NU==5&&NV>0){S 5+=NV; ++C 5; } }}}} #endif } Vb = sm. V / dm; Ub = sm. U / dm; ds. Sq = VV - 2*UV + UU; vote = MV - Vb + Ub; User-vote. C 3 /* SAMPLE-statistic-based pruning through early exit. */ if( dm > 1) { /* method d. UVsds */ internal_prune = pcfg->get_internal_prune(user_d. UVsds); if ( internal_prune->enabled ) { auto double d. UVsds, thr = internal_prune->threshold, expnt = internal_prune->exponent; d. UVsds = pow((ds. Sq-dm*(Vb-Ub))/(dm-1), . 5); if( d. UVsds > (thr * pow(dm, expnt)) ) continue; } /* method Usds_Vsds. NO exponent. */ internal_prune = pcfg->get_internal_prune(user_Vsds, Usds); if ( internal_prune->enabled ) { auto double Usds, Vsds, thr=internal_prune->threshold; Usds = pow((UU-dm*Ub*Ub)/(dm-1), 0. 5); Vsds = pow((VV-dm*Vb*Vb)/(dm-1), 0. 5); if( Vsds > (thr * Usds) ) continue; } /* e. g. , -10 is exponent. */ /* e. g. , 0 in if statement is threshold. */ internal_prune = pcfg->get_internal_prune(user_dv. Cors); if ( internal_prune->enabled ) { auto double dv. Cors, Usds, Vsds, thr = internal_prune->threshold, expnt = internal_prune->exponent; Usds = pow((UU-dm*Ub*Ub)/(dm-1), 0. 5); Vsds = pow((VV-dm*Vb*Vb)/(dm-1), 0. 5); dv. Cors = exp(expnt * (Vsds-Usds)*(Vsds-Usds)); if ( dv. Cors < thr ) continue; if ( internal_prune->weight ) u. Cor = dv. Cors; } internal_prune = pcfg->get_internal_prune(user_vd. Cors); if ( internal_prune->enabled ) { auto double vd. Cors, d. UVsds, thr = internal_prune->threshold, expnt = internal_prune->exponent; d. UVsds=pow((ds. Sq-dm*(Vb-Ub))/(dm-1), . 5); vd. Cors = exp(expnt * d. UVsds); if ( vd. Cors < thr ) continue; if ( internal_prune->weight ) u. Cor=vd. Cors; } }

/* POPULATION-statistics-based pruning through early exit. */ internal_prune = pcfg->get_internal_prune(user_vd. Corp); if( dm >

/* POPULATION-statistics-based pruning through early exit. */ internal_prune = pcfg->get_internal_prune(user_vd. Corp); if( dm > 0 ) { internal_prune = pcfg->get_internal_prune(user_d. UVsdp); if ( internal_prune->enabled ) { auto double vd. Corp, d. UVsdp, auto double d. UVsdp, thr = internal_prune->threshold, expnt = internal_prune->exponent; d. UVsdp=pow(dm*ds. Sq-(sm. V-sm. U)*(sm. V-sm. U), . 5)/dm; d. UVsdp = pow(dm*ds. Sq-(sm. V-sm. U)*(sm. V-sm. U), . 5) if ( d. UVsdp > thr * pow(dm, expnt) ) continue; / dm; } vd. Corp = exp(expnt * d. UVsdp); User-vote. C 4 /* method Usds_Vsds */ // Usdp=pow(dm*UU-sm. U*sm. U, . 5)/dm; // Vsdp=pow(dm*VV-sm. V*sm. V, . 5)/dm; // if( Vsdp > 0. 5 * Usdp )continue; // Threshold is 0. 5 // No exponent internal_prune = pcfg->get_internal_prune(user_Vsdp_Usdp); if ( internal_prune->enabled ) { auto double Usdp, Vsdp, thr = internal_prune->threshold; Usdp = pow(dm*UU - sm. U*sm. U, 0. 5) / dm; Vsdp = pow(dm*VV - sm. V*sm. V, 0. 5) / dm; if ( Vsdp > thr * Usdp ) continue; } // e. g. , Threshold: 0. 9 // e. g. , Exponent: -10 // dv. Corp=exp(-10 *(Vsdp-Usdp) * (Vsdp-Usdp)); // if ( dv. Corp <. 9 ) continue; // u. Cor=dv. Corp; internal_prune = pcfg->get_internal_prune(user_dv. Corp); if ( internal_prune->enabled ) { auto double dv. Corp, Usdp, Vsdp, thr = internal_prune->threshold, expnt = internal_prune->exponent; Usdp = pow(dm*UU - sm. U*sm. U, 0. 5) / dm; Vsdp = pow(dm*VV - sm. V*sm. V, 0. 5) / dm; dv. Corp = exp(expnt * (Vsdp-Usdp)*(Vsdp-Usdp)); if ( dv. Corp < thr ) continue; if ( internal_prune->weight ) u. Cor = dv. Corp; } if ( vd. Corp < thr) continue; if ( internal_prune->weight ) u. Cor = vd. Corp; } } /* OTHER Correlation pruning * (pearson=s, pureshift=p, distance=d) */ internal_prune = pcfg->get_internal_prune(user_s. Cor); if ( internal_prune->enabled ) { auto double s. Cor, thr = internal_prune->threshold; s. Cor = (UV - dm*Ub*Vb)/(. 0001 + (pow((UU-dm*pow(Ub, 2)), 0. 5))* (. 0001+pow((VV-dm*pow(Vb, 2)), . 5))); if ( s. Cor < thr ) continue; if ( internal_prune->weight ) u. Cor = s. Cor; } internal_prune = pcfg->get_internal_prune(user_p. Cor); if ( internal_prune->enabled ) { auto double One. PDS, p. Cor = -1, thr = internal_prune->threshold, expnt = internal_prune->exponent; One. PDS = ds. Sq - dm*pow(Vb-Ub, 2); if ( max > 0 ) p. Cor=exp(expnt*One. PDS/(pow(max, . 75)*pow(dm, . 5))); if ( p. Cor < thr ) continue; if ( internal_prune->weight ) u. Cor = p. Cor; }

internal_prune = pcfg->get_internal_prune(user_d. Cor); if ( internal_prune->enabled ) { auto double d. Cor, One.

internal_prune = pcfg->get_internal_prune(user_d. Cor); if ( internal_prune->enabled ) { auto double d. Cor, One. PDS, thr = internal_prune->threshold; User-vote. C 5 One. PDS = ds. Sq - dm*pow(Vb-Ub, 2); d. Cor = exp(-ds. Sq / 100); if ( d. Cor < thr ) continue; if ( internal_prune->weight ) u. Cor = d. Cor; } /* Turn on for boundary based predication revisions. */ #if 0 if(C 1>0&&C 2+C 3+C 4+C 5>0) {A 1=S 1/C 1; A 11=(S 2+S 3+S 4+S 5)/(C 2+C 3+C 4+C 5); z 0 IP 11+=(A 1 -((A 1+A 11)/2))*(MV-((A 1+A 11)/2)); } if(C 1>0&&C 2>0) {A 1=S 1/C 1; A 2=S 2/C 2; z 0 IP 12+=(A 1 -((A 1+A 2 )/2))*(MV-((A 1+A 2 )/2)); } if(C 1>0&&C 3>0) {A 1=S 1/C 1; A 3=S 3/C 3; z 0 IP 13+=(A 1 -((A 1+A 3 )/2))*(MV-((A 1+A 3 )/2)); } if(C 1>0&&C 4>0) {A 1=S 1/C 1; A 4=S 4/C 4; z 0 IP 14+=(A 1 -((A 1+A 4 )/2))*(MV-((A 1+A 4 )/2)); } if(C 1>0&&C 5>0) {A 1=S 1/C 1; A 5=S 5/C 5; z 0 IP 15+=(A 1 -((A 1+A 5 )/2))*(MV-((A 1+A 5 )/2)); } z 0 IP 51=-z 0 IP 15; z 0 IP 41=-z 0 IP 14; z 0 IP 31=-z 0 IP 13; z 0 IP 21=-z 0 IP 12; if(C 2>0&&C 1+C 3+C 4+C 5>0) {A 2=S 2/C 2; A 22=(S 1+S 3+S 4+S 5)/(C 1+C 3+C 4+C 5); z 0 IP 22+=(A 2 -((A 2+A 22)/2))*(MV-((A 2+A 22)/2)); } if(C 2>0&& C 3>0) {A 2=S 2/C 2; A 3=S 3/C 3; z 0 IP 23+=(A 2 -((A 2+A 3 )/2))*(MV-((A 2+A 3 )/2)); } if(C 2>0&& C 4>0) {A 2=S 2/C 2; A 4=S 4/C 4; z 0 IP 24+=(A 2 -((A 2+A 4 )/2))*(MV-((A 2+A 4 )/2)); } if(C 2>0&& C 5>0) {A 2=S 2/C 2; A 5=S 5/C 5; z 0 IP 25+=(A 2 -((A 2+A 5 )/2))*(MV-((A 2+A 5 )/2)); } z 0 IP 32=-z 0 IP 23; z 0 IP 42=-z 0 IP 24; z 0 IP 52=-z 0 IP 25; if(C 3>0&&C 1+C 2+C 4+C 5>0) {A 3=S 3/C 3; A 33=(S 1+S 2+S 4+S 5)/(C 1+C 2+C 4+C 5); z 0 IP 33+=(A 3 -((A 3+A 33)/2))*(MV-((A 3+A 33)/2)); } if(C 3>0&& C 4>0) {A 3=S 3/C 3; A 4=S 4/C 4; z 0 IP 34+=(A 3 -((A 3+A 4 )/2))*(MV-((A 3+A 4 )/2)); } if(C 3>0&& C 5>0) {A 3=S 3/C 3; A 5=S 5/C 5; z 0 IP 35+=(A 3 -((A 3+A 5 )/2))*(MV-((A 3+A 5 )/2)); } z 0 IP 43=-z 0 IP 34; z 0 IP 53=-z 0 IP 35; if(C 4>0&&C 1+C 2+C 3+C 5>0) {A 4=S 4/C 4; A 44=(S 1+S 2+S 3+S 5)/(C 1+C 2+C 3+C 5); z 0 IP 44+=(A 4 -((A 4+A 44)/2))*(MV-((A 4+A 44)/2)); } if(C 4>0&& C 5>0) {A 4=S 4/C 4; A 5=S 5/C 5; z 0 IP 45+=(A 4 -((A 4+A 5 )/2))*(MV-((A 4+A 5 )/2)); } z 0 IP 54=-z 0 IP 45; if(C 5>0&&C 1+C 2+C 3+C 4>0) {A 5=S 5/C 5; A 55=(S 1+S 2+S 3+S 4)/(C 1+C 2+C 3+C 4); z 0 IP 55+=(A 5 -((A 5+A 55)/2))*(MV-((A 5+A 55)/2)); } //auto double MU = Users. get_rating(U, M)-2; fprintf(stderr, "MU=%1. 0 f %8. 1 f n", MU, z 0 IP 55, z 0 IP 11, z 0 IP 51); //auto double MU = Users. get_rating(U, M)-2; fprintf(stderr, "MU=%1. 0 f %5. 1 f %5. 1 f %5. 1 fn", MU, z 0 IP 11, z 0 IP 22, z 0 IP 33, z 0 IP 44, z 0 IP 55, z 0 IP 12, z 0 IP 13, z 0 IP 14, z 0 IP 15, z 0 IP 23, z 0 IP 24, z 0 IP 25, z 0 IP 34, z 0 IP 35, z 0 IP 45); #endif

if ( u. Cor > 0 ) { vote_sum += vote*u. Cor; vote_cnt +=

if ( u. Cor > 0 ) { vote_sum += vote*u. Cor; vote_cnt += u. Cor; } else continue; /* Check and implement forcing of vote in the user loop. */ if ( pcfg->user_vote_force_in_loop() ) { if( (vote < 1) && (vote != DEFAULT_VOTE) ) vote = 1; if( (vote > 5) && (vote != DEFAULT_VOTE) ) vote = 5; } } if ( vote_cnt > 0 ) vote = vote_sum / vote_cnt; else vote = DEFAULT_VOTE; /* force_vote_after_Voter_Loop goes here. */ if ( pcfg->user_vote_force_after_loop() ) { if( (vote < 1) && (vote != DEFAULT_VOTE) ) vote=1; if( (vote > 5) && (vote != DEFAULT_VOTE) ) vote=5; } /* Turn on only if doing Inner-Product Boundary-Based prediction revisions. */ #if 0 //Boundary-Based-Inner-Product vote CHANGE start if ( z 0 IP 55>-. 01 //&& z 0 IP 55> z 0 IP 33 && z 0 IP 55> z 0 IP 44 && z 0 IP 51>-. 01 //&& z 0 IP 52>. 1 && z 0 IP 53> THRZ 0 && z 0 IP 54> THRZ 0 ) vote=5; #endif #if 0 //Boundary-Based-Inner-Product vote CHANGE start auto double FACZ 0=-0. 1, THRZ 0=-0. 1 ; //fauto double FACZ 0= 0. 40, THRZ 0=0. 7, z 0 IP 51=-z 0 IP 15, z 0 IP 52=-z 0 IP 25, z 0 IP 53=-z 0 IP 35, z 0 IP 54=-z 0 IP 54; #if 1 //Change vote to 5? if ( true && z 0 IP 55> FACZ 0 + z 0 IP 11 && z 0 IP 55> FACZ 0+z 0 IP 22 && z 0 IP 55> FACZ 0+z 0 IP 33 && z 0 IP 55> FACZ 0 + z 0 IP 44 && z 0 IP 51> THRZ 0 && z 0 IP 52> THRZ 0 && z 0 IP 53> THRZ 0 && z 0 IP 54> THRZ 0 ) vote=5; #endif #if 1 //Change vote to 1? if ( true && z 0 IP 11>(FACZ 0 )*z 0 IP 22 && z 0 IP 11>(FACZ 0 )*z 0 IP 33 && z 0 IP 11>(FACZ 0 )*z 0 IP 44 && z 0 IP 11>(FACZ 0 )*z 0 IP 55 && z 0 IP 12> THRZ 0 && z 0 IP 13> THRZ 0 && z 0 IP 14> THRZ 0 && z 0 IP 15> THRZ 0 ) vote=1; #endif //Boundary-Based-Inner-Product vote CHANGE end return vote; } User-vote. C 6

/** file This file contains the implementation of the movie voting algorithem. */ /*

/** file This file contains the implementation of the movie voting algorithem. */ /* Include files. */ #include <stdio. h> #include <PTree. H> #include "Movie. Set. H" #include "User. Set. H" #include "mpp. Config. H" #include "Prediction. Config. H" #include "mpp. h" movie-vote. C 1 /* Config file needs: (movie-vote part) * UCor Internal Pruning: * 1. Select 0 or 1 of DVCorp, DVCors, VDCorp, VDCors, PCor, DCor, SCor * 1. 1 For selected in 1, set Threshold: DVThrp, DVThrs, VDThrp, VDThrs, PThr, DThr, SThr * Threshold defaults are: 0 0 0 0 * UCor VOTE weighting: (Default is UCor=1. By selecting 1 of these, we reset UCor's value to it. ) * 2. Select 0 or 1 of DVCorp, DVCors, VDCorp, VDCors, PCor, DCor, SCor * Standard Deviation Internal Pruning: (population/sample; diffference_of_vectors/vector_of_differences) * 3. Select 0 or more of: d. MNsdp, d. MNsds, Nsdp_Msdp, Nsds_Msds * 3. 1 Foreach selected in 2, set Threshold: d. MNsdp. Thr, d. MNsds. Thr, Nsdp_Msdp. Thr, Nsds_Msds. Thr * Threshold defaults are: 0 0 * 3. 2 Foreach selected in 2, set pow exp: d. MNsdp. Exp, d. MNsds. Exp, Nsdp_Msdp. Exp, Nsds_Msds. Exp * Power Exponent defaults are: -1 -1 * External Pruning: * 4. Select 0 or more of: Prune_Users_In_Sup. M, Prune_Movies_In_Sup. U, Prune_Users_In_Co. Sup. MN * 4. 1 Foreach selected in 2, select 1 of: Prune, Fast. Prune, Common. Co. Support. Prune * 4. 2 Reset non-pruned support in 2 nd: yes, no. * 4. 3 Foreach selected in 2, set parameter: mstrt, ustrt, TSa, TSb, Tdvp, Tdvs, Tvdp, Tvds, TD, TP, PPm, TV, TSD, Ch, Ct * Prune Parameter defaults are: 0 0 -100 -1 -1 -1 1 no def * Forcing in Range: * 5. Select 0, 1 or 2 force_vote_in_ranges: in_Voter_LOOP(for each voter) outside_Voter_LOOP (for composite VOTE) */ /** * Public function. * This function implements movie voting. * param pcfg A pointer to the class containing the parameters * which configure the voting. * param M The movie number for which a prediction is to be made * param support. M The PTree identifying the support for the movie to be predicted. * param U The identity number of the user for which a prediction is to be made. * param support. U The Ptree identifying the support for the user who a predication is being made for. * return The recommended prediction. */

extern double movie_vote(Prediction. Config *pcfg, unsigned long int M,  PTree & support. M,

extern double movie_vote(Prediction. Config *pcfg, unsigned long int M, PTree & support. M, unsigned long int U, PTree & support. U) /* NV: NLOOP strt (Ns are movie voters) */ auto unsigned long int *sup. Ulist = sup. U. get_indexes(); for (unsigned long int nn= 0; nn < sup. U. get_count(); ++nn) { auto unsigned long int N = sup. Ulist[nn]; { auto double vote = DEFAULT_VOTE, VOTE_sum = 0, VOTE_cnt = 0; auto double Nb, Mb, ds. Sq, UCor = 1; struct pruning *internal_prune; struct external_prune *external_prune; movie-vote. C 2 auto PTree sup. M = support. M, sup. U = support. U; sup. M. clearbit(U); sup. U. clearbit(M); /* External pruning: Prune Users sup. M */ external_prune = pcfg->get_movie_Prune_Users_in_Sup. M(); if ( external_prune->enabled ) { if( sup. M. get_count() > external_prune->params. Ct) do_pruning(external_prune, M, U, sup. M, sup. U); sup. M. clearbit(U); sup. U. clearbit(M); if ( (sup. M. get_count() < 1) || (sup. U. get_count() < 1) ) return vote; } /* Reset support if requested. */ if ( pcfg->reset_movie_support() ) { sup. U = support. U; sup. U. clearbit(M); } /* External pruning: Prune Movies sup. U */ external_prune = pcfg->get_movie_Prune_Movies_in_Sup. U(); if ( external_prune->enabled ) { if( sup. U. get_count() > external_prune->params. Ct ) do_pruning(external_prune, M, U, sup. M, sup. U); sup. M. clearbit(U); sup. U. clearbit(M); if( (sup. M. get_count() < 1) || (sup. U. get_count() < 1) ) return vote; } auto double NU = Users. get_rating(U, N)-2, MAX = 0, sm. N = 0, sm. M = 0, MN = 0, NN = 0, dm; auto PTree cs. MN = sup. M & Movies. get_users(N); cs. MN. clearbit(U); dm = cs. MN. get_count(); if( dm < 1 ) continue; /* External pruning: PRUNE USERS Co. Sup. MN */ external_prune = pcfg->get_movie_Prune_Users_in_Co. Sup. MN(); if ( external_prune->enabled ) { if( cs. MN. get_count() > external_prune->params. Ct) do_pruning(external_prune, M, U, cs. MN, sup. U); cs. MN. clearbit(U); sup. U. clearbit(M); dm = cs. MN. get_count(); if( dm < 1) continue; } /* NV: VLOOP strt (Vs are user vector_space_dimensions) */ auto unsigned long int *cs. MNlist = cs. MN. get_indexes(); for (unsigned long int v= 0; v < cs. MN. get_count(); ++v) { auto unsigned long int V = cs. MNlist[v]; auto double MV = Users. get_rating(V, M) - 2, NV = Users. get_rating(V, N) - 2; if( pow(MV-NV, 2) > MAX ) MAX = pow(MV-NV, 2); sm. N += NV; sm. M += MV; MM += MV * MV; MN += NV * MV; NN += NV * NV; } Nb = sm. N / dm; Mb = sm. M / dm; ds. Sq = NN - 2*MN + MM; VOTE = NU - Nb + Mb;

/* force_vote_in_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_in_loop() ) { if ( (VOTE <

/* force_vote_in_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_in_loop() ) { if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) ) VOTE=1; if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) ) VOTE=5; } /* SAMPLE-statistic-based pruning through early exit. */ if( dm > 1 ) { /* method d. MNsds */ internal_prune = pcfg->get_internal_prune(movie_d. MNsds); if ( internal_prune->enabled ) { auto double d. MNsds, thr = internal_prune->threshold, expnt = internal_prune->exponent; internal_prune = movie-vote. C 3 d. MNsds = pow((ds. Sq-dm*(Nb-Mb))/(dm-1), 0. 5); if( d. MNsds > (thr * pow(dm, expnt)) ) continue; } /* method Msds_Nsds NO exponent. */ internal_prune = pcfg->get_internal_prune(movie_Nsds_Msds); if ( internal_prune->enabled ) { auto double Msds, Nsds, thr = internal_prune->threshold; Msds = pow((MM-dm*Mb*Mb)/(dm-1), 0. 5); Nsds = pow((NN-dm*Nb*Nb)/(dm-1), 0. 5); if ( Nsds > (thr * Msds) ) continue; } internal_prune = pcfg->get_internal_prune(movie_DVCors); if ( internal_prune->enabled ) { auto double Msds, Nsds, DVCors, thr = internal_prune->threshold, expnt = internal_prune->exponent; Msds = pow(dm*MM - sm. M*sm. M, 0. 5) / dm; Nsds = pow(dm*NN - sm. N*sm. N, 0. 5) / dm; DVCors = exp(expnt * (Nsds-Msds)*(Nsds-Msds)); if ( DVCors < thr ) continue; if ( internal_prune->weight ) UCor = DVCors; } pcfg->get_internal_prune(movie_VDCors); if ( internal_prune->enabled ) { auto double VDCors, d. MNsds, thr = internal_prune->threshold, expnt = internal_prune->exponent; d. MNsds=pow((ds. Sq-dm*(Nb-Mb))/(dm-1), . 5); VDCors = exp(expnt * d. MNsds); if ( VDCors < thr ) continue; if ( internal_prune->weight ) UCor = VDCors; } } /* POPULATION-statistics-based pruning through early exit. */ if ( dm > 0 ) { internal_prune = pcfg->get_internal_prune(movie_d. MNsdp); if ( internal_prune->enabled ) { auto double d. MNsdp, thr=internal_prune->threshold; d. MNsdp=pow(dm*ds. Sq-(sm. N-sm. M)*(sm. N-sm. M), . 5)/dm; if ( d. MNsdp > (thr * pow(dm, 0. 9)) ) continue; } /* method Usds_Vsds */ internal_prune = pcfg->get_internal_prune(movie_Nsdp_Msdp); if ( internal_prune->enabled ) { auto double Nsdp, Msdp, thr = internal_prune->threshold; Msdp = pow(dm*MM - sm. M*sm. M, 0. 5) / dm; Nsdp = pow(dm*NN - sm. N*sm. N, 0. 5) / dm; if( Nsdp > (thr * Msdp) ) continue; } internal_prune = pcfg->get_internal_prune(movie_VDCorp); if ( internal_prune->enabled ) { auto double DVCorp, Msdp, Nsdp, thr = internal_prune->threshold, expnt = internal_prune->exponent; Msdp = pow(dm*MM - sm. M*sm. M, 0. 5) / dm; Nsdp = pow(dm*NN - sm. N*sm. N, 0. 5) / dm; DVCorp = exp(expnt * (Nsdp-Msdp)*(Nsdp-Msdp)); if ( DVCorp < thr ) continue; if ( internal_prune->weight ) UCor = DVCorp;

if ( internal_prune->enabled ) { auto double VDCorp, d. MNsdp, thr = internal_prune->threshold, expnt

if ( internal_prune->enabled ) { auto double VDCorp, d. MNsdp, thr = internal_prune->threshold, expnt = internal_prune->exponent; d. MNsdp=pow(dm*ds. Sq-(sm. N-sm. M)*(sm. N-sm. M), . 5)/dm; VDCorp = exp(expnt * d. MNsdp); if ( VDCorp < thr ) continue; if ( internal_prune->weight ) UCor = VDCorp; movie-vote. C 4 } } /* OTHER Correlation pruning (pearson=s, pureshift=p, distance=d)*/ internal_prune = pcfg->get_internal_prune(movie_SCor); if ( internal_prune->enabled ) { auto double SCor, thr=internal_prune->threshold; SCor= (MN-dm*Mb*Nb)/(. 0001+(pow((MM-dm*pow(Mb, 2)), . 5)) * (. 0001+pow((NN-dm*pow(Nb, 2)), . 5))); if ( SCor < thr ) continue; if ( internal_prune->weight ) UCor = SCor; } /* force_vote_in_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_in_loop() ) { if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) ) VOTE=1; if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) ) VOTE=5; } /* CHECK for exponent */ internal_prune = pcfg->get_internal_prune(movie_PCor); if ( internal_prune->enabled ) { auto double ONEPDS, PCor = 1, thr = internal_prune->threshold; ONEPDS = ds. Sq - dm * pow(Nb-Mb, 2); if (MAX>0) PCor=exp(-. 1*ONEPDS/(pow(MAX, . 75)*pow(dm, . 5))); if( PCor < thr ) continue; if ( internal_prune->weight ) UCor = PCor; } internal_prune = pcfg->get_internal_prune(movie_DCor); if ( internal_prune->enabled ) { auto double DCor, ONEPDS, thr = internal_prune->threshold; ONEPDS = ds. Sq - dm*pow(Nb-Mb, 2); DCor = exp(-ds. Sq / 100); if ( DCor < thr ) continue; if ( internal_prune->weight ) UCor = DCor; } if (UCor>0) {VOTE_sum += VOTE*UCor; VOTE_cnt+=UCor; } else continue; } if ( VOTE_cnt > 0 ) VOTE = VOTE_sum / VOTE_cnt; else VOTE = DEFAULT_VOTE; /* force_vote_after_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_after_loop() ) { if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) ) VOTE=1; if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) ) VOTE=5; } return VOTE; }

/* Set the starting point based on the specificed start point * and a

/* Set the starting point based on the specificed start point * and a multiplier if it is specified. If the starting point * exceeds the support count start at the beginning of the * support list. */ start = start + (unsigned long int) (mult * supcnt); if ( start > supcnt ) start = 0; /** file contains implementations of routines * for pruning user and movie voting lists. */ /* Standard C++ include files. */ #include <map> #include <vector> #include <unistd. h> #include <stdlib. h> /* The simple case is a start of zero. */ if ( start == 0 ) { for (unsigned long int lp= 0; lp < supcnt; ++lp) list. push_back(indexes[lp]); } /* Two loop passes are needed for a non-zero start value. */ for (unsigned long int lp= start; lp < supcnt; ++lp) list. push_back(indexes[lp]); /* Local C++ include files. */ #include <PTree. H> #include "User. Set. H" #include "Movie. Set. H" #include "mpp. Config. H" #include "mpp. h" /* Global accessible variables. */ extern float cor. Data[17771]; using namespace std; /* Shorthand type definition for the correlation map. */ typedef multimap<double, unsigned long int, greater<double > > map_t; Prune. C 1 for (unsigned long int lp= 0; lp < start; ++lp) list. push_back(indexes[lp]); return; } /* Private function. * This function verifies whether or not a voting entity is within a * selection window. A selection window is defined by a minimum (leftside) /* Private function. * voter window and a window size. * * param voter The voter being considered. * This function loads a vector with a list of support indexes from * * the given PTree. The list contains N elements where N is the support * param pp A pointer to the structure containing the * count. The actual order of the list is determined by the start and * leftside and width parameters for a pruning method. * multiplier values passed in from the caller. * return A boolean value is returned if the voter is * within the selection window. A false value * param suptree A reference to PTree whose support list is to be generated. * * is automatically returned if the width value * param list A reference to vector loaded with support indexes. * is set to zero. Setting the width value to * param start The starting element in the support list which * zero thus disables window based selection. * will be 0 th element in the completed support list. */ * param mult The multiplier value to be used in determining static bool outside_window(unsigned long int voter, * the support starting point. struct pruning_parameters *pp) */ { static void load_support_vector(PTree & suptree, if ( pp->width == 0 ) return false; vector<unsigned long int> & list, if ( voter < pp->leftside ) return true; unsigned long int start, double mult) if ( voter > pp->leftside + pp->width ) return true; { return false; auto unsigned long int *indexes = suptree. get_indexes(), } supcnt = suptree. get_count();

/* Private function. * This function implements the final step in 'pruning' of a

/* Private function. * This function implements the final step in 'pruning' of a PTree. It * clears the destination PTree and then sets only those bits in the PTree * which have been selected by a previous correlation strategy. * param tree A reference to the PTree which is reflect the * contents of the multimap. * param index_map The map specifying the index bits to be set. * param max_count Maximum number of indexes to be selected from PTree. */ static void load_ptree(PTree & tree, map_t index_map, double max_count) { map_t: : iterator index_ptr = index_map. begin(); if ( index_map. size() < max_count ) max_count = index_map. size(); tree. clearall(); for (unsigned int lp= 0; lp < max_count; ++lp) { tree. setbit(index_ptr->second); ++index_ptr; } return; } /* Movie prune standard. */ /* movie_vote: Prune */ static void m. Prune(unsigned long int M, PTree & sup. U, struct pruning_parameters *pp) { if ( sup. U. get_count() < (pp->Ct + 1) ) return; map_t cor. Rm; auto vector<unsigned long int> support; /* movie. PRUNE (NV loops) NLOOP start */ load_support_vector(sup. U, support, pp->mstrt_mult); for (unsigned long int lp= 0; lp < support. size(); lp++) { auto unsigned long int N = support[lp]; if ( outside_window(N, pp) ) continue; auto double sm. M = 0, sm. N = 0, MM = 0, NN = 0, MV, NV, max=0, dm, Mb, Nb, ds. Sq, One. PDS, Nsdp = 0, Msdp = 0, Nsds = 0, Msds = 0, d. MNsdp = 0, d. MNsds = 0, m. Cor = 1, s. Cor = 1, d. Cor = 1, p. Cor = 1, v. Cor = 1, std. Cor = 1, dv. Corp = 1, dv. Cors = 1, vd. Corp = 1, vd. Cors = 1; Prune. C auto PTree cs. MN = sup. M&Movies. get_users(N); if( cs. MN. get_count() < 1 ) continue; 2 /* movie. PRUNE (NV loops) VLOOP start */ auto vector<unsigned long int> ilp; load_support_vector(cs. MN, ilp, pp->ustrt_mult); for (unsigned long int lp 1= 0; lp 1 < ilp. size(); ++lp 1) { auto unsigned long int V = ilp[lp 1]; #if 0 if ( outside_window(V, pp) ) continue; #endif MV = Movies. get_rating(V, M) - 2; NV = Movies. get_rating(V, N) - 2; if(pow(MV-NV, 2)>max) max=pow(MV-NV, 2); sm. M += MV; sm. N += NV; MM += MV*MV; NN += NV*NV; MN += MV*NV; } dm=cs. MN. get_count(), Mb=sm. M/dm, Nb=sm. N/dm, ds. Sq=NN-2*MN+MM, One. PDS=ds. Sq-dm*pow(Nb-Mb, 2), s. Cor=(MN-dm*Mb*Nb)/(. 0001+ (pow((MM-dm*pow(Mb, 2)), . 5))*(pow((NN-dm*pow(Nb, 2)), . 5))), d. Cor=exp(-ds. Sq/100), p. Cor=1; if(max>0)p. Cor=exp(-pp->PPm*One. PDS/(. 0001+pow(max, . 75)*pow(dm, . 5))); if(dm>0){Nsdp=pow(dm*NN-sm. N*sm. N, . 5)/dm; Msdp=pow(dm*MM-sm. M*sm. M, . 5)/dm; d. MNsdp=pow(dm*ds. Sq-(sm. N-sm. M)*(sm. N-sm. M), . 5)/dm; } if(dm>1){Nsds=pow((NN-dm*Nb*Nb)/(dm-1), . 5); Msds=pow((MM-dm*Mb*Mb)/(dm-1), . 5); d. MNsds=pow((ds. Sq-dm*(Nb-Mb))/(dm-1), . 5); } dv. Corp=exp(-10 * (Nsdp-Msdp) ); dv. Cors=exp(-10 * (Nsds-Msds) ); vd. Corp=exp(-10 * d. MNsdp ); vd. Cors=exp(-10 * d. MNsds ); if( pp->Ch == 1) m. Cor = cor. Data[N+1]; if( pp->Ch == 2) m. Cor = s. Cor; if( pp->Ch == 3) m. Cor = d. Cor; if( pp->Ch == 4) m. Cor = p. Cor; if( pp->Ch == 5) m. Cor=v. Cor; if( pp->Ch == 6) m. Cor = std. Cor; if( pp->Ch == 7 ) m. Cor = dv. Corp; if( pp->Ch == 8 ) m. Cor = dv. Cors; if( pp->Ch == 9 ) m. Cor = vd. Corp; if( pp->Ch == 0 ) m. Cor = vd. Cors; // THRESHOLD PRUNING if ( cor. Data[N+1] < pp->TSa || s. Cor < pp->TSb || p. Cor < pp->TP || d. Cor < pp->TD || v. Cor < pp->TV || std. Cor < pp->TSD || dv. Corp < pp->Tdvp || dv. Cors < pp->Tdvs || vd. Corp < pp->Tvdp || vd. Cors < pp->Tvds )

else { auto pair<double, unsigned long int> entry(m. Cor, N); cor. Rm. insert(entry); }

else { auto pair<double, unsigned long int> entry(m. Cor, N); cor. Rm. insert(entry); } } if ( cor. Rm. size() == 0 ) return; load_ptree(sup. U, cor. Rm, pp->Ct); return; } /* movie_vote: Fast. Prune */ static void fm. Prune. S(PTree & sup. U, struct pruning_parameters *pp) { if ( sup. U. get_count() < pp->Ct + 1 ) return; map_t cor. Rm; auto vector<unsigned long int> support; /* movie. PRUNE (NV loops) NLOOP start */ load_support_vector(sup. U, support, pp->mstrt_mult); for (unsigned long int lp= 0; lp < support. size(); lp++) { auto unsigned long int N = support[lp]; #if 0 if ( outside_window(N, pp) ) continue; #endif if( cor. Data[N+1] < pp->TSa ) continue; auto pair<double, unsigned long int> entry(cor. Data[N+1], N); cor. Rm. insert(entry); } if ( cor. Rm. size() == 0 ) return; load_ptree(sup. U, cor. Rm, pp->Ct); return; auto double sm. U=0, sm. V=0, UU=0, VV=0, UV=0, max=0, Vsdp=0, Usdp=0, Vsds=0, Usds=0, d. UVsdp=0, d. UVsds=0, m. Cor=1, s. Cor=1, d. Cor=1, p. Cor=1, v. Cor=1, std. Cor=1, dv. Corp=1, dv. Cors=1, vd. Corp=1, vd. Cors=1, NU, NV, dm, Ub, Vb, ds. Sq, One. PDS; auto PTree cs. UV = sup. U & Users. get_movies(V); if( cs. UV. get_count() < 1 ) continue; /* user PRUNE (VN loops) NLOOP start */ auto vector<unsigned long int> ilp; load_support_vector(cs. UV, ilp, pp->mstrt_mult); for (unsigned long int lp 1= 0; lp 1 < ilp. size(); ++lp 1) { auto unsigned long int N = ilp[lp 1]; #if 0 if ( outside_window(N, pp) ) continue; #endif NU = Movies. get_rating(U, N) - 2; NV = Movies. get_rating(V, N) - 2; if ( pow(NU-NV, 2) > max ) max=pow(NU-NV, 2); sm. U += NU; sm. V += NV; UU += NU*NU; VV += NV*NV; UV += NU*NV; } //user PRUNE (VN loops) NLOOP end dm = cs. UV. get_count(); Ub = sm. U/dm; Vb = sm. V/dm; ds. Sq = VV - 2*UV + UU; One. PDS = ds. Sq - dm*pow(Vb-Ub, 2); Prune. C 3 s. Cor=(UV-dm*Ub*Vb)/((pow((UU-dm*pow(Ub, 2)), . 5))*(pow((VV-dm*pow(Vb, 2)), . 5))); d. Cor = exp(-ds. Sq/100); if (max>0) p. Cor=exp(-pp->PPm*One. PDS/(pow(max, . 75)*pow(dm, . 5))); if(dm>0){ Vsdp=pow(dm*VV-sm. V*sm. V, . 5)/dm; } Usdp=pow(dm*UU-sm. U*sm. U, . 5)/dm; //user. PRUNE (VN loops) start d. UVsdp=pow(dm*ds. Sq-(sm. V-sm. U)*(sm. V-sm. U), . 5)/dm; } /* user_vote: Prune */ if(dm>1){ Vsds=pow((VV-dm*Vb*Vb)/(dm-1), . 5); static void u. Prune (unsigned long int U, PTree & sup. M, PTree & sup. U, Usds=pow((UU-dm*Ub*Ub)/(dm-1), . 5); struct pruning_parameters *pp) d. UVsds=pow((ds. Sq-dm*(Vb-Ub))/(dm-1), . 5); } { dv. Corp=exp(-10 * (Vsdp-Usdp) ); if ( sup. M. get_count() < pp->Ct + 1) return; dv. Cors=exp(-10 * (Vsds-Usds) ); map_t cor. R; vd. Corp=exp(-10 * d. UVsdp ); auto vector<unsigned long int> support; vd. Cors=exp(-10 * d. UVsds ); if( pp->Ch == 1 ) m. Cor = s. Cor; if( pp->Ch == 2 ) m. Cor = s. Cor; /* user. Prune (VN loops) VLOOP start */ if( pp->Ch == 3 ) m. Cor = d. Cor; if( pp->Ch == 4 ) m. Cor = p. Cor; load_support_vector(sup. M, support, pp->ustrt_mult); if( pp->Ch == 5 ) m. Cor = v. Cor; if( pp->Ch == 6 ) m. Cor = std. Cor; for (unsigned long int lp= 0; lp < support. size(); lp++) { if( pp->Ch == 7 ) m. Cor = dv. Corp; if( pp->Ch == 8 ) m. Cor = dv. Cors; auto unsigned long int V = support[lp]; if( pp->Ch == 9 ) m. Cor = vd. Corp; if( pp->Ch == 0) m. Cor = vd. Cors; if ( outside_window(V, pp) ) continue;

// THRESHOLD PRUNE if ( s. Cor < pp->TSb || p. Cor < pp->TP

// THRESHOLD PRUNE if ( s. Cor < pp->TSb || p. Cor < pp->TP || d. Cor < pp->TD || v. Cor < pp->TV || std. Cor < pp->TSD || dv. Corp < pp->Tdvp|| dv. Cors < pp->Tdvs|| vd. Corp < pp->Tvdp|| vd. Cors < pp->Tvds) continue; else { auto pair<double, unsigned long int> entry(m. Cor, V); cor. R. insert(entry); } } if ( cor. R. size() == 0 ) return; load_ptree(sup. M, cor. R, pp->Ct); return; } /* user_vote: Fast. Prune */ static void fu. Prune. S(unsigned long int U, PTree & sup. M, PTree & sup. U, struct pruning_parameters *pp) { if ( sup. M. get_count() < (pp->Ct + 1) ) return; map_t cor. R; auto vector<unsigned long int> support; Prune. C 4 auto double dm = cs. UV. get_count(), Ub = sm. U / dm, Vb = sm. V / dm, SCor=(UV-dm*Ub*Vb)/(. 00001+(pow((UU-dm*pow(Ub, 2)), . 5))* (pow((VV-dm*pow(Vb, 2)), . 5))); if( SCor < pp->TSb ) continue; auto pair<double, unsigned long int> entry(SCor, V); cor. R. insert(entry); } if ( cor. R. size() == 0 ) return; load_ptree(sup. M, cor. R, pp->Ct); return; } /* user_vote: Common. Co. Support. Prune */ static void u. Prune 2(PTree & sup. M, PTree & sup. U, struct pruning_parameters *pp) { if ( sup. M. get_count() < pp->Ct+1) return; map_t cor. R; auto PTree cs. UV; auto vector<unsigned long int> support; /* Common. Co. Sup user. PRUNE VN loops VLOOP start */ load_support_vector(sup. M, support, pp->ustrt, pp->ustrt_mult); for (unsigned long int lp= 0; lp < support. size(); lp++) { auto unsigned long int V = support[lp]; if ( outside_window(V, pp) ) continue; auto PTree cs. UV = sup. U & Users. get_movies(V); if ( cs. UV. get_count() < 1 ) continue; auto double sm. U = 0, sm. V = 0, UU = 0, VV = 0, UV = 0, NU, NV; /* fast user Prune (VN loops) NLOOP start */ auto vector<unsigned long int> ilp; load_support_vector(cs. UV, ilp, pp->mstrt_mult); if ( outside_window(V, pp) ) continue; cs. UV = sup. U & Users. get_movies(V); auto double dm = cs. UV. get_count(); auto pair<double, unsigned long int> entry(dm, V); cor. R. insert(entry); } auto unsigned int select_count = (unsigned int) pp->Ct; auto PTree ccs. U = sup. U; map_t: : iterator begin = cor. R. begin(); sup. M. clearall(); if ( cor. R. size() < pp->Ct ) select_count = cor. R. size(); for(unsigned long int lp 1= 0; lp 1 < ilp. size(); ++lp 1) { auto unsigned long int N = ilp[lp 1]; #if 0 if ( outside_window(N, pp) ) continue; #endif for(unsigned int lp= 0; lp < select_count; ++lp) { sup. M. setbit(begin->second); ccs. U = ccs. U & Users. get_movies(begin->second); ++begin; } sup. U = ccs. U; return; NU = Movies. get_rating(U, N) - 2; NV = Movies. get_rating(V, N) - 2; sm. U += NU; sm. V += NV; UU += NU * NU; VV += NV * NV; UV += NU * NV; } }

Prune. C 5 /* movie_voting: Common. Co. Support. Prune */ static void m. Prune

Prune. C 5 /* movie_voting: Common. Co. Support. Prune */ static void m. Prune 2(PTree & sup. M, PTree & sup. U, struct pruning_parameters *pp) { if ( sup. U. get_count() < (pp->Ct + 1) ) return; map_t cor. Rm; auto PTree cs. MN; auto vector<unsigned long int> support; /* movie. PRUNE NV loops NLOOP start */ load_support_vector(sup. U, support, pp->mstrt_mult); auto struct pruning_parameters *params = &prune->params; switch ( prune->method ) { case User. Prune: u. Prune(U, sup. M, sup. U, params); break; case User. Fast. Prune: fu. Prune. S(U, sup. M, sup. U, params); break; case User. Common. Co. Support. Prune: u. Prune 2(sup. M, sup. U, params); break; for (unsigned long int lp= 0; lp < support. size(); lp++) { auto unsigned long int N = support[lp]; if ( outside_window(N, pp) ) continue; cs. MN = sup. M & Movies. get_users(N); auto double dm = cs. MN. get_count(); auto pair<double, unsigned long int> entry(dm, N); cor. Rm. insert(entry); } auto unsigned int select_count = (unsigned int) pp->Ct; auto PTree ccs. M = sup. M; map_t : : iterator begin = cor. Rm. begin(); sup. U. clearall(); if ( cor. Rm. size() < select_count ) select_count = cor. Rm. size(); for(unsigned int lp= 0; lp < select_count; ++lp) { sup. U. setbit(begin->second); ccs. M = ccs. M & Movies. get_users(begin->second); ++begin; } sup. M = ccs. M; return; } /* Internal function. * This function dispatches execution to the pruning method which has * been selected for an external pruning routine. * param pcfg A pointer to the structure defining the * external pruning to be conducted. * param M The movie whose rating is to be predicted. * param U The user who the predication is to be made for. * param sup. M A PTree describing the movie support. * param sup. U A PTree describing user support. */ void do_pruning(struct external_prune * const prune, unsigned long int M, unsigned long int U, PTree & sup. M, PTree & sup. U) { case Movie. Prune: m. Prune(M, sup. U, params); break; case Movie. Fast. Prune: fm. Prune. S(sup. U, params); break; case Movie. Common. Co. Support. Prune: m. Prune 2(sup. M, sup. U, params); break; } return; }

run script for processing movie_predict files into 1 movie_prediction file (and also 1. rmse

run script for processing movie_predict files into 1 movie_prediction file (and also 1. rmse and 1. out log file). cd Output. . /mpp-glue 1. . /$1 cd. . mpp-rmse 1. /$1 mpp-glue script #! /bin/bash # This utility 'glues' a set of. predict files for a given run # of mpp-mpred into a single file. This program is driven # by the input file used for the prediction run. When it finds # a movie (delimited by a trailing : ) ALL entries in files, # Input. File. Name_movie. ID. predict, in the current directory # are printed to a file, Input. File. Name. txt. prediction. # The utility takes as the single argument, Input. File. Name # used for the prediction run # Verify input file is found. if [ -z "$1" ]; then echo "Error: Input file not specified. "; exit 1; fi; if [ ! -e "$1" ]; then echo "Error: Input file not found - >$Input<"; exit 1; fi; # Variables global to this module. declare -r Name=`basename $1`; declare -r Output="$Name. predictions" Logfile="$Name. logfile"; declare -r Backup="$Name. backup"; declare Inputfile=$1; declare Movie; declare Predictions Log; declare Current_Dir; # Main body of the program occurs here. # If a directory named Output is present assume # we should use that directory. if [ -d ". /Output" ]; then Current_Dir=`pwd`; Inputfile=". . /$Inputfile"; cd Output; fi; # Remove any old output files and make sure we have a fresh backup directory. rm -f $Output $Logfile; if [ -d "$Backup" ]; then echo "Error: Backup directory present. "; exit 1; fi; mkdir $Backup; # Loop over prediction input file and generate outputs. cat $Inputfile | while read input; do if [ "$input" != "${input%%: }" ]; then Movie=${input%%: }; Predictions="$Name"_$Movie. predict; Log="$Name"_$Movie. log; if [ ! -e "$Predictions" ]; then echo "Error: Prediction file not found - " ">$Predictions<"; exit 1; fi; echo "Processing: $Movie"; cat $Predictions >>$Output; # [ -e "$Log" ] && cat $Log >>$Logfile; rm $Predictions; # previous line added # with following commented out, it seem to eliminate backing up. # mv $Predictions $Backup; # if [ $? -ne 0 ]; then echo "Error: Unable to create predictions backup. "; # exit 1; fi; # if [ -e "$Log" ]; then mv $Log $Backup; # if [ $? -ne 0 ]; then echo "Error: Unable to create logs backup. "; # exit 1; fi; Puts as output (in current dir) fi; Input. File. Name. txt. predictions done; # All done. Takes all echo -e "n. Inputfile: $Inputfile"; Input. File. Name_movie. ID 1. predict echo -e "t. Predictions: t$Output"; … echo -e "t. Logfile: t$Logfile"; Input. File. Name_movie. IDn. predict echo -e "t. Backups: t$Backup"; in current directory as input echo -e "n. Line count verifications: "; (deleted after processing) echo -e "t$(wc -l $Inputfile)"; echo -e "t$(wc -l $Output)"; [ -n "$Current_Dir" ] && cd. . ; mpp-glue exit 0

mpp-rmse 1 script #! /bin/bash # This utility generates an RMSE report based on

mpp-rmse 1 script #! /bin/bash # This utility generates an RMSE report based on predictions carried # out on the 'probe' dataset. It compares a prediction list against # the set of known files. # This program is driven by the input file used for the prediction # run. The majority of the comparative work and generation of the # RMSE values is done by the PERL script called from this script. # The PERL script reads both the prediction file # (Output/Input. File. Name. txt. prediction) and the list of known answers # (Input. File. Name. txt. answers in the current directory). # When a movie is found it verifies the movie is # also present in the companion file. This is to insure there are # no discrepancies between the two files. # The utility takes as a single argument the name of the input file # used for the prediction run. # Verify input file is found. if [ -z "$1" ]; then echo "Error: Input file not specified. "; exit 1; fi; if [ ! -e "$1" ]; then echo "Error: Input file not found - >$Input<"; exit 1; fi; # Variables global to this module. declare -r Startdir=`dirname $0`; declare -r Basename=`basename $1`; declare -r Answers="$1. answers"; declare -r Predictions="Output/$Basename. predictions"; if [ ! -e "$Answers" ]; then echo "Answers file not found - >$Answers<. "; exit 1; fi; if [ ! -e "$Predictions" ]; then echo "Predictions file not found - >$Predictions<. "; exit 1; fi; # Main body of the program occurs here. perl $Startdir/mpp-rmse. pl $Answers $Predictions | tee "$Basename. rmse"; exit 0 mpp-rmse 1. pl $answers = $ARGV[0]; $predictions = $ARGV[1]; $lp = 0; $cnt = 0; $error_sum = 0; $total_error = 0; $total_cnt = 0; $last_movie = ""; chomp(@answers = `cat $answers`); chomp(@predictions = `cat $predictions`); foreach(@answers) { if ( /: $/ ) { if ( $last_movie ne "" ) { printf "nt. Sum: %. 5 ft. Total: %-5 dt. RMSE: %fnn", $error_sum, $cnt, sqrt($error_sum/$cnt); printf "t. Running RMSE: %f / %d predictionsnn", sqrt($total_error/$total_cnt), $total_cnt; $error_sum = 0; $cnt = 0; } $last_movie = $_; print "Movie: $_n"; if ( $_ ne $predictions[$lp] ) { print "Movies don't matchn"; print "t$_ vs. $predictions[$lp]n"; exit 1; } ++$lp; next; } # Correct for an NAN if ( $predictions[$lp] eq "nan" ) { print "NAN"; $predictions[$lp] = "3. 70"; } if ( $predictions[$lp] eq "corm-nan" ) { print "CORM-NAN"; $predictions[$lp] = "3. 70"; } $error = ($_ - $predictions[$lp])**2; $error_sum += $error; $total_error += $error; ++$total_cnt; ++$cnt; printf "t%4 d: t. Answer: %2 dt. Prediction: $predictions[$lp]t. Error: %. 5 fn", $cnt -1, $_, $error; ++$lp; } # Print the RMSE from the last movie. printf "nt. Sum: %. 5 ft. Total: %-5 dt. RMSE: %fnn", $error_sum, $cnt, sqrt($error_sum/$cnt); # Then the total RMSE for the run. print "Prediction summary: n"; printf "t. Sum: %. 5 ft. Total: %-5 dt. RMSE: %fnn", $total_error, $total_cnt, sqrt($total_error/$total_cnt); exit 0; Puts as output (in current dir) Input. File. Name. txt. rmse Takes Output/Input. File. Name. txt. predictions and Input. File. Name. txt. answers from current directory as input mpp-rmse

mpp-user-reduce script #! /bin/bash # Variables global to this module. declare -r Pgm=`basename $0`;

mpp-user-reduce script #! /bin/bash # Variables global to this module. declare -r Pgm=`basename $0`; declare Mode="both"; # This utility reduces a set of movies to be predicted by outputting # movies which have an RMSE value greater than a specified threshold. # This program is driven by the input file used for the prediction # run. The majority of the comparative work and generation of the # RMSE values is done by the PERL script called from this script. # If the first argument to the utility is a -m the next argument # is interpreted as a mode value. The following arguments are accepted: # low: Output only low RMSE pairings. # high: Output only high RMSE pairings. # both: Output both files. # The default is for both files to be output. if [ "$1" = "-m" ]; then case $2 in low) Mode="low"; ; high) Mode="high"; ; both) Mode="both"; ; *) echo -e "$Pgm: Unknown argument to mode switch, c"; echo "specify low, high or both. "; exit 1; ; esac; shift 2; fi; # The utility takes four general argumns as follows: # # $1: Inputfile # $2: RMSE threshold value. # $3: Root name of output file for movies below threshold. # $4: Root name of output file for movies above threshold. # Verify input file is found. if [ -z "$1" -o ! -e "$1" ]; then echo "$Pgm: Error - Input file not specified. "; echo "Command format: " echo -e "t$Pgm [-m low|high|both] Inputfile Threshold c"; echo -e "Low. Out. File High. Outfile"; exit 1; fi; if [ -z "$2" ]; then echo "$Pgm: Error - RMSE threshold not specified. "; exit 1; fi; if [ -z "$3" ]; then echo "$Pgm: Error - Low output filename not specified. "; exit 1; fi; if [ -z "$4" ]; then echo "$Pgm: Error - High output filename not specified. "; exit 1; fi; # Variables global to this module which are dependent on command-line options. declare -r Input=$1; declare -r Startdir=`dirname $0`; declare -r Basename=`basename $1`; declare -r Answers="$1. answers"; declare -r Predictions="Output/$Basename. predictions"; declare -r Threshold=$2; declare -r Low. Out=$3; declare -r High. Out=$4; if [ ! -e "$Answers" ]; then echo "$Pgm: Error - Answers file not found: >$Answers<. "; exit 1; fi; if [ ! -e "$Predictions" ]; then echo "$Pgm - Predictions file not found: >$Predictions<. "; exit 1; fi; # Main body of the program occurs here. perl -w $Startdir/mpp-user-reduce. pl $Input $Answers $Predictions $Threshold $Low. Out $High. Out $Mode; exit 0 mpp-user-reduce -m both Data/probe 19. txt. 0001 lo 19 hi 19 Takes input, Data/probe 19. txt (movie. ID with interleaved user. IDs format or. txt format) Sq. Err. Thrhld (if Sq. Err ≤. 0001, put pair in lo 19. txt, else put in hi 19. txt) -m both means both lo and hi will be produced (other options: low or high) mpp-user-reduce –m both|low|high Input. File. txt Sq. Err. Thrhd Puts as output lo-File. Name hi-File. Name

mpp-user-reduce. pl $Input = $ARGV[0]; $Answers = $ARGV[1]; $Predictions = $ARGV[2]; $Threshold = $ARGV[3];

mpp-user-reduce. pl $Input = $ARGV[0]; $Answers = $ARGV[1]; $Predictions = $ARGV[2]; $Threshold = $ARGV[3]; $Low. Out = $ARGV[4]; $High. Out = $ARGV[5]; $Mode = $ARGV[6]; $Low_Count = 0; $High_Count = 0; # Subroutine outputs pairing results for a given collection of user/movie ratings; sub Output_Pairing { my($file, $rmse_ptr) = @_; my($inputfile, $answerfile, $user, $answer); # Open input and answer files. $inputfile = $file. ". txt"; print "tt. Input: $inputfilen"; open(NEWINPUT, ">$inputfile") || die "Cannot open new inputfile: $inputfile"; $answerfile = $file. ". txt. answers"; print "tt. Answers: $answerfilen"; open(ANSWERS, ">$answerfile") || die "Cannot open new answer file: $answerfile. "; # The outer loop runs over the movies in a grouping. The inner # loop then runs over the set of inputs and answers for that movie. foreach ( keys(%{$rmse_ptr}) ) { print NEWINPUT "$_n"; print ANSWERS "$_n"; foreach ( @{$$rmse_ptr{$_}} ) { ($user, $answer) = split; print NEWINPUT "$usern"; print ANSWERS "$answern"; } } close(NEWINPUT); close(ANSWERS); return; } # Main program starts here. # Load input, answers and predictions into arrays which are stored in # hashes keyed by movie number. open(INPUT, $Input) || die "Cannot open input: $Input"; while ( <INPUT> ) {chomp; if (/: $/) {$key = $_; $Input{$key}=[]; } else { push(@{$Input{$key}}, $_); } } close(INPUT); open(INPUT, $Answers) || die "Cannot open answer file: $Answers"; while ( <INPUT> ) {chomp; if (/: $/) {$key = $_; $Answers{$key}=[]; } else { push(@{$Answers{$key}}, $_); } } close(INPUT); open(INPUT, $Predictions) || die "Cannot open predictions file: $Predictions"; while ( <INPUT> ) { chomp; if ( /: $/ ) { $key = $_; $Predictions{$key} = []; } else { push(@{$Predictions{$key}}, $_); } } close(INPUT); foreach( keys(%Answers) ) { my $lp; my $error; $movie = $_; @users = @{$Input{$movie}}; @ans = @{$Answers{$movie}}; @pred = @{$Predictions{$movie}}; for ($lp= 0; $lp <= $#ans; ++$lp) { $user = $users[$lp]; $predict = $pred[$lp]; # Correct for NAN's and CORM-NAN if ($pred[$lp] eq "nan"){print "NAN"; $predict="3. 70"; } if ($pred[$lp] eq "corm-nan"){print "CORM-NAN"; $predict="3. 70"; } $error = ($ans[$lp] - $predict)**2; if ( $error > $Threshold ) { $High. RMSE{$movie} = [] if !defined($High. RMSE{$movie}); push(@{$High. RMSE{$movie}}, "$user $ans[$lp]"); ++$High_Count; } else { $Low. RMSE{$movie} = [] if !defined($Low. RMSE{$movie}); push(@{$Low. RMSE{$movie}}, "$user $ans[$lp]"); ++$Low_Count; } } } # Output new input and predictions files based on the reduced set. print "Selected movie/user pairings based on RMSE = $Threshold: n"; if ( ($Mode eq "low") or ($Mode eq "both") ) { print "t. Low rmse pairs: ", $Low_Count, "n"; Output_Pairing($Low. Out, %Low. RMSE); print "n"; } if ( ($Mode eq "high") or ($Mode eq "both") ) { print "t. High rmse pairs: ", $High_Count, "n"; Output_Pairing($High. Out, %High. RMSE); } # All done. exit 0;

mpp-filter script (for unioning (-M or), intersecting (-M and) clusters (to check coverage, etc.

mpp-filter script (for unioning (-M or), intersecting (-M and) clusters (to check coverage, etc. ) #! /bin/bash This is a driver program for implementing a utility for ANDing or # ORing two input files. # Variables global to this module. declare -r Pgm=`basename $0`; declare Mode=""; # Parse arguements. while getopts "M: " Arg; do case $Arg in M) Mode=$OPTARG; ; esac; done; # Sanity checks. if [ -z "$Mode" ]; then echo "$Pgm: No mode specified. "; exit 1; fi; if [ "$Mode" != "and" -a "$Mode" != "or" ]; then echo "$Pgm: Invalid mode specifed - $Mode"; exit 1; fi; # Verify two filenames are present. shift `expr $OPTIND - 1`; if [ $# -ne 2 ]; then echo "$Pgm: Insufficient filenames specified. "; exit ; fi; # Call Perl to carry out the boolean filtering operation. exec perl $Pgm. pl $Mode $*; mpp-filter. pl # This script implements boolean filtering operations between two input # files. The results of the filtering operation are output on stdout. # Two merge modes are supported: # AND: A user index is output if it exists for a given movie in both input files. # OR: A movie/user pair is output if it exists in either input file. $Mode = $ARGV[0]; $Input 1 = $ARGV[1]; $Input 2 = $ARGV[2]; # The following subroutine loads a file into an associative array. The # filename to be read is passed to the subroutine as the first arguement. # A reference to the associative array is passed as the second arguement. # If the filename cannot be opened an error exit is taken from the applic. sub Load_File { my $key, $file = $_[0], $hptr = $_[1]; open(IN, $file) || die "Cannot open file: $file"; while ( $_ = <IN> ) { chomp; if ( /: $/ ) { $key = $_; $$hptr{$key} = []; } else { push(@{$$hptr{$key}}, $_); } } close(IN); return; } # Subroutine outputs a file which has been stored in hashed/array format. sub Output_File { foreach ( keys(%{$_[0]}) ) { print "$_n"; my @hlist = @{$_[0]{$_}}; foreach ( @hlist ) { print "$_n"; } }

Appendix 1: additional codes $ ls -l -rwxr-xr-x -rw-r--r--rw-r--r--rw-r--r--rw-r--r-drwxr-xr-x -rw-r--r--rwxr-xr-x -rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rwxr-xr-x -rw-r--r--rwxr-xr-x -rw-r--r--rw-r--r--rwxr-xr-x -rw-r--r--

Appendix 1: additional codes $ ls -l -rwxr-xr-x -rw-r--r--rw-r--r--rw-r--r--rw-r--r-drwxr-xr-x -rw-r--r--rwxr-xr-x -rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rwxr-xr-x -rw-r--r--rwxr-xr-x -rw-r--r--rw-r--r--rwxr-xr-x -rw-r--r-- Directories 1 perrizo faculty 259 Nov 29 11: 22 cluster-corr 1 perrizo faculty 1. 2 K Nov 29 11: 22 cluster-corr. pl 1 perrizo faculty 7. 7 K Feb 1 12: 26 config 1 perrizo faculty 14 K Nov 29 11: 22 config. c 1 perrizo faculty 1. 4 K Nov 29 11: 22 config. h 1 perrizo faculty 5. 6 K Nov 29 11: 25 config. o 1 perrizo faculty 38 K Nov 29 11: 25 config-parser. c 1 perrizo faculty 806 Nov 29 11: 22 config-parser. l 1 perrizo faculty 15 K Nov 29 11: 25 config-parser. o 1 perrizo faculty 2. 4 K Nov 29 11: 22 cosupport. C 2 perrizo faculty 12 K Feb 2 12: 40 Data 2 perrizo faculty 4. 0 K Nov 29 11: 25 lib. PTree 1 perrizo faculty 4. 1 K Nov 29 11: 22 Makefile 1 perrizo faculty 16 K Nov 29 11: 25 movie-corr 1 perrizo faculty 1. 3 K Nov 29 11: 22 movie-corr. C 1 perrizo faculty 2. 3 K Nov 29 11: 22 Movie. Correlation. C 1 perrizo faculty 1. 4 K Nov 29 11: 22 Movie. Correlation. H 1 perrizo faculty 9. 6 K Nov 29 11: 25 Movie. Correlation. o 1 perrizo faculty 3. 3 K Nov 29 11: 25 movie-corr. o 1 perrizo faculty 1. 5 K Nov 29 11: 22 movie-rating. C 1 perrizo faculty 1. 5 K Nov 29 11: 22 movie-set. C 1 perrizo faculty 2. 0 K Nov 29 11: 22 Movie. Set. C 1 perrizo faculty 1. 1 K Nov 29 11: 22 Movie. Set. H 1 perrizo faculty 4. 2 K Nov 29 11: 25 Movie. Set. o 1 perrizo faculty 14 K Jan 19 07: 07 movie-vote. C 1 perrizo faculty 9. 7 K Jan 19 07: 07 movie-vote. o 1 perrizo faculty 303 Nov 29 11: 22 mpp 1 perrizo faculty 1. 3 K Nov 29 11: 22 mpp-cluster-list 1 perrizo faculty 2. 5 K Nov 29 11: 22 mpp-cluster-list. pl 1 perrizo faculty 1. 7 K Nov 29 11: 22 mpp. Config. C 1 perrizo faculty 1. 1 K Nov 29 11: 22 mpp. Config. H 1 perrizo faculty 2. 9 K Nov 29 11: 25 mpp. Config. o 1 perrizo faculty 745 Dec 5 11: 32 mpp-filter 1 perrizo faculty 3. 0 K Dec 5 11: 32 mpp-filter. pl 1 perrizo faculty 2. 3 K Nov 29 11: 22 mpp-glue 1 perrizo faculty 591 Nov 29 11: 22 mpp. h 1 perrizo faculty 101 K Feb 2 12: 38 mpp-mpred 1 perrizo faculty 13 K Nov 29 11: 22 mpp-mpred. C 1 perrizo faculty 29 K Nov 29 11: 25 mpp-mpred. o 1 perrizo faculty 1. 4 K Nov 29 11: 22 mpp-rmse 1 perrizo faculty 1. 5 K Nov 29 11: 22 mpp-rmse. pl 1 perrizo faculty 6. 9 K Jan 19 06: 36 mpp-user. C 1 perrizo faculty 1. 3 K Nov 29 11: 22 mpp-user-cluster 1 perrizo faculty 3. 8 K Nov 29 11: 22 mpp-user-cluster. pl 1 perrizo faculty 11 K Jan 19 07: 07 mpp-user. o 1 perrizo faculty 2. 5 K Jan 21 17: 38 mpp-user-reduce 1 perrizo faculty 3. 2 K Jan 21 17: 38 mpp-user-reduce. pl drwxr-xr-x -rw-r--r--rw-r--r--rw-r--r--rwxr-xr-x -rw-r--r--rw-r--r--rw-r--r--rw-r--r-- 75 perrizo faculty 3 perrizo faculty 5 perrizo faculty 1 perrizo faculty 1 perrizo faculty 1 perrizo faculty 1 perrizo faculty 2. 3 M Feb 2 13: 13 Output 4. 0 K Jan 8 13: 28 p 19 4. 0 K Jan 8 13: 34 p 95 4. 0 K Jan 31 10: 51 pf 22 K Nov 29 11: 22 Prediction. Config. C 5. 3 K Nov 29 11: 22 Prediction. Config. H 22 K Nov 29 11: 25 Prediction. Config. o 19 K Feb 2 12: 38 prune. C 29 K Feb 2 12: 38 prune. o 1. 2 K Nov 29 11: 22 read-user-ptrees. C 146 Nov 29 13: 59 run 74 K Dec 16 20: 47 show-config 454 Nov 29 11: 22 show-config. C 2. 7 K Nov 29 11: 25 show-config. o 6. 4 K Nov 29 11: 22 User. Set. C 1. 5 K Nov 29 11: 22 User. Set. H 7. 2 K Nov 29 11: 25 User. Set. o 17 K Jan 19 06: 57 user-vote. C 9. 3 K Jan 19 07: 07 user-vote. o $ ls -l Data -rw-r--r-- 1 perrizo faculty 67 Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 23 Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 533 K Dec 18 01: 33 -rw-r--r-- 1 perrizo faculty 146 K Dec 18 01: 33 -rw-r--r-- 1 perrizo faculty 1. 9 K Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 611 Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 23 K Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 6. 4 K Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 594 K Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 162 K Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 51 K Dec 18 01: 32 -rw-r--r-- 1 perrizo faculty 14 K Dec 18 01: 32 $ ls -l lib. PTree -rw-r--r-- 1 perrizo faculty 18672 Nov 29 11: 25 -rw-r--r-- 1 perrizo faculty 3192 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 15813 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 2973 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 11096 Nov 29 11: 25 -rw-r--r-- 1 perrizo faculty 18135 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 3796 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 488 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 779 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 7485 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 1179 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 6464 Nov 29 11: 25 -rw-r--r-- 1 perrizo faculty 2265 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 420 Nov 29 11: 22 -rw-r--r-- 1 perrizo faculty 16127 Nov 29 11: 22 p 1. txt. answers probe-1000. txt. answers probe 19. txt. answers probe 95. txt. answers test-probe-1000. txt. answers test-probe-100. txt. answers lib. PTree. a Makefile PTree. C PTree. H PTree. o PTree-omp. C ptree-op-test. C ptree-read. C ptree-save. C PTree. Set. H PTree. Set. o ptreeset-read. C ptree-test. C PTree-x 86_64. C PTree-x 86. C $ ls -l Output. . . -rw-r--r-- 1 perrizo faculty 32157 Feb 2 13: 25 probe-full. txt_9939. predict. . . drwxr-xr-x 2 perrizo faculty 901120 Jan 20 06: 23 probe-full. txt. backup -rw-r--r-- 1 perrizo faculty 7059980 Jan 20 06: 23 probe-full. txt. predictions

Makefile VERSION = 2. 6. 0 # Default directory where PTree data is stored.

Makefile VERSION = 2. 6. 0 # Default directory where PTree data is stored. # Overriden below depending on architecture. PTREEDATA = /tmp # Set compiler behavior based on architecture. ARCH : = $(shell uname -m | sed -e s/i 686/x 86/) ifeq (${ARCH}, x 86_64) COMPILER = gcc # COMPILER = gcc 4 PTREEDATA = /scratch/perrizo endif ifeq (${ARCH}, ia 64) # COMPILER = intel COMPILER = gcc 4 endif ifeq (${ARCH}, x 86) COMPILER = gcc endif ifndef (${COMPILER}, ) ifeq (${COMPILER}, gcc 4) CC = /opt/gcc 4/bin/gcc C++ = /opt/gcc 4/bin/g++ # WARNINGS = -W -Wall -Wchar-subscripts -Wshadow -Wpointer-arith -Wwrite-strings -Wmissing-prototypes # VECTOR = -ftree-vectorizer-verbose=5 OPT = -O 2 ${VECTOR} ifeq (${ARCH}, x 86_64) OPT += -msse 2 endif C_DEBUG = -g -pg LD_DEBUG = -g -pg endif ifeq (${COMPILER}, gcc) CC = gcc C++ = g++ # WARNINGS = -W -Wall -Wchar-subscripts -Wshadow -Wpointer-arith -Wwrite-strings -Wmissing-prototypes # VECTOR = -ftree-vectorizer-verbose=5 OPT = -O 2 ${VECTOR} ifeq (${ARCH}, x 86_64) OPT += -msse 2 endif C_DEBUG = -g -pg LD_DEBUG = -g -pg endif ifeq (${COMPILER}, pgroup) CC = pgcc C++ = pg. CC OPT = -fast -Minline=levels: 10 C_DEBUG = -g -Minfo #-pg LD_DEBUG = -g -tp core 2 -64 #-pg endif ifeq (${COMPILER}, intel) CC = icpc C++ = icpc OPT = -O 2 C_DEBUG = -g -p LD_DEBUG = -g -p endif INCLUDES = -I. /lib. PTree CFLAGS = ${OPT} ${WARNINGS} ${INCLUDES} ifdef DEBUG CFLAGS += ${C_DEBUG} endif ifdef DEBUG LDFLAGS += ${LD_DEBUG} endif OBJS = mpp-mpred. o mpp-user. o mpp. Config. o Prediction. Config. o Movie. Correlation. o User. Set. o Movie. Set. o movie-vote. o user-vote. o prune. o config-parser. o LIB =. /lib. PTree. a LIBS = -lfl -L. /lib. PTree -l. PTree # Executable target definitions. all: mpp-mpred show-config movie-corr mpp-mpred: ${OBJS} ${LIB} ${C++} ${LDFLAGS} -o $@ $^ ${LIBS}; cosupport: cosupport. o User. Set. o Movie. Set. o ${LIB} ${C++} ${LDFLAGS} -o $@ $^ ${LIBS}; tools: movie-rating movie-set movie-rating: movie-rating. o User. Set. o Movie. Set. o ${LIB} ${C++} ${LDFLAGS} -o $@ $^ ${LIBS} movie-set: movie-set. o User. Set. o ${LIB} ${C++} ${LDFLAGS} -o $@ $^ ${LIBS}; movie-corr: movie-corr. o Movie. Correlation. o ${LIB}

cosupport. C if ( argv[1] == NULL ) { fputs("Need V specified. n", stderr);

cosupport. C if ( argv[1] == NULL ) { fputs("Need V specified. n", stderr); return 1; } auto unsigned long int U = 421582, M = 0, V = strtoul(argv[1], NULL, 10); auto PTree M_support = Movies. get_users(M); auto PTree Voters(M_support); Voters. clearbit(U); unsigned long int *voters = Voters. get_indexes(); /** * This file contains a driver program to * determine the rating given to * a movie by a user. */ /* Standard include files. */ #include <unistd. h> #include <stdio. h> #include <string. h> #include <math. h> fputs("Voter list: n", stdout); for (size_t voter= 0; voter < Voters. get_count(); ++voter) fprintf(stdout, "%zu: %llun", voters[voter]); fputc('n', stdout); auto PTree cosupport; fputs("Voter Map: n", stdout); Voters. dump(stdout); /* Local include files. */ #include "User. Set. H" #include "Movie. Set. H" fputs("U Map: n", stdout); (Users. get_movies(U)). dump(stdout); fputs("V Map: n", stdout); (Users. get_movies(V)). dump(stdout); cosupport = Users. get_movies(U) & Users. get_movies(V); fputs("Cosupport Map: n", stdout); cosupport. dump(stdout); cosupport. clearbit(M); extern int main(int argc, char *argv[]) { auto Movie. Set Movies; auto User. Set Users; fputs("Loading user PTree's. n", stdout); if ( !Movies. load_binary() ) { fputs("Cannot load binary PTree's. n", stderr); return 1; } fputs("Loading movie PTree's. n", stdout); if ( !Users. load_binary() ) { fputs("Cannot load binary PTree's. n", stderr); return 1; } fputs("Loading user identities. nn", stdout); if ( !Users. load_identities() ) { fputs("Cannot load user identities. n", stderr); return 1; } fprintf(stdout, "Cosupport, M= %lu, U = %lu, V = %lun", M, U, V); auto double Ubar = Users. get_mean(U, cosupport), Vbar = Users. get_mean(V, cosupport), Vrt = Users. get_rating(V, M); auto double vote = Vrt - Vbar + Ubar; auto unsigned long int *movies = cosupport. get_indexes(); for (unsigned long int movie= 0; movie < cosupport. get_count(); ++movie) fprintf(stdout, "tt%lu [%lu]: t. U = %0. 2 f, V = %0. 2 fn", Movies. get_identity(movies[movie]), movies[movie], Movies. get_rating(U, movies[movie]), Movies. get_rating(V, movies[movie])); fprintf(stdout"ttt%. 2 ft[Vrt: %. 2 f Vbar: %. 2 f Ubar: %. 2 f]n", vote, Vrt, Vbar, Ubar); return 0; }

movie-corr. C /** file This file implements a program for * printing movie-movie correlations.

movie-corr. C /** file This file implements a program for * printing movie-movie correlations. */ /* Standard include files. */ #include <stdio. h> #include <stdlib. h> #include <unistd. h> /* Local include files. */ #include "Movie. Correlation. H" /* Program entry point. */ extern int main(int argc, char *argv[]) { auto bool dump = false; auto int gopt; auto unsigned int target=0, movie=0; auto Movie. Correlation mvcorr; while ((gopt=getopt(argc, argv, "dm: t: "))!=EOF){ switch ( gopt ) { case 'd': dump = true; break; case 'm': movie = atoi(optarg); break; case 't': target = atoi(optarg); break; } } if ( movie == 0 ) { fputs("movie-corr: No movie specified. n", stderr); return 1; } if ( !mvcorr. load(movie) ) { printf("Error loading movies. n"); return 1; } /* Dump movies and correlations. */ if ( dump ) { fprintf(stdout, "Correlations for movie: %un", movie); for (unsigned int lp= 0; lp < MOVIE_COUNT; ++lp) fprintf(stdout, "t%5 u: %7. 4 f / %dn", lp + 1, mvcorr. supp(lp), mvcorr(lp)); return 0; } /* Print correlation of target movie. */ if ( target > 0 ) { fprintf(stdout, "%-7. 4 fn", mvcorr(target-1)); return 0; } mpp #! /bin/bash if [ "$1" != "-i" ]; then echo "No input file specified. "; exit 1; fi; shift; inputfile="$1"; run_name=`basename $inputfile`; rm -f $run_name. out; . /mpp-mpred -i $inputfile $* >"$run_name. out" 2>&1 & while [ ! -e "$run_name. out" ]; do sleep 1 s; done; tail -f "$run_name. out"; exit; mpp. h /** file * This file contains general definitions and * defines for the PTree * based Netflix prediction system. */ /* External variable declarations. */ extern User. Set Users; extern Movie. Set Movies; /* Function declarations. */ extern void do_pruning(struct external_prune * const prune, unsigned long int M, unsigned long int U, PTree & sup. M, PTree & sup. U); double user_vote(Prediction. Config *, unsigned long int, PTree &); double movie_vote(Prediction. Config *, unsigned long int, PTree &);

Movie. Correlation. C /*Public method. * Implements loading of correlation and support vector for

Movie. Correlation. C /*Public method. * Implements loading of correlation and support vector for given movie. * param index The index number of the movie to be loaded. /** file * return A boolean value is used to indicate the success * This file contains the implementation of a class * or failure of the load. A true value indicates success. */ * which encapsulates management of correlation info bool Movie. Correlation: : load(unsigned long int index) * for a particular movie to all other movies. { */ auto char snbufr[10]; auto string root = PTREEDATA"/mpred-data/", corr_path = root + "mv_corr/co_mv_", /* System include files. */ supp_path = root + "mv_supp/sp_mv_"; #include <stdlib. h> auto ifstream corr_file, supp_file; /* Standard C++ includes. */ /* Sanity check for movie index size. */ #include <string> if ( index > (MOVIE_COUNT + 1) ) #include <iostream> return false; #include <fstream> movie_index = index; /* Synthesize the filename of the correlations file and read it. */ /* Local include files. */ snprintf(snbufr, sizeof(snbufr), "%lu", movie_index); #include "Movie. Correlation. H" string sn(snbufr); string corr_fname = corr_path + sn + ". bin"; using namespace std; corr_file. open(corr_fname. c_str()); if ( corr_file. fail() ) { corr_file. close(); Movie. Correlation: : Movie. Correlation(void) return false; } { corr_file. read(reinterpret_cast<char*>(&correlations), movie_index = 0; (MOVIE_COUNT + 1)*sizeof(float)); if ( corr_file. fail() ) { /* Initialize correlation and support count. */ corr_file. close(); for (unsigned int lp= 0; lp <= MOVIE_COUNT + 1; ++lp) { return false; support[lp] = 0; } correlations[lp] = 0. 0; corr_file. close(); } /* Synthesize the filename of the support file and read it. */ string supp_fname = supp_path + sn + ". bin"; return; supp_file. open(supp_fname. c_str()); } if ( supp_file. fail() ) { supp_file. close(); return false; /** } * Destructor. supp_file. read(reinterpret_cast<char*>(&support), */ (MOVIE_COUNT + 1)*sizeof(short int)); if ( supp_file. fail() ) { Movie. Correlation: : ~Movie. Correlation(void) supp_file. close(); return false; { } return; supp_file. close(); } return true; }

Movie. Correlation. H #if !defined(MOVIECORRELATION_H) #define MOVIECORRELATION_H /* Total number of movies. */ #define

Movie. Correlation. H #if !defined(MOVIECORRELATION_H) #define MOVIECORRELATION_H /* Total number of movies. */ #define MOVIE_COUNT 17770 /* Standard include files. */ #include <stdio. h> /* Local include files. */ class Movie. Correlation { private: /* * Inline accessor methods for returning movie supports and * correlations. */ float inline corr(unsigned int index) { if ( index > (MOVIE_COUNT + 1) ) return 0; return correlations[index + 1]; } unsigned short inline supp(unsigned int index) { if ( index > (MOVIE_COUNT + 1) ) return 0; return support[index + 1]; } /* Public method for loading the correlation vector for a movie. */ bool load(unsigned long int); }; #endif /* The index number of the movie whose correlations are loaded. */ unsigned long int movie_index; /* * The following array contains the list of correlations for * a movie to all the other movies. The array is one based * so a value of one needs to be added to the movie index * number to retrieve the correlation. */ float correlations[MOVIE_COUNT + 1]; /* * The following array contains the support list for the * correlations vector. The vector is one based as is the * correlations vector. */ unsigned short int support[MOVIE_COUNT + 1]; public: /* Void constructor. */ Movie. Correlation(void); /* Destructor. */ ~Movie. Correlation(void);

Movie. Set. C /* System include files. */ #include <limits. h> /* Local include

Movie. Set. C /* System include files. */ #include <limits. h> /* Local include files. */ #include "Movie. Set. H" /* Variables static to this module. */ /* No arguement constructor. */ Movie. Set: : Movie. Set(void) : ptree_set() {return; } /* Destructor. */ Movie. Set: : ~Movie. Set(void) {return; } /* Public method calculates rating user provided for movie. * param user_index The identity number of the user. * param movie The identity number of the movie. * return The rating number is returned to the caller. */ double Movie. Set: : get_rating(unsigned long int user_index, unsigned long int movie_index) { auto double rating = 0; auto size_t slot = movie_index * 3; Movie. Set. H #if !defined(MOVIESET_H) #define MOVIESET_H /* Standard include files. */ #include <stdio. h> #include <math. h> /* Local include files. */ #include "PTree. Set. H" class Movie. Set { private: PTree. Set ptree_set; public: /* Void constructor. */ Movie. Set(void); /* Constructor to initialize an in-memory tree. */ /* Destructor. */ ~Movie. Set(void); for (int tree= 2, bit= 0; tree >= 0; --tree, ++bit) { if ( ptree_set[slot + tree]. is_set(user_index)) rating += pow(2. 0, bit); } return rating; /* Public inline method to return identity of movie index*/ unsigned long int get_identity(unsigned long int offset) { return offset + 1; } } /* Public method returns PTree describing * set of users who rated movie*/ PTree Movie. Set: : get_users(unsigned long int index) { auto size_t slot = index * 3; return ptree_set[slot] | ptree_set[slot+1] | ptree_set[slot+2]; } /* Public method param output descriptor- PTree's to be directed*/ void Movie. Set: : dump(FILE *output) { for (int lp= 0; lp < ptree_set. size(); ++lp) ptree_set[lp]. dump(output); return; } /* Public method loads binary PTree set which has as its * X-axis user indexes with movie rating PTree's on Y-axis. */ bool Movie. Set: : load_binary(void) { auto char bufr[PATH_MAX]; auto FILE *input; for (int pt= 22; pt <= 53331; ++pt) { snprintf(bufr, sizeof(bufr), "%s/mpred-data/nf_us_mv_pt/p%d. pct", PTREEDATA, pt); if ( (input = fopen(bufr, "r")) == NULL ) return false; if ( !ptree_set. load_binary_file(input) ) return false; fclose(input); } return true; } /* Public inline method to return index of movie identity*/ unsigned long int get_index(unsigned long int identity) { return identity - 1; } /* Public method to return rating of movie by user. */ double get_rating(unsigned long int, unsigned long int); /* Public method to return set of users rating movie. */ PTree get_users(unsigned long int); /* Public method to print sparseness of set. */ void dump(FILE *); /* Public method to load a binary PTree set. */ bool load_binary(void); }; #endif

mpp. Config. C /** file contains implentation of class which encapsulates * info needed

mpp. Config. C /** file contains implentation of class which encapsulates * info needed to configure prediction run. Purpose of * class is to abstract out diff between single config * run and a run based on a cluster of configurations. */ /* System include files. */ /* Local include files. */ #include "mpp. Config. H" /* No arguement constructor. */ mpp. Config: : mpp. Config(void) { standard_config = false; standard = NULL; cluster_config = false; return; } /* Destructor. */ mpp. Config: : ~mpp. Config(void) { if ( standard != NULL ) delete standard; return; } /* * * * Public method causes the object to be initialized as a standard single file configuration. param cfgfile ptr to buffer containing name of standard configuration file. return If init of configuration is successful a boolean true value is returned. Otherwise a false value is returned. */ bool mpp. Config: : read_config(const char * const cfgfile) { standard = new Prediction. Config; if ( standard == NULL ) return false; if ( !standard->read_config(cfgfile) ) return false; standard_config = true; return true; } /* Public method causes object to be initialized as standard single * file configuration. * param cfgfile Pointer to buffer containing the name of the * standard configuration file. * return If initialization of configuration is successful * a boolean true value is returned. Otherwise a * false value is returned. */ bool mpp. Config: : read_cluster_config(const char * const cfgfile) { return false; } mpp. Config. H #if !defined(MPPCONFIG_H) #define MPPCONFIG_H /* Standard include files. */ #include <stdio. h> /* Local include files. */ #include "Prediction. Config. H" class mpp. Config { private: bool standard_config, cluster_config; Prediction. Config *standard; public: /* Void constructor. */ mpp. Config(void); /* Destructor. */ ~mpp. Config(void); /* Public inline accessor methods to determine if a standard * or cluster configuration is being used. */ inline bool is_standard_config(void) {return standard_config; } inline bool is_cluster_config(void) {return cluster_config; } /* Public inline accesor method for the standard configuration. */ inline Prediction. Config*get_standard_config(void){return standard; } /* Public method to read a configuration file. */ bool read_config(const char * const); /* Public method to read a cluster configuration file. */ bool read_cluster_config(const char * const); /* Public method to print out a configuration. */ void print(FILE *); }; #endif

Prediction. Config. C /* file File contains implementation of class which encapsulates * info

Prediction. Config. C /* file File contains implementation of class which encapsulates * info which regulates how Movie/User pair predictions are made. */ /* System include files. */ #include <stdlib. h> #include <string. h> /* Local include files. */ #include "Prediction. Config. H" extern "C" {#include "config. h"} /* Internal private function. * This function initializes an internal pruning structure. * param p A pointer to the structure to be initialized. */ static void _init_internal_prune(struct pruning *p) { p->enabled = false; p->weight = false; p->threshold = 0. 0; p->exponent = 1. 0; return; } /* Internal private function. * This function initializes a structure defining external pruning. * param p A pointer to the structure to be initialized. */ static void _init_external_prune(struct external_prune *p) { p->enabled = false; p->method = User. Prune; p->params. mstrt = 0; p->params. mstrt_mult = 0. 0; p->params. ustrt = 0; p->params. ustrt_mult = 0. 0; p->params. TSa = -100; p->params. TSb = -100; p->params. Tdvp = -1; p->params. Tdvs = -1; p->params. Tvdp = -1; p->params. Tvds = -1; p->params. TD = -1; p->params. TP = -1; p->params. PPm = 0. 1; p->params. TV = -1; p->params. TSD = -1; p->params. Ch = 1; p->params. Ct = 2; return; } /* No arguement constructor. */ Prediction. Config: : Prediction. Config(void) { /* Initialize general prediction parameters. */ name = NULL; user_voting = false; movie_voting = false; user_vote_weight = 1; /* Initialize user voting parameters. */ user_force_vote_in_Voter_Loop = false; user_force_vote_after_Voter_Loop = false; user_reset_support = false; user_boundary_override = false; user_facz = 0. 0; user_thrz = 1. 0; _init_internal_prune(&dv. Corp); _init_internal_prune(&dv. Cors); _init_internal_prune(&vd. Corp); _init_internal_prune(&vd. Cors); _init_internal_prune(&p. Cor); _init_internal_prune(&d. Cor); _init_internal_prune(&s. Cor); _init_internal_prune(&d. UVsdp); _init_internal_prune(&d. UVsds); _init_internal_prune(&Vsdp_Usdp); _init_internal_prune(&Vsds_Usds); _init_external_prune(&Prune_Users_in_Sup. M); _init_external_prune(&Prune_Movies_in_Sup. U); _init_external_prune(&Prune_Movies_in_Co. Sup. UV); /* Initialize movie voting parameters. */ movie_force_vote_in_Voter_Loop = false; movie_force_vote_outside_Voter_Loop = false; movie_boundary_override = false; movie_facz = 0. 0; movie_thrz = 1. 0; _init_internal_prune(&DVCorp); _init_internal_prune(&DVCors); _init_internal_prune(&VDCorp); _init_internal_prune(&VDCors); _init_internal_prune(&PCor); _init_internal_prune(&DCor); _init_internal_prune(&SCor); _init_internal_prune(&d. MNsdp); _init_internal_prune(&d. MNsds); _init_internal_prune(&Nsdp_Msdp); _init_internal_prune(&Nsds_Msds); _init_external_prune(&Movie_Prune_Users_in_Sup. M); _init_external_prune(&Movie_Prune_Movies_in_Sup. U); _init_external_prune(&Movie_Prune_Users_in_Co. Sup. MN); return; }

Prediction. Config. C page 2 /* Destructor. */ Prediction. Config: : ~Prediction. Config(void) {

Prediction. Config. C page 2 /* Destructor. */ Prediction. Config: : ~Prediction. Config(void) { if ( name != NULL ) free(name); return; } /* * * Internal private function. initializes configuration structure for an internal pruning method. param cf The configuration which is being used. param sp A pointer to the external pruning definition structure which is to be initialized. param name The name of the external pruning method. */ void _set_external_prune(Config cf, struct external_prune *sp, /* Internal private fctn determines if config enabled. const char *name) * param cf Ptr to configto be tested for the option. { * param var Ptr to name of variable to be tested. auto char *val; * return Boolean value returned to indicated whether auto struct pruning_parameters *pp = &sp->params; * configuration option has been enabled. True value if ( !Config_Set_Section(cf, name) ) return; * indicates variable is enabled else false returned. */ val = Config_Get(cf, "method"); static bool _is_enabled(Config cf, const char * const var) if ( strcmp(val, "User. Prune")==0) sp->method=User. Prune; { if ( strcmp(val, "User. Fast. Prune")==0) sp->method=User. Fast. Prune; auto char *p; if ( strcmp(val, "User. Common. Co. Support. Prune")==0) p = Config_Get(cf, var); sp->method=User. Common. Co. Support. Prune; if(p==NULL) return false; if(strcmp(val, "Movie. Prune")==0) sp->method=Movie. Prune; if (strcmp(p, "enabled")==0) return true; if(strcmp(val, "Movie. Fast. Prune")==0)sp->method=Movie. Fast. Prune; return false; if(strcmp(val, "Movie. Common. Co. Support. Prune")==0) } sp->method = Movie. Common. Co. Support. Prune; /* Internal private function. * initializes config struct for internal pruning method. /* Set the external pruning parameters. */ * param cf The configuration which is being used. val = Config_Get(cf, "mstrt"); * param sp Pointer to the structure to be initialized. if ( val != NULL ) pp->mstrt = atoll(val); * param name Name of the internal pruning method. val = Config_Get(cf, "mstrt_mult"); * param threshold Name of variable containing thresh. if ( val != NULL ) pp->mstrt_mult = atof(val); * param wt Name of variable specifying whether val = Config_Get(cf, "ustrt"); * method should be used to set the value of u. Cor. */ if ( val != NULL ) pp->ustrt = atoll(val); void _set_internal_prune(Config cf, struct pruning *sp, val = Config_Get(cf, "ustrt_mult"); const char *name, const char *threshold, const char *weight) if ( val != NULL ) pp->ustrt_mult = atof(val); { val = Config_Get(cf, "TSa"); auto char *val; if ( val != NULL ) pp->TSa = atof(val); sp->enabled = _is_enabled(cf, name); val = Config_Get(cf, "TSb"); if ( !sp->enabled ) return; if ( val != NULL ) pp->TSb = atof(val); val = Config_Get(cf, threshold); val = Config_Get(cf, "Tdvp"); if ( val != NULL ) sp->threshold = atof(val); if ( val != NULL ) pp->Tdvp = atof(val); sp->weight = _is_enabled(cf, weight); return; val = Config_Get(cf, "Tdvs"); } if ( val != NULL ) pp->Tdvs = atof(val); /* Internal private function. val = Config_Get(cf, "Tvdp"); * Function initializes config structure for standard if ( val != NULL ) pp->Tvdp = atof(val); * deviation based pruning method. val = Config_Get(cf, "Tvds"); * param cf Configuration which is being used. if ( val != NULL ) pp->Tvds = atof(val); * param sp Ptr to structure to be initialized. val = Config_Get(cf, "TD"); * param name of the internal pruning method. if ( val != NULL ) pp->TD = atof(val); * param threshold name of variable containing threshold val = Config_Get(cf, "TP"); * param exponent Name of variable specifying the exponent if ( val != NULL ) pp->TP = atof(val); * which should be used for the GAUSSIAN val = Config_Get(cf, "PPm"); * method should be used to set value of u. Cor. */ if ( val != NULL ) pp->PPm = atof(val); val = Config_Get(cf, "TV"); void _set_stddev_prune(Config cf, struct pruning *sp, if ( val != NULL ) pp->TV = atof(val); const char *name, const char *threshold, const char *exponent){ val = Config_Get(cf, "TSD"); auto char *val; if ( val != NULL ) pp->TSD = atof(val); sp->enabled = _is_enabled(cf, name); val = Config_Get(cf, "Ch"); if(!sp->enabled)return; if ( val != NULL ) pp->Ch = atof(val); val=Config_Get(cf, threshold); val = Config_Get(cf, "Ct"); if(val!=NULL)sp->threshold=atof(val); if ( val != NULL ) pp->Ct = atof(val); val=Config_Get(cf, exponent); return; if(val!=NULL)sp->exponent=atof(val); return; } }

Prediction. Config. C /* * * page 3 Public method used for paramers to

Prediction. Config. C /* * * page 3 Public method used for paramers to be associated with internal pruning type. param Enumerated type describing internal pruning return for which parameter information is to be obtained. return Ptr to structure describing how the internal pruning method is to be implemented. */ struct pruning *Prediction. Config: : get_internal_prune (enum internal_pruning pr) { switch ( pr ) { case user_dv. Corp: return &dv. Corp; case user_dv. Cors: return &dv. Cors; case user_vd. Corp: return &vd. Corp; case user_vd. Cors: return &vd. Cors; case user_p. Cor: return &p. Cor; case user_d. Cor: return &d. Cor; case user_s. Cor: return &s. Cor; case case movie_DVCorp: return &DVCorp; movie_DVCors: return &DVCors; movie_VDCorp: return &VDCorp; movie_VDCors: return &VDCors; movie_PCor: return &PCor; movie_DCor: return &DCor; movie_SCor: return &SCor; /* Standard deviation types */ case user_d. UVsdp: return &d. UVsdp; case user_d. UVsds: return &d. UVsds; case user_Vsdp_Usdp: return &Vsdp_Usdp; case user_Vsds_Usds: return &Vsds_Usds; case } /* * * * /* Open and parse the configuration file. */ cf = Config_Init(); if(cf == NULL ) return false; if(Config_Parse(cf, file)<0){Config_Destroy(cf); return false; } } return NULL; movie_d. MNsdp: return &d. MNsdp; movie_d. MNsds: return &d. MNsds; movie_Nsdp_Msdp: return &Nsdp_Msdp; movie_Nsds_Msds: return &Nsds_Msds; /* Set general prediction parameters. */ if (!Config_Set_Section(cf, "Default")) {Config_Destroy(cf); return false; } val = Config_Get(cf, "name"); if ( val != NULL ) name = strdup(val); user_voting = _is_enabled(cf, "user_voting"); movie_voting = _is_enabled(cf, "movie_voting"); val = Config_Get(cf, "user_vote_weight"); if ( val != NULL ) user_vote_weight = atof(val); /* Process user voting parameters. */ if ( user_voting && Config_Set_Section(cf, "user_voting") ) { user_force_vote_in_Voter_Loop = _is_enabled(cf, "force_vote_in_Voter_Loop"); user_force_vote_after_Voter_Loop = _is_enabled(cf, "force_vote_after_Voter_Loop"); user_reset_support = _is_enabled(cf, "reset_support"); user_boundary_override = _is_enabled(cf, "boundary_override"); if ( user_boundary_override ) { val = Config_Get(cf, "facz"); if ( val != NULL ) user_facz = atof(val); val = Config_Get(cf, "thrz"); if ( val != NULL ) user_thrz = atof(val); } /* Process user voting parameters. */ if(user_voting && Config_Set_Section(cf, "user_voting")){ user_force_vote_in_Voter_Loop = _is_enabled(cf, "force_vote_in_Voter_Loop"); user_force_vote_after_Voter_Loop = _is_enabled(cf, "force_vote_after_Voter_Loop"); user_reset_support = _is_enabled(cf, "reset_support"); user_boundary_override=_is_enabled(cf, "boundary_override"); if ( user_boundary_override ) { val = Config_Get(cf, "facz"); if ( val != NULL ) user_facz = atof(val); val = Config_Get(cf, "thrz"); if ( val != NULL ) user_thrz = atof(val); } Public method. parses configuration file and translates the ASCII key/value pairs into appropriate configuration variables. param file A character pointer to the file name containing the configuration to be read. return A boolean value is returned to indicate whether or not the read of the configuration file was successful. A true value indicates success while failure is indicated by a false value. */ bool Prediction. Config: : read_config(const char * const file) { auto char *val; auto Config cf;

Prediction. Config. C page 4 Movie_Prune_Users_in_Sup. M. enabled=_is_enabled(cf, "Prune_Users_in_Sup. M"); Movie_Prune_Movies_in_Sup. U. enabled=_is_enabled(cf, _set_internal_prune(cf,

Prediction. Config. C page 4 Movie_Prune_Users_in_Sup. M. enabled=_is_enabled(cf, "Prune_Users_in_Sup. M"); Movie_Prune_Movies_in_Sup. U. enabled=_is_enabled(cf, _set_internal_prune(cf, &dv. Corp, "dv. Corp", "dv. Thrp", "dv. Corp. Weight"); "Prune_Movies_in_Sup. U"); Movie_Prune_Users_in_Co. Sup. MN. enabled=_is_enabled(cf, _set_internal_prune(cf, &dv. Cors, "dv. Cors", "dv. Thrs", "dv. Cors. Weight"); "Prune_Users_in_Co. Sup. MN"); _set_internal_prune(cf, &vd. Corp, "vd. Corp", "vd. Thrp", "vd. Corp. Weight"); _set_internal_prune(cf, &vd. Cors, "vd. Cors", "vd. Thrs", "vd. Cors. Weight"); if ( Movie_Prune_Users_in_Sup. M. enabled ) _set_internal_prune(cf, &p. Cor, "p. Cor", "p. Thr", "p. Cor. Weight"); _set_external_prune(cf, &Movie_Prune_Users_in_Sup. M, "movie_voting Prune_Users_in_Sup. M"); _set_internal_prune(cf, &d. Cor, "d. Cor", "d. Thr", "d. Cor. Weight"); if ( Movie_Prune_Movies_in_Sup. U. enabled ) _set_internal_prune(cf, &s. Cor, "s. Cor", "s. Thr", "s. Cor. Weight"); _set_external_prune(cf, &Movie_Prune_Movies_in_Sup. U, _set_stddev_prune(cf, &d. UVsdp, "d. UVsdp", "d. UVsdp. Thr", "d. UVsdp. Exp"); "movie_voting Prune_Movies_in_Sup. U"); _set_stddev_prune(cf, &d. UVsds, "d. UVsds", "d. UVsds. Thr", "d. UVsds. Exp"); if ( Movie_Prune_Users_in_Co. Sup. MN. enabled ) _set_external_prune(cf, _set_stddev_prune(cf, &Vsdp_Usdp, "Vsdp_Usdp", "Vsdp_Usdp. Thr", &Movie_Prune_Users_in_Co. Sup. MN, "Vsdp_Usdp. Exp"); "movie_voting Prune_Users_in_Co. Sup. MN"); _set_stddev_prune(cf, &Vsds_Usds, "Vsds_Usds", "Vsds_Usds. Thr", } "Vsds_Usds. Exp"); Config_Destroy(cf); return true; Prune_Movies_in_Sup. U. enabled=_is_enabled(cf, } "Prune_Movies_in_Sup. U"); fputs("ttt. Pruning method: ", output); Prune_Users_in_Sup. M. enabled=_is_enabled(cf, switch ( sp->method ) { "Prune_Users_in_Sup. M"); case User. Prune: fputs("User. Prunen", output); break; Prune_Movies_in_Co. Sup. UV. enabled=_is_enabled(cf, case User. Fast. Prune: fputs("User. Fast. Prunen", output); break; "Prune_Movies_in_Co. Sup. UV"); case User. Common. Co. Support. Prune: fputs("User. Common. Co. Support. Prunen", output); break; if(Prune_Movies_in_Sup. U. enabled)_set_external_prune(cf, case Movie. Prune: fputs("Movie. Prunen", output); break; &Prune_Movies_in_Sup. U, "user_voting Prune_Movies_in_Sup. U"); case Movie. Fast. Prune: fputs("Movie. Fast. Prunen", output); break; if(Prune_Users_in_Sup. M. enabled)_set_external_prune(cf, case Movie. Common. Co. Support. Prune: fputs("Movie. Common. Co. Support. Prunen", &Prune_Users_in_Sup. M, "user_voting Prune_Users_in_Sup. M"); output); break; } if(Prune_Movies_in_Co. Sup. UV. enabled)_set_external_prune(cf, &Prune_Movies_in_Co. Sup. UV, "user_voting Prune_Movies_in_Co. Sup. UV"); } /* Process movie voting configuration. */ if ( movie_voting && Config_Set_Section(cf, fprintf(output, "ttmstrt: %-llutmultiplier: %-7. 2 fn", pp->mstrt_mult); "movie_voting")){ fprintf(output, "ttustrt: %-llutmultiplier: %-7. 2 fn", movie_force_vote_in_Voter_Loop=_is_enabled(cf, pp->ustrt_mult); "force_vote_in_Voter_Loop"); fprintf(output, "tt. TSa: %-7. 2 ft. TSb: %-7. 2 fn", pp->TSa, movie_force_vote_outside_Voter_Loop=_is_enabled pp->TSb); fprintf(output, "tt. Tdvp: %-7. 2 ft. Tdvs: %-7. 2 fn", (cf, "force_vote_outside_Voter_Loop"); pp->Tdvp, pp->Tdvs); movie_reset_support = _is_enabled(cf, "reset_support"); fprintf(output, "tt. Tvdp: %-7. 2 ft. Tvds: %-7. 2 fn", movie_boundary_override = _is_enabled(cf, "boundary_override"); pp->Tvdp, pp->Tvds); if(movie_boundary_override) { val=Config_Get(cf, "facz"); fprintf(output, "tt. TD: %-7. 2 ft. TP: %-7. 2 fn", if(val!=NULL)movie_facz=atof(val); val = Config_Get(cf, "thrz"); pp->TD, pp->TP); if ( val != NULL ) movie_thrz = atof(val); } fprintf(output, "tt. PPm: %-7. 2 fn", pp->PPm); _set_internal_prune(cf, &DVCorp, "DVCorp", "DVThrp", "DVCorp. Weight"); fprintf(output, "tt. TV: %-7. 2 ft. TSD: %-7. 2 fn", _set_internal_prune(cf, &DVCors, "DVCors", "DVThrs", "DVCors. Weight"); pp->TV, pp->TSD); _set_internal_prune(cf, &VDCorp, "VDCorp", "VDThrp", "VDCorp. Weight"); fprintf(output, "tt. Ch: %-7. 2 ft. Ct: %-7. 2 fn", _set_internal_prune(cf, &VDCors, "VDCors", "VDThrs", "VDCors. Weight"); pp->Ch, pp->Ct); _set_internal_prune(cf, &PCor, "PCor", "PThr", "PCor. Weight"); return; _set_internal_prune(cf, &DCor, "DCor", "DThr", "DCor. Weight"); } _set_internal_prune(cf, &SCor, "SCor", "SThr", "SCor. Weight"); _set_stddev_prune(cf, &d. MNsdp, "d. MNsdp", "d. MNsdp. Thr", "d. MNsdp. Exp"); _set_stddev_prune(cf, &d. MNsds, "d. MNsds", "d. MNsds. Thr", "d. MNsds. Exp"); _set_stddev_prune(cf, &Nsdp_Msdp, "Nsdp_Msdp", "Nsdp_Msdp. Thr", "Nsdp_Msdp. Exp"); _set_stddev_prune(cf, &Nsds_Msds, "Nsds_Msds", "Nsds_Msds. Thr", "Nsds_Msds. Exp");

Prediction. Config. C page 5 if ( movie_boundary_override ) { fputs("is enabled: n", output);

Prediction. Config. C page 5 if ( movie_boundary_override ) { fputs("is enabled: n", output); fprintf(output, "tttfacz: %-7. 2 ftthrz: %-7. 2 fn", movie_facz, movie_thrz); } else fputs("not enabled. n", output); /* Public method prints interpretation of configuration. * param output file descriptor to be used for output. */ void Prediction. Config: : print(FILE *output) { if ( name == NULL ) fputc('n', output); _print_internal_prune(&DVCorp, "DVCorp", output); else fprintf(output, "t. Name: %snn", name); _print_internal_prune(&DVCors, "DVCors", output); if ( user_voting ) { fputs("t. User voting enabled. n", output); _print_internal_prune(&VDCorp, "vd. Corp", output); fprintf(output, "tt. User vote weight: %-7. 2 fn", _print_internal_prune(&VDCors, "vd. Cors", output); user_vote_weight); _print_internal_prune(&PCor, "p. Cor", output); fputs("tt. Force vote in voter loop will be ", output); _print_internal_prune(&DCor, "d. Cor", output); if(user_force_vote_in_Voter_Loop ) fputs("enabled. n", output); _print_internal_prune(&SCor, "s. Cor", output); else fputs("disabled. n", output); fputs("tt. Force vote after voter loop will be ", output); _print_stddev_prune(&d. MNsdp, "d. MNsdp", output); if(user_force_vote_after_Voter_Loop) fputs("enabled. n", output); _print_stddev_prune(&d. MNsds, "d. MNsds", output); else fputs("disabled. n", output); _print_stddev_prune(&Nsdp_Msdp, "Nsdp_Msdp", output); fputs("tt. User support ", output); _print_stddev_prune(&Nsds_Msds, "Nsds_Msds", output); if ( user_reset_support ) fputs("will be reset. n", output); else fputs("will not be reset. n", output); _print_external_prune(&Movie_Prune_Users_in_Sup. M, fputs("tt. Boundary override ", output); "Prune_Users_in_Sup. M", output); if ( user_boundary_override ) { fputs("is enabled: n", output); _print_external_prune(&Movie_Prune_Movies_in_Sup. U, fprintf(output, "tttfacz: %-7. 2 ftthrz: %-7. 2 fn", "Prune_Movies_in_Sup. U", output); user_facz, user_thrz); _print_external_prune(&Movie_Prune_Users_in_Co. Sup. MN, } "Prune_Users_in_Co. Sup. MN", output); else fputs("not enabled. n", output); fputc('n', output); _print_internal_prune(&dv. Corp, "dv. Corp", output); } _print_internal_prune(&dv. Cors, "dv. Cors", output); _print_internal_prune(&vd. Corp, "vd. Corp", output); return; _print_internal_prune(&vd. Cors, "vd. Cors", output); } _print_internal_prune(&p. Cor, "p. Cor", output); _print_internal_prune(&d. Cor, "d. Cor", output); _print_internal_prune(&s. Cor, "s. Cor", output); _print_stddev_prune(&d. UVsdp, "d. UVsdp", output); _print_stddev_prune(&d. UVsds, "d. UVsds", output); _print_stddev_prune(&Vsdp_Usdp, "Vsdp_Usdp", output); _print_stddev_prune(&Vsds_Usds, "Vsds_Usds", output); _print_external_prune(&Prune_Movies_in_Sup. U, "Prune_Movies_in_Sup. U", output); _print_external_prune(&Prune_Users_in_Sup. M, "Prune_Users_in_Sup. M", output); _print_external_prune(&Prune_Movies_in_Co. Sup. UV, "Prune_Movies_in_Co. Sup. UV", output); fputc('n', output); } if ( movie_voting ) { fputs("t. Movie voting enabled. n", output); fprintf(output, "tt. Movie vote weight: %-7. 2 fn", 1. 0 - user_vote_weight); fputs("tt. Force vote in voter loop will be ", output); if ( movie_force_vote_in_Voter_Loop ) fputs("enabled. n", output); else fputs("disabled. n", output); fputs("tt. Force vote outside voter loop will be ", output); if ( movie_force_vote_outside_Voter_Loop ) fputs("enabled. n", output); else fputs("disabled. n", output); fputs("tt. Movie support ", output);

Prediction. Config. H #if !defined(PREDICTIONCONFIG_H) #define PREDICTIONCONFIG_H /* Standard include files. */ #include <stdio.

Prediction. Config. H #if !defined(PREDICTIONCONFIG_H) #define PREDICTIONCONFIG_H /* Standard include files. */ #include <stdio. h> /* Local include files. */ /* Enumeration types describing various * methods of internal pruning. */ enum internal_pruning { user_dv. Corp, user_dv. Cors, user_vd. Corp, user_vd. Cors, user_p. Cor, user_d. Cor, user_s. Cor, movie_DVCorp, movie_DVCors, movie_VDCorp, movie_VDCors, movie_PCor, movie_DCor, movie_SCor, /* Standard deviation types. */ user_d. UVsdp, user_d. UVsds, user_Vsdp_Usdp, user_Vsds_Usds, movie_d. MNsdp, movie_d. MNsds, movie_Nsdp_Msdp, movie_Nsds_Msds}; /* following structure def used to generically describe * internal or standard deviation pruning methods. */ struct pruning { bool enabled; bool weight; double threshold; double exponent; }; /* following structure def used to encapsulate parameters * which configure the external pruning routines. */ struct pruning_parameters { unsigned long int mstrt, ustrt; double mstrt_mult, ustrt_mult, TSa, TSb, Tdvp, Tdvs, Tvdp, Tvds, TD, TP, PPm, TV, TSD, Ch, Ct; }; /* following structure definition is used to encapsulate * information for the external pruning routines. */ enum prune_type { User. Prune, User. Fast. Prune, User. Common. Co. Support. Prune, Movie. Fast. Prune, Movie. Common. Co. Support. Prune}; struct external_prune {bool enabled; enum prune_type method; struct pruning_parameters params; }; /* Name of the prediction configuration. */ char *name; /* First section of variables affect global prediction * parameters at level of mpp-user. C file. Subsequent * var sections will provide params specific to either * user or movie voting. */ bool user_voting, movie_voting; double user_vote_weight; /* User voting parameters. */ bool user_force_vote_in_Voter_Loop, user_force_vote_after_Voter_Loop; bool user_reset_support; bool user_boundary_override; double user_facz, user_thrz; /* Internal vote pruning. */ struct pruning dv. Corp, dv. Cors, vd. Corp, vd. Cors, p. Cor, d. Cor, s. Cor; /* Standard deviation pruning. */ struct pruning d. UVsdp, d. UVsds, Vsdp_Usdp, Vsds_Usds; struct external_prune Prune_Movies_in_Sup. U, Prune_Users_in_Sup. M, Prune_Movies_in_Co. Sup. UV; /* Movie voting parameters. */ bool movie_force_vote_in_Voter_Loop, movie_force_vote_outside_Voter_Loop; bool movie_reset_support; bool movie_boundary_override; double movie_facz, movie_thrz; /* Internal vote pruning. */ struct pruning DVCorp, DVCors, VDCorp, VDCors, PCor, DCor, SCor; /* Standard deviation pruning. */ struct pruning d. MNsdp, d. MNsds, Nsdp_Msdp, Nsds_Msds; struct external_prune Movie_Prune_Users_in_Sup. M, Movie_Prune_Movies_in_Sup. U, Movie_Prune_Users_in_Co. Sup. MN; public: /* Void constructor. */ Prediction. Config(void); /* Destructor. */ ~Prediction. Config(void); /* Inline public methods to return type of voting. */ inline bool do_user_voting(void) {return user_voting; } inline bool do_movie_voting(void) {return movie_voting; } /* Inline public method to access user vote weight. */ inline double get_user_vote_weight(void) {return user_vote_weight; } /* Inline public methods to determine whether user and * movie support be reset after initial external pruning */ inline bool reset_user_support(void) {return user_reset_support; } inline bool reset_movie_support(void) {return movie_reset_support; } /* Inline public methods to return status of vote forcing. */ inline bool user_vote_force_in_loop(void) { return user_force_vote_in_Voter_Loop; } inline bool user_vote_force_after_loop(void) { return user_force_vote_after_Voter_Loop; } inline bool movie_vote_force_in_loop(void) { return movie_force_vote_in_Voter_Loop; } inline bool movie_vote_force_after_loop(void) { return movie_force_vote_outside_Voter_Loop; } /* Inline accessor functions for returning external pruning conf. */ inline struct external_prune *get_user_Prune_Movies_in_Sup. U(void) { return &Prune_Movies_in_Sup. U; } inline struct external_prune *get_user_Prune_Users_in_Sup. M(void) { return &Prune_Users_in_Sup. M; } inline struct external_prune *get_user_Prune_Movies_in_Co. Sup. UV(void){ return &Prune_Movies_in_Co. Sup. UV; } inline struct external_prune *get_movie_Prune_Users_in_Sup. M(void) { return &Movie_Prune_Users_in_Sup. M; } inline struct external_prune *get_movie_Prune_Movies_in_Sup. U(void) { return &Movie_Prune_Movies_in_Sup. U; } inline struct external_prune *get_movie_Prune_Users_in_Co. Sup. MN(void){ return &Movie_Prune_Users_in_Co. Sup. MN; } /* Public method to read a configuration file. */ bool read_config(const char *); /*Public accessor for returning ptr to structure of internal prune. */ struct pruning *get_internal_prune(enum internal_pruning); /* Public method to print out a configuration. */ void print(FILE *); }; #endif

User. Set. C /* System include files. */ #include <limits. h> /* Local include

User. Set. C /* System include files. */ #include <limits. h> /* Local include files. */ #include "User. Set. H" /* Variables static to this module. */ /* No arguement constructor. */ User. Set: : User. Set(void) : ptree_set() { id_numbers = NULL; return; } /* Destructor. */ User. Set: : ~User. Set(void) { if ( id_numbers != NULL ) free(id_numbers); return; } /** * Public method. * Calculates the rating a user provided for a movie. * param user The index number of the user. * param movie The index number of the movie. * return Rating number is returned to the caller. */ double User. Set: : get_rating(unsigned long int user_index, unsigned long int movie_index) { auto int rating = 0, val = 1; auto size_t slot = user_index * 3; for (int tree= 2; tree >= 0; --tree) { if(ptree_set[slot+tree]. is_set(movie_index)) rating+=val; val<<=1; } return rating; } /* Public method. * Calculates mean rating of a group of movies by user * Specification of grp of movies is provided by bit mask * in the PTree supplied as an arguement to this method. * param user_index Index number of the user for whom * the mean is being predicted. * param cosupport Mask specifying group of movies for * mean is to be calculated. * return Rating number returned to caller. */ double User. Set: : get_mean(unsigned long int user_index, PTree &cosupport) { auto size_t slot = user_index * 3; auto double mean = 0; auto PTree bitcolumn; /* Iterate over the three bit positions which represent * movie ratings. Multiply number of 1 bits in * bit column by bitvalue of the tree. */ for (int tree=2, bit=0; tree>=0; --tree, ++bit) { bitcolumn = cosupport & ptree_set[slot+tree]; mean += bitcolumn. get_count()*pow(2. 0, bit); } /* Divide by the number of movies in the cosupport list to * complete the mean. */ return mean / cosupport. get_count(); } /* Public method returns a PTree describing the set * of movies which a user has rated. */ PTree User. Set: : get_movies(unsigned long int index) { auto size_t slot = index * 3; return ptree_set[slot] | ptree_set[slot+1] | ptree_set[slot+2]; } /* Public method. * This method converts a user identity number into the index value * used to reference the PTree's corresponding to this user. */ unsigned long int User. Set: : get_index(unsigned long int id_number) { auto unsigned long int identities = ptree_set. size() / 3; for (unsigned long int lp= 0; lp < identities; ++lp) if ( id_numbers[lp] == id_number) return lp; return 0; } /*Public method used to obtain pointers to rating PTree's of given * user. Each user has 3 associated PTree's corresponding to one of * the three bits used to represent movie ratings. The zeroth PTree * represents the high order bit of the rating value. * param Index number of user whose rating is to be returned. * param Bit position PTree to be returned. * return NULL is returned if an invalid PTree is requested. * Else requested PTree is returned to the caller. */ PTree User. Set: : get_ptree(unsigned long int user_index, int bit) { auto size_t slot = user_index * 3; if ( slot >= ptree_set. size() ) return NULL; return ptree_set[slot + bit]; } /*Public method. * This method sets up an array containing the numerical value of * user identity numbers corresponding to the PTree slots. * return A boolean return value is used to indicate success * or failure of the load. */ bool User. Set: : load_identities(void) { auto size_t cnt = 0, number_of_identities; auto char *p, bufr[PATH_MAX]; auto FILE *input;

User. Set. C page 2 /*Public method. * sets up an array containing the

User. Set. C page 2 /*Public method. * sets up an array containing the numerical value of * user identity numbers corresponding to the PTree slots. * return Boolean return is used to indicate success * or failure of the load. */ bool User. Set: : load_identities(void) { auto size_t cnt = 0, number_of_identities; auto char *p, bufr[PATH_MAX]; auto FILE *input; /* PTree. Set must be loaded. */ if ( ptree_set. size() == 0 ) return false; /* Allocate an array of integers to hold identities*/ number_of_identities = ptree_set. size() / 3; id_numbers=(unsigned long int *)malloc(number_of_identities* sizeof(unsigned long int)); if ( id_numbers == NULL ) return false; /* Read file and convert identities to integers. */ snprintf(bufr, sizeof(bufr), "%s/mpred-data/nf_mv_us_pt/user-attributes. txt", PTREEDATA); input = fopen(bufr, "r"); if ( input == NULL ) return false; while ( !feof(input) ) { if(fgets(bufr, sizeof(bufr), input)==NULL) return false; if ((p=strrchr(bufr, 'n')) != NULL) *p = ''; id_numbers[cnt++] = strtoul(bufr, NULL, 10); if ( cnt == number_of_identities ) return true; } } if ( cnt != number_of_identities ) return false; return true; /* Public method. * This method dumps all component PTree's of set of moves. * param output Output descriptor where the PTree's are * to be directed. */ void User. Set: : dump(FILE *output) { for (int lp= 0; lp < ptree_set. size(); ++lp) ptree_set[lp]. dump(output); return; } for (unsigned long int lp= 0; lp < identities; ++lp) fprintf(output, "%lu -> %lun", lp, id_numbers[lp]); return; } User. Set. H #if !defined(USERSET_H) #define USERSET_H /* Standard include files. */ #include <stdio. h> #include <math. h> /* Local include files. */ #include "PTree. Set. H" class User. Set { private: unsigned long int *id_numbers; PTree. Set ptree_set; public: /* Void constructor. */ User. Set(void); /* Constructor to initialize an in-memory tree. */ /* Destructor. */ ~User. Set(void); /* Public inline method to return the identity of a user index*/ unsigned long int get_identity(unsigned long int index) { return id_numbers[index]; } /* Public method to return the index of a user identity. */ unsigned long int get_index(unsigned long int); /* Public method to return the rating of a movie by a user. */ double get_rating(unsigned long int, unsigned long int); /* Public method to return the mean rating of a set of movies. */ double get_mean(unsigned long int, PTree &); /* Public method to return the set of movies rated by a user. */ PTree get_movies(unsigned long int); /* Public method to return rating PTree's. */ PTree get_ptree(unsigned long int, int); /* Public method to load a list of user identities. */ bool load_identities(void); /* Public method to print sparseness of set. */ void dump(FILE *); /* Public method to load a set of PTree's saved in ASCII format*/ bool load(FILE *); /* Public method to load a binary PTree set. */ bool load_binary(void); /* Public method to print index/attribute pairings. */ void print(FILE *); }; #endif

Sample config file # Sample prediction configuration file. # Name of the configuration. name

Sample config file # Sample prediction configuration file. # Name of the configuration. name = default # Allow user voting: enabled or disabled user_voting = disabled # Do movie based voting: enabled or disabled movie_voting = enabled # User vote weighting. Movie vote weighting will be derived from # the value of this variable. user_vote_weight = 0 # User voting configuration. # This section is only processed if user voting is enabled. [user_voting] # The following options specify where and if votes are forced into # their standard range of 1 -5. force_vote_in_Voter_Loop = disabled force_vote_after_Voter_Loop = disabled # The following variable controls whether or not user support is reset # after user pruning is completed. reset_support = disabled # The following variables control Boundary Based prediction overrides. # The parameters are only evaluated if the boundary based method is # enabled. # boundary_override = disabled; # facz = 0 # thrz = 0 # Internal pruning configuration. # One or more of the pruning functions can be enabled. # For each pruning type a default threshold can be set. If not set the # default value indicated below is used. # The third variable selects a vote weighting option. If the weight variant # of the pruning method is enabled the value of u. Cor is set to that value. # Note that the last enabled weight will set u. Cor. # dv. Corp = disabled # dv. Thrp = 0 # dv. Corp. Weight = disabled # dv. Cors = disabled # dv. Thrs = 0 # dv. Cors. Weight = disabled # vd. Corp = disabled # vd. Thrp = 0 # vd. Corp. Weight = disabled # vd. Cors = disabled # vd. Thrs = 0 # vd. Cors. Weight = disabled # p. Cor = disabled # p. Thr = 0 # p. Cor. Weight = disabled # d. Cor = disabled # d. Thr = 0 # d. Cor. Weight = disabled # s. Cor = disabled # s. Thr = 0 # s. Cor. Weight = disabled # Standard deviation pruning. # One of more of the following methods can be selected. The default is # for all these methods to be disabled. # Each pruning method has a threshold and exponent value associated with # it. The defaults values are noted in the definitions below. # d. UVsdp = disabled # d. UVsdp. Thr = 0 # d. UVsdp. Exp = -1 # d. UVsds = disabled # d. UVsds. Thr = 0 # d. UVsds. Exp = -1 # Vsdp_Usdp = disabled # Vsdp_Usdp. Thr = 0 # Vsdp_Usdp. Exp = -1 # Vsds_Usds = disabled # Vsds_Usds. Thr = 0 # Vsds_Usds. Exp = -1

# External pruning configuration # The following section selects the use of any combination

# External pruning configuration # The following section selects the use of any combination of three # pruning functions. By default pruning is disabled. # Each pruning method is encapsulated in its own section. This allows # a pruning configuration to be turned on and off without disturbing # the pruning configuration. # Within each pruning section there are six different methods for # implementing the pruning. These methods are: # User. Prune, User. Fast. Prune, User. Common. Co. Support. Prune # Movie. Prune, Movie. Fast. Prune, Movie. Common. Cosupport. Prune # There a total of 15 parameters which select the configuration of # of the pruning. Default values are noted. Prune_Movies_in_Sup. U = disabled Prune_Users_in_Sup. M = disabled Prune_Movies_in_Co. Sup. UV = disabled # The following options specify where and if votes are forced into # their standard range of 1 -5. force_vote_in_Voter_Loop = disabled force_vote_outside_Voter_Loop = disabled # The following variable controls whether or not user support is reset # after user pruning is completed. reset_support = disabled # Internal pruning configuration. # One or more of the pruning functions can be enabled. # For each pruning type a default threshold can be set. If not set the # default value indicated below is used. # Third variable selects a vote weighting option. If weight variant # of pruning method is enabled the value of u. Cor is set to that value. # Note that the last enabled weight will set u. Cor. # DVCorp = disabled # DVThrp = 0 # DVCorp. Weight = disabled [user_voting Prune_Movies in_Co. Sup. UV] [user_voting Prune_Users_in_Sup. M] method = User. Prune leftside = 0 width = 0 mstrt_mult = 0. 0 ustrt = 0 ustrt_mult = 0. 0 TSa = -100 TSb = -100 Tdvp = -1 Tdvs = -1 Tvdp = -1 Tvds = -1 Sample config file - pg 2 # following variables control Boundary Based prediction overrides. # Parameters are only evaluated if boundary based method is enabled. # boundary_override = disabled; # facz = 0 # thrz = 0 [user_voting Prune_Movies_in_Sup. U] method = User. Prune leftside = 0 width = 0 mstrt_mult = 0. 0 ustrt = 0 ustrt_mult = 0. 0 TSa = -100 TSb = -100 Tdvp = -1 Tdvs = -1 Tvdp = -1 Tvds = -1 TD = -1 TP = -1 PPm =. 1 TV = -1 TSD = -1 Ch = 1 Ct = 2 # Movie voting configuration. # This section is only processed if the # movie_vote variable is set to # enabled in the Default section. [movie_voting] TD = -1 TP = -1 PPm =. 1 TV = -1 TSD = -1 Ch = 1 Ct = 2 method = User. Prune leftside = 0 width = 0 mstrt_mult = 0. 0 ustrt = 0 ustrt_mult = 0. 0 TSa = -100 TSb = -100 Tdvp = -1 Tdvs = -1 Tvdp = -1 Tvds = -1 TD = -1 TP = -1 PPm =. 1 TV = -1 TSD = -1 Ch = 1 Ct = 2 # DVCors = enabled # DVThrs = 0 # DVCors. Weight = disabled # VDCorp = enabled # VDThrp = 0 # VDCorp. Weight = disabled # VDCors = enabled # VDThrs = 0 # VDCors. Weight = disabled # PCor = enabled # PThr = 0 # PCor. Weight = disabled # DCor = enabled # DThr = 0 # DCor. Weight = disabled # SCor = enabled # SThr = 0 # SCor. Weight = disabled

# Standard deviation pruning. # One of more of the following methods can be

# Standard deviation pruning. # One of more of the following methods can be selected. The default is # for all these methods to be disabled. # Each pruning method has a threshold and exponent value associated with # it. The defaults values are noted in the definitions below. # d. MNsdp = disabled # d. MNsdp. Thr = 0 # d. MNsdp. Exp = -1 # d. MNsds = disabled # d. MNsds. Thr = 0 # d. MNsds. Exp = -1 # Nsdp_Msdp = enabled # Nsdp_Msdp. Thr = 0 # Nsdp_Msdp. Exp = -1 # Nsds_Msds = enabled # Nsds_Msds. Thr = 0 # Nsds_Msds. Exp = -1 # External pruning configuration # The following section selects the use of any combination of three # pruning functions. By default pruning is disabled. # Each pruning method is encapsulated in its own section. This allows # a pruning configuration to be turned on and off without disturbing # the pruning configuration. # Within each pruning section there are six different methods for # implementing the pruning. These methods are: # User. Prune, User. Fast. Prune, User. Common. Co. Support. Prune # Movie. Prune, Movie. Fast. Prune, Movie. Common. Co. Support. Prune # There a total of 15 parameters which select the configuration of # of the pruning. Default values are noted. Prune_Users_in_Sup. M = disabled Prune_Movies_in_Sup. U = enabled Prune_Users_in_Co. Sup. MN = enabled [movie_voting Prune_Users_in_Sup. M] method = User. Prune leftside = 0 width = 0 mstrt_mult = 0. 0 ustrt = 0 ustrt_mult = 0. 0 TSa = -100 TSb = -100 Tdvp = -1 Tdvs = -1 Tvdp = -1 Tvds = -1 TD = -1 TP = -1 PPm =. 1 TV = -1 TSD = -1 Ch = 1 Ct = 2 Sample config file - pg 3 [movie_voting Prune_Movies_in_Sup. U] method = Movie. Common. Co. Support. Prune leftside = +40 width = 10 mstrt = 0 mstrt_mult = 0. 0 ustrt = 0 [movie_voting Prune_Users_in_Co. Sup. MN] ustrt_mult = 0. 0 method = User. Common. Co. Support. Prune TSa = -100 leftside = 0 TSb = -100 width = 8000 Tdvp = -1 mstrt = 0 Tdvs = -1 Tvdp = -1 mstrt_mult = 0. 0 Tvds = -1 ustrt = 0 TD = -1 ustrt_mult = 0. 0 TP = -1 TSa = -100 PPm =. 1 TSb = -100 TV = -1 Tdvp = -1 TSD = -1 Tdvs = -1 Ch = 1 Tvdp = -1 Ct = 2 Tvds = -1 TD = -1 TP = -1 PPm =. 1 TV = -1 TSD = -1 Ch = 1