- Slides: 39
MUSIC RECOMMENDATION SYSTEM FOR LAST. FM DATASET
Why music recommendation system is required?
What is a data mining ? � Data mining , which can be called data or knowledge discovery, is the process of analyzing data from different perspectives and summarizing it into useful information. http: //www. anderson. ucla. edu/faculty/jason. frand/teacher/technologies/palace/datamining. htm http: //www. headsafrica. com/headsafrica/application/views/services/client/zf_files/images/data_mining/dat a_mining. jpg
Data mining Modelling Clustering Items are grouped for their similar specification in this method. It is consider the similarities of data among themselves Classification It is very common technique for predicting some interests. It may refer to categorization data items. Unclassified cases are predicted as any class label group according to other classified label class Association Existing records in the database by examining their relationship with each other, it is a technique that determines which events occur together simultaneously
What is recommendation engine? � Recommendation system is described as system which interprets data that users entered the system and makes recommendation to users.
Recommendation Techniques q Content-based Filtering The salient features of any contents which were liked or watched previously by users are saved in mostly databases and new profile is created for users. While making recommendation, the content that belongs to nearest feature from the sets of property previously created is recommended with looking at this profile. https: //www. ntt-review. jp/archive_html/200804/images/le 1_fig 02. gif
Recommendation Techniques q Collaborative Filtering This constitutes the foundation of “The one loving one loves the alike” approaches. It is not depending on the one user's content- property profile, while making recommendation bearing in mind that users who like the similar content properties or users with similar characteristics. http: //www. bridgewell. com/images_en/ec_03. jpg
Recommendation Techniques v Collaborative Filtering Types Ø User-based recommendation: This technique finds the similar users and recommends item. Ø Item-based recommendation: The similarity of items is calculated and items are recommended. http: //oytunyuksel. com/wp-content/uploads/post-02 -01. jpg
How to be created recommendation engine ?
How to be created recommendation engine ? � When the recommendation engine is created, the following steps should be implemented. Ø The definition of data representation Ø The creation of database or file model structure Ø Making data pre-processing for getting the best result http: //www. w 3. org/WAI/TIDE/phases. gif
What is an Apache Mahout ? � It is a Java library of scalable machine-learning algorithms, implemented on top of Apache Hadoop and using the Map. Reduce paradigm. For using Mahout in project: � Download the latest Mahout release is 0. 8 It can be accessed from the link below http: //apache. fastbull. org/mahout/0. 8/mahout-distribution-0. 8. zip � � Extract all the libraries and include them in a new Eclipse (Net. Beans) project as external JAR file. � Java 1. 6. x or greater is required for installation � Hadoop is not mandatory to create recommendation engine. http: //hortonworks. com/hadoop/mahout/ http: //hortonworks. com/wp-content/uploads/2013/09/mantle-mahout. png
How to use Mahout for recommendation? � The recommendation in Mahout follows these steps: ü The dataset is adjusted for Mahout-compliant ü The compatible recommender component is chosen ü The similarity calculations are computing according to rating or preferences ü The recommendation is evaluated
Recommender job flow The main step doing the heavy lifting in the workflow is the "calculate cooccurrences" step. This step is responsible for doing pairwise comparisons across the entire matrix, looking for commonalities. http: //www. ibm. com/developerworks/library/j-mahout-
The background process of recommendation in architecture
Graduation Project with Last. fm � Scheduling
Graduation Project with Last. fm � Gannt chart
Graduation Project with Last. fm � What is important risks ? v. Big-Data v Time v Computer performance v Sparsity http: //www. pm-primer. com/wp-content/uploads/2012/04/risk 1. jpg
Music recommendation project for Last. fm � The dataset of « Last. fm Dataset-1 K users » is used in project. This dataset has information about user properties and which songs are listened by which users. � This dataset 2 files, one of them is users’ profile and other one contains users’ musical history. � There are 1000 users and 19, 150, 868 lines musical history which belongs to 1000 -users.
Music recommendation project for Last. fm � Last. fm API is used and new csv format is created. Although there are 1000 users, during to project period 700 users' files with desired properties were prepared due to time constraints. � After preparing files, all files were saved on database tables for the sake of easy data processing, the tables: � Artists Users Tracks User. Tag. Track. Tags
Music recommendation project for Last. fm � The collaborative filtering method is used. � 2 types of segmentation are considered. ü The one of the recommendation is made between clustering users according to gender, age, country type. ü Other recommendation is made between all users. � User-based recommendation engine is created. � JDBC and File Data Model is used for data representation.
Music recommendation project for Last. fm � To make cluster, Weka is used because of simplicity. All users' characteristics were represented as value. (In thesis page 33 -34) goes …….
Music recommendation project for Last. fm � There are many methods can be used for collaborative filtering : q Mean Squared Differences Algorithm q Vector Similarity q Pearson Correlation Coefficient q Strengths and Weaknesses of Collaborative Filtering Method � Pearson Correlation Similarity algorithm is used for thesis data model. Since it is convenient and gives correct result for huge amount of data.
The functionality of project system
Artists JDBC Model-Database Tables artist id artist name Tracks track id track name artist id published year Track. Tags tag id tag name Users user id user name gender age country User. Tag. Track usertagtrack id user id track id tag id preferences v It is a general database (default), all files or other databases are created from this.
Recommendation Model Pref. User. Tag user id tag id sum (preferences) track id sum (preferences) tag id sum (preferences) Pref. User. Track user id Pref. Tag. Track track id v In JDBCData. Model, primary keys must be defined because of time efficiency. The database format should be:
Number of elements in tables v The name of tables begins with «Pref» statement are formatted table for Mahout recommendation functions. v They contain very low data according to User. Tag. Track table.
Number of elements in tables Before the assignment of primary key With primary key, format is shown below: user id tag id sum (preferences)
The introduction of system � After the text file is created via API, standard line of text is shown as follows: user name, artist name, track name, published year, tags user_000103, Super Furry Animals, The Undefeated, 2003, indie, britpop, rock, trumpet, pop � This line represents on User. Tag. Track table: usertagtrackid user id track id tag id preferences 1 user_000103 indie 20 2 user_000103 britpop 20 3 user_000103 rock 20 4 user_000103 trumpet 20 5 user_000103 The Undefeated The Undefeated pop 20
The functions used in the recommendation engine v The working principle of user-based recommendation engine:
Recommendation Results The infinite amount of results can be obtained via evaluator program. In thesis, pages 41 -51 have many results with different conditions. Table Name Pref. User. Tag Neighbourhood Size 2 For User Id 5 # Recommendations 5 Results Tag-Name Recommended. Item[item: 112040, value: 213. 030 76] missjudy 76 Recommended. Item[item: 3387, value: 211. 02057] my 750 essential songs Recommended. Item[item: 8124, value: 194. 43637] lionel richie Recommended. Item[item: 8147, value: 175. 26286] leona lewis
Recommendation Results Table Name Pref. User. Track Neighbourhood Size 2 For User Id 5 # Recommendations 5 Results Track Name Recommended. Item[item: 7064, value: 73. 0] Out Of Control Neighbourhood Size Results Recommended. Item[item: 16570, value: 304. 5] 7 Track Name When You'Re Gone Recommended. Item[item: 7064, value: 73. 0] Out Of Control Recommended. Item[item: 1466, value: 9. 0] Aerodynamic Recommended. Item[item: 7170, value: 5. 0 ] Bring Me To Life Recommended. Item[item: 2969, value: 5. 0] Number Five With A Bullet
How to evaluate results ? � The evaluation of this recommendation engine result is realized with the most common metrics precision and recall. � Precision is calculated with the ratio of relevant items recommended correctly to the number of items recommended. � Recall is the ratio of relevant items recommended correctly to the number of items which are relavent to users. Predicted as positive Predicted as negative Actual Positive Actual Negative TP FP FN TN
How to evaluate results ? � The precision-recall is provided Recommender. IRStats. Evaluator class in Mahout. The evaluate function gives the result of F-measure, precision, recall value of recommendation engine. v Parameters are given this functions, the important parameter is «at» which means that the number of recommendations to consider when evaluating precision o precision at something (integer value)
Evaluation Results Table Name Pref. User. Tag Data Model Structure User-Tag-Preference Row-Column Variable Number # users: 700 , # item: 14044 Neighbourhood Size 2 5 recommendations Precision: 0. 9784243295019155 Recall: 0. 9741058655221752 Table Name Pref. User. Track Data Model Structure User-Track-Preference Row-Column Variable Number # users: 700, # item: 316018 Neighbourhood Size 2 5 recommendations Precision: 0. 033268482490272366 Recall: 0. 005531505532
Evaluation Results Table Name Pref. User. Track Data Model Structure User-Track-Preference Row-Column Variable Number # users: 700, # item: 316018 Neighbourhood Size 3 5 recommendations Precision: 0. 036322463768115994 Recall: 0. 012746512747
The comment of evaluation results � If the number of neighbourhood size increases, the recommendation engine results will be better because of the working principle of similarity function. � User-tag recommendation engine is the better than user-track recommendation engine because of data size and sparsity. � People with similar characteristics are also similar musical tastes. � When the neighbourhood size increases, the number of recommended items increases .
Self-criticism I Ø The creation of data set and data representation took a long time. Thus, ready dataset can be used and this way buys project holder extra time. Ø There are huge amount of data in data model. Scanning all data and making recommendation took a long time because of computer capacity. Thus, I could get a better computer. Ø The out of memory error was the most frequently encountered problems while calculating evaluation result because of low JAVA heap-space in operating system or Java version.
Self-criticism II � Slowness or memory error problems can be solved via using parallel programming. In addition, using server is the another alternative solution for problems. � User-Track Profile results is not good, recommendation engine performance for this model could be increased. If the computer capacity increases, more data can be used for recommendation engine. � http: //d 1 jb 6 zrebfcfrk. cloudfront. net/assets/content/cache/made/65 b 7808 e 1 a 1599 d 2/Think_Bigger, _Make_Better_3_860_484. png http: //thisiscolossal. com/wp-content/uploads/2011/01/better-3 -600 x 337. jpg
Thank you for listening