Applicationoriented tutor recommendation system Wang Weizhe 517030910381 Kuang

TABLE OF CONTENT 01 Assignments 02 Recommendation System 03 Recommendation Algorithm 04 Frontend Function

Assignments Wang Weizhe Cui Shoukang Design and implement the recommend Design frontend. Implements the

Offline • We don’t have the information of tutor, the only thing we have

Online Recommendation Merge the results of different engine and get final 1 User behavior

Recommendation algorithm Based on map Tree-like structure for fields algorithm Based on vector Dimension

近似加速算法（ trade-off） Practicability Costly query Not a tree The graph of fields are exactly

利用t-SNE对feature进行可视化 SJTU FU DAN ECNU TONG JI

Basic functions and pages Search Recommender Realize the basic search of three Given the

Database and Implementation Register, Login The users_information database is established locally, which stores the

Front end design Overall framework • As for the overall style, we used bootstrap:

• Front end design Pure. css is a set of lightweight and responsive

Django Framework Data access layer Models Handle all transactions related to data: how to

Project Directory Structure Final. Project, contains configurations and routes. statics Contains come static resources.

Future work Improve search speed Make use of clustering We can use clustering algorithm

Slides: 25

Download presentation

Application-oriented tutor recommendation system Wang Weizhe： 517030910381 Kuang Yi： 517030910343 Cui Shoukang： 517030910371

TABLE OF CONTENT 01 Assignments 02 Recommendation System 03 Recommendation Algorithm 04 Frontend Function And Frontend Design 05 Django Framework 06 Future Work

01 Assignments

Assignments Wang Weizhe Cui Shoukang Design and implement the recommend Design frontend. Implements the algorithm. function of registration, login, search Build up databases. and collection. Kuang Yi Design the recommend algorithm. Build web server using Django.

02 Recommendation System

User Profile Modeling • How to transform the information users collect into proper user profiles is vital to a recommendation. • According to the information we have, we decide to vectorize paper with the information of field. This method can be better understood by analogy with word 2 vec in NLP. • By doing so, we can represent a paper with • A map/dictionary m(p) = {field 1: weight 1, field 2: weight 2, …} • Or a vector v(p) = [weight 1, weight 2, …] • For vector representation, we need to store the map between dimension and field ids additionally. • Since we have representation of paper, we can get the field representation of author by summing up the fields of all his papers. • Finally, we can get a user profile by summing up the fields of his collected papers and authors.

Offline • We don’t have the information of tutor, the only thing we have is about author(which can be a student) • We get the author information of 21 colleges at home and abroad and use the number of paper as a simple filter to filter some authors out 康奈尔中科大华东师范复旦 CU CUTe ECNU FUDAN 2103818986 2100219712 2100414475 2103229542 华中科技大学伦敦大学麻省理南京大学南洋理北大华南理东南大学上海交大斯坦福中山大学深圳大学清华同济大学东京大学武大浙大 HUST LONDON MIT NJU NTU PKU SCUT SEU SJTU SU SYSU SZU THU TONGJI UT WHU ZJU 2100453429 2101579272 2104904426 2101930586 2102285164 2103048394 2103486486 2104085410 2103723062 2102042269 2102381148 2103397249 2100751172 2100874370 2100269561 2102914960 2101552222

Online Recommendation Merge the results of different engine and get final 1 User behavior Collect some paper, author and affiliation. 2 3 4 Multi-engine recommendation Get user profile Interested field distribution Use different recommend engine to get recommend results

03 Recommendation algorithm

Recommendation algorithm Based on map Tree-like structure for fields algorithm Based on vector Dimension reduction on sparse matrix

Map-based algorithm

近似加速算法（ trade-off） Practicability Costly query Not a tree The graph of fields are exactly not a tree. Therefore, it would be hard for us to find the nearest common ancestor As we use the data from Acemap and we can not load the graph directly due to the reason of scale. We can only get the relation by query which is costly. In fact, the influence of nodes will decay soon as the distance increases. Therefore, we can only consider the neighbor nodes. We need to use an approximated algorithm to accelerate!

Vector-based algorithm • Besides map, we can also use vector to represent user profile. All the authors’ vectors form a (n, k) sparse matrix M, where n is the number of authors and k is the number of fields. • It would spare us great amount of time and space to store M and calculate the similarity. Hence, we use Auto. Encoder, an unsupervised learning algorithm to reduce the dimensionality of the data and get a new feature matrix M’ with shape (n, d), where d is the target dimension. • With this method, there is no need for us to design the weights of different fields. Instead, machine will learn it automatically.

利用t-SNE对feature进行可视化 SJTU FU DAN ECNU TONG JI

04 Front Function and Front End Design

Basic functions and pages Search Recommender Realize the basic search of three Given the landmark 21 colleges item: papers, scholars and universities, recommend institutions. tutors for users after selecting colleges and universities. Register, Login Collection Realize the registration and login You can collect your favorite of users, and can record the basic papers, scholars and institutions information of users. and display the collected information on your home page.

Database and Implementation Register, Login The users_information database is established locally, which stores the basic information entered by the user during registration. when logging in, the backend compares the input name and password with the database for verification. In addition, relevant js functions are also written format Searching The acemap database provided by the teaching assistant was connected, including 188942513 papers, 91016667 authors and 25669 affiliation. Using mysql query statement and fuzzy search to realize information query. verification. Collection The collection button constructs the information of this item into json data format and passes it to js function, which then passes it to the back end. We built three user_item databases in advance, and the back-end obtained information was verified and then stored into the relevant databases. Recommender The front end gives 21 schools. After selecting a school, the back end obtains the school information and executes the recommendation algorithm. Finally, the recommendation results are displayed on the results page. Relevant databases have already been mentioned earlier.

Front end design Overall framework • As for the overall style, we used bootstrap: bootstrap. css Charts • Our distribution chart uses the pattern of echart and carousel. css Coordinate the use of various class for quotes echart. js. Set rose. Type to 'radius' to build a rose typesetting. Such as the above card, as well as the title, view. Visual. Map, a visual mapping component, is search box, etc. introduced for visual coding.

• Front end design Pure. css is a set of lightweight and responsive pure css modules produced by yahoo company in the united states, which are applicable to any Web project. We have adopted some of these modules and implemented the UI framework for front-end information display.

05 Django Framework

Django Framework Data access layer Models Handle all transactions related to data: how to access, how to verify validity, what actions are included, and the relationship between data, etc. Views Django Business logic layer Relevant logic for accessing models and retrieving appropriate templates. The bridge between the model and the template. Templates Presentation layer Handle performance-related decisions: how to display on pages or other types of documents.

Project Directory Structure Final. Project, contains configurations and routes. statics Contains come static resources. Recommend Sys App, contains models, forms and views. templates Contains some templates used to show information.

06 Future Work

Future work Improve search speed Make use of clustering We can use clustering algorithm like K-means and EM algorithm to cluster the data. Then, we will classify the user at first and use this information to filter lots of authors Distance function In this project, we try Manhattan distance, Euclidean distance and cosine distance. Due to the lack of feedback, we don’t know which kind of distance metric would be better. In the future, we may use metric learning method like RCA to learn a practical metric The number of papers in acemap database is as high as 200 million, and the search is slow. Consider adding some restrictions before searching for optimization. More recommendation algorithm Till now, we have only two recommend engines. More engines may give more reasonable recommendation. Make use of multithread Django dev server are single-thread. Therefore, the recommendation of different recommendation engines is serial, which takes a lot of time. After we deploy our system on apache server, we can use multithread to accelerate it.

Thanks!