UNIVERSITY OF JYVSKYL Yevgeniy Ivanchenko University of Jyvskyl

UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko University of Jyväskylä yeivanch@cc. jyu. fi 2004

UNIVERSITY OF JYVÄSKYLÄ OBJECTIVES (I) • Since nothing is known about decision mechanism of

UNIVERSITY OF JYVÄSKYLÄ OBJECTIVES (II) • To understand behavior of Neuro. Search data analysis

UNIVERSITY OF JYVÄSKYLÄ OBJECTIVES (III) • If we know the inner structure of decision

UNIVERSITY OF JYVÄSKYLÄ SOM (I) • SOM is neural network model that maps high

UNIVERSITY OF JYVÄSKYLÄ SOM (II) • Usually SOM represents itself either hexagonal or rectangular

UNIVERSITY OF JYVÄSKYLÄ SOM (III) • In the figure one can see that the

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (I) • Neuro. Search can be considered as the

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (II) • To perform the analysis we used Component

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (III) • • • The figure shows U-matrix (the

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (IV) • After the analysis it was found that

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (V) • Further investigation of the algorithm is based

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (VI) • After investigation of the algorithm for the

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (VII) The table shows efficiency of four algorithms. One

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (I) • • • Since RBA is based on

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (II) To make qualitative evaluation of performance, RBA was

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (III) • Analyzing behavior of the algorithms in static

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (IV) Number of replies and amount of used packets

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (V) Total number of located resources and used packets

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (VI) • The algorithms used less packets in dynamic

UNIVERSITY OF JYVÄSKYLÄ FUTURE WORK • Developing the supervised approach to train Neuro. Search.

Slides: 22

Download presentation

UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko University of Jyväskylä yeivanch@cc. jyu. fi 2004

UNIVERSITY OF JYVÄSKYLÄ OBJECTIVES (I) • Since nothing is known about decision mechanism of Neuro. Search we need to look inside the algorithm to understand its behavior. • Since nothing is known about behavior of Neuro. Search algorithm in dynamic environment, we need to know its behavior under conditions that are approximated to real life situation. 2004

UNIVERSITY OF JYVÄSKYLÄ OBJECTIVES (II) • To understand behavior of Neuro. Search data analysis techniques were used. The Self-Organizing Maps (SOM) is well known tool to perform data mining task. • Set of rules was obtained based on the analysis of Neuro. Search. The rules were tested in static environment. The question that arises here: Is it possible to use the algorithm, which utilized properties of static environment, in dynamic scenario? 2004

UNIVERSITY OF JYVÄSKYLÄ OBJECTIVES (III) • If we know the inner structure of decision mechanism of Neuro. Search we will be able to tell about contribution of every input to particular decision of the algorithm. This for example can be used to remove unnecessary input information. • This also can help evaluate complexity and robustness of the algorithm. 2004

UNIVERSITY OF JYVÄSKYLÄ SOM (I) • SOM is neural network model that maps high dimensional space onto low-dimensional space (usually two dimensional). • After using SOM algorithm similar vectors from the input space are located near each other in the output space. This can help investigate properties of obtained clusters and as a consequence causes that produced these clusters on the output map. 2004

UNIVERSITY OF JYVÄSKYLÄ SOM (II) • Usually SOM represents itself either hexagonal or rectangular grid of neurons. In the figure R 1 and R 2 denote different neighborhood size. • During the training process size of neighborhood is slightly decreased to provide more accurate adjustment of the weights of the neurons. 2004 R 2 R 1

UNIVERSITY OF JYVÄSKYLÄ SOM (III) • In the figure one can see that the neurons that are ‘covered’ by neighborhood kernel function move closer to the input vector. • Best Matching Unit (BMU) is the closest neuron to the current input vector. • The weights of the neurons are updated according to the kernel function and the distance to BMU. 2004 BMU

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (I) • Neuro. Search can be considered as the main part of information model of the system. To build this system black box method was used: we are modeling external behavior of the system and at the same time we don’t know what are the causes of particular behavior of the system. • To investigate decision mechanism of Neuro. Search analysis of input-output pairs was done using SOM. 2004

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (II) • To perform the analysis we used Component plane & U-matrix with ‘hit’ distribution on it. Component plane visualizes values of all components of the vectors according to the output map. U-matrix is one of possible ways to visualize the output map. The ‘hits’ on the U-matrix correspond to the decisions of Neuro. Search. • This approach allows us investigating not only contribution of each component to particular decision, but also the correlations between components. 2004

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (III) • • • The figure shows U-matrix (the left side of the figure) & fragment of Component plane (the right side of the figure). It is easy to see variable From is responsible for stopping further forwarding of the queries where it is 1. Other variables have different values in the area where From is 1, for example variable to. Unsearched. Neighbors has different values in this area. 2004 U-matrix to. Unsearched. Neighbors From

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (IV) • After the analysis it was found that 4 variables (From, to. Visited, Sent and current. Visited) are responsible for stopping further forwarding of the queries. • Variables to. Unsearched. Neighbors and Neighbors are correlated. • Variables packets. Now and Hops are highly correlated. • Variables from. Neighbor. Amount, packets. Now and Hops are correlated somehow. • Neuro. Search mostly doesn’t send the queries further if Neighbors or to. Unsearched. Neighbors is small. 2004

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (V) • Further investigation of the algorithm is based on Hops because only this variable shows the state of the algorithm in particular time interval, in other words analyzing intervals of this variable we can monitor the queries through their path. • The maximum length of the queries’ path is 7. Thus we have 7 different cases to analyze. • Data for each case contains only samples with the currently investigating value of Hops variable. All samples where at least one of From, Sent, current. Visted or to. Visited variables is equal to 1 were removed as well. It is because we already know behavior of the algorithm in these areas. 2004

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (VI) • After investigation of the algorithm for the different values of Hops we have produced Rule Based Algorithm (RBA). RBA is based on rules that were extracted using analysis of U-matrix and corresponding component plane. • General strategy of the algorithm is quite simple: A decision is mostly based on interconnection between Hops, Neighbors/to. Unsearched. Neighbors and Neighbors. Order values. In the beginning the algorithm sends the queries to the most connected nodes. When number of hops in the query is increasing Neuro. Search slightly starts to forward the queries to low-connected nodes. 2004

UNIVERSITY OF JYVÄSKYLÄ DATA ANALYSIS (VII) The table shows efficiency of four algorithms. One can see that Neuro. Search and RBA have almost the same level of performance. This means that RBA adapted behavior of Neuro. Search and we can say that SOM suits well for analyzing of Neuro. Search. Both these algorithms have better performance compared to BFS 2 and BFS 3. Comparison between algorithms 2004 Algorithm Packets Replies BFS-2 3000 619 BFS-3 12464 1325 Neuro. Search 4703 979 RBA 4904 963

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (I) • • • Since RBA is based on decision mechanism of Neuro. Search it is possible to evaluate behavior of Neuro. Search using RBA in dynamic environment. As a simulation environment P 2 P extension for NS-2 was built. The environment provides quite high dynamical changes. There are two different classes of probabilities that define dynamical changes in the network. The first class is defined randomly before starting the simulation. The second is defined by the formulas: 2004

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (II) To make qualitative evaluation of performance, RBA was compared to BFS 2 and BFS 3 in static and dynamic environments. Number of replies and amount of used packets in static environment are shown in the figures: 2004

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (III) • Analyzing behavior of the algorithms in static environment one can see that mostly RBA locates more resources than BFS 2 and significantly less than BFS 3. • In general RBA uses more packets than BFS 2 and significantly less than BFS 3. • This situation satisfies us because RBA is based on Neuro. Search’s decision mechanism that is trained to locate only half of available resources. • In some points RBA locates more resources than BFS 3 algorithm and in the same time uses less packets. This means that if some resource isn’t common in the network, RBA and as a consequence Neuro. Search can find enough instances of this resource. 2004

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (IV) Number of replies and amount of used packets in dynamic environment are shown in the figures: Analyzing the figures one can see that performance of the algorithms didn’t suffer so much in the dynamic environment. 2004

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (V) Total number of located resources and used packets in static and dynamic environment are shown in the table: Algorithm Packets Replies Static dynamic static dynamic BFS 2 3000 2515 619 528 BFS 3 12464 10040 1325 1245 RBA 4904 4865 963 900 The algorithms still can find enough resources in dynamic environment. There are two possible causes that can explain the fact that all investigated algorithms found a little bit fewer resources: 1) Some nodes in offline mode could contain queried resources. 2) Some nodes in offline mode could lie on possible path of the query. 2004

UNIVERSITY OF JYVÄSKYLÄ DYNAMIC ENVIRONMENT (VI) • The algorithms used less packets in dynamic environment than in static environment. • BFS strategy is very sensitive to the size of the network, because BFS based algorithms used significantly less packets in dynamic environment where size of the network was smaller all the simulation time. • RBA used approximately the same amount of packets in both environments. Therefore we can say that RBA is not strongly sensitive to the size of the network. 2004

UNIVERSITY OF JYVÄSKYLÄ FUTURE WORK • Developing the supervised approach to train Neuro. Search. • Developing modification of the algorithm for ad hoc wireless P 2 P networks. • Paying more detailed and deeper attention to the inner structure of the algorithm, using knowledge discovery methods. • Investigating and utilizing properties of other P 2 P algorithms to answer to the question about adding these properties to Neuro. Search. 2004

UNIVERSITY OF JYVÄSKYLÄ Thank you! 2004