Machine Learning Approach to Report Prioritization with an























- Slides: 23
Machine Learning Approach to Report Prioritization with an Application to Travel Time Dissemination Piotr Szczurek Bo Xu Jie Lin Ouri Wolfson
Agenda • • • Background Model and Problem Definition Machine Learning Approach Application – Travel Time Dissemination Results Conclusion
Background • Technology in vehicles – Computers – GPS – Communication devices (802. 11 p, C 2 C) • Sensing of environment – Video cameras – GPS – Temperature – Automobile status: break sensors, accelerometers
Background • Dissemination of information – Limited by connectivity and bandwidth • Store-and-forward communication – Information is stored in a local database of limited size – Addresses connectivity issues • Prioritization – Not all information may be communicated – Not all information may be stored – Need to select most useful information to be kept and communicated
Model and Problem Definition • System: – Set of mobile nodes: physical entity capable of data computation, storage, and short range wireless communication – Nodes observe environment through sensing device (e. g. GPS) • Reports: – Data derived from the sensing device – Fixed set of attributes and their values. • (all reports have fixed size) – Created over time by nodes – Examples: • Speed report (average speed, timestamp, vehicle id) • Parking space report (parking meter id, availability) – Once created, stored in report database
Model and Problem Definition • Report database – Local database maintained by each node – Limited in size • Communication – Reports stored in the database are communicated over time to a subset of other nodes in the network – Broadcast communication: reports are sent to all nodes within transmission range – Communication protocol: decides when and how many reports to broadcast – Remaining question: which reports should be broadcast?
Model and Problem Definition • Relevance value – Utility a report holds when it would be sent to other nodes, given the sending node’s current characteristics and the attribute values of the report – Highly application specific, difficult to specify – Value of a report can change over time – Can be a range of values (0. . 1) or Boolean (0 or 1) – Example: parking space availability (0 for occupied, 1 for not occupied) • What to broadcast and keep in report database? – Find relevance value for each report and keep (or broadcast) the highest valued reports • Problem: finding the relevance value of a report
Machine Learning Approach • Idea: use received reports as input to a machine learning process • Assumptions: – Nodes can judge the relevance of a report after it is received – Relevance value is based on a goal common to all nodes • Method description: – Define a goal on which the relevance value is based – Relevance value of a report is then defined based on how close the report achieves the goal – For every incoming report, use report’s attributes and sender’s characteristics as input values. Use relevance value as output. This creates a training example. – Use a supervised machine learning algorithm to find a model for mapping inputs to outputs. – Use learned models to find the relevance value of a report
Machine Learning Approach • Two ways of learning: – Online: models are updated while training – Offline: • First, collect training examples • Second, use learned model • Offline learning – – Advantage: Nodes do not incur overhead of learning Disadvantage: model is not adaptable Can also be used to bootstrap online learning Used for finding useful attributes • Research questions: – Can the relevance value be learned? – What advantage does the learned model offer?
Application – Travel Time Dissemination • • Assume every vehicle in system carries GPS, on-board computer with communication capabilities (e. g. 802. 11 b) Each vehicle has a known destination to which it travels along the shortest path Vehicles measure travel times on road segments as they traverse them Travel times are encapsulated by reports. Each report contains: • • Report ID Road segment ID Travel time Time of measurement Reports are stored in reports database of a limited size (200 reports) Reports database is a list of all received or generated reports. List is ranked by ranking function. If database size is exceeded, lowest ranked report is discarded. Example of ranking function: r=1/age. Of. Report
Application – Travel Time Dissemination • Reports are disseminated over VANET • Incoming and newly generated reports are used to update a digital map • Digital map contains: – – Road segment identifier Coordinates of the segment endpoints Road type Travel time estimate (average of all reports for latest time interval; initially free-flow) – List of reports used for the estimate – Time period number (indicates 5 -minute interval; initially -1)
Application – Travel Time Dissemination • Travel time updates – Executed at end of each 5 -minute interval – All reports generated or received within that interval are used – For each road segment in digital map: • Reports for the most current period are identified. All others are discarded. • Report period number is then compared with that in digital map: >: Time period is updated and all reports are inserted in list. Travel time estimate is average of all inserted reports. <: All reports are discarded =: All reports are inserted in list; duplicates are discarded. Travel time estimate is average of all inserted reports. • After each update, vehicles recalculate the shortest path to their destination
Application – Travel Time Dissemination • Communication – – Based on Traffic. Info algorithm Combination flooding/periodic broadcasting Flooding for freshly created reports Periodic broadcasting of subset of reports from report database. Subset is chosen based on ranking function. Highest K ranked reports are chosen. – Size of subset (K) is determined by Good Citizen Formula. • Based on transmission range, node density, and last broadcast time – Broadcast period is determined based on transmission range and vehicle velocity
Application – Travel Time Dissemination • Example: 1. Vehicle A just traversed road segment 123 at time of 1: 04 pm (time period 2). The recorded travel time was 10 minutes. Vehicle A creates a report with ID 1, using the measured travel time. Report contains: 2. • • 3. Report ID (1) Road segment ID (123) Travel time (10 minutes) Time of measurement (1: 04 pm) Vehicle A updates its digital map at 1: 05 pm. Currently, it holds no reports for segment 1. The following changes are applied for road segment 123: • • • Travel time estimate = 10 minutes List of reports: [report 1] Time period = 2
Application – Travel Time Dissemination • Example (continued): 4. Vehicle A broadcasts report 1 at 1: 06 pm (in time period 3). 5. Vehicle B receives the report. 6. Vehicle B updates its digital map at 1: 10 pm. It currently has one report (report 2) for segment 123, with travel time of 11 minutes, for time period 2. The following changes are applied for road segment 123: • • Travel time estimate = 10. 5 minutes List of reports: [report 1], [report 2]
Application – Travel Time Dissemination • Learning the ranking function (offline) – Goal of application: vehicles choose the best (shortest) paths to their destinations – Relevance of a report: report is good when it changes the shortest path • 0 if report does not change path • 1 if report changes shortest path – Attributes: • Age of report in time periods • Distance to road segment in terms of free-flow travel time form vehicle’s current position to the road segment contained in report • Road type: either highway or city street
Application – Travel Time Dissemination • Learning (continued) – Learning examples created artificially by emulating different scenarios – 25 learning epochs: • Each epoch had vehicles placed randomly on a road network (region of Chicago) • Random destination for each vehicle • All vehicles have digital map with one report containing free-flow travel time and random period number between 0 and 100 • Random segment is chosen from the road network. Its travel time is chosen from a uniform distribution between 0 and free-flow travel time • 101 reports are created for each vehicle with ages 0. . 100 • Each report is about the chosen road segment and contains the assigned travel time • Every vehicle applies the 101 reports independently. After each is applied it is checked whether the shortest path would change. • If report would change path, a positive training example is created; otherwise a negative training example is created – Two road networks were used (from different regions of Chicago). On smaller region, 100 vehicles were used; 250 was used for the larger region.
Application – Travel Time Dissemination • Learning (continued) – – Weka learning toolkit was used for learning Negative examples were downsampled to match positives 7677 positive and 7677 negative examples 5 classifiers were tested: • Naïve Bayesian (Naive. Bayes Weka implementation) • Logistic Regression (using Logistic Weka implementation) • Support Vector Machines (using SMO Weka implementation, w/ build. Logistic. Models enabled) • Artificial Neural Network (using Multilayer Perceptron Weka implementation) • Decision Tree (using J 48 Weka implementation)
Results • 10 -fold cross validation • All algorithms, with exception of decision trees, performed similarly with an accuracy of approximately 83 -84% • The decision trees had the best accuracy of 96. 22% – But unusable model: complex tree with most leaf nodes being homogeneous • Logistic regression model most understandable: U = -0. 0322*age - 0. 02*distance + 0. 3885*[road=highway] – 0. 3885*[road=city street] + 4. 9053
Results • 3 logistic regression models were derived: 1. 2. 3. • Using region 1 examples Using region 2 examples Using examples from both regions No major difference in coefficients for all models, except for road type – This shows that the relevance values are dependent on the makeup of road network • Usefulness of derived models in report prioritization – Tested SWANS/STRAW simulator – 100 vehicles were randomly placed in region 1. Each travelled to random destinations for 1 hour. – Majority of highway segments had reduced speed limits (simulated accident scenario) – Number of broadcasted reports limited to 10 – Two evaluation metrics were used: • Average Trip Time: average time to reach destination • Total Path Travel Time Difference: calculated by taking the absolute value of the difference between travel time along the shortest path given vehicle’s current knowledge and full knowledge – Compared to common heuristics (1/(age+distance) used by Traffic. Info)
Results
Results
Conclusion • Proposed a machine learning approach to report prioritization for use in peer-to-peer environments • Uses incoming reports in order to provide input to supervised machine learning algorithms • Learned model can then be used by all nodes in order to rank the reports to be disseminated • Accurate prediction is feasible • Learned model outperformed heuristics in terms of disseminating the information most likely to affect the vehicle’s path