Finding the Hidden Scenes Behind Android Applications Joey
Finding the Hidden Scenes Behind Android Applications Joey Allen, Xiangyu Niu University of Tennessee The Accuracy for APPIC was calculated according to the flowchart. Introduction Mobile devices have become an integral part of everyday life, and third party mobile applications satisfy a variety of user needs. While having a device that can meet the requirement of so many needs is cutting-edge, it leaves the user susceptible to having sensitive data and information leaked or intercepted. A malicious developer can use a misleading application description to deceive a naïve user into downloading a deceitful application. The app’s functionality may be totally different than what the user expected, and potentially steal sensitive information or damage the device. This potential threat has created an urgent need for a mechanism to autonomously categorize third party applications User Reads Application Description CI = Correct Inference II = Incorrect Inference Compare APPIC tags with Author’s Tags APPIC and Author correctly categorized App. (CI + 1) APPIC incorrectly categorizes application (II + 1) APPIC finds App in wrong category. (CI + 1) APPIC and author incorrectly categorize app. (II + 1) Accuracy vs. Catagory (LDA Model) 1 0, 9 0, 8 Accuracy 0, 7 0, 6 0, 5 0, 4 0, 3 0, 2 0, 1 Ar c a Bo de & ok s & am am p; A Br p; R ctio ai n ef n &a ere n m p; ce Pu zz Bu le sin es s Co m Cas m un ual ica t Ed ion En uca te t rta ion in He m en al th t F Lib &a ina ra m nc rie p e s & ; Fit n am e p; ss De M m ed Lif o ia &a est y m p; le Ne Vi w de s& am M o ed p; ica M Pe aga l rs z on ine s al i Ph zat ot ion og Pr rap od hy uc tiv Sh ity op pi ng So cia l Sp Sp o or ts rts Ga m es Tr T a o n Tr av spo ols r el &a tati m on p; Lo ca ov l er al l 0 3 Tags Categories 2 Tags Accuracy vs. Categories (AT Model) 1 0, 9 Figure 1: APPIC Overview [1]. 0, 8 0, 5 0, 4 0, 2 0, 1 er al l l p; Tr a ve l& am ov Lo ca s ol To l ts or Sp cia So ng pi op Sh tiv ity n Pr od uc io Pe rs o na liz at in es o w s& am p; M ag az Vi p; m &a Ne M ed ia &a th al He de le es ty ss Lif Fit ne ce p; an m Fin m en t n in tio ca En te rta n tio ica un Ed u al su m m s es sin zz le Bu Ca Co in &a m p; Pu en fe r Re p; Br a m &a ks ad e &a m p; Ac tio n ce 0 Bo o The approach used in this research compares two methods of autonomously categorizing applications. The first method is Latent-Dirichlet Allocation (LDA) and the second method is Author-Topic Model. LDA is a generative probabilistic model of a corpus [1]. Where the corpus in this research is the app’s description and permission file. Author-Topic Model extends LDA to include authorship information [2]. 0, 6 0, 3 Ar c Approach Accuracy 0, 7 Categories Conclusion The LDA performed better in general. As the amount of tags increased the accuracy increased, as expected. However, more tags leads to a less secure system. The AT model did not perform as well in categorizing applications as LDA, but the AT Model can still be useful in finding authors that create similar applications. References [1] Y. Yang, J. S. Sun, and M. W. Berry, “APPIC: Finding The Hidden Scene Behind Description Files for Android Apps. ” [2] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation, ” the Journal of machine Learning research, vol. 3, pp. 993– 1022, 2003. [3] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The author-topic model for authors and documents, ” in Proceedings of the 20 th conference on Uncertainty in artificial intelligence, 2004, pp. 487– 494. Acknowledgements Yinyuan Yang for providing the permissions database. Acknowledgements: This work was supported primarily by the Engineering Research Center Program of the National Science Foundation and the Department of Energy under NSF Award Number EEC-1041877 and the CURENT Industry Partnership Program.
- Slides: 1