Smartphone apps usually have access to sensitive user data such as contacts, geo-location, and account credentials and they might share such data to external entities through the Internet or with other apps. Confidentiality of user data could be breached if there are anomalies in the way sensitive data is handled by an app which is vulnerable or malicious. Existing approaches that detect anomalous sensitive data flows have limitations in terms of accuracy because the definition of anomalous flows may differ for different apps with different functionalities; it is normal for "Health" apps to share heart rate information through the Internet but is anomalous for "Travel" apps.
In this paper, we propose a novel approach to detect anomalous sensitive data flows in Android apps, with improved accuracy. To achieve this objective, we first group trusted apps according to the topics inferred from their functional descriptions. We then learn sensitive information flows with respect to each group of trusted apps. For a given app under analysis, anomalies are identified by comparing sensitive information flows in the app against those flows learned from trusted apps grouped under the same topic. In the evaluation, information flow is learned from 11,796 trusted apps. We then checked for anomalies in 596 new (benign) apps and identified 2 previously-unknown vulnerable apps related to anomalous flows. We also analyzed 18 malware apps and found anomalies in 6 of them. PDF version of the paper.