Article

Understanding the Purpose of Permission Use in Mobile Apps

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Mobile apps frequently request access to sensitive data, such as location and contacts. Understanding the purpose of why sensitive data is accessed could help improve privacy as well as enable new kinds of access control. In this article, we propose a text mining based method to infer the purpose of sensitive data access by Android apps. The key idea we propose is to extract multiple features from app code and then use those features to train a machine learning classifier for purpose inference. We present the design, implementation, and evaluation of two complementary approaches to infer the purpose of permission use, first using purely static analysis, and then using primarily dynamic analysis. We also discuss the pros and cons of both approaches and the trade-offs involved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Based on the observation that users cannot understand the purpose of permissions based only on descriptions, a recent study focuses on inferring this information from app's code and behaviours [Wang et al., 2017a]. In the static analysis, two types of features are extracted from the code: app-specific features that include permission related APIs, Intents, Content Providers and text-based features. ...
... Besides metadata, app code and behaviours could be analyzed for the problem [Wang et al., 2017a]. A recent study shows that text-based features extracted from the code or the call stack helps finding the reasons of permissions usage in an app. ...
... Static/dynamic analysis both on custom code and third-party libraries should be considered. While the purpose of a permission usage could be too coarse-grained or fined-grained depending on the technique (static and dynamic analysis respectively) used in [Wang et al., 2017a], this could be further investigated in future studies. ...
Preprint
Full-text available
Android is among the most targeted platform by attackers. While attackers are improving their techniques, traditional solutions based on static and dynamic analysis have been also evolving. In addition to the application code, Android applications have some metadata that could be useful for security analysis of applications. Unlike traditional application distribution mechanisms, Android applications are distributed centrally in mobile markets. Therefore, beside application packages, such markets contain app information provided by app developers and app users. The availability of such useful textual data together with the advancement in Natural Language Processing (NLP) that is used to process and understand textual data has encouraged researchers to investigate the use of NLP techniques in Android security. Especially, security solutions based on NLP have accelerated in the last 5 years and proven to be useful. This study reviews these proposals and aim to explore possible research directions for future studies by presenting state-of-the-art in this domain. We mainly focus on NLP-based solutions under four categories: description-to-behaviour fidelity, description generation, privacy and malware detection.
... y context (Yin et al., 2015). The restricted access and limited control theory (RALC) indicate that users manage privacy by restricting access and limiting control over information (Tavani, 2008;Tavani and Moor, 2001). Therefore, the mechanism that enforces restrictions around smartphone users' information privacy is critical (Armando et al., 2015;H. Wang et al., 2017). ...
... The phrasing of app permission influences privacy decisions . Accepting wrong permission results in unauthorised access and secondary use of data (Boateng et al., 2019;Degirmenci, 2020;H. Wang et al., 2017). Additionally, permission requests are a gateway to users' information and thus reflects the core characteristic of the smartphone. ...
Thesis
Full-text available
The level of sensitivity with which smartphone users perceive information influences their privacy decisions. Information sensitivity is complex to understand due to the multiple factors influencing it. Adding to this complexity is the intimate nature of smartphone usage that produces personal information about various aspects of users’ lives. Users perceive information differently and this plays an important role in determining responses to privacy risks. The different levels of perceived sensitivity in turn point out how users could be uniquely supported through information cues that will enhance their privacy. However, several studies have tried to explain information sensitivity and privacy decisions by focusing on single-factor analysis. The current research adopts a different approach by exploring the influences of the disclosure context (smartphone ecosystem), three critical factors (economic status, location tracking, apps permission requests) and privacy attributes (privacy guardian, pragmatist, and privacy unconcerned) for a more encompassing understanding of how smartphone user categories in the UK perceive information. The analysis of multiple factors unearth deep complexities and provide a nuanced understanding of how information sensitivity varies across categories of smartphone users. Understanding how user categories perceive information enables tailored�privacy. Tailored privacy moves from “one-size-fits-all” to tailoring support to users and their context. The present research applied the Struassian grounded theory to analyse the qualitative interview data collected from 47 UK university graduates who are smartphone users. The empirical research findings show that smartphone users can be characterised into eight categories. However, the category a user belongs depends on the influencing factor or the information (identity or financial) involved and the privacy concern category of the user. This study proposes a middle-range theory for understanding smartphone users’ perception of information sensitivity. Middle-range theories are testable propositions resulting from an in-depth focus on a specific subject matter by looking at the attributes of individuals. The propositions show that an effective privacy support model for smartphone users should consider the varying levels of information sensitivity. Therefore, the study argues that users who perceive information as highly sensitive require privacy assurance to strengthen privacy, whereas users who perceive information as less sensitive require appropriate risk awareness to mitigate privacy risks. The proposition provides insight that could support tailored privacy for smartphone users.
... Wang et al. [12] proposes a data mining-based model for understanding why applications ask for their permissions. They make use of the static analysis to extract permission There are also few studies that focused on automatic security-centric description generation [12], [13]. ...
... Wang et al. [12] proposes a data mining-based model for understanding why applications ask for their permissions. They make use of the static analysis to extract permission There are also few studies that focused on automatic security-centric description generation [12], [13]. NLP has started to be used in malware detection as well. ...
Conference Paper
Full-text available
Android gives us opportunity to extract meaningful information from metadata. From the security point of view, the missing important information in metadata of an application could be a sign of suspicious application, which could be directed for extensive analysis. Especially the usage of dangerous permissions is expected to be explained in app descriptions. The permission-to-description fidelity problem in the literature aims to discover such inconsistencies between the usage of permissions and descriptions. This study proposes a new method based on natural language processing and recurrent neural networks. The effect of user reviews on finding such inconsistencies is also investigated in addition to application descriptions. The experimental results show that high precision is obtained by the proposed solution, and the proposed method could be used for triage of Android applications.
... Kynoid [35] extends TaintDroid with user-defined security policies such as restrictions on destination IP addresses to which data are released. Wang et al. [3,36] extended TaintDroid to enforce fine-grained access control based on the purpose of permission use in Android apps. • ...
... Third-party libraries. Developers include third-party libraries in their apps to enrich the functionality or meet other purposes [36]. Third-party libraries have occupied a large portion of code in Android apps [50]. ...
Article
Full-text available
Permission-related issues in Android apps have been widely studied in our research community, while most of the previous studies considered these issues from the perspective of app users. In this paper, we take a different angle to revisit the permission-related issues from the perspective of app developers. First, we perform an empirical study on investigating how we can help developers make better decisions on permission uses during app development. With detailed experimental results, we show that many permission-related issues can be identified and fixed during the application development phase. In order to help developers to identify and fix these issues, we develop PerHelper, an IDEplugin to automatically infer candidate permission sets, which help guide developers to set permissions more effectively and accurately. We integrate permission-related bug detection into PerHelper and demonstrate its applicability and flexibility through case studies on a set of open-source Android apps.
... Indeed, more than 70% of popular apps in Google Play use advertising libraries [13]. The purposes of sensitive behaviors used in custom code and TPLs are usually different [14], [15]. As shown in Figure 1 (2), the sensitive behaviors used in TPLs would dilute the behaviors of custom code. ...
... Ma et al. [29] extended CHABADA by proposing an active and semi-supervised approach to detect malware using both known benign and malicious apps. Wang et al. [15], [36], [37] proposed to infer the purpose of permission use. However, all of the previous studies do not consider the impact of TPLs when checking whether the app behaves as it advertised. ...
... Another code analysis technique, Privacygrade [80], decompiles the code of mobile apps to identify anomalies arising from sensitive API calls made by third-party libraries. Furthermore, researchers in [81] apply text mining to the app code to determine the purpose of accessing user location and contact lists. The code analysis techniques neither modify the app functionality at runtime nor perform PII filtering. ...
Article
Full-text available
The current age is witnessing an unprecedented dependence on data originating from humans through the devices that comprise the Internet of Things. The data collected by these devices are used for many purposes, including predictive maintenance, smart analytics, preventive healthcare, disaster protection, and increased operational efficiency and performance. However, most applications and systems that rely on user data to achieve their business objectives fail to comply with privacy regulations and expose users to numerous privacy threats. Such privacy breaches raise concerns about the legitimacy of the data being processed. Hence, this paper reviews some notable techniques for transparently, securely, and privately separating and sharing personally identifiable and non-personally identifiable information in various domains. One of the key findings of this study is that, despite various advantages, none of the existing techniques or data sharing applications preserve data/user privacy throughout the data life cycle. Another significant issue is the lack of transparency for data subjects during the collection, storage, and processing of private data. In addition, as privacy is unique to every user, there cannot be a single autonomous solution to identify and secure personally identifiable information for users of a particular application, system, or people living in different states/countries. Therefore, this research suggests a way forward to prevent the leakage of personally identifiable information at various stages of the data life cycle in compliance with some of the common privacy regulations around the world. The proposed approach aims to empower data owners to select, share, monitor, and control access to their data. In addition, the data owner is a stakeholder and a party to all data sharing contracts related to his personal data. The proposed solution has broad security and privacy controls that can be tailored to the privacy needs of specific applications.
... In mobile task automation, the key to understand the functionalities of the apps. Prior work has explored to extract such functionality information based on the code [60], GUI [25], network traffic [17], and metadata [13], but they are mostly too coarse-grained to facilitate task completion. Our method uses LLMs to process the raw traces collected by a dynamic app analyzer [30] to obtain fine-grained description of each GUI element. ...
Preprint
Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system that can handle arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at https://autodroid-sys.github.io/.
... Wang et al. [32] downloaded 830 apps from Google Play and Baidu App Market, which is a third-party market in China, and analyzed the intended use of READ CONTACTS and ACCESS FINE LOCATION permissions. READ CONTACTS allows access of stored contacts, and ACCESS FINE LOCATION of location information. ...
Article
As the use of mobile devices for financial payments continues to increase, prevention of attacks by Android malware becomes critical. In this study, we collected apps shared on Twitter in 2018 to create a Twitter shared apps dataset. We also clarified the proportion of apps that contained malware and those utilizing accessibility services (ASs). Furthermore, we faced issues in determining whether an app is suspicious or benign using VirusTotal results and extracting permissions from apps. Using VirusTotal, we studied the distribution of the number of apps for each anti-virus engine that “detected” malware and analyzed the changes in results over time to determine thresholds. Additionally, we examined methods for extracting permissions using Apktool and the aapt command, installed apps that were judged to request ASs and examined whether they were requested to obtain more accurate results. We analyzed the usage rate of ASs and the requested permissions. Furthermore, we analyzed the target Android application program interface levels of suspicious apps that use ASs.
... Mobile apps frequently request access to sensitive information, such as unique device ID, location data, and contact lists. Android currently requires developers to declare what permissions an app uses [26]. AndroidManifest.xml is the manifest file of the Android application, which describes each component of the application and defines the manifestation of the component such as component name, theme, launch type, operation of the component, launch intent, etc. ...
Article
Full-text available
Most of the current malware detection methods running on Android are based on signature and cloud technologies leading to poor protection against new types of malware. Deep learning techniques take Android malware detection to a new level. Still, most deep learning-based Android malware detection methods are too inefficient or even unworkable on Android devices due to their high resource consumption. Therefore, this paper proposes MSFDroid, a lightweight multi-source fast Android malware detection model, which uses information from the internal files of the Android application package in several dimensions to build base models for ensemble learning. Meanwhile, this paper proposes an adaptive soft voting method by dynamically adjusting the weights of each base model to overcome the noise generated by traditional soft voting and thus improves the performance. It also proposes adaptive shrinkage convolutional unit that can dynamically adjust the convolutional kernel’s weight and the activation function’s threshold to improve the expressiveness of the CNN. The proposed method is tested on public datasets and on several real devices. The experimental results show that it achieves a better trade-off between performance and efficiency by significantly improving the detection speed while achieving a comparable performance compared to other deep learning methods.
... Some studies have used third-party library detection techniques for finding security vulnerabilities [65,77,104,115] in Android apps. While others have focused on privacy leaks [21,60,61,80,169]. Third-party libraries have been detected and removed as noise in clone, app repackage, and malicious app detection studies [27,94,150,178,181]. ...
Thesis
In the current era of ubiquitous technology usage, gadgets like smartphones, tablets, and laptops are widely used. Since all of these devices are battery operated, the question of energy efficiency has become one of the crucial parameters when users select a device. Energy efficiency aims at reducing the amount of energy required when providing products and services. The energy efficiency of a digital device has become part of its overall perceived quality. Empirical studies have shown that mobile apps that do not drain battery usually get good ratings from users. Many studies have been published that present refactoring guidelines and tools to optimize the code to make mobile apps energy efficient. However, these guidelines cannot be generalized w.r.t. energy efficiency, as there is not enough energy-related data for every context. Existing energy enhancement tools and profilers are mostly prototypes applicable to only a small subset of energy-related problems. In addition, existing guidelines and tools focus on addressing energy issues a posteriori, i.e., once they have already been introduced into the code. Android app code can be roughly divided into two parts: the custom code and the reusable code. Custom code is unique to each app. Reusable code includes third party libraries that are included in apps to speed up the development process. As compared to desktop or web applications, Android apps contain multiple components that have user-driven workflows. A typical Android app consists of activities, fragments, services, content providers, and broadcast receivers. Due to the difference in architecture, the support tools used to develop traditional Java-based applications are not so useful in Android app development and maintenance. We start by evaluating the energy consumption of various code smell refactorings in native Android apps. Then we conduct an empirical study on the impact of third-party network libraries used in Android apps. By analysing commonly used third-party network libraries in various usage scenarios, we show that the energy consumption of these libraries differs significantly. We discuss results and provide generalized contextual guidelines that could be used during app development. Further, we conduct a systematic literature review to identify and study the current state of the art support tools available to aid the development of green Android apps. We summarize the scope and limitations of these tools and highlight research gaps. Based on this study and the experiments we conducted before, we highlight the problems in capturing and reproducing hardware-based energy measurements. We develop support tool ARENA (Analysing eneRgy Efficiency in aNdroid Apps) for the Android Studio IDE that could help in gathering energy data and analyzing the energy consumption of Android apps. Last, we develop the support tool REHAB (Recommending Energy-efficient tHird-pArty liBraries) for the Android Studio IDE to recommend energy efficient third-party network libraries to developers during development.
... Some studies have used third-party library detection techniques for finding security vulnerabilities [38]- [41] in Android apps. While others have focused on privacy leaks [42]- [46]. Third-party libraries have been detected and removed as noise in clone, app repackage, and malicious app detection studies [47]- [51]. ...
Chapter
Mobile applications are developed with limited battery resources in mind. To build energy-efficient mobile apps, many support tools have been developed which aid developers during the development and maintenance phases. To understand what is already available and what is still needed to support green Android development, we conducted a systematic mapping study to overview the state of the art and to identify further research opportunities. After applying inclusion/exclusion and quality criteria, we identified tools for detecting/refactoring code smells/energy bugs, and for detecting/migrating third-party libraries in Android applications. The main contributions of this study are: (1) classification of identified tools based on the support they offer to aid green Android development, (2) classification of the identified tools based on techniques used to offer support to developers, and (3) characterization of the identified tools based on the user interface, IDE integration, and availability. The most important finding is that the tools for detecting/migrating third-party libraries in Android development do not provide support to developers to optimize code w.r.t. energy consumption, which merits further research.
... A large mount of studies have analyzed mobile apps from security and privacy aspects, including malware detection (Zhang et al. 2014;Feng et al. 2014;Arp et al. 2014), permission and privacy analysis (Wang et al. 2015b(Wang et al. , c, 2017a, repackaging and fake app detection (Wang et al. 2015a;Hu et al. 2020), and identifying and analyzing third-party libraries Li et al. 2017b;Wang et al. 2017a), etc. Besides, some researchers in our community have analyzed specific types of mobile apps. For example, Hu et al. (2019) analyzed the ecosystem of fraudulent dating apps, i.e., the sole purpose of these apps is to lure users into purchasing premium/VIP services to start conversations with other (likely fake female) accounts in the app. ...
Article
Full-text available
As the COVID-19 pandemic emerged in early 2020, a number of malicious actors have started capitalizing the topic. Although a few media reports mentioned the existence of coronavirus-themed mobile malware, the research community lacks the understanding of the landscape of the coronavirus-themed mobile malware. In this paper, we present the first systematic study of coronavirus-themed Android malware. We first make efforts to create a daily growing COVID-19 themed mobile app dataset, which contains 4,322 COVID-19 themed apk samples (2,500 unique apps) and 611 potential malware samples (370 unique malicious apps) by the time of mid-November, 2020. We then present an analysis of them from multiple perspectives including trends and statistics, installation methods, malicious behaviors and malicious actors behind them. We observe that the COVID-19 themed apps as well as malicious ones began to flourish almost as soon as the pandemic broke out worldwide. Most malicious apps are camouflaged as benign apps using the same app identifiers (e.g., app name, package name and app icon). Their main purposes are either stealing users’ private information or making profit by using tricks like phishing and extortion. Furthermore, only a quarter of the COVID-19 malware creators are habitual developers who have been active for a long time, while 75% of them are newcomers in this pandemic. The malicious developers are mainly located in the US, mostly targeting countries including English-speaking countries, China, Arabic countries and Europe. To facilitate future research, we have publicly released all the well-labelled COVID-19 themed apps (and malware) to the research community. Till now, over 30 research institutes around the world have requested our dataset for COVID-19 themed research.
... A large number of papers were focused on using program analysis to detect the security [62], [63], [64], privacy [65], [66], [38], [67], [68], ads/third-party library [57], [9], [69], [11], and functionality issues [70], [71], [72], [73], [74] of mobile apps. In contrast, this paper focuses on a different perspective, i.e., how the users feel about their experiences. ...
Preprint
Millions of mobile apps have been available through various app markets. Although most app markets have enforced a number of automated or even manual mechanisms to vet each app before it is released to the market, thousands of low-quality apps still exist in different markets, some of which violate the explicitly specified market policies.In order to identify these violations accurately and timely, we resort to user comments, which can form an immediate feedback for app market maintainers, to identify undesired behaviors that violate market policies, including security-related user concerns. Specifically, we present the first large-scale study to detect and characterize the correlations between user comments and market policies. First, we propose CHAMP, an approach that adopts text mining and natural language processing (NLP) techniques to extract semantic rules through a semi-automated process, and classifies comments into 26 pre-defined types of undesired behaviors that violate market policies. Our evaluation on real-world user comments shows that it achieves both high precision and recall ($>0.9$) in classifying comments for undesired behaviors. Then, we curate a large-scale comment dataset (over 3 million user comments) from apps in Google Play and 8 popular alternative Android app markets, and apply CHAMP to understand the characteristics of undesired behavior comments in the wild. The results confirm our speculation that user comments can be used to pinpoint suspicious apps that violate policies declared by app markets. The study also reveals that policy violations are widespread in many app markets despite their extensive vetting efforts. CHAMP can be a \textit{whistle blower} that assigns policy-violation scores and identifies most informative comments for apps.
... PrivacyGrade.org [16] proposes a method to grade smartphone apps from a privacy perspective by evaluating permission requirements. Privacy Bird (privacybird.org) is a tool that allows end-users to find out what web sites will do with their data by reading privacy policies written in the Platform for Privacy Preferences (P3P) standards 2 . ...
Article
Full-text available
The design and development process for Internet of Things (IoT) applications is more complicated than for desktop, mobile, or web applications. IoT applications require both software and hardware to work together across multiple different types of nodes (e.g., microcontrollers, system-on-chips, mobile phones, miniaturized single-board computers, and cloud platforms) with different capabilities under different conditions. IoT applications typically collect and analyze personal data that can be used to derive sensitive information about individuals. Without proper privacy protections in place, IoT applications could lead to serious privacy violations. Thus far, privacy concerns have not been explicitly considered in software engineering processes when designing and developing IoT applications, partly due to a lack of tools, technologies, and guidance. This paper presents a research vision that argues the importance of developing a privacy-aware IoT application design tool to address the challenges mentioned above. This tool should not only transform IoT application designs into privacy-aware application designs, but also validate and verify them. First, we outline how this proposed tool should work in practice and its core functionalities. Then, we identify research challenges and potential directions for developing the proposed tool. We anticipate that this proposed tool will save many engineering hours which engineers would otherwise need to spend on developing privacy expertise and applying it. We also highlight the usefulness of this tool toward privacy education and privacy compliance.
... The permissions of the third party libraries are also included in this study as one of the improvements on the previous study of the same authors [57]. Based on the observation that users cannot understand the purpose of permissions based only on descriptions, a recent study focuses on inferring this information from app's code and behaviours [49]. In the static analysis, two types of features are extracted from the code: app-specific features that include permission related APIs, Intents, Content Providers, and textbased features. ...
Article
Full-text available
Since mobile applications make our lives easier, there is a large number of mobile applications customized for our needs in the application markets. While the application markets provide us a platform for downloading applications, it is also used by malware developers in order to distribute their malicious applications. In Android, permissions are used to prevent users from installing applications that might violate the users’ privacy by raising their awareness. From the privacy and security point of view, if the functionality of applications is given in sufficient detail in their descriptions, then the requirement of requested permissions could be well-understood. This is defined as description-to-permission fidelity in the literature. In this study, we propose two novel models that address the inconsistencies between the application descriptions and the requested permissions. The proposed models are based on the current state-of-art neural architectures called attention mechanisms. Here, we aim to find the permission statement words or sentences in app descriptions by using the attention mechanism along with recurrent neural networks. The lack of such permission statements in application descriptions creates a suspicion. Hence, the proposed approach could assist in static analysis techniques in order to find suspicious apps and to prioritize apps for more resource intensive analysis techniques. The experimental results show that the proposed approach achieves high accuracy.
... Dynamic analysis with a modified or rooted OS include Protect-MyPrivacy [13], TaintDroid [16], and others [7, 10,17,41,43,46]. Such tools are powerful not suitable for mass adoption since rooting a phone or installing a custom OS is not only a daunting task for the average user, but is also strongly discouraged by wireless providers and phone manufacturers. ...
Preprint
Full-text available
Mobile devices have access to personal, potentially sensitive data, and there is a large number of mobile applications and third-party libraries that transmit this information over the network to remote servers (including app developer servers and third party servers). In this paper, we are interested in better understanding of not just the extent of personally identifiable information (PII) exposure, but also its context i.e., functionality of the app, destination server, encryption used, etc.) and the risk perceived by mobile users today. To that end we take two steps. First, we perform a measurement study: we collect a new dataset via manual and automatic testing and capture the exposure of 16 PII types from 400 most popular Android apps. We analyze these exposures and provide insights into the extent and patterns of mobile apps sharing PII, which can be later used for prediction and prevention. Second, we perform a user study with 220 participants on Amazon Mechanical Turk: we summarize the results of the measurement study in categories, present them in a realistic context, and assess users' understanding, concern, and willingness to take action. To the best of our knowledge, our user study is the first to collect and analyze user input in such fine granularity and on actual (not just potential or permitted) privacy exposures on mobile devices. Although many users did not initially understand the full implications of their PII being exposed, after being better informed through the study, they became appreciative and interested in better privacy practices.
... Fake Apps and Repackaged Apps. Although many research efforts have been focused on security and privacy issues in the mobile app ecosystem [87,91,92,110,113,114,114,119], prior work on squatting attacks in app stores is rather limited. There have been a number of studies on fake apps and repackaged apps (app clones). ...
Conference Paper
Full-text available
... Besides mobile security, information of Android app GUIs has also been studied for software engineering. Recent researchers extract screen semantics from Android app GUIs and task flow from Android app GUI layouts, to mine humangenerated app traces [68], [69], add natural language interfaces [70]- [72], identify the inconsistence between intention and app behaviors [73], [74], detect aggressive mobile advertisements [75]- [77], and build conversational bots [78], [79]. Liu et al. [68] introduced an automatic approach for generating semantic annotations for mobile app UIs, which could be used to develop new data-driven design applications and enable efficient flow search over large datasets of interaction mining data. ...
Article
Full-text available
App clone is a serious threat to the mobile app ecosystem, which not only damages the benefits of original developers, but also contributes to spreading malware. App clone detection has received extensive attentions from our research community, and a number of approaches were proposed, which mainly rely on code or visual similarity of the apps. However, the tricky plagiarists in the wild may specifically modify the code or the content of User Interface (UI), which will lead to the ineffectiveness of current methods. In this paper, we propose a robust app clone detection method based on the similarity of UI structure. The key idea behind our approach is based on the finding that content features (e.g., background color) are more likely to be modified by plagiarists, while structure features (e.g., overall hierarchy structure, widget hierarchy structure) are relative stable, which could be used to detect different levels of clone attacks. Experiment results on a labeled benchmark of 4,720 similar app pairs show that our approach could achieve an accuracy of 99.6%. Compare with existing approaches, our approach works in practice with high effectiveness. We have implemented a prototype system and applied it to more than 404,650 app pairs, and we found 1,037 app clone pairs, most of them are piggybacking apps that introduced malicious payloads.
... To infer how specific permissions are used in code, Wang et al. [28] apply text analysis and different supervised classifiers on a manually labeled set of 622 apps. They perform taint tracking using a customized version of TaintDroid [7] and recently extended [29] their approach to also handle obfuscated code as it is often [32] found in Android apps. McLaughlin et al. [17] interpret source code analysis as a form of textual processing and design a convolutional neural network (CNN) that captures semantic information from opcodes in Dalvik bytecode to detect malware. ...
Conference Paper
Full-text available
Permissions are a key factor in Android to protect users' privacy. As it is often not obvious why applications require certain permissions, developer-provided descriptions in Google Play and third-party markets should explain to users how sensitive data is processed. Reliably recognizing whether app descriptions cover permission usage is challenging due to the lack of enforced quality standards and a variety of ways developers can express privacy-related facts. We introduce a machine learning-based approach to identify critical discrepancies between developer-described app behavior and permission usage. By combining state-of-the-art techniques in natural language processing (NLP) and deep learning, we design a convolutional neural network (CNN) for text classification that captures the relevance of words and phrases in app descriptions in relation to the usage of dangerous permissions. Our system predicts the likelihood that an app requires certain permissions and can warn about descriptions in which the requested access to sensitive user data and system features is textually not represented. We evaluate our solution on 77,000 real-world app descriptions and find that we can identify individual groups of dangerous permissions with a precision between 71% and 93%. To highlight the impact of individual words and phrases, we employ a model explanation algorithm and demonstrate that our technique can successfully bridge the semantic gap between described app functionality and its access to security- and privacy-sensitive resources.
... The work in [61] proposes a static taint analysis method to identify in Android sources what are the sensitive APIs through which sensitive data flows. Although not strictly related to our approach, the idea of identifying data flows that originate from sensitive sources together with the extraction of data related to app features [62], [63], [64] would permit our approach to (semi) automatically map features to the Android components that implement them. ...
... Previous work [97] has suggested that more than 85% of Android apps published in vendor-customized phones suffer from this issue. Since permissions constitute an explicit declaration of what sensitive resources an app will use [91,93], over-privileging an app is undesirable because: (i) it is a violation of the principle of least privilege [2]; (ii) it exposes users to unnecessary permission warnings; and (iii) it increases the attack surface [44] and the impact of the presence of a bug or vulnerability [54]. Intuitively, this gap can be identified first by building a permission map that identifies what sensitive permissions are needed for each API call/Intent/Content Provider, and using static analysis to determine what permission-related invocations an app makes. ...
Conference Paper
Full-text available
China is one of the largest Android markets in the world. As Chinese users cannot access Google Play to buy and install Android apps, a number of independent app stores have emerged and compete in the Chinese app market. Some of the Chinese app stores are pre-installed vendor-specific app markets (e.g., Huawei, Xiaomi and OPPO), whereas others are maintained by large tech companies (e.g., Baidu, Qihoo 360 and Tencent). The nature of these app stores and the content available through them vary greatly, including their trustworthiness and security guarantees. As of today, the research community has not studied the Chinese Android ecosystem in depth. To fill this gap, we present the first large-scale comparative study that covers more than 6 million Android apps downloaded from 16 Chinese app markets and Google Play. We focus our study on catalog similarity across app stores, their features, publishing dynamics, and the prevalence of various forms of misbehavior (including the presence of fake, cloned and malicious apps). Our findings also suggest heterogeneous developer behavior across app stores, in terms of code maintenance, use of third-party services, and so forth. Overall, Chinese app markets perform substantially worse when taking active measures to protect mobile users and legit developers from deceptive and abusive actors, showing a significantly higher prevalence of malware, fake, and cloned apps than Google Play.
... A large number of studies focus on mobile app analysis, including security and privacy analysis [10,25,30,39,41,42,65,66,68], app repackaging detection [15,63], app quality analysis [37,50,54,58], third-party library detection [12,44,53,62,64] and mobile ad network analysis [17,20], etc. Most of these studies focus on one specific issue and lack of the measurement study of the app ecosystem, although similar approaches could be used to understand the evolution of specific issues in the mobile app ecosystem. ...
Conference Paper
The continuing expansion of mobile app ecosystems has attracted lots of efforts from the research community. However, although a large number of research studies have focused on analyzing the corpus of mobile apps and app markets, little is known at a comprehensive level on the evolution of mobile app ecosystems. Because the mobile app ecosystem is continuously evolving over time, understanding the dynamics of app ecosystems could provide unique insights that cannot be achieved through studying a single static snapshot. In this paper, we seek to shed light on the dynamics of mobile app ecosystems. Based on 5.3 million app records (with both app metadata and apks) collected from three snapshots of Google Play over more than three years, we conduct the first study on the evolution of app ecosystems from different aspects. Our results suggest that although the overall ecosystem shows promising progress in regard of app popularity, user ratings, permission usage and privacy policy declaration, there still exists a considerable number of unsolved issues including malicious apps, update issues, third-party tracking threats, improper app promotion behaviors, and spamming/malicious developers. Our study shows that understanding the evolution of mobile app ecosystems can help developers make better decision on developing and releasing apps, provide insights for app markets to identifying misbehaviors, and help mobile users to choose desired apps.
... They relied on the PScout mappings and reported an accuracy of 85% and 94% for the two permissions. To overcome the obstacle of obfuscated code, they recently incorporated a dynamic analysis aspect and conducted a study on 830 apps [77]. While their approach has similarities with Reaper it presents significant limitations. ...
Conference Paper
Android's app ecosystem relies heavily on third-party libraries as they facilitate code development and provide a steady stream of revenue for developers. However, while Android has moved towards a more fine-grained run time permission system, users currently lack the required resources for deciding whether a specific permission request is actually intended for the app itself or is requested by possibly dangerous third-party libraries. In this paper we present Reaper, a novel dynamic analysis system that traces the permissions requested by apps in real time and distinguishes those requested by the app's core functionality from those requested by third-party libraries linked with the app. We implement a sophisticated UI automator and conduct an extensive evaluation of our system's performance and find that Reaper introduces negligible overhead, rendering it suitable both for end users (by integrating it in the OS) and for deployment as part of an official app vetting process. Our study on over 5K popular apps demonstrates the large extent to which personally identifiable information is being accessed by libraries and highlights the privacy risks that users face. We find that an impressive 65% of the permissions requested do not originate from the core app but are issued by linked third-party libraries, 37.3% of which are used for functionality related to ads, tracking, and analytics. Overall, Reaper enhances the functionality of Android's run time permission model without requiring OS or app modifications, and provides the necessary contextual information that can enable users to selectively deny permissions that are not part of an app's core functionality.
... Security is also the topic of He et al. 's [47] study on limiting attack surfaces through providing guidelines to stakeholders, including both end-users and developers, whereby the former stakeholder is asked to carefully evaluate the permissions asked for by apps upon installation, and the latter, developers, should have a clear understanding of the impact and the attack surfaces introduced with certain app permissions. From the literature, we find that apps asking the end-user for sensitive permissions may not describe the purpose or reason for requesting access to the permission-enabled data [48], rendering it difficult to know whether or not this data benefits the app-owner or a third-party more than it benefits the user [49]. The usage of in-app advertisement libraries is part of a large-scale review study conducted by Martin et al. [50], and these libraries are found to request additional app permissions, thus having the potential of acting intrusively and maliciously. ...
Article
Full-text available
The purpose of this study is to report on the industry’s perspectives and opinions on cross-platform mobile development, with an emphasis on the popularity, adoption, and arising issues related to the use of technical development frameworks and tools. We designed and conducted an online survey questionnaire, for which 101 participants were recruited from various developer-oriented online forums and websites. A total of five questions are reported in this study, of which two employed a Likert scale instrument, while three were based on multiple choice. In terms of technical frameworks, we find that PhoneGap, the Ionic Framework, and React Native were the most popular in use, both in hobby projects and in professional settings. The participants report an awareness of trade-offs when embracing cross-platform technologies and consider penalties in performance and user experience to be expected. This is also in line with what is reported in academic research. We find patterns in the reported perceived issues which match both older and newer research, thus rendering the findings a point of departure for further endevours.
... A large number of studies focused on the topic of mobile ad libraries in various directions including identifying ad libraries [8,39,41,46,72,73], detecting privacy and security issues introduced by ad libraries [16,30,57,62,77], analysing the impact of ad libraries [71,74,75,82], etc. We believe all the aforementioned approaches can be leveraged to supplement our work towards providing a better characterization of mobile ad frauds. ...
Conference Paper
Although mobile ad frauds have been widespread, state-of-the-art approaches in the literature have mainly focused on detecting the so-called static placement frauds, where only a single UI state is involved and can be identified based on static information such as the size or location of ad views. Other types of fraud exist that involve multiple UI states and are performed dynamically while users interact with the app. Such dynamic interaction frauds, although now widely spread in apps, have not yet been explored nor addressed in the literature. In this work, we investigate a wide range of mobile ad frauds to provide a comprehensive taxonomy to the research community. We then propose, FraudDroid, a novel hybrid approach to detect ad frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI state transition graphs and collects their associated runtime network traffics, which are then leveraged to check against a set of heuristic-based rules for identifying ad fraudulent behaviours. We show empirically that FraudDroid detects ad frauds with a high precision (∼ 93%) and recall (∼ 92%). Experimental results further show that FraudDroid is capable of detecting ad frauds across the spectrum of fraud types. By analysing 12,000 ad-supported Android apps, FraudDroid identified 335 cases of fraud associated with 20 ad networks that are further confirmed to be true positive results and are shared with our fellow researchers to promote advanced ad fraud detection.
... Previous work [97] has suggested that more than 85% of Android apps published in vendor-customized phones suffer from this issue. Since permissions constitute an explicit declaration of what sensitive resources an app will use [91,93], over-privileging an app is undesirable because: (i) it is a violation of the principle of least privilege [2]; (ii) it exposes users to unnecessary permission warnings; and (iii) it increases the attack surface [44] and the impact of the presence of a bug or vulnerability [54]. Intuitively, this gap can be identified first by building a permission map that identifies what sensitive permissions are needed for each API call/Intent/Content Provider, and using static analysis to determine what permission-related invocations an app makes. ...
Preprint
Full-text available
China is one of the largest Android markets in the world. As Chinese users cannot access Google Play to buy and install Android apps, a number of independent app stores have emerged and compete in the Chinese app market. Some of the Chinese app stores are pre-installed vendor-specific app markets (e.g., Huawei, Xiaomi and OPPO), whereas others are maintained by large tech companies (e.g., Baidu, Qihoo 360 and Tencent). The nature of these app stores and the content available through them vary greatly, including their trustworthiness and security guarantees. As of today, the research community has not studied the Chinese Android ecosystem in depth. To fill this gap, we present the first large-scale comparative study that covers more than 6 million Android apps downloaded from 16 Chinese app markets and Google Play. We focus our study on catalog similarity across app stores, their features, publishing dynamics, and the prevalence of various forms of misbehavior (including the presence of fake, cloned and malicious apps). Our findings also suggest heterogeneous developer behavior across app stores, in terms of code maintenance, use of third-party services, and so forth. Overall, Chinese app markets perform substantially worse when taking active measures to protect mobile users and legit developers from deceptive and abusive actors, showing a significantly higher prevalence of malware, fake, and cloned apps than Google Play.
... Also, in the past decade, much research has been done to design more effective mechanisms for privacy self-management. Such mechanisms include privacy nudges [3,27,32], which offer subtle yet persuasive cues to help users make the "right" decisions with minimum cognitive efforts; tools to facilitate the understanding of privacy policies [8,24]; and better permission request schemes, like constructing permissions based on the purpose of use [28]. Recognizing that users' privacy decisions often vary by demographics and context, there were also work [12,13] that provide personalized support for privacy decisions based on predicted user preferences. ...
... [30] 10 Million D com.ScnStudios.PoliceCarDriver3D [28] 10 Million D is based on previous research [59,60] that used crowdsourcing and machine-learning techniques to analyze the privacy-related behaviors of mobile apps. The rationale behind PrivacyGrade is that, whether the sensitive permissions should be granted is based on the purpose of permission use in the app and the expectation of mobile users [81,82]. Based on a large amount of crowd-sourcing data, PrivacyGrade ascertains users' level of concern for data usage (e.g. ...
Conference Paper
To ensure the quality and trustworthiness of the apps within its app market (i.e., Google Play), Google has released a series of policies to regulate app developers. As a result, policy-violating apps (e.g., malware, low-quality apps, etc.) have been removed by Google Play periodically. In reality, we have found that the number of removed apps are actually much more than what we have expected, as almost half of all the apps have been removed or replaced from Google Play during a two year period from 2015 to 2017. However, despite the significant number of removed apps, there are almost no study on the characterization of these removed apps. To this end, this paper takes the first step to understand why Android apps are removed from Google Play, aiming at observing promising insights for both market maintainers and app developers towards building a better app ecosystem. By leveraging two app sets crawled from Google Play in 2015 (over 1.5 million) and 2017 (over 2.1 million), we have identified a set of over 790K removed apps, which are then thoroughly investigated in various aspects. The experimental results have revealed various interesting findings, as well as insights for future research directions.
... Many user uses several applications in location based services without reading privacy policy agreement (provided by application provider). Hence a user is only responsible for losing his information while accessing location based series over the road network [23,24]. ...
... Ad libraries. A significant number of studies focused on the topic of ad libraries in various directions such as on discovering ad libraries [14,17,22], on detecting privacy leaks within ad libraries [11,16], on separating the privilege of ad libraries from host apps [20,24], and on pinpointing click frauds [8,10]. We believe all the aforementioned approaches can be leveraged to supplement our work towards providing a better characterization of violated ad policies. ...
Conference Paper
Advertisement libraries are used in almost two-thirds of apps in Google Play. To increase economic revenue, some app developers tend to entice mobile users to unexpectedly click ad views during their interaction with the app, resulting in kinds of ad fraud. Despite some popular ad providers have published behavioral policies to prevent inappropriate behaviors/practices, no previous work has studied whether mobile apps comply with those policies. In this paper, we take Google Admob as the starting point to study policy-violation apps. We first analyze the behavioral policies of Admob and create a taxonomy of policy violations. Then we propose an automated approach to detect policy-violation apps, which takes advantage of two key artifacts: an automated model-based Android GUI testing technique and a set of heuristic rules summarized from the behavior policies of Google Admob. We have applied our approach to 3,631 popular apps that have used the Admob library, and we could achieve a precision of 86% in detecting policy-violation apps. The results further show that roughly 2.5% of apps violate the policies, suggesting that behavioral policy violation is indeed a real issue in the Android advertising ecosystem.
... Its accuracy and usability can be continuously improved with more apps available in the app stores and more user decisions incorporated. • Obfuscation resilience: Previous research utilized namespace at the code level to build context-aware permission models [40,53,54]. However, commercial apps and malwares often modify their classes, methods and variable names to prevent reverse engineering, as shown in Figure 3. Malicious apps may further simulate the name space of official Android packages to evade detection. ...
Article
Full-text available
Mobile operating systems adopt permission systems to protect system integrity and user privacy. In this work, we propose INSPIRED, an intention-aware dynamic mediation system for mobile operating systems with privacy-preserving capability. When a security or privacy sensitive behavior is triggered, INSPIRED automatically infers the underlying program intention by examining its runtime environment and justifies whether to grant the relevant permission by matching with user intention. We stress on runtime contextual-integrity by answering the following three questions: who initiated the behavior, when was the sensitive action triggered and under what kind of environment was it triggered? Specifically, observing that mobile applications intensively leverage user interface (UI) to reflect the underlying application functionality, we propose a machine learning based permission model using foreground information obtained from multiple sources. To precisely capture user intention, our permission model evolves over time and it can be user-customized by continuously learning from user decisions. Moreover, by keeping and processing all user's behavioral data inside her own device (i.e., without sharing with a third-party cloud for learning), INSPIRED is also privacy-preserving. Our evaluation shows that our model achieves both high precision and recall (95%) based on 6,560 permission requests from both benign apps and malware. Further, it is capable of capturing users' specific privacy preferences with an acceptable median f-measure (84.7%) for 1,272 decisions from users. Finally, we show INSPIRED can be deployed on real Android devices to provide real-time protection with a low overhead.
Article
The generation of large amounts of personal data provides data centers with sufficient resources to mine idiosyncrasy from private records. User modeling has long been a fundamental task with the goal of capturing the latent characteristics of users from their behaviors. However, centralized user modeling on collected data has raised concerns about the risk of data misuse and privacy leakage. As a result, federated user modeling has come into favor, since it expects to provide secure multi-client collaboration for user modeling through federated learning. Unfortunately, to the best of our knowledge, existing federated learning methods that ignore the inconsistency among clients cannot be applied directly to practical user modeling scenarios, and moreover, they meet the following critical challenges: 1) Statistical heterogeneity . The distributions of user data in different clients are not always independently identically distributed (IID), which leads to unique clients with needful personalized information; 2) Privacy heterogeneity . User data contains both public and private information, which have different levels of privacy, indicating that we should balance different information shared and protected; 3) Model heterogeneity . The local user models trained with client records are heterogeneous, and thus require a flexible aggregation in the server; 4) Quality heterogeneity . Low-quality information from inconsistent clients poisons the reliability of user models and offsets the benefit from high-quality ones, meaning that we should augment the high-quality information during the process. To address the challenges, in this paper, we first propose a novel client-server architecture framework, namely Hierarchical Personalized Federated Learning (HPFL), with a primary goal of serving federated learning for user modeling in inconsistent clients. More specifically, the client train and deliver the local user model via the hierarchical components containing hierarchical information from privacy heterogeneity to join collaboration in federated learning. Moreover, the client updates the personalized user model with a fine-grained personalized update strategy for statistical heterogeneity. Correspondingly, the server flexibly aggregates hierarchical components from heterogeneous user models in the case of privacy and model heterogeneity with a differentiated component aggregation strategy. In order to augment high-quality information and generate high-quality user models, we expand HPFL to the Augmented-HPFL (AHPFL) framework by incorporating the augmented mechanisms, which filters out low-quality information such as noise, sparse information and redundant information. Specially, we construct two implementations of AHPFL, i.e., AHPFL-SVD and AHPFL-AE, where the augmented mechanisms follow SVD (singular value decomposition) and AE (autoencoder), respectively. Finally, we conduct extensive experiments on real-world datasets, which demonstrate the effectiveness of both HPFL and AHPFL frameworks.
Conference Paper
Data mining is capable of giving and providing the hidden, unknown, and interesting information in terms of knowledge in healthcare industry. It is useful to form decision support systems for the disease prediction and valid diagnosis of health issues. The concepts of data mining can be used to recommend the solutions and suggestions in medical sector for precautionary measures to control the disease origin at early stage. Today, diabetes is a most common and life taking syndrome found all over theworld. The presence of diabetes itself is a cause tomany other health issues in the form of side effects in human body. In such cases when considered, a need is to find the hidden data patterns from diabetic data to discover the knowledge so as to reduce the invisible health problems that arise in diabetic patients. Many studies have shown that AssociativeClassification concept of dataminingworks well and can derive good outcomes in terms of prediction accuracy. This research work holds the experimental results of the work carried out to predict and detect the by-diseases in diabetic patients with the application of Associative Classification, and it discusses an improved algorithmic method of Associative Classification named Associative Classification using Maximum Threshold and Super Subsets (ACMTSS) to achieve accuracy in better terms. Keywords Knowledge · By-disease · Maximum threshold · Super subsets · ACMTSS · Associative Classification
Article
Full-text available
With the deployment of the 5G cellular system, the upsurge of diverse mobile applications and devices has increased the potential challenges and threats posed to users. Industry and academia have attempted to address cyber security challenges by implementing automated malware detection and machine learning algorithms. This study expands on previous research on machine learning-based mobile malware detection. We critically evaluate 154 selected articles and highlight their strengths and weaknesses as well as potential improvements. We explore the mobile malware detection techniques used in recent studies based on attack intentions, such as server, network, client software, client hardware, and user. In contrast to other SLR studies, our study classified the means of attack as supervised and unsupervised learning. Therefore, this article aims at providing researchers with in-depth knowledge in the field and identifying potential future research and a framework for a thorough evaluation. Furthermore, we review and summarize security challenges related to cybersecurity that can lead to more effective and practical research.
Article
Full-text available
Mobile platforms are rapidly and continuously changing, with support for new sensors, APIs, and programming abstractions. Static analysis is gaining a growing interest, allowing developers to predict properties about the run-time behavior of mobile apps without executing them. Over the years, literally hundreds of static analysis techniques have been proposed, ranging from structural and control-flow analysis to state-based analysis. In this paper, we present a systematic mapping study aimed at identifying, evaluating and classifying characteristics, trends and potential for industrial adoption of existing research in static analysis of mobile apps. Starting from over 12,000 potentially relevant studies, we applied a rigorous selection procedure resulting in 261 primary studies along a time span of 9 years. We analyzed each primary study according to a rigorously-defined classification framework. The results of this study give a solid foundation for assessing existing and future approaches for static analysis of mobile apps, especially in terms of their industrial adoptability. Researchers and practitioners can use the results of this study to (i) identify existing research/technical gaps to target, (ii) understand how approaches developed in academia can be successfully transferred to industry, and (iii) better position their (past and future) approaches for static analysis of mobile apps.
Preprint
Full-text available
We present the design and design rationale for the user interfaces for Privacy Enhancements for Android (PE for Android). These UIs are built around two core ideas, namely that developers should explicitly declare the purpose of why sensitive data is being used, and these permission-purpose pairs should be split by first party and third party uses. We also present a taxonomy of purposes and ways of how these ideas can be deployed in the existing Android ecosystem.
Chapter
Mobile apps are becoming increasingly complex, as nowadays a growing amount of apps no longer focuses on being a “specialized utility” but acts as an “all-around” app that offers assorted features (e.g., news feed, messaging, weather, map, and navigation, etc.). In this paper, we argue that being able to automatically and precisely identify the features offered by an app would allow researchers to investigate new technical solutions, that in turn would benefit both end-users, developers and, researchers. As a stepping stone in this direction, we describe an automated technique to identify features within Android apps. Our approach performs the identification of the features by extracting information from the app user interface and grouping together semantically similar concepts thanks to knowledge-base aided natural text processing and machine learning.
Chapter
The presence of smartphones and their daily usages have changed several aspects of modern life. Android and IOS devices are widely used these days by the public. Besides, enormous number of mobile applications have been developed for the users. Google launched an online market which is known as Google Play for offering applications to end users as well as managing them in an integrated environment. Applications have many features that developers should clarify while they are uploading apps. These features have potential correlations which studying them could be useful in several tasks such as detecting malicious or miscategorized apps. Motivated by this, the purpose of this paper is to study these correlations through Machine Learning (ML) techniques. We apply various ML classification algorithms to distinguish these relations among key features of applications. Additionally, we perform many examinations to observe the relations between the size of the feature vector and the accuracy of the mentioned algorithms. Furthermore, we compare the algorithms to find the best choices for each part of our experiments. The results of our evaluation are promising. Also, in the majority of cases there are strong correlations between features.
Article
Most Android applications include third‐party libraries (3PLs) to make revenues, to facilitate their development, and to track user behaviors. 3PLs generally require specific permissions to realize their functionalities. Current Android systems manage permissions in app (process) granularity. As a result, the permission sets of apps with 3PLs (3PL‐apps) may be augmented, introducing overprivilege risks. In this paper, we firstly study how severe the problem is by analyzing the permission sets of 27 718 real‐world Android apps with and without 3PLs downloaded in both 2016 and 2017. We find that the usage of 3PLs and the permissions required by 3PL‐apps have increased over time. As a result, the possibility of overprivilege risks increases. We then propose Perman, a fine‐grained permission management mechanism for Android. Perman isolates the permissions of the host app and those of the 3PLs through dynamic code instrumentation. It allows users to manage permission requests of different modules of 3PL‐apps during app runtime. Unlike existing tools, Perman does not need to redesign Android apps and systems. Therefore, it can be applied to millions of existing apps and various Android devices. We conduct experiments to evaluate the effectiveness and efficiency of Perman. The experimental results verify that Perman is capable of managing permission requests of the host app and those of the 3PLs. We also confirm that the overhead introduced by Perman is comparable to that by existing commercial permission management tools.
Article
Full-text available
The booming popularity of smartphones is partly a result of application markets where users can easily download wide range of third-party applications. However, due to the open nature of markets, especially on Android, there have been several privacy and security concerns with these applications. On Google Play, as with most other markets, users have direct access to natural-language descriptions of those applications, which give an intuitive idea of the functionality including the security-related information of those applications. Google Play also provides the permissions requested by applications to access security and privacy-sensitive APIs on the devices. Users may use such a list to evaluate the risks of using these applications. To best assist the end users, the descriptions should reflect the need for permissions, which we term description-to-permission fidelity. In this paper, we present a system AutoCog to automatically assess description-to-permission fidelity of applications. AutoCog employs state-of-the-art techniques in natural language processing and our own learning-based algorithm to relate description with permissions. In our evaluation, AutoCog outperforms other related work on both performance of detection and ability of generalization over various permissions by a large extent. On an evaluation of eleven permissions, we achieve an average precision of 92.6% and an average recall of 92.0%. Our large-scale measurements over 45,811 applications demonstrate the severity of the problem of low description-to-permission fidelity. AutoCog helps bridge the long-lasting usability gap between security techniques and average users.
Conference Paper
Full-text available
Mobile malware attempts to evade detection during app analysis by mimicking security-sensitive behaviors of benign apps that provide similar functionality (e.g., sending SMS messages), and suppressing their payload to reduce the chance of being observed (e.g., executing only its payload at night). Since current approaches focus their analyses on the types of security-sensitive resources being accessed (e.g., network), these evasive techniques in malware make differentiating between malicious and benign app behaviors a difficult task during app analysis. We propose that the malicious and benign behaviors within apps can be differentiated based on the contexts that trigger security-sensitive behaviors, i.e., the events and conditions that cause the security-sensitive behaviors to occur. In this work, we introduce AppContext, an approach of static program analysis that extracts the contexts of security-sensitive behaviors to assist app analysis in differentiating between malicious and benign behaviors. We implement a prototype of AppContext and evaluate AppContext on 202 malicious apps from various malware datasets, and 633 benign apps from the Google Play Store. AppContext correctly identifies 192 malicious apps with 87.7% precision and 95% recall. Our evaluation results suggest that the maliciousness of a security-sensitive behavior is more closely related to the intention of the behavior (reflected via contexts) than the type of the security-sensitive resources that the behavior accesses.
Conference Paper
Full-text available
Smartphone apps today request permission to access a multitude of sensitive resources, which users must accept completely during installation (e.g., on Android) or selectively configure after installation (e.g., on iOS, but also planned for Android). Everyday users, however, do not have the ability to make informed decisions about which permissions are essential for their usage. For enhanced privacy, we seek to leverage crowdsourcing to find minimal sets of permissions that will preserve the usability of the app for diverse users. We advocate an efficient 'lattice-based' crowd-management strategy to explore the space of permissions sets. We conducted a user study (N = 26) in which participants explored different permission sets for the popular Instagram app. This study validates our efficient crowd management strategy and shows that usability scores for diverse users can be predicted accurately, enabling suitable recommendations.
Conference Paper
Full-text available
Smartphone users are often unaware of the data collected by apps running on their devices. We report on a study that evaluates the benefits of giving users an app permission manager and sending them nudges intended to raise their awareness of the data collected by their apps. Our study provides both qualitative and quantitative evidence that these approaches are complementary and can each play a significant role in empowering users to more effectively control their privacy. For instance, even after a week with access to the permission manager, participants benefited from nudges showing them how often some of their sensitive data was be-ing accessed by apps, with 95% of participants reassessing their permissions, and 58% of them further restricting some of their permissions. We discuss how participants interacted both with the permission manager and the privacy nudges, analyze the effective-ness of both solutions, and derive some recommendations.
Article
Full-text available
The increasing popularity of Google's mobile platform Android makes it the prime target of the latest surge in mobile malware. Most research on enhancing the plat-form's security and privacy controls requires extensive modification to the operating system, which has signif-icant usability issues and hinders efforts for widespread adoption. We develop a novel solution called Aurasium that bypasses the need to modify the Android OS while providing much of the security and privacy that users de-sire. We automatically repackage arbitrary applications to attach user-level sandboxing and policy enforcement code, which closely watches the application's behavior for security and privacy violations such as attempts to re-trieve a user's sensitive information, send SMS covertly to premium numbers, or access malicious IP addresses. Aurasium can also detect and prevent cases of privilege escalation attacks. Experiments show that we can apply this solution to a large sample of benign and malicious applications with a near 100 percent success rate, with-out significant performance and space overhead. Aura-sium has been tested on three versions of the Android OS, and is freely available.
Article
Full-text available
Modern smartphone operating systems (OSs) have been developed with a greater emphasis on security and protecting privacy. One of the mechanisms these systems use to protect users is a permission system, which requires developers to declare what sensitive resources their applications will use, has users agree with this request when they install the application and constrains the application to the requested resources during runtime. As these permission systems become more common, questions have risen about their design and implementation. In this paper, we perform an analysis of the permission system of the Android smartphone OS in an attempt to begin answering some of these questions. Because the documentation of Android's permission system is incomplete and because we wanted to be able to analyze several versions of Android, we developed PScout, a tool that extracts the permission specification from the Android OS source code using static analysis. PScout overcomes several challenges, such as scalability due to Android's 3.4 million line code base, accounting for permission enforcement across processes due to Android's use of IPC, and abstracting Android's diverse permission checking mechanisms into a single primitive for analysis. We use PScout to analyze 4 versions of Android spanning version 2.2 up to the recently released Android 4.0. Our main findings are that while Android has over 75 permissions, there is little redundancy in the permission specification. However, if applications could be constrained to only use documented APIs, then about 22% of the non-system permissions are actually unnecessary. Finally, we find that a trade-off exists between enabling least-privilege security with fine-grained permissions and maintaining stability of the permission specification as the Android OS evolves.
Article
Full-text available
Under certain circumstances, consumers are willing to pay a premium for privacy. We explore how choice architecture affects smartphone users' stated willingness to install applications that request varying permissions. We performed two experiments to gauge smartphone users' stated willingness to pay premiums to limit their personal information exposure when installing new applications. We found that when participants were comparison shopping between multiple appli-cations that performed similar functionality, a quarter of our sample responded that they were willing to pay a $1.50 premium for the application that requested the fewest permissions—though only when viewing the requested permissions of each application side-by-side. In a second experiment, we more closely simulated the user experience by asking them to valuate a single application that featured multiple sets of permissions based on five between-subjects conditions. In this scenario, the requested permissions had a much smaller impact on participants' responses. Our results suggest that many smartphone users are concerned with their privacy and are willing to pay premiums for applications that are less likely to request access to personal information. We propose improvements in choice architecture for smartphone application markets that could result in decreased satisficing and increased rational behavior.
Conference Paper
Full-text available
Mobile devices are playing an increasingly intimate role in everyday life. However, users can be surprised when in-formed of the data collection and distribution activities of apps they install. We report on two studies of smartphone users in western European countries, in which users were confronted with app behaviors and their reactions assessed. Users felt their personal space had been violated in "creepy" ways. Using Altman's notions of personal space and territoriality, and Nissenbaum's theory of contextual integrity, we account for these emotional reactions and sug-gest that they point to important underlying issues, even when users continue using apps they find creepy.
Conference Paper
Full-text available
How do we know a program does what it claims to do? After clus-tering Android apps by their description topics, we identify outliers in each cluster with respect to their API usage. A "weather" app that sends messages thus becomes an anomaly; likewise, a "messaging" app would typically not be expected to access the current location. Applied on a set of 22,500+ Android applications, our CHABADA prototype identified several anomalies; additionally, it flagged 56% of novel malware as such, without requiring any known malware patterns.
Conference Paper
Full-text available
IT security systems often attempt to support users in taking a decision by communicating associated risks. However, a lack of efficacy as well as problems with habituation in such systems are well known issues. In this paper, we propose to leverage the rich set of personal data available on smartphones to communicate risks using personalized examples. Examples of private information that may be at risk can draw the users' attention to relevant information for a decision and also improve their response. We present two experiments that validate this approach in the context of Android app permissions. Private information that becomes accessible given certain permissions is displayed when a user wants to install an app, demonstrating the consequences this installation might have. We find that participants made more privacy-conscious choices when deciding which apps to install. Additionally, our results show that our approach causes a negative affect in participants, which makes them pay more attention.
Conference Paper
Full-text available
Smartphones have unprecedented access to sensitive personal information. While users report having privacy concerns, they may not actively consider privacy while downloading apps from smartphone application marketplaces. Currently, Android users have only the Android permissions display, which appears after they have selected an app to download, to help them understand how applications access their information. We investigate how permissions and privacy could play a more active role in app-selection decisions. We designed a short "Privacy Facts' display, which we tested in a 20-participant lab study and a 366-participant online experiment. We found that by bringing privacy information to the user when they were making the decision and by presenting it in a clearer fashion, we could assist users in choosing applications that request fewer permissions.
Conference Paper
Full-text available
Mobile operating systems, such as Apple's iOS and Google's Android, have supported a ballooning market of feature-rich mobile applications. However, helping users understand security risks of mobile applications is still an ongoing challenge. While recent work has developed various techniques to reveal suspicious behaviors of mobile applications, there exists little work to answer the following question: are those behaviors necessarily inappropriate? In this paper, we seek an approach to cope with such a challenge and present a continuous and automated risk assessment framework called RiskMon that uses machine-learned ranking to assess risks incurred by users' mobile applications, especially Android applications. RiskMon combines users' coarse expectations and runtime behaviors of trusted applications to generate a risk assessment baseline that captures appropriate behaviors of applications. With the baseline, RiskMon assigns a risk score on every access attempt on sensitive information and ranks applications by their cumulative risk scores. We also discuss a proof-of-concept implementation of RiskMon as an extension of the Android mobile platform and provide both system evaluation and usability study of our methodology.
Conference Paper
Full-text available
Smartphone security research has produced many useful tools to analyze the privacy-related behaviors of mobile apps. However, these automated tools cannot assess people's perceptions of whether a given action is legitimate, or how that action makes them feel with respect to privacy. For example, automated tools might detect that a blackjack game and a map app both use one's location information, but people would likely view the map's use of that data as more legitimate than the game. Our work introduces a new model for privacy, namely privacy as expectations. We report on the results of using crowdsourcing to capture users' expectations of what sensitive resources mobile apps use. We also report on a new privacy summary interface that prioritizes and highlights places where mobile apps break people's expectations. We conclude with a discussion of implications for employing crowdsourcing as a privacy evaluation technique.
Article
Full-text available
In order to direct and build an effective, secure mobile ecosystem, we must first understand user attitudes toward security and privacy for smartphones and how they may differ from attitudes toward more traditional computing systems. What are users' comfort levels in performing different tasks? How do users select applications? What are their overall perceptions of the platform? This understanding will help inform the design of more secure smartphones that will enable users to safely and confidently benefit from the potential and convenience offered by mobile platforms. To gain insight into user perceptions of smartphone security and installation habits, we conduct a user study involving 60 smartphone users. First, we interview users about their willingness to perform certain tasks on their smartphones to test the hypothesis that people currently avoid using their phones due to privacy and security concerns. Second, we analyze why and how they select applications, which provides information about how users decide to trust applications. Based on our findings, we present recommendations and opportunities for services that will help users safely and confidently use mobile applications and platforms.
Article
Full-text available
MockDroid is a modified version of the Android operating system which allows a user to 'mock' an application's access to a resource. This resource is subsequently reported as empty or unavailable whenever the application requests access. This approach allows users to revoke access to particular resources at run-time, encouraging users to consider the trade-off between functionality and the disclosure of personal information whilst they use an application. Existing applications continue to work on MockDroid, possibly with reduced functionality, since existing applications are already written to tolerate resource failure, such as network unavailability or lack of a GPS signal. We demonstrate the practicality of our approach by successfully running a random sample of twenty-three popular applications from the Android Market.
Article
Full-text available
Android's permission system is intended to inform users about the risks of installing applications. When a user installs an application, he or she has the opportunity to review the application's permission requests and cancel the installation if the permissions are excessive or objectionable. We examine whether the Android permission system is effective at warning users. In particular, we evaluate whether Android users pay attention to, understand, and act on permission information during installation. We performed two usability studies: an Internet survey of 308 Android users, and a laboratory study wherein we interviewed and observed 25 Android users. Study participants displayed low attention and comprehension rates: both the Internet survey and laboratory study found that 17% of participants paid attention to permissions during installation, and only 3% of Internet survey respondents could correctly answer all three permission comprehension questions. This indicates that current Android permission warnings do not help most users make correct security decisions. However, a notable minority of users demonstrated both awareness of permission warnings and reasonable rates of comprehension. We present recommendations for improving user attention and comprehension, as well as identify open challenges.
Article
Full-text available
Smartphones and "app" markets are raising concerns about how third-party applications may misuse or improperly handle users' privacy-sensitive data. Fortunately, unlike in the PC world, we have a unique opportunity to improve the security of mobile applications thanks to the centralized nature of app distribution through popu-lar app markets. Thorough validation of apps applied as part of the app market admission process has the potential to significantly enhance mobile device security. In this paper, we propose AppIn-spector, an automated security validation system that analyzes apps and generates reports of potential security and privacy violations. We describe our vision for making smartphone apps more secure through automated validation and outline key challenges such as detecting and analyzing security and privacy violations, ensuring thorough test coverage, and scaling to large numbers of apps.
Conference Paper
With the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper presents a study of the mobile app ecosystem from the perspective of app developers. Based on over one million Android apps and 320,000 developers from Google Play, we analyzed the Android app ecosystem from different aspects. Our analysis shows that while over half of the developers have released only one app in the market, many of them have released hundreds of apps. We classified developers into different groups based on the number of apps they have released, and compared their characteristics. Specially, we have analyzed the group of aggressive developers who have released more than 50 apps, trying to understand how and why they create so many apps. We also investigated the privacy behaviors of app developers, showing that some developers have a habit of producing apps with low privacy ratings. Our study shows that understanding the behavior of mobile developers can be helpful to not only other app developers, but also to app markets and mobile users.
Conference Paper
This work presents a new approach for deobfuscating Android APKs based on probabilistic learning of large code bases (termed "Big Code"). The key idea is to learn a probabilistic model over thousands of non-obfuscated Android applications and to use this probabilistic model to deobfuscate new, unseen Android APKs. The concrete focus of the paper is on reversing layout obfuscation, a popular transformation which renames key program elements such as classes, packages, and methods, thus making it difficult to understand what the program does. Concretely, the paper: (i) phrases the layout deobfuscation problem of Android APKs as structured prediction in a probabilistic graphical model, (ii) instantiates this model with a rich set of features and constraints that capture the Android setting, ensuring both semantic equivalence and high prediction accuracy, and (iii) shows how to leverage powerful inference and learning algorithms to achieve overall precision and scalability of the probabilistic predictions. We implemented our approach in a tool called DeGuard and used it to: (i) reverse the layout obfuscation performed by the popular ProGuard system on benign, open-source applications, (ii) predict third-party libraries imported by benign APKs (also obfuscated by ProGuard), and (iii) rename obfuscated program elements of Android malware. The experimental results indicate that DeGuard is practically effective: it recovers 79.1% of the program element names obfuscated with ProGuard, it predicts third-party libraries with accuracy of 91.3%, and it reveals string decoders and classes that handle sensitive data in Android malware.
Conference Paper
Understanding the purpose of why sensitive data is used could help improve privacy as well as enable new kinds of access control. In this paper, we introduce a new technique for inferring the purpose of sensitive data usage in the context of Android smartphone apps. We extract multiple kinds of features from decompiled code, focusing on app-specific features and text-based features. These features are then used to train a machine learning classifier. We have evaluated our approach in the context of two sensitive permissions, namely ACCESS_FINE_LOCATION and READ_CONTACT_LIST, and achieved an accuracy of about 85% and 94% respectively in inferring purposes. We have also found that text-based features alone are highly effective in inferring purposes.
Conference Paper
We present LibRadar, a tool that is able to detect third-party libraries used in an Android app accurately and instantly. As third-party libraries are widely used in Android apps, program analysis on Android apps typically needs to detect or remove third-party libraries first in order to function correctly or provide accurate results. However, most previous studies employ a whitelist of package names of known libraries, which is incomplete and unable to deal with obfuscation. In contrast, LibRadar detects libraries based on stable API features that are obfuscation resilient in most cases. After analyzing one million free Android apps from Google Play, we have identified possible libraries and collected their unique features. Based on these features, LibRadar can detect third-party libraries in a given Android app within seconds, as it only requires simple static analysis and fast comparison. LibRadar is available for public use at http://radar.pkuos.org. The demo video is available at: https://youtu.be/GoMYjYxsZnI
Conference Paper
The aim of this research was to understand what affects people's privacy preferences in smartphone apps. We ran a four-week study in the wild with 34 participants. Participants were asked to answer questions, which were used to gather their personal context and to measure their privacy preferences by varying app name and purpose of data collection. Our results show that participants shared the most when no information about data access or purpose was given, and shared the least when both of these details were specified. When just one of either purpose or the requesting app was shown, participants shared less when just the purpose was specified than when just the app name was given. We found that the purpose for data access was the predominant factor affecting users' choices. In our study the purpose condition vary from being not specified, to vague to be very specific. Participants were more willing to disclose data when no purpose was specified. When a vague purpose was shown, participants became more privacy-aware and were less willing to disclose their information. When specific purposes were shown participants were more willing to disclose when the purpose for requesting the information appeared to be beneficial to them, and shared the least when the purpose for data access was solely beneficial to developers.
Conference Paper
The proliferation of mobile apps is due in part to the advertising ecosystem which enables developers to earn revenue while providing free apps. Ad-supported apps can be developed rapidly with the availability of ad libraries. However, today?s ad libraries essentially have access to the same resources as the parent app, and this has caused signi?cant privacy concerns. In this paper, we explore ef?cient methods to de-escalate privileges for ad libraries where the resource access privileges for ad libraries can be different from that of the app logic. Our system, PEDAL, contains a novel machine classi?er for detecting ad libraries even in the presence of obfuscated code, and techniques for automatically instrumenting bytecode to effect privilege de-escalation even in the presence of privilege inheritance. We evaluate PEDAL on a large set of apps from the Google Play store and demonstrate that it has a 98% accuracy in detecting ad libraries and imposes less than 1% runtime overhead on apps.
Conference Paper
We propose a type-based taint analysis for Android. Concretely, we present DFlow, a context-sensitive information flow type system, and DroidInfer, the corresponding type inference analysis for detecting privacy leaks in Android apps. We present novel techniques for error reporting based on CFL-reachability, as well as novel techniques for handling of Android-specific features, including libraries, multiple entry points and callbacks, and inter-component communication. Empirical results show that our approach is scalable and precise. DroidInfer scales well in terms of time and memory and has false-positive rate of 15.7%. It detects privacy leaks in apps from the Google Play Store and in known malware.
Conference Paper
Repackaged Android applications (app clones) have been found in many third-party markets, which not only compromise the copyright of original authors, but also pose threats to security and privacy of mobile users. Both fine-grained and coarse-grained approaches have been proposed to detect app clones. However, fine-grained techniques employing complicated clone detection algorithms are difficult to scale to hundreds of thousands of apps, while coarse-grained techniques based on simple features are scalable but less accurate. This paper proposes WuKong, a two-phase detection approach that includes a coarse-grained detection phase to identify suspicious apps by comparing light-weight static semantic features, and a fine-grained phase to compare more detailed features for only those apps found in the first phase. To further improve the detection speed and accuracy, we also introduce an automated clustering-based preprocessing step to filter third-party libraries before conducting app clone detection. Experiments on more than 100,000 Android apps collected from five Android markets demonstrate the effectiveness and scalability of our approach.
Conference Paper
We introduce the Android Security Framework (ASF), a generic, extensible security framework for Android that enables the development and integration of a wide spectrum of security models in form of code-based security modules. The design of ASF reflects lessons learned from the literature on established security frameworks (such as Linux Security Modules or the BSD MAC Framework) and intertwines them with the particular requirements and challenges from the design of Android's software stack. ASF provides a novel security API that supports authors of Android security extensions in developing their modules. This overcomes the current unsatisfactory situation to provide security solutions as separate patches to the Android software stack or to embed them into Android's mainline codebase. This system security extensibility is of particular benefit for enterprise or government solutions that require deployment of advanced security models, not supported by vanilla Android. We present a prototypical implementation of ASF and demonstrate its effectiveness and efficiency by modularizing different security models from related work, such as dynamic permissions, inlined reference monitoring, and type enforcement.
Article
Today's smartphones are a ubiquitous source of private and confidential data. At the same time, smartphone users are plagued by carelessly programmed apps that leak important data by accident, and by malicious apps that exploit their given privileges to copy such data intentionally. While existing static taint-analysis approaches have the potential of detecting such data leaks ahead of time, all approaches for Android use a number of coarse-grain approximations that can yield high numbers of missed leaks and false alarms. In this work we thus present FlowDroid, a novel and highly precise static taint analysis for Android applications. A precise model of Android's lifecycle allows the analysis to properly handle callbacks invoked by the Android framework, while context, flow, field and object-sensitivity allows the analysis to reduce the number of false alarms. Novel on-demand algorithms help FlowDroid maintain high efficiency and precision at the same time. We also propose DroidBench, an open test suite for evaluating the effectiveness and accuracy of taint-analysis tools specifically for Android apps. As we show through a set of experiments using SecuriBench Micro, DroidBench, and a set of well-known Android test applications, FlowDroid finds a very high fraction of data leaks while keeping the rate of false positives low. On DroidBench, FlowDroid achieves 93% recall and 86% precision, greatly outperforming the commercial tools IBM AppScan Source and Fortify SCA. FlowDroid successfully finds leaks in a subset of 500 apps from Google Play and about 1,000 malware apps from the VirusShare project.
Article
Recent years have witnessed incredible growth in the popularity and prevalence of smart phones. A flourishing mobile application market has evolved to provide users with additional functionality such as interacting with social networks, games, and more. Mobile applications may have a direct purchasing cost or be free but ad-supported. Unlike in-browser ads, the privacy im-plications of ads in Android applications has not been thoroughly explored. We start by comparing the similarities and differences of in-browser ads and in-app ads. We examine the effect on user privacy of thirteen popular Android ad providers by reviewing their use of permissions. Worryingly, several ad libraries checked for permissions beyond the required and optional ones listed in their documentation, including dangerous permissions like CAMERA, WRITE CALENDAR and WRITE CONTACTS. Further, we discover the insecure use of Android's JavaScript extension mechanism in several ad libraries. We identify fields in ad requests for private user information and confirm their presence in network data obtained from a tier-1 network provider. We also show that users can be tracked by a network sniffer across ad providers and by an ad provider across applications. Finally, we discuss several possible solutions to the privacy issues identified above.
Article
Today's smartphones are a ubiquitous source of private and confidential data. At the same time, smartphone users are plagued by carelessly programmed apps that leak important data by accident, and by malicious apps that exploit their given privileges to copy such data intentionally. While existing static taint-analysis approaches have the potential of detecting such data leaks ahead of time, all approaches for Android use a number of coarse-grain approximations that can yield high numbers of missed leaks and false alarms. In this work we thus present FlowDroid, a novel and highly precise static taint analysis for Android applications. A precise model of Android's lifecycle allows the analysis to properly handle callbacks invoked by the Android framework, while context, flow, field and object-sensitivity allows the analysis to reduce the number of false alarms. Novel on-demand algorithms help FlowDroid maintain high efficiency and precision at the same time. We also propose DroidBench, an open test suite for evaluating the effectiveness and accuracy of taint-analysis tools specifically for Android apps. As we show through a set of experiments using SecuriBench Micro, DroidBench, and a set of well-known Android test applications, FlowDroid finds a very high fraction of data leaks while keeping the rate of false positives low. On DroidBench, FlowDroid achieves 93% recall and 86% precision, greatly outperforming the commercial tools IBM AppScan Source and Fortify SCA. FlowDroid successfully finds leaks in a subset of 500 apps from Google Play and about 1,000 malware apps from the VirusShare project.
Article
Mobile applications are a major force behind the explosive growth of mobile devices. While they greatly extend the functionality of mobile devices, they also raise security and privacy concerns, especially when they have not gone through a rigorous review process. To protect users from untrusted and potentially malicious applications, we design and imple-ment a rewriting framework for embedding In-App Reference Monitors (I-ARM) into Android applications. The framework user identifies a set of security-sensitive API methods and specifies their security policies, which may be tailored to each application. Then, our framework automatically rewrites the Dalvik bytecode of the application, where it interposes on all the invocations of these API methods to implement the desired security policies. We have implemented a prototype of the rewriting framework and evaluated it on compatibility, functionality, and performance in time and size overhead. We showcase example security policies that this rewriting framework supports.
Conference Paper
In the recent years, studies of design and programming practices in mobile development are gaining more attention from researchers. Several such empirical studies used Android applications (paid, free, and open source) to analyze factors such as size, quality, dependencies, reuse, and cloning. Most of the studies use executable files of the apps (APK files), instead of source code because of availability issues (most of free apps available at the Android official market are not open-source, but still can be downloaded and analyzed in APK format). However, using only APK files in empirical studies comes with some threats to the validity of the results. In this paper, we analyze some of these pertinent threats. In particular, we analyzed the impact of third-party libraries and code obfuscation practices on estimating the amount of reuse by class cloning in Android apps. When including and excluding third-party libraries from the analysis, we found statistically significant differences in the amount of class cloning 24,379 free Android apps. Also, we found some evidence that obfuscation is responsible for increasing a number of false positives when detecting class clones. Finally, based on our findings, we provide a list of actionable guidelines for mining and analyzing large repositories of Android applications and minimizing these threats to validity
Conference Paper
Today's smartphone applications expect users to make decisions about what information they are willing to share, but fail to provide sufficient feedback about which privacy-sensitive information is leaving the phone, as well as how frequently and with which entities it is being shared. Such feedback can improve users' understanding of potential privacy leakages through apps that collect information about them in an unexpected way. Through a qualitative lab study with 19 participants, we first discuss misconceptions that smartphone users currently have with respect to two popular game applications that frequently collect the phone's current location and share it with multiple third parties. To measure the gap between users' understanding and actual privacy leakages, we use two types of interfaces that we developed: just-in-time notifications that appear the moment data is shared and a visualization that summarizes the shared data. We then report on participants' perceived benefits and concerns regarding data sharing with smartphone applications after experiencing notifications and having viewed the visualization. We conclude with a discussion on how heightened awareness of users and usable controls can mitigate some of these concerns.
Conference Paper
Mobile-device theft and loss have reached gigantic proportions. Despite these threats, today's mobile devices are saturated with sensitive information due to operating systems that never securely erase data and applications that hoard it on the vulnerable device for performance or convenience. This paper presents CleanOS, a new Android-based operating system that manages sensitive data rigorously and maintains a clean environment at all times. To do so, CleanOS leverages a key property of today's mobile applications - the use of trusted, cloud-based services. Specifically, CleanOS identifies and tracks sensitive data in RAM and on stable storage, encrypts it with a key, and evicts that key to the cloud when the data is not in active use on the device. We call this process idle eviction of sensitive data. To implement CleanOS, we used the TaintDroid mobile taint-tracking system to identify sensitive data locations and instrumented Android's Dalvik interpreter to securely evict that data after a specified period of non-use. Our experimental results show that CleanOS limits sensitive-data exposure drastically while incurring acceptable overheads on mobile networks.
Conference Paper
The success of Android phones makes them a prominent target for malicious software, in particular since the Android permission system turned out to be inadequate to protect the user against security and privacy threats. This work presents AppGuard, a powerful and flexible system for the enforcement of user-customizable security policies on untrusted Android applications. AppGuard does not require any changes to a smartphone's firmware or root access. Our system offers complete mediation of security-relevant methods based on callee-site inline reference monitoring. We demonstrate the general applicability of AppGuard by several case studies, e.g., removing permissions from overly curious apps as well as defending against several recent real-world attacks on Android phones. Our technique exhibits very little space and runtime overhead. AppGuard is publicly available, has been invited to the Samsung Apps market, and has had more than 500,000 downloads so far.