Yulei Sui is a Senior Lecturer at School of Computer Science, Faculty of Engineering and Information Technology, University of Technology Sydney (UTS). He received his PhD from University of New South Wales in 2014. He had worked as a Research Fellow from January 2014 to September 2017 at UNSW, Australia. He Joined UTS to take his current role in October 2017. His general interests are program analysis, compilers, software engineering and machine learning. Particularly, his research focuses on building fundamental static and dynamic program analysis techniques to improve the reliability and security of modern software systems. His recent interest lies in bug prediction, program repair, and software security analysis through data mining and machine learning.
His papers have been published in the top-tier conferences and journals in the field of software engineering and a programming language such as TSE, ICSE, FSE, ISSTA, ECOOP, SAS, CGO, CC, ICPP, LCTES. He was a research intern in the Program Analysis Group in Oracle Lab Australia in 2013. He was a keynote speaker at EuroLLVM 2016, and has been awarded a 2019 SAS Best Paper, a 2018 ICSE Distinguished Paper Award, a 2013 CGO Best Paper, an ACM CAPS award, and an ARC Discovery Early Career Researcher Award (2017-2019).
Can supervise: YES
- Program Analysis
- Software Engineering
- Software Security Analysis
- Machine Learning
41093 Software Engineering Studio 1A
41094 Software Engineering Studio 1B
HDR Coordinator of School of Computer Science, FEIT
Li, J, Pan, Y, Sui, Y & Tsang, IW 2020, 'Secure Metric Learning via Differential Pairwise Privacy', IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, vol. 15, pp. 3640-3652.View/Download from: Publisher's site
Nobakht, M, Sui, Y, Seneviratne, A & Hu, W 2020, 'PGFIT: Static permission analysis of health and fitness apps in IoT programming frameworks', JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, vol. 152.View/Download from: Publisher's site
Zhang, Y, Sui, Y, Pan, S, Zheng, Z, Ning, B, Tsang, I & Zhou, W 2020, 'Familial Clustering For Weakly-labeled Android Malware Using Hybrid Representation Learning', IEEE Transactions on Information Forensics and Security.View/Download from: Publisher's site
IEEE Labeling malware or malware clustering is important for identifying new security threats, triaging and building reference datasets. The state-of-the-art Android malware clustering approaches rely heavily on the raw labels from commercial AntiVirus (AV) vendors, which causes misclustering for a substantial number of weakly-labeled malware due to the inconsistent, incomplete and overly generic labels reported by these closed-source AV engines, whose capabilities vary greatly and whose internal mechanisms are opaque (i.e., intermediate detection results are unavailable for clustering). The raw labels are thus often used as the only important source of information for clustering. To address the limitations of the existing approaches, this paper presents ANDRE, a new ANDroid Hybrid REpresentation Learning approach to clustering weakly-labeled Android malware by preserving heterogeneous information from multiple sources (including the results of static code analysis, the metainformation of an app, and the raw-labels of the AV vendors) to jointly learn a hybrid representation for accurate clustering. The learned representation is then fed into our outlieraware clustering to partition the weakly-labeled malware into known and unknown families. The malware whose malicious behaviours are close to those of the existing families on the network, are further classified using a three-layer Deep Neural Network (DNN). The unknown malware are clustered using a standard density-based clustering algorithm. We have evaluated our approach using 5,416 ground-truth malware from Drebin and 9,000 malware from VIRUSSHARE (uploaded between Mar. 2017 and Feb. 2018), consisting of 3324 weakly-labeled malware. The evaluation shows that ANDRE effectively clusters weaklylabeled malware which cannot be clustered by the state-of-theart approaches, while achieving comparable accuracy with those approaches for clustering ground-truth samples.
Hou, Z, Wang, Y, Sui, Y, Gu, J, Zhao, T & Zhou, X 2018, 'Managing high-performance computing applications as an on-demand service on federated clouds', Computers and Electrical Engineering, vol. 67, pp. 579-595.View/Download from: Publisher's site
© 2018 Elsevier Ltd There are several challenges (e.g., imbalance between supply and demand of hardware resources and software licenses, and usability) under modern High-Performance Computing (HPC) environment. As a means of providing an on-demand service for end users, we propose a Software-as-a-Service (SaaS) approach for managing commercial HPC applications as a Web-based service deployed on top of federated clouds. Some inter-trusted private or public clouds are federated to create a unified service platform with a large amount of hardware resources. In addition, an on-demand, pay-per-use model for Web-service-enabled HPC applications is proposed. Further, we provide an economic analysis of the proposed approach from the perspective of end users, cloud service providers, and Independent Software Vendors (ISVs). We conduct a simulation using two HPC application services on three federated clouds. A combined Quality of Service (QoS) and economic evaluation demonstrates a better effect of the proposed approach comparing with existing HPC platforms.
IEEE We present SUPA, a value-flow-based demand-driven flow- and context-sensitive pointer analysis with strong updates for C and C++ programs. SUPA enables computing points-to information via value-flow refinement, in environments with small time and memory budgets. We formulate SUPA by solving a graph-reachability problem on an inter-procedural value-flow graph representing a program's def-use chains, which are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable's points-to set), SUPA reasons about the flow of values along the pre-computed def-use chains sparsely (rather than across all program points), by performing only the work necessary for the query (rather than analyzing the whole program). In particular, strong updates are performed to filter out spurious def-use chains through value-flow refinement as long as the total budget is not exhausted.
Sui, Y, Fan, X, Zhou, H & Xue, J 2018, 'Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization', ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, vol. 17, no. 2.View/Download from: Publisher's site
Sui, Y, Yan, H, Zheng, Z, Zhang, Y & Xue, J 2018, 'Parallel construction of interprocedural memory SSA form', Journal of Systems and Software, vol. 146, pp. 186-195.View/Download from: Publisher's site
© 2018 Elsevier Inc. Interprocedural memory SSA form, which provides a sparse data-flow representation for indirect memory operations, paves the way for many advanced program analyses. Any performance improvement for memory SSA construction benefits for a wide range of clients (e.g., bug detection and compiler optimisations). However, its construction is much more expensive than that for scalar-based SSA form. The memory objects distinguished at a pointer dereference significantly increases the number of variables that need to be put on SSA form, resulting in considerable analysis overhead when analyzing large programs (e.g., millions of lines of code). This paper presents PARSSA, a fully parameterised approach for parallel construction of interprocedural memory SSA form by utilising multi-core computing resources. PARSSA partitions whole-program memory objects into uniquely identified memory regions. The indirect memory accesses in a function are fully parameterised using partitioned memory regions, so that the memory SSA construction of a parameterised function is readily parallelised. We implemented PARSSA in LLVM using Intel Threading Building Block (TBB) for creating parallel tasks. We evaluated PARSSA using 15 large applications. PARSSA achieves up to 6.9 × speedup against the sequential version on an 8-core machine.
Sui, Y, Ye, D, Su, Y & Xue, J 2016, 'Eliminating Redundant Bounds Checks in Dynamic Buffer Overflow Detection Using Weakest Preconditions', IEEE TRANSACTIONS ON RELIABILITY, vol. 65, no. 4, pp. 1682-1699.View/Download from: Publisher's site
Sui, Y, Ye, D & Xue, J 2014, 'Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis', IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, vol. 40, no. 2, pp. 107-122.View/Download from: Publisher's site
Sui, Y, Ye, S, Xue, J & Zhang, J 2014, 'Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation', SOFTWARE-PRACTICE & EXPERIENCE, vol. 44, no. 12, pp. 1485-1510.View/Download from: Publisher's site
Ye, S, Sui, Y & Xue, J 2014, 'Region-Based Selective Flow-Sensitive Pointer Analysis', STATIC ANALYSIS (SAS 2014), vol. 8723, pp. 319-336.
Sui, Y & Xue, J, 'Demand-Driven Pointer Analysis with Strong Updates via Value-Flow Refinement'.
We present a new demand-driven flow- and context-sensitive pointer analysis
with strong updates for C programs, called SUPA, that enables computing
points-to information via value-flow refinement, in environments with small
time and memory budgets such as IDEs. We formulate SUPA by solving a graph
reachability problem on an inter-procedural value-flow graph representing a
program's def-use chains, which are pre-computed efficiently but
over-approximately. To answer a client query (a request for a variable's
points-to set), SUPA reasons about the flow of values along the pre-computed
def-use chains sparsely (rather than across all program points), by performing
only the work necessary for the query (rather than analyzing the whole
program). In particular, strong updates are performed to filter out spurious
def-use chains through value-flow refinement as long as the total budget is not
exhausted. SUPA facilitates efficiency and precision tradeoffs by applying
different pointer analyses in a hybrid multi-stage analysis framework.
We have implemented SUPA in LLVM (3.5.0) and evaluate it by choosing
uninitialized pointer detection as a major client on 18 open-source C programs.
As the analysis budget increases, SUPA achieves improved precision, with its
single-stage flow-sensitive analysis reaching 97.4% of that achieved by
whole-program flow-sensitive analysis by consuming about 0.18 seconds and 65KB
of memory per query, on average (with a budget of at most 10000 value-flow
edges per query). With context-sensitivity also considered, SUPA's two- stage
analysis becomes more precise for some programs but also incurs more analysis
times. SUPA is also amenable to parallelization. A parallel implementation of
its single-stage flow-sensitive analysis achieves a speedup of up to 6.9x with
an average of 3.05x a 8-core machine with respect its sequential version.
Wan, Y, Shu, J, Sui, Y, Xu, G, Zhao, Z, Wu, J & Yu, P 2019, 'Multi-modal attention network learning for semantic source code retrieval', Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, IEEE/ACM International Conference on Automated Software Engineering, IEEE, San Diego, CA, USA, pp. 13-25.View/Download from: Publisher's site
© 2019 IEEE. Code retrieval techniques and tools have been playing a key role in facilitating software developers to retrieve existing code fragments from available open-source repositories given a user query (e.g., a short natural language text describing the functionality for retrieving a particular code snippet). Despite the existing efforts in improving the effectiveness of code retrieval, there are still two main issues hindering them from being used to accurately retrieve satisfiable code fragments from large-scale repositories when answering complicated queries. First, the existing approaches only consider shallow features of source code such as method names and code tokens, but ignoring structured features such as abstract syntax trees (ASTs) and control-flow graphs (CFGs) of source code, which contains rich and well-defined semantics of source code. Second, although the deep learning-based approach performs well on the representation of source code, it lacks the explainability, making it hard to interpret the retrieval results and almost impossible to understand which features of source code contribute more to the final results. To tackle the two aforementioned issues, this paper proposes MMAN, a novel Multi-Modal Attention Network for semantic source code retrieval. A comprehensive multi-modal representation is developed for representing unstructured and structured features of source code, with one LSTM for the sequential tokens of code, a Tree-LSTM for the AST of code and a GGNN (Gated Graph Neural Network) for the CFG of code. Furthermore, a multi-modal attention fusion layer is applied to assign weights to different parts of each modality of source code and then integrate them into a single hybrid representation. Comprehensive experiments and analysis on a large-scale real-world dataset show that our proposed model can accurately retrieve code snippets and outperforms the state-of-the-art methods.
Zou, C, Sui, Y, Yan, H & Xue, J 2019, 'TCD: Statically Detecting Type Confusion Errors in C++ Programs', Proceedings - International Symposium on Software Reliability Engineering, ISSRE, International Symposium on Software Reliability Engineering, IEEE, Berlin, Germany, pp. 292-302.View/Download from: Publisher's site
© 2019 IEEE. For performance reasons, C++, albeit unsafe, is often the programming language of choice for developing software infrastructures. A serious type of security vulnerability in C++ programs is type confusion, which may lead to program crashes and control flow hijack attacks. While existing mitigation solutions almost exclusively rely on dynamic analysis techniques, which suffer from low code coverage and high overhead, static analysis has rarely been investigated. This paper presents TCD, a static type confusion detector built on top of a precise demand-driven field-, context-and flow-sensitive pointer analysis. Unlike existing pointer analyses, TCD is type-aware as it not only preserves the type information in the pointed-to objects but also handles complex language features of C++ such as multiple inheritance and placement new, making it therefore possible to reason about type casting in C++ programs. We have implemented TCD in LLVM and evaluated it using seven C++ applications (totaling 526,385 lines of C++ code) from Qt, a widely-adopted C++ toolkit for creating GUIs and cross-platform software. TCD has found five type confusion bugs, including one reported previously in prior work and four new ones, in under 7.3 hours, with a low false positive rate of 28.2%.
Aung, TWW, Huo, H & Sui, Y 2019, 'Interactive Traceability Links Visualization using Hierarchical Trace Map', 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE International Conference on Software Maintenance and Evolution, IEEE, Cleveland, Ohio.View/Download from: Publisher's site
Traceability links between various software artifacts of a system aid software engineers in system comprehension, verification and change impact analysis. Establishing trace links between software artifacts manually is an error-prone and costly task. Recently, studies in automated traceability link recovery area have received broad attention in the software maintenance community aiming to overcome the challenges of manual trace links maintenance process. In these studies, the trace links results generated by an automated trace recovery approach are presented either in a bland textual matrix format (e.g., tabular format) or two-dimensional graphical formats (e.g. tree view, hierarchical leaf node). Therefore, it is challenging for software engineers to explore the inter-relationships between various artifacts at once (e.g., which test cases and source code files/methods are related to a particular requirement). In this position paper, we propose a hierarchical trace map visualization technique to explore inter-relationships between various artifacts at once naturally and intuitively
Cheng, X, Wang, H, Hua, J, Zhang, M, Xu, G, Yi, L & Sui, Y 2019, 'Static detection of control-flow-related vulnerabilities using graph embedding', Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems, ICECCS, International Conference on Engineering of Complex Computer Systems, IEEE, Guangzhou, China, pp. 41-50.View/Download from: Publisher's site
© 2019 IEEE. Static vulnerability detection has shown its effectiveness in detecting well-defined low-level memory errors. However, high-level control-flow related (CFR) vulnerabilities, such as insufficient control flow management (CWE-691), business logic errors (CWE-840), and program behavioral problems (CWE-438), which are often caused by a wide variety of bad programming practices, posing a great challenge for existing general static analysis solutions. This paper presents a new deep-learning-based graph embedding approach to accurate detection of CFR vulnerabilities. Our approach makes a new attempt by applying a recent graph convolutional network to embed code fragments in a compact and low-dimensional representation that preserves high-level control-flow information of a vulnerable program. We have conducted our experiments using 8,368 real-world vulnerable programs by comparing our approach with several traditional static vulnerability detectors and state-of-the-art machine-learning-based approaches. The experimental results show the effectiveness of our approach in terms of both accuracy and recall. Our research has shed light on the promising direction of combining program analysis with deep learning techniques to address the general static analysis challenges.
Lei, Y & Sui, Y 2019, 'Fast and Precise Handling of Positive Weight Cycles for Field-Sensitive Pointer Analysis', Static Analysis, International Static Analysis Symposium, Springer, Porto, Portugal, pp. 27-47.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2019. By distinguishing the fields of an object, Andersen's field-sensitive pointer analysis yields better precision than its field-insensitive counterpart. A typical field-sensitive solution to inclusion-based pointer analysis for C/C++ is to add positive weights to the edges in Andersen's constraint graph to model field access. However, the precise modeling is at the cost of introducing a new type of constraint cycles, called positive weight cycles (PWCs). A PWC, which contains at least one positive weight constraint, can cause infinite and redundant field derivations of an object unless the number of its fields is bounded by a pre-defined value. PWCs significantly affect analysis performance when analyzing large C/C++ programs with heavy use of structs and classes.para This paper presents Dea, a fast and precise approach to handling of PWCs that significantly accelerates existing field-sensitive pointer analyses by using a new field collapsing technique that captures the derivation equivalence of fields derived from the same object when resolving a PWC.para Two fields are derivation equivalent in a PWC if they are always pointed to by the same variables (nodes) in this PWC. A stride-based field representation is proposed to identify and collapse derivation equivalent fields into one, avoiding redundant field derivations with significantly fewer field objects during points-to propagation. We have conducted experiments using 11 open-source C/C++ programs. The evaluation shows that Dea is on average 7.1X faster than Pearce et al.'s field-sensitive analysis (Pkh), obtaining the best speedup of 11.0X while maintaining the same precision.
Sui, Y, Zhang, Y, Zheng, W, Zhang, M & Xue, J 2019, 'Event trace reduction for effective bug replay of Android apps via differential GUI state analysis', ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, Tallinn, Estonia, pp. 1095-1099.View/Download from: Publisher's site
© 2019 ACM. Existing Android testing tools, such as Monkey, generate a large quantity and a wide variety of user events to expose latent GUI bugs in Android apps. However, even if a bug is found, a majority of the events thus generated are often redundant and bug-irrelevant. In addition, it is also time-consuming for developers to localize and replay the bug given a long and tedious event sequence (trace). This paper presents ECHO, an event trace reduction tool for effective bug replay by using a new differential GUI state analysis. Given a sequence of events (trace), ECHO aims at removing bug-irrelevant events by exploiting the differential behavior between the GUI states collected when their corresponding events are triggered. During dynamic testing, ECHO injects at most one lightweight inspection event after every event to collect its corresponding GUI state. A new adaptive model is proposed to selectively inject inspection events based on sliding windows to differentiate the GUI states on-the-fly in a single testing process. The experimental results show that ECHO improves the effectiveness of bug replay by removing 85.11% redundant events on average while also revealing the same bugs as those detected when full event sequences are used.
Wu, D, Liu, J, Sui, Y, Chen, S & Xue, J 2019, 'Precise Static Happens-Before Analysis for Detecting UAF Order Violations in Android', Proceedings of 12th International Conference on Software Testing, Verification and Validation, IEEE Conference on Software Testing, Validation and Verification, IEEE, Xi'an, China.View/Download from: Publisher's site
Unlike Java, Android provides a rich set of APIs to support a hybrid concurrency system, which consists of both Java threads and an event queue mechanism for dispatching asynchronous events. In this model, concurrency errors often manifest themselves in the form of order violations. An order violation occurs when two events access the same shared object in an incorrect order, causing unexpected program behaviors (e.g., null pointer dereferences). This paper presents SARD, a static analysis tool for detecting both intra-and inter-thread use-after-free (UAF) order violations, when a pointer is dereferenced (used) after it no longer points to any valid object, through systematic modeling of Android's concurrency mechanism. We propose a new flow-and context-sensitive static happens-before (HB) analysis to reason about the interleavings between two events to effectively identify precise HB relations and eliminate spurious event interleavings. We have evaluated SARD by comparing with NADROID, a state-of-the-art static order violation detection tool for Android. SARD outperforms NADROID in terms of both precision (by reporting three times fewer false alarms than NADROID given the same set of apps used by NADROID) and efficiency (by running two orders of magnitude faster than NADROID).
Xu, X, Sui, Y, Yan, H & Xue, J 2019, 'VFix: Value-Flow-Guided Precise Program Repair for Null Pointer Dereferences', Proceedings of 41th International Conference on Software Engineering, International Conference on Software Engineering, IEEE, Montreal, QC, Canada,.View/Download from: Publisher's site
Automated Program Repair (APR) faces a key challenge in efficiently generating correct patches from a potentially infinite solution space. Existing approaches, which attempt to reason about the entire solution space, can be ineffective (by often producing no plausible patches at all) and imprecise (by often producing plausible but incorrect patches). We present VFIX, a new value-flow-guided APR approach, to fix null pointer exception (NPE) bugs by considering a substantially reduced solution space in order to greatly increase the number of correct patches generated. By reasoning about the data and control dependences in the program, VFIX can identify bug-relevant repair statements more accurately and generate more correct repairs than before. VFIX outperforms a set of 8 state-of-the-art APR tools in fixing the NPE bugs in Defects4j in terms of both precision (by correctly fixing 3 times as many bugs as the most precise one and 50% more than all the bugs correctly fixed by these 8 tools altogether) and efficiency (by producing a correct patch in minutes instead of hours).
Yan, H, Chen, S, Sui, Y, Zhang, Y, Zou, C & Xue, J 2019, 'Per-Dereference Verification of Temporal Heap Safety via Adaptive Context-Sensitive Analysis', SAS 2019: Static Analysis, International Static Analysis Symposium, Spronger, Porto, Portugal, pp. 48-72.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2019. We address the problem of verifying the temporal safety of heap memory at each pointer dereference. Our whole-program analysis approach is undertaken from the perspective of pointer analysis, allowing us to leverage the advantages of and advances in pointer analysis to improve precision and scalability. A dereference (Forumala Presented)., say, via pointer q is unsafe iff there exists a deallocation (Forumala Presented)., say, via pointer p such that on a control-flow path (Forumala Presented).,p aliases with q (with both pointing to an object o representing an allocation), denoted, and (Forumala Presented). reaches (Forumala Presented). via control flow, denoted. Applying directly any existing pointer analysis, which is typically solved separately with an associated control-flow reachability analysis, will render such verification highly imprecise, since (i.e., (Forumala Presented). does not distribute over (Forumala Presented). ). For precision, we solve, with a control-flow path (Forumala Presented). containing an allocation o, a deallocation (Forumala Presented). and a dereference (Forumala Presented). abstracted by a tuple of three contexts. For scalability, a demand-driven full context-sensitive (modulo recursion) pointer analysis, which operates on pre-computed def-use chains with adaptive context-sensitivity, is used to infer, without losing soundness or precision. Our evaluation shows that our approach can successfully verify the safety of 81.3% (or (Forumala Presented).) of all the dereferences in a set of ten C programs totalling 1,166 KLOC.
Barbar, M, Sui, Y, Zhang, H, Chen, S & Xue, J 2018, 'Live path CFI against control flow hijacking attacks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 23rd Australasian Conference on Information Security and Privacy, Wollongong, Australia, pp. 768-779.View/Download from: Publisher's site
© Springer International Publishing AG, part of Springer Nature 2018. Through memory vulnerabilities, control flow hijacking allows an attacker to force a running program to execute other than what the programmer has intended. Control Flow Integrity (CFI) aims to prevent the adversarial effects of these attacks. CFI attempts to enforce the programmer's intent by ensuring that a program only runs according to a control flow graph (CFG) of the program. The enforced CFG can be built statically or dynamically, and Per-Input Control Flow Integrity (PICFI) represents a recent advance in dynamic CFI techniques. PICFI begins execution with the empty CFG of a program and lazily adds edges to the CFG during execution according to concrete inputs. However, this CFG grows monotonically, i.e., edges are never removed when corresponding control flow transfers become illegal. This paper presents LPCFI, Live Path Control Flow Integrity, to more precisely enforce forward edge CFI using a dynamically computed CFG by both adding and removing edges for all indirect control flow transfers from indirect callsites, thereby raising the bar against control flow hijacking attacks.
Barbar, M, Sui, Y, Zhang, H, Chen, S & Xue, J 2018, 'Live path control flow integrity', Proceedings - International Conference on Software Engineering, International Conference on Software Engineering, Gothenburg, Sweden, pp. 195-196.View/Download from: Publisher's site
© 2018 Authors. Per-Input Control Flow Integrity (PICFI) represents a recent advance in dynamic CFI techniques. PICFI starts with the empty CFG of a program and lazily adds edges to the CFG during execution according to concrete inputs. However, this CFG grows monotonically, i.e., invalid edges are never removed when corresponding control flow transfers (via indirect calls) become illegal (i.e., will never be executed again). This paper presents LPCFI, Live Path Control Flow Integrity, to more precisely enforce forward edge CFI using a dynamically computed CFG by both adding and removing edges for all indirect control flow transfers from function pointer calls, thereby raising the bar against control flow hijacking attacks.
Gao, Q, Ma, S, Shao, S, Sui, Y, Zhao, G, Ma, L, Ma, X, Duan, F, Deng, X, Zhang, S & Chen, X 2018, 'CoBOT: Static C/C++ bug detection in the presence of incomplete code', Proceedings of the 26th Conference on Program Comprehension, International Conference on Program Comprehension, ACM, Gothenburg, Sweden, pp. 385-388.View/Download from: Publisher's site
© 2018 Authors. To obtain precise and sound results, most of existing static analyzers require whole program analysis with complete source code. However, in reality, the source code of an application always interacts with many third-party libraries, which are often not easily accessible to static analyzers. Worse still, more than 30% of legacy projects  cannot be compiled easily due to complicated configuration environments (e.g., third-party libraries, compiler options and macros), making ideal "whole-program analysis" unavailable in practice. This paper presents CoBOT , a static analysis tool that can detect bugs in the presence of incomplete code. It analyzes function APIs unavailable in application code by either using function summarization or automatically downloading and analyzing the corresponding library code as inferred from the application code and its configuration files. The experiments show that CoBOT is not only easy to use, but also effective in detecting bugs in real-world programs with incomplete code. Our demonstration video is at: https://youtu.be/bhjJp3e7LPM.
Nobakht, M, Sui, Y, Seneviratne, A & Hu, W 2018, 'Permission Analysis of Health and Fitness Apps in IoT Programming Frameworks', Proceedings - 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018, 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, New York, USA, pp. 533-538.View/Download from: Publisher's site
© 2018 IEEE. Popular IoT programming frameworks, such as Google Fit, enable third-party developers to build apps to store and retrieve user data from a variety of data sources (e.g., wearables). The problem of overprivilege stands out due to the diversity and complexity of IoT apps, and developers rushing to release apps to meet the high demand in the IoT market. Any incorrect API usage of the IoT frameworks by third-party developers can lead to security risks, especially in health and fitness apps. Protecting sensitive user information is critically important to prevent financial and psychological harms. This paper presents PGFIT, a static permission analysis tool that precisely and efficiently identifies overprivilege issues in third-party apps built on top of a popular IoT programming framework, Google Fit. PGFIT extracts the set of requested permission scopes and the set of used data types in Google Fitenabled apps to determine whether the requested permission scopes are actually necessary. In this way, PGFIT serves as a quality assurance tool for developers and a privacy checker for app users. We used PGFIT to perform overprivilege analysis on a set of 20 Google Fit-enabled apps and with manual inspection, we found that 6 (30%) of them are overprivileged.
Yan, H, Sui, Y, Chen, S & Xue, J 2018, 'Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities', 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), International Conference on Software Engineering, IEEE, Gothenburg, Sweden, pp. 327-337.
Zhang, Y, Sui, Y & Xue, J 2018, 'Launch-mode-aware Context-sensitive Activity Transition Analysis', Proceedings of the 40th International Conference on Software Engineering, International Conference on Software Engineering, ACM, Gothenburg, Sweden, pp. 598-608.View/Download from: Publisher's site
Existing static analyses model activity transitions in Android apps context-insensitively, making it impossible to distinguish different activity launch modes, reducing the pointer analysis precision for an activity's callbacks, and potentially resulting in infeasible activity transition paths. In this paper, we introduce Chime, a launch-mode-aware context-sensitive activity transition analysis that models different instances of an activity class according to its launch mode and the transitions between activities context-sensitively, by working together with an object-sensitive pointer analysis.
Our evaluation shows that our context-sensitive activity transition analysis is more precise than its context-insensitive counterpart in capturing activity transitions, facilitating GUI testing, and improving the pointer analysis precision.
Fan, X, Sui, Y, Liao, X & Xue, J 2017, 'Boosting the precision of virtual call integrity protection with partial pointer analysis for C++', ISSTA 2017 - Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, Santa Barbara, CA, USA, pp. 329-340.View/Download from: Publisher's site
© 2017 Association for Computing Machinery. We present, Vip, an approach to boosting the precision of Virtual call Integrity Protection for large-scale real-world C++ programs (e.g., Chrome) by using pointer analysis for the first time. Vip introduces two new techniques: (1) a sound and scalable partial pointer analysis for discovering statically the sets of legitimate targets at virtual callsites from separately compiled C++ modules and (2) a lightweight instrumentation technique for performing (virtual call) integrity checks at runtime. Vip raises the bar against vtable hijacking attacks by providing stronger security guarantees than the CHA-based approach with comparable performance overhead. Vip is implemented in LLVM-3.8.0 and evaluated using SPEC programs and Chrome. Statically, Vip protects virtual calls more effectively than CHA by significantly reducing the sets of legitimate targets permitted at 20.3% of the virtual callsites per program, on average. Dynamically, Vip incurs an average (maximum) instrumentation overhead of 0.7% (3.3%), making it practically deployable as part of a compiler tool chain.
Yan, H, Sui, Y, Chen, S & Xue, J 2017, 'Machine-learning-guided typestate analysis for static use-After-free detection', Proceedings of the 33rd Annual Computer Security Applications Conference, Annual Computer Security Applications Conference, ACM, Orlando, FL, USA, pp. 42-54.View/Download from: Publisher's site
Typestate analysis relies on pointer analysis for detecting temporal memory safety errors, such as use-After-free (UAF). For large programs, scalable pointer analysis is usually imprecise in analyzing their hard "corner cases", such as infeasible paths, recursion cycles, loops, arrays, and linked lists. Due to a sound over-Approximation of the points-To information, a large number of spurious aliases will be reported conservatively, causing the corresponding typestate analysis to report a large number of false alarms. Thus, the usefulness of typestate analysis for heap-intensive clients, like UAF detection, becomes rather limited, in practice. We introduce Tac, a static UAF detector that bridges the gap between typestate and pointer analyses by machine learning. Tac learns the correlations between program features and UAF-related aliases by using a Support Vector Machine (SVM) and applies this knowledge to further disambiguate the UAF-related aliases reported imprecisely by the pointer analysis so that only the ones validated by its SVM classifier are further investigated by the typestate analysis. Despite its unsoundness, Tac represents a practical typestate analysis approach for UAF detection. We have implemented Tac in LLVM-3.8.0 and evaluated it using a set of eight open-source C/C++ programs. The results show that Tac is effective (in terms of finding 5 known CVE vulnerabilities, 1 known bug, and 8 new bugs with a low false alarm rate) and scalable (in terms of analyzing a large codebase with 2,098 KLOC in just over 4 hours).
Zhang, J, Sui, Y & Xue, J 2017, 'Incremental analysis for probabilistic programs', Static Analysis (LNCS), International Symposium on Static Analysis, Springer, New York, NY, USA, pp. 450-472.View/Download from: Publisher's site
© 2017, Springer International Publishing AG. This paper presents Icpp, a new data-flow-based InCremental analysis for Probabilistic Programs, to infer their posterior probability distributions in response to small yet frequent changes to probabilistic knowledge, i.e., prior probability distributions and observations. Unlike incremental analyses for usual programs, which emphasize code changes, such as statement additions and deletions, Icpp focuses on changes made to probabilistic knowledge, the key feature in probabilistic programming. The novelty of Icpp lies in capturing the correlation between prior and posterior probability distributions by reasoning about the probabilistic dependence of each data-flow fact, so that any posterior probability affected by newly changed probabilistic knowledge can be incrementally updated in a sparse manner without recomputing it from scratch, thereby allowing the previously computed results to be reused. We have evaluated Icpp with a set of probabilistic programs. Our results show that Icpp is an order of magnitude faster than the state-of-the-art data-flow-based inference in analyzing probabilistic programs under small yet frequent changes to probabilistic knowledge, with an average analysis overhead of around 0.1Â s in response to a single change.
Di, P & Sui, Y 2016, 'Accelerating dynamic data race detection using static thread interference analysis', Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016, pp. 30-39.View/Download from: Publisher's site
Copyright © 2016 ACM. Precise dynamic race detectors report an error if and only if more than one thread concurrently exhibits conict on a memory access. They insert instrumentations at compiletime to perform runtime checks on all memory accesses to ensure that all races are captured and no spurious warnings are generated. However, a dynamic race check for a particular memory access statement is guaranteed to be redundant if the statement can be statically identified as thread interference-free. Despite significant recent advances in dynamic detection techniques, the redundant check remains a critical factor that leads to prohibitive overhead of dynamic race detection for multithreaded programs. In this paper, we present a new framework that eliminates redundant race check and boosts the dynamic race detection by performing static optimizations on top of a series of thread interference analysis phases. Our framework is implemented on top of LLVM 3.5.0 and evaluated with an industry dynamic race detector TSAN which is available as a part of LLVM tool chain. 11 benchmarks from SPLASH2 are used to evaluate the effectiveness of our approach in accelerating TSAN by eliminating redundant interference-free checks. The experimental result demonstrates our new approach achieves from 1.4x to 4.0x (2.4x on average) speedup over original TSAN under 4 threads setting, and achieves from 1.3x to 4.6x (2.6x on average) speedup under 16 threads setting.
Sui, Y & Xue, J 2016, 'On-Demand Strong Update Analysis via Value-Flow Refinement', FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), ASSOC COMPUTING MACHINERY, Seattle, WA, pp. 460-473.View/Download from: Publisher's site
Sui, Y & Xue, J 2016, 'SVF: Interprocedural Static Value-Flow Analysis in LLVM', PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC 2016), 25th International Conference on Compiler Construction (CC), ASSOC COMPUTING MACHINERY, Barcelona, SPAIN, pp. 265-266.View/Download from: Publisher's site
Sui, Y, Di, P & Xue, J 2016, 'Sparse Flow-Sensitive Pointer Analysis for Multithreaded Programs', PROCEEDINGS OF CGO 2016: THE 14TH INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 14th International Symposium on Code Generation and Optimization (CGO), ASSOC COMPUTING MACHINERY, Barcelona, SPAIN, pp. 160-170.View/Download from: Publisher's site
Sui, Y, Fan, X, Zhou, H & Xue, J 2016, 'Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization', Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems - LCTES 2016, the 17th ACM SIGPLAN/SIGBED Conference, ACM Press.View/Download from: Publisher's site
Sui, Y, Fan, X, Zhou, H & Xue, J 2016, 'Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization', ACM SIGPLAN NOTICES, ASSOC COMPUTING MACHINERY, pp. 41-51.View/Download from: Publisher's site
Yan, H, Sui, Y, Chen, S & Xue, J 2016, 'Automated memory leak fixing on value-flow slices for C programs', Proceedings of the ACM Symposium on Applied Computing, pp. 1386-1393.View/Download from: Publisher's site
© 2016 ACM. C is the dominant programming language for developing embedded software, operating systems, and device drivers. Unlike programs written in managed languages like Java, C programs rely on explicit memory management and are prone to memory leaks. Existing (static or dynamic) detectors only report leaks, but fixing them often requires considerable manual effort by inspecting a list of reported true and false alarms. How to develop on-demand lightweight techniques for automated leak fixing without introducing new memory errors remains challenging. In this paper, we introduce AUTOFIX, a fully automated leak-fixing approach for C programs by combining static and dynamic program analysis. Given a leaky allocation site reported by a static memory leak detector, AUTOFIX performs a graph reachability analysis to identify leaky paths on the value-flow slices of the program, and then conducts a liveness analysis to locate the program points for inserting fixes (i.e., the missing free calls) on the identified leaky paths. We have implemented AUTOFIX in LLVM-3.5.0 and evaluated it using five SPEC2000 benchmarks and three open-source applications. Experimental results show that AUTOFIX can safely fix all the memory leaks reported by a state-of-theart static memory leak detector with small instrumentation overhead.
Di, P, Sui, Y, Ye, D & Xue, J 2015, 'Region-Based May-Happen-in-Parallel Analysis for C Programs', 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 44th Annual International Conference on Parallel Processing Workshops (ICPPW), IEEE, Beijing, PEOPLES R CHINA, pp. 889-898.View/Download from: Publisher's site
Fan, X, Sui, Y & Xue, J 2015, 'Contention-Aware Scheduling for Asymmetric Multicore Processors', 2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 21st IEEE International Conference on Parallel and Distributed Systems ICPADS, IEEE, Melbourne, AUSTRALIA, pp. 742-751.View/Download from: Publisher's site
Li, Y, Tan, T, Sui, Y & Xue, J 2014, 'Self-inferencing Reflection Resolution for Java', ECOOP 2014 - OBJECT-ORIENTED PROGRAMMING, 28th European Conference on Object-Oriented Programming (ECOOP), SPRINGER-VERLAG BERLIN, Uppsala, SWEDEN, pp. 27-53.
Ye, D, Su, Y, Sui, Y & Xue, J 2014, 'WPBOUND: Enforcing Spatial Memory Safety Efficiently at Runtime with Weakest Preconditions', 2014 IEEE 25TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 25th IEEE International Symposium on Software Reliability Engineering (ISSRE), IEEE, Naples, ITALY, pp. 88-99.View/Download from: Publisher's site
Ye, D, Sui, Y & Xue, J 2014, 'Accelerating dynamic detection of uses of undefined values with static value-flow analysis', Proceedings of the 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014, pp. 154-164.View/Download from: Publisher's site
Uninitialized variables can cause system crashes when used and security vulnerabilities when exploited. With source rather than binary instrumentation, dynamic analysis tools such as MSan can detect uninitialized memory uses at significantly reduced overhead but are still costly. In this paper, we introduce a static value-flow analysis, called Usher, to guide and accelerate the dynamic analysis performed by such tools. Usher reasons about the definedness of values using a value-flow graph (VFG) that captures def-use chains for both top-level and address-taken variables interprocedurally and removes unnecessary instrumentation by solving a graph reachability problem. Usher works well with any pointer analysis (done a priori) and facilitates advanced instrumentation-reducing optimizations (with two demonstrated here). Implemented in LLVM and evaluated using all the 15 SPEC2000 C programs, Usher can reduce the slowdown of MSan from 212% - 302% to 123% - 140% for a number of configurations tested. Copyright © 2014 by the Association for Computing Machinery, Inc. (ACM).
Sui, Y, Li, Y & Xue, J 2013, 'Query-Directed Adaptive Heap Cloning for Optimizing Compilers', PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 11th IEEE/ACM International Symposium on Code Generation and Optimization (CGO), IEEE, Shenzhen, PEOPLES R CHINA, pp. 1-11.
Di, P, Ye, D, Su, Y, Sui, Y & Xue, J 2012, 'Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs', Proceedings of the International Conference on Parallel Processing, pp. 350-359.View/Download from: Publisher's site
Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the polyhedral model to extract parallelism from a loop nest by applying a sequence of affine transformations to the loop nest. However, how to automate this process to exploit both intra and inter-SM parallelism for GPUs remains a challenging problem. Presently, compilers may generate code significantly slower than hand-optimized code for certain applications. This paper describes a compiler framework for tiling and parallelizing loop nests with uniform dependences into CUDA code. We aim to improve two levels of wave front parallelism. We find tiling hyper planes by embedding parallelism enhancing constraints in the polyhedral model to maximize intra-tile, i.e., intra-SM parallelism. This improves the load balance among the SPs in an SM executing a wave front of loop iterations within a tile. We eliminate parallelism-hindering false dependences to maximize inter-tile, i.e., inter-SM parallelism. This improves the load balance among the SMs executing a wave front of tiles. Our approach has been implemented in PLUTO and validated using eight benchmarks on two different NVIDIA GPUs (C1060 and C2050). Compared to PLUTO, our approach achieves 2 - 5.5X speedups across the benchmarks. Compared to highly hand-optimized 1-D Jacobi (3 points), 2-D Jacobi (5 points), 3-D Jacobi (7 points) and 3-D Jacobi (27 points), our speedups, 1.17X, 1.41X, 0.97X and 0.87X with an average of 1.10X on C1060 and 1.24X, 1.20X, 0.86X and 0.95X with an average of 1.06X on C2050, are competitive. © 2012 IEEE.
Sui, Y, Ye, D & Xue, J 2012, 'Static memory leak detection using full-sparse value-flow analysis', 2012 International Symposium on Software Testing and Analysis, ISSTA 2012 - Proceedings, pp. 254-264.View/Download from: Publisher's site
We introduce a static detector, Saber, for detecting memory leaks in C programs. Leveraging recent advances on sparse pointer analysis, Saber is the first to use a full-sparse value-flow analysis for leak detection. Saber tracks the flow of values from allocation to free sites using a sparse value-flow graph (SVFG) that captures def-use chains and value flows via assignments for all memory locations represented by both top-level and address-taken pointers. By exploiting field-, flow- and context-sensitivity during different phases of the analysis, Saber detects leaks in a program by solving a graph reachability problem on its SVFG. Saber, which is fully implemented in Open64, is effective at detecting 211 leaks in the 15 SPEC2000 C programs and five applications, while keeping the false positive rate at 18.5%. We have also compared Saber with Fastcheck (which analyzes allocated objects flowing only into top-level pointers) and Sparrow (which handles all allocated objects using abstract interpretation) using the 15 SPEC2000 C programs. Saber is as accurate as Sparrow but is 14.2X faster and reports 40.7% more bugs than Fastcheck at a slightly higher false positive rate but is only 3.7X slower. © 2012 ACM.
Sui, Y, Ye, D & Xue, J 2012, 'Static memory leak detection using full-sparse value-flow analysis', Proceedings of the 2012 International Symposium on Software Testing and Analysis - ISSTA 2012, the 2012 International Symposium, ACM Press.View/Download from: Publisher's site
Sui, Y, Ye, S, Xue, J & Yew, PC 2011, 'SPAS: Scalable path-sensitive pointer analysis on full-sparse SSA', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 155-171.View/Download from: Publisher's site
We present a new SPAS (Scalable PAth-Sensitive) framework for resolving points-to sets in C programs that exploits recent advances in pointer analysis. SPAS enables intraprocedural path-sensitivity to be obtained in flow-sensitive and context-sensitive (FSCS) techniques scalably, by using BDDs to manipulate program paths and by performing pointer analysis level-by-level on a full-sparse SSA representation similarly as the state-of-the-art LevPA (the FSCS version of SPAS). Compared with LevPA using all 27 C benchmarks in SPEC CPU2000 and CPU2006, SPAS incurs 18.42% increase in analysis time and 10.97% increase in memory usage on average, while guaranteeing that all points-to sets are obtained with non-decreasing precision. © 2011 Springer-Verlag.
Yang, Y, Zhou, X, Yang, G, Wang, B & Sui, Y 2010, 'Trustworthy Service Scheduling Framework for QoS garantee of composite service', 2nd International Conference on Information Science and Engineering, ICISE2010 - Proceedings.View/Download from: Publisher's site
Interoperability-centric and loosely-coupled distributed computing environment brings forward a new challenge on trustworthiness of service computing, at the same time, user puts more and more pressure on system's non-functional requirements such as real-time, availability, security, cost and so on. Therefore service's QoS should be guaranteed trustworthily. In this paper a Trustworthy Service Scheduling Framework (TSSF) is presented to guarantee composite service's QoS requirement. The overall QoS requirement of the entire composite service is satisfied with the method of planning and guaranteeing the QoS of each atomic service. Our approach is to decompose the QoS requirement for the composite service into separate ones for the atoms; and then TSSF delivers expected QoS for each atomic service with adaptive trustworthy service scheduling mechanisms based on Dynamic Service Entity Group (DSEG); finally experimental results indicate good performance of the proposed framework. © 2010 IEEE.
Yulei, S, Xingshe, Z & Gang, Y 2009, 'QoS decomposition for dependable service-oriented middleware', 2009 Second ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM 2009, pp. 103-108.View/Download from: Publisher's site
In this paper, a dependable service-oriented middleware (DSM) is proposed by introducing the concept of Quality of Service (QoS) into service-oriented architecture. DSM provides a dependable framework with the functionalities of QoS monitoring, configuration and runtime management to fulfill user's QoS request. The overall QoS requirement of the entire composite service is satisfied with the method of planning and guaranteeing the QoS of each atomic service. Our approach is to decompose the QoS requirement for the composite service into separate ones for the atoms, and to deliver expected QoS for each atomic service with adaptive service scheduling mechanisms. This paper first presents a group-based service redundancy management and QoS decomposition model to support dependable service-oriented architecture; then demonstrates the structure and components of DSM; finally experimental results indicate good performance of the proposed framework and algorithms. ©2009 IEEE.