Yulei Sui's general interests are program analysis, compilers, software engineering and machine learning. Particularly, his research focuses on building fundamental static and dynamic program analysis techniques to improve the reliability and security of modern software systems. His recent interest lies in bug prediction, program repair, and software security analysis through data mining and machine learning.
His papers have been published in the top-tier conferences and journals in the field of software engineering and programming language such as TSE, ICSE, FSE, ISSTA, ECOOP, SAS, CGO, CC, ICPP, LCTES. He was a research intern in the Program Analysis Group in Oracle Lab Australia in 2013. He was a keynote speaker at EuroLLVM 2016, and has been awarded an IPRS scholarship, an ICSE Distinguished Paper Award, a CGO Best Paper, an ACM CAPS award, and an ARC Discovery Early Career Researcher Award (2017-2019).
More up-to-date information on my personal web page.
Can supervise: YES
- Program Analysis
- Software Engineering
- Software Security Analysis
- Machine Learning
Xiao, G, Zheng, Z, Jiang, B & Sui, Y 2019, 'An Empirical Study of Regression Bug Chains in Linux', IEEE Transactions on Reliability.
Sui, Y, Fan, X, Zhou, H & Xue, J 2018, 'Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization', ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, vol. 17, no. 2.View/Download from: UTS OPUS or Publisher's site
Hou, Z, Wang, Y, Sui, Y, Gu, J, Zhao, T & Zhou, X 2018, 'Managing high-performance computing applications as an on-demand service on federated clouds', Computers and Electrical Engineering, vol. 67, pp. 579-595.View/Download from: UTS OPUS or Publisher's site
© 2018 Elsevier Ltd There are several challenges (e.g., imbalance between supply and demand of hardware resources and software licenses, and usability) under modern High-Performance Computing (HPC) environment. As a means of providing an on-demand service for end users, we propose a Software-as-a-Service (SaaS) approach for managing commercial HPC applications as a Web-based service deployed on top of federated clouds. Some inter-trusted private or public clouds are federated to create a unified service platform with a large amount of hardware resources. In addition, an on-demand, pay-per-use model for Web-service-enabled HPC applications is proposed. Further, we provide an economic analysis of the proposed approach from the perspective of end users, cloud service providers, and Independent Software Vendors (ISVs). We conduct a simulation using two HPC application services on three federated clouds. A combined Quality of Service (QoS) and economic evaluation demonstrates a better effect of the proposed approach comparing with existing HPC platforms.
IEEE We present SUPA, a value-flow-based demand-driven flow- and context-sensitive pointer analysis with strong updates for C and C++ programs. SUPA enables computing points-to information via value-flow refinement, in environments with small time and memory budgets. We formulate SUPA by solving a graph-reachability problem on an inter-procedural value-flow graph representing a program's def-use chains, which are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable's points-to set), SUPA reasons about the flow of values along the pre-computed def-use chains sparsely (rather than across all program points), by performing only the work necessary for the query (rather than analyzing the whole program). In particular, strong updates are performed to filter out spurious def-use chains through value-flow refinement as long as the total budget is not exhausted.
Sui, Y, Yan, H, Zheng, Z, Zhang, Y & Xue, J 2018, 'Parallel construction of interprocedural memory SSA form', Journal of Systems and Software, vol. 146, pp. 186-195.View/Download from: UTS OPUS or Publisher's site
© 2018 Elsevier Inc. Interprocedural memory SSA form, which provides a sparse data-flow representation for indirect memory operations, paves the way for many advanced program analyses. Any performance improvement for memory SSA construction benefits for a wide range of clients (e.g., bug detection and compiler optimisations). However, its construction is much more expensive than that for scalar-based SSA form. The memory objects distinguished at a pointer dereference significantly increases the number of variables that need to be put on SSA form, resulting in considerable analysis overhead when analyzing large programs (e.g., millions of lines of code). This paper presents PARSSA, a fully parameterised approach for parallel construction of interprocedural memory SSA form by utilising multi-core computing resources. PARSSA partitions whole-program memory objects into uniquely identified memory regions. The indirect memory accesses in a function are fully parameterised using partitioned memory regions, so that the memory SSA construction of a parameterised function is readily parallelised. We implemented PARSSA in LLVM using Intel Threading Building Block (TBB) for creating parallel tasks. We evaluated PARSSA using 15 large applications. PARSSA achieves up to 6.9 × speedup against the sequential version on an 8-core machine.
Sui, Y, Ye, D, Su, Y & Xue, J 2016, 'Eliminating Redundant Bounds Checks in Dynamic Buffer Overflow Detection Using Weakest Preconditions', IEEE TRANSACTIONS ON RELIABILITY, vol. 65, no. 4, pp. 1682-1699.View/Download from: Publisher's site
Ye, S, Sui, Y & Xue, J 2014, 'Region-Based Selective Flow-Sensitive Pointer Analysis', STATIC ANALYSIS (SAS 2014), vol. 8723, pp. 319-336.
Sui, Y, Ye, S, Xue, J & Zhang, J 2014, 'Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation', SOFTWARE-PRACTICE & EXPERIENCE, vol. 44, no. 12, pp. 1485-1510.View/Download from: Publisher's site
Wu, D, Liu, J, Sui, Y, Chen, S & Xue, J 2019, 'Precise Static Happens-Before Analysis for Detecting UAF Order Violations in Android', Proceedings of 12th International Conference on Software Testing, Verification and Validation.
Xu, X, Sui, Y, Yan, H & Xue, J 2019, 'VFix: Value-Flow-Guided Precise Program Repair for Null Pointer Dereferences', Proceedings of 41th International Conference on Software Engineering.
Barbar, M, Sui, Y, Zhang, H, Chen, S & Xue, J 2018, 'Live path CFI against control flow hijacking attacks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 768-779.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing AG, part of Springer Nature 2018. Through memory vulnerabilities, control flow hijacking allows an attacker to force a running program to execute other than what the programmer has intended. Control Flow Integrity (CFI) aims to prevent the adversarial effects of these attacks. CFI attempts to enforce the programmer's intent by ensuring that a program only runs according to a control flow graph (CFG) of the program. The enforced CFG can be built statically or dynamically, and Per-Input Control Flow Integrity (PICFI) represents a recent advance in dynamic CFI techniques. PICFI begins execution with the empty CFG of a program and lazily adds edges to the CFG during execution according to concrete inputs. However, this CFG grows monotonically, i.e., edges are never removed when corresponding control flow transfers become illegal. This paper presents LPCFI, Live Path Control Flow Integrity, to more precisely enforce forward edge CFI using a dynamically computed CFG by both adding and removing edges for all indirect control flow transfers from indirect callsites, thereby raising the bar against control flow hijacking attacks.
Barbar, M, Sui, Y, Zhang, H, Chen, S & Xue, J 2018, 'Live path control flow integrity', Proceedings - International Conference on Software Engineering, International Conference on Software Engineering, Gothenburg, Sweden, pp. 195-196.View/Download from: UTS OPUS or Publisher's site
© 2018 Authors. Per-Input Control Flow Integrity (PICFI) represents a recent advance in dynamic CFI techniques. PICFI starts with the empty CFG of a program and lazily adds edges to the CFG during execution according to concrete inputs. However, this CFG grows monotonically, i.e., invalid edges are never removed when corresponding control flow transfers (via indirect calls) become illegal (i.e., will never be executed again). This paper presents LPCFI, Live Path Control Flow Integrity, to more precisely enforce forward edge CFI using a dynamically computed CFG by both adding and removing edges for all indirect control flow transfers from function pointer calls, thereby raising the bar against control flow hijacking attacks.
Gao, Q, Ma, S, Shao, S, Sui, Y, Zhao, G, Ma, L, Ma, X, Duan, F, Deng, X, Zhang, S & Chen, X 2018, 'CoBOT: Static C/C++ bug detection in the presence of incomplete code', Proceedings - International Conference on Software Engineering, pp. 385-388.View/Download from: UTS OPUS or Publisher's site
© 2018 Authors. To obtain precise and sound results, most of existing static analyzers require whole program analysis with complete source code. However, in reality, the source code of an application always interacts with many third-party libraries, which are often not easily accessible to static analyzers. Worse still, more than 30% of legacy projects  cannot be compiled easily due to complicated configuration environments (e.g., third-party libraries, compiler options and macros), making ideal "whole-program analysis" unavailable in practice. This paper presents CoBOT , a static analysis tool that can detect bugs in the presence of incomplete code. It analyzes function APIs unavailable in application code by either using function summarization or automatically downloading and analyzing the corresponding library code as inferred from the application code and its configuration files. The experiments show that CoBOT is not only easy to use, but also effective in detecting bugs in real-world programs with incomplete code. Our demonstration video is at: https://youtu.be/bhjJp3e7LPM.
Nobakht, M, Sui, Y, Seneviratne, A & Hu, W 2018, 'Permission Analysis of Health and Fitness Apps in IoT Programming Frameworks', Proceedings - 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018, pp. 533-538.View/Download from: UTS OPUS or Publisher's site
© 2018 IEEE. Popular IoT programming frameworks, such as Google Fit, enable third-party developers to build apps to store and retrieve user data from a variety of data sources (e.g., wearables). The problem of overprivilege stands out due to the diversity and complexity of IoT apps, and developers rushing to release apps to meet the high demand in the IoT market. Any incorrect API usage of the IoT frameworks by third-party developers can lead to security risks, especially in health and fitness apps. Protecting sensitive user information is critically important to prevent financial and psychological harms. This paper presents PGFIT, a static permission analysis tool that precisely and efficiently identifies overprivilege issues in third-party apps built on top of a popular IoT programming framework, Google Fit. PGFIT extracts the set of requested permission scopes and the set of used data types in Google Fitenabled apps to determine whether the requested permission scopes are actually necessary. In this way, PGFIT serves as a quality assurance tool for developers and a privacy checker for app users. We used PGFIT to perform overprivilege analysis on a set of 20 Google Fit-enabled apps and with manual inspection, we found that 6 (30%) of them are overprivileged.
Yan, H, Sui, Y, Chen, S & Xue, J 2018, 'Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities', 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), International Conference on Software Engineering, IEEE, Gothenburg, Sweden, pp. 327-337.View/Download from: UTS OPUS
Zhang, Y, Sui, Y & Xue, J 2018, 'Launch-mode-aware Context-sensitive Activity Transition Analysis', Proceedings of the 40th International Conference on Software Engineering, ACM, pp. 598-608.View/Download from: UTS OPUS or Publisher's site
Zhang, J, Sui, Y & Xue, J 2017, 'Incremental analysis for probabilistic programs', Static Analysis (LNCS), International Symposium on Static Analysis, Springer, New York, NY, USA, pp. 450-472.View/Download from: Publisher's site
© 2017, Springer International Publishing AG. This paper presents Icpp, a new data-flow-based InCremental analysis for Probabilistic Programs, to infer their posterior probability distributions in response to small yet frequent changes to probabilistic knowledge, i.e., prior probability distributions and observations. Unlike incremental analyses for usual programs, which emphasize code changes, such as statement additions and deletions, Icpp focuses on changes made to probabilistic knowledge, the key feature in probabilistic programming. The novelty of Icpp lies in capturing the correlation between prior and posterior probability distributions by reasoning about the probabilistic dependence of each data-flow fact, so that any posterior probability affected by newly changed probabilistic knowledge can be incrementally updated in a sparse manner without recomputing it from scratch, thereby allowing the previously computed results to be reused. We have evaluated Icpp with a set of probabilistic programs. Our results show that Icpp is an order of magnitude faster than the state-of-the-art data-flow-based inference in analyzing probabilistic programs under small yet frequent changes to probabilistic knowledge, with an average analysis overhead of around 0.1Â s in response to a single change.
Fan, X, Sui, Y, Liao, X & Xue, J 2017, 'Boosting the precision of virtual call integrity protection with partial pointer analysis for C++', ISSTA 2017 - Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, Santa Barbara, CA, USA, pp. 329-340.View/Download from: UTS OPUS or Publisher's site
© 2017 Association for Computing Machinery. We present, Vip, an approach to boosting the precision of Virtual call Integrity Protection for large-scale real-world C++ programs (e.g., Chrome) by using pointer analysis for the first time. Vip introduces two new techniques: (1) a sound and scalable partial pointer analysis for discovering statically the sets of legitimate targets at virtual callsites from separately compiled C++ modules and (2) a lightweight instrumentation technique for performing (virtual call) integrity checks at runtime. Vip raises the bar against vtable hijacking attacks by providing stronger security guarantees than the CHA-based approach with comparable performance overhead. Vip is implemented in LLVM-3.8.0 and evaluated using SPEC programs and Chrome. Statically, Vip protects virtual calls more effectively than CHA by significantly reducing the sets of legitimate targets permitted at 20.3% of the virtual callsites per program, on average. Dynamically, Vip incurs an average (maximum) instrumentation overhead of 0.7% (3.3%), making it practically deployable as part of a compiler tool chain.
Yan, H, Sui, Y, Chen, S & Xue, J 2017, 'Machine-learning-guided typestate analysis for static use-After-free detection', Proceedings of the 33rd Annual Computer Security Applications Conference, Annual Computer Security Applications Conference, ACM, Orlando, FL, USA, pp. 42-54.View/Download from: UTS OPUS or Publisher's site
Typestate analysis relies on pointer analysis for detecting temporal memory safety errors, such as use-After-free (UAF). For large programs, scalable pointer analysis is usually imprecise in analyzing their hard "corner cases", such as infeasible paths, recursion cycles, loops, arrays, and linked lists. Due to a sound over-Approximation of the points-To information, a large number of spurious aliases will be reported conservatively, causing the corresponding typestate analysis to report a large number of false alarms. Thus, the usefulness of typestate analysis for heap-intensive clients, like UAF detection, becomes rather limited, in practice. We introduce Tac, a static UAF detector that bridges the gap between typestate and pointer analyses by machine learning. Tac learns the correlations between program features and UAF-related aliases by using a Support Vector Machine (SVM) and applies this knowledge to further disambiguate the UAF-related aliases reported imprecisely by the pointer analysis so that only the ones validated by its SVM classifier are further investigated by the typestate analysis. Despite its unsoundness, Tac represents a practical typestate analysis approach for UAF detection. We have implemented Tac in LLVM-3.8.0 and evaluated it using a set of eight open-source C/C++ programs. The results show that Tac is effective (in terms of finding 5 known CVE vulnerabilities, 1 known bug, and 8 new bugs with a low false alarm rate) and scalable (in terms of analyzing a large codebase with 2,098 KLOC in just over 4 hours).
Sui, Y & Xue, J 2016, 'On-Demand Strong Update Analysis via Value-Flow Refinement', FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), ASSOC COMPUTING MACHINERY, Seattle, WA, pp. 460-473.View/Download from: Publisher's site
Sui, Y, Fan, X, Zhou, H & Xue, J 2016, 'Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization', ACM SIGPLAN NOTICES, ASSOC COMPUTING MACHINERY, pp. 41-51.View/Download from: Publisher's site
Yan, H, Sui, Y, Chen, S & Xue, J 2016, 'Automated memory leak fixing on value-flow slices for C programs', Proceedings of the ACM Symposium on Applied Computing, pp. 1386-1393.View/Download from: Publisher's site
© 2016 ACM. C is the dominant programming language for developing embedded software, operating systems, and device drivers. Unlike programs written in managed languages like Java, C programs rely on explicit memory management and are prone to memory leaks. Existing (static or dynamic) detectors only report leaks, but fixing them often requires considerable manual effort by inspecting a list of reported true and false alarms. How to develop on-demand lightweight techniques for automated leak fixing without introducing new memory errors remains challenging. In this paper, we introduce AUTOFIX, a fully automated leak-fixing approach for C programs by combining static and dynamic program analysis. Given a leaky allocation site reported by a static memory leak detector, AUTOFIX performs a graph reachability analysis to identify leaky paths on the value-flow slices of the program, and then conducts a liveness analysis to locate the program points for inserting fixes (i.e., the missing free calls) on the identified leaky paths. We have implemented AUTOFIX in LLVM-3.5.0 and evaluated it using five SPEC2000 benchmarks and three open-source applications. Experimental results show that AUTOFIX can safely fix all the memory leaks reported by a state-of-theart static memory leak detector with small instrumentation overhead.
Sui, Y, Di, P & Xue, J 2016, 'Sparse Flow-Sensitive Pointer Analysis for Multithreaded Programs', PROCEEDINGS OF CGO 2016: THE 14TH INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 14th International Symposium on Code Generation and Optimization (CGO), ASSOC COMPUTING MACHINERY, Barcelona, SPAIN, pp. 160-170.View/Download from: Publisher's site
Sui, Y & Xue, J 2016, 'SVF: Interprocedural Static Value-Flow Analysis in LLVM', PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC 2016), 25th International Conference on Compiler Construction (CC), ASSOC COMPUTING MACHINERY, Barcelona, SPAIN, pp. 265-266.View/Download from: Publisher's site
Fan, X, Sui, Y & Xue, J 2016, 'Contention-aware scheduling for asymmetric multicore processors', Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, pp. 742-751.View/Download from: Publisher's site
© 2015 IEEE. Asymmetric multicore processors (AMPs) have been proposed as an energy-efficient alternative to symmetric mul-ticore processors (SMPs). However, AMPs derive their performance from core specialization, which requires co-running applications to be scheduled to run on their most appropriate core types. Despite extensive research on AMP scheduling, developing an effective scheduling algorithm remains challenging. Contention for shared resources is a key performance-limiting factor, which often renders existing contention-free scheduling algorithms ineffective. We introduce a contention-aware scheduling algorithm for ARM's big.LITTLE, a commercial AMP platform. Our algorithm comprises an offline stage and an online stage. The offline stage builds a performance interference model for an application by training it with a set of co-running applications. Guided by this model, the online stage schedules a workload by assigning its applications to their most appropriate core types in order to minimize the performance degradation caused by contention for shared resources. Our model can accurately predict the performance degradation of an application when co-running with other applications with an average prediction error of 9.60%. Compared with the default scheduler provided for ARM's big.LITTLE and the speedup-factor-driven scheduler, our contention-aware scheduler can improve overall system performance by up to 28.32% and 28.51%, respectively.
Di, P & Sui, Y 2016, 'Accelerating dynamic data race detection using static thread interference analysis', Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016, pp. 30-39.View/Download from: Publisher's site
Copyright © 2016 ACM. Precise dynamic race detectors report an error if and only if more than one thread concurrently exhibits conict on a memory access. They insert instrumentations at compiletime to perform runtime checks on all memory accesses to ensure that all races are captured and no spurious warnings are generated. However, a dynamic race check for a particular memory access statement is guaranteed to be redundant if the statement can be statically identified as thread interference-free. Despite significant recent advances in dynamic detection techniques, the redundant check remains a critical factor that leads to prohibitive overhead of dynamic race detection for multithreaded programs. In this paper, we present a new framework that eliminates redundant race check and boosts the dynamic race detection by performing static optimizations on top of a series of thread interference analysis phases. Our framework is implemented on top of LLVM 3.5.0 and evaluated with an industry dynamic race detector TSAN which is available as a part of LLVM tool chain. 11 benchmarks from SPLASH2 are used to evaluate the effectiveness of our approach in accelerating TSAN by eliminating redundant interference-free checks. The experimental result demonstrates our new approach achieves from 1.4x to 4.0x (2.4x on average) speedup over original TSAN under 4 threads setting, and achieves from 1.3x to 4.6x (2.6x on average) speedup under 16 threads setting.
Di, P, Sui, Y, Ye, D & Xue, J 2015, 'Region-Based May-Happen-in-Parallel Analysis for C Programs', 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 44th Annual International Conference on Parallel Processing Workshops (ICPPW), IEEE, Beijing, PEOPLES R CHINA, pp. 889-898.View/Download from: Publisher's site
Ye, D, Su, Y, Sui, Y & Xue, J 2014, 'WPBOUND: Enforcing Spatial Memory Safety Efficiently at Runtime with Weakest Preconditions', 2014 IEEE 25TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 25th IEEE International Symposium on Software Reliability Engineering (ISSRE), IEEE, Naples, ITALY, pp. 88-99.View/Download from: Publisher's site
Li, Y, Tan, T, Sui, Y & Xue, J 2014, 'Self-inferencing Reflection Resolution for Java', ECOOP 2014 - OBJECT-ORIENTED PROGRAMMING, 28th European Conference on Object-Oriented Programming (ECOOP), SPRINGER-VERLAG BERLIN, Uppsala, SWEDEN, pp. 27-53.
Ye, D, Sui, Y & Xue, J 2014, 'Accelerating dynamic detection of uses of undefined values with static value-flow analysis', Proceedings of the 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014, pp. 154-164.View/Download from: Publisher's site
Uninitialized variables can cause system crashes when used and security vulnerabilities when exploited. With source rather than binary instrumentation, dynamic analysis tools such as MSan can detect uninitialized memory uses at significantly reduced overhead but are still costly. In this paper, we introduce a static value-flow analysis, called Usher, to guide and accelerate the dynamic analysis performed by such tools. Usher reasons about the definedness of values using a value-flow graph (VFG) that captures def-use chains for both top-level and address-taken variables interprocedurally and removes unnecessary instrumentation by solving a graph reachability problem. Usher works well with any pointer analysis (done a priori) and facilitates advanced instrumentation-reducing optimizations (with two demonstrated here). Implemented in LLVM and evaluated using all the 15 SPEC2000 C programs, Usher can reduce the slowdown of MSan from 212% - 302% to 123% - 140% for a number of configurations tested. Copyright © 2014 by the Association for Computing Machinery, Inc. (ACM).
Sui, Y, Li, Y & Xue, J 2013, 'Query-directed adaptive heap cloning for optimizing compilers', Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013.View/Download from: Publisher's site
Andersen's pointer analysis becomes more precise when applied with full heap cloning but unscalable for large, heap-intensive programs. In contrast, k-callsite-sensitive heap cloning can be faster but less precise for some programs. In this paper, we make one step forward by enhancing Andersen's analysis with QUery-Directed Adaptive (QUDA) heap cloning for optimizing compilers. The novelty of our analysis, called QUDA, lies in performing k-callsite-sensitive heap cloning iteratively, starting with k = 0 (without heap cloning), so that an abstract heap object is cloned at iteration k = i + 1 only if some mayalias queries that are not answered positively at iteration k = i may now be answered more precisely. QUDA, which is implemented in Open64, has the same precision as the state-of-the-art, FULCRA, a version of QUDA with exhaustive heap cloning, but is significantly more scalable. For 10 SPEC2000 C benchmarks and 5 C applications (totalling 840 KLOC) evaluated, QUDA takes only 4+ minutes but exhaustive heap cloning takes 42+ minutes to complete. QUDA takes only 75.1 % of the time that Open64 takes on average to compile these 15 programs under '-O2'. © 2013 IEEE.
Sui, Y, Ye, D & Xue, J 2012, 'Static memory leak detection using full-sparse value-flow analysis', 2012 International Symposium on Software Testing and Analysis, ISSTA 2012 - Proceedings, pp. 254-264.View/Download from: Publisher's site
We introduce a static detector, Saber, for detecting memory leaks in C programs. Leveraging recent advances on sparse pointer analysis, Saber is the first to use a full-sparse value-flow analysis for leak detection. Saber tracks the flow of values from allocation to free sites using a sparse value-flow graph (SVFG) that captures def-use chains and value flows via assignments for all memory locations represented by both top-level and address-taken pointers. By exploiting field-, flow- and context-sensitivity during different phases of the analysis, Saber detects leaks in a program by solving a graph reachability problem on its SVFG. Saber, which is fully implemented in Open64, is effective at detecting 211 leaks in the 15 SPEC2000 C programs and five applications, while keeping the false positive rate at 18.5%. We have also compared Saber with Fastcheck (which analyzes allocated objects flowing only into top-level pointers) and Sparrow (which handles all allocated objects using abstract interpretation) using the 15 SPEC2000 C programs. Saber is as accurate as Sparrow but is 14.2X faster and reports 40.7% more bugs than Fastcheck at a slightly higher false positive rate but is only 3.7X slower. © 2012 ACM.
Di, P, Ye, D, Su, Y, Sui, Y & Xue, J 2012, 'Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs', Proceedings of the International Conference on Parallel Processing, pp. 350-359.View/Download from: Publisher's site
Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the polyhedral model to extract parallelism from a loop nest by applying a sequence of affine transformations to the loop nest. However, how to automate this process to exploit both intra and inter-SM parallelism for GPUs remains a challenging problem. Presently, compilers may generate code significantly slower than hand-optimized code for certain applications. This paper describes a compiler framework for tiling and parallelizing loop nests with uniform dependences into CUDA code. We aim to improve two levels of wave front parallelism. We find tiling hyper planes by embedding parallelism enhancing constraints in the polyhedral model to maximize intra-tile, i.e., intra-SM parallelism. This improves the load balance among the SPs in an SM executing a wave front of loop iterations within a tile. We eliminate parallelism-hindering false dependences to maximize inter-tile, i.e., inter-SM parallelism. This improves the load balance among the SMs executing a wave front of tiles. Our approach has been implemented in PLUTO and validated using eight benchmarks on two different NVIDIA GPUs (C1060 and C2050). Compared to PLUTO, our approach achieves 2 - 5.5X speedups across the benchmarks. Compared to highly hand-optimized 1-D Jacobi (3 points), 2-D Jacobi (5 points), 3-D Jacobi (7 points) and 3-D Jacobi (27 points), our speedups, 1.17X, 1.41X, 0.97X and 0.87X with an average of 1.10X on C1060 and 1.24X, 1.20X, 0.86X and 0.95X with an average of 1.06X on C2050, are competitive. © 2012 IEEE.
Sui, Y, Ye, S, Xue, J & Yew, PC 2011, 'SPAS: Scalable path-sensitive pointer analysis on full-sparse SSA', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 155-171.View/Download from: Publisher's site
We present a new SPAS (Scalable PAth-Sensitive) framework for resolving points-to sets in C programs that exploits recent advances in pointer analysis. SPAS enables intraprocedural path-sensitivity to be obtained in flow-sensitive and context-sensitive (FSCS) techniques scalably, by using BDDs to manipulate program paths and by performing pointer analysis level-by-level on a full-sparse SSA representation similarly as the state-of-the-art LevPA (the FSCS version of SPAS). Compared with LevPA using all 27 C benchmarks in SPEC CPU2000 and CPU2006, SPAS incurs 18.42% increase in analysis time and 10.97% increase in memory usage on average, while guaranteeing that all points-to sets are obtained with non-decreasing precision. © 2011 Springer-Verlag.
Yang, Y, Zhou, X, Yang, G, Wang, B & Sui, Y 2010, 'Trustworthy Service Scheduling Framework for QoS garantee of composite service', 2nd International Conference on Information Science and Engineering, ICISE2010 - Proceedings.View/Download from: Publisher's site
Interoperability-centric and loosely-coupled distributed computing environment brings forward a new challenge on trustworthiness of service computing, at the same time, user puts more and more pressure on system's non-functional requirements such as real-time, availability, security, cost and so on. Therefore service's QoS should be guaranteed trustworthily. In this paper a Trustworthy Service Scheduling Framework (TSSF) is presented to guarantee composite service's QoS requirement. The overall QoS requirement of the entire composite service is satisfied with the method of planning and guaranteeing the QoS of each atomic service. Our approach is to decompose the QoS requirement for the composite service into separate ones for the atoms; and then TSSF delivers expected QoS for each atomic service with adaptive trustworthy service scheduling mechanisms based on Dynamic Service Entity Group (DSEG); finally experimental results indicate good performance of the proposed framework. © 2010 IEEE.