Yulei Sui's general interests are program analysis, compilers, software engineering and machine learning. Particularly, his research focuses on building fundamental static and dynamic program analysis techniques to improve the reliability and security of modern software systems. His recent interest lies in bug prediction, program repair, and software security analysis through data mining and machine learning.
His papers have been published in the top-tier conferences and journals in the field of software engineering and programming language such as TSE, ICSE, FSE, ISSTA, ECOOP, SAS, CGO, CC, ICPP, LCTES. He was a research intern in the Program Analysis Group in Oracle Lab Australia in 2013. He was a keynote speaker at EuroLLVM 2016, and has been awarded an IPRS scholarship, a CGO Best Paper, an ACM CAPS award, and an ARC Discovery Early Career Researcher Award (2017-2019).
More up-to-date information on my personal web page: https://yuleisui.github.io/
Software Security Analysis
41093 Software Engineering Studio
Sui, Y., Fan, X., Zhou, H. & Xue, J. 2018, 'Loop-Oriented pointer analysis for automatic SIMD vectorization', ACM Transactions on Embedded Computing Systems, vol. 17, no. 2.View/Download from: Publisher's site
© 2018 ACM. Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), require precise dependence analysis on arrays and structs to vectorize isomorphic scalar instructions (in the case of SLP) and reduce dynamic dependence checks at runtime (in the case of LLV). The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-sensitive models, which are too imprecise in handling arrays and structs). This article proposes an inter-procedural Loop-oriented Pointer Analysis for C, called Lpa, for analyzing arrays and structs to support aggressive SLP and LLV optimizations effectively. Unlike field-insensitive solutions that pre-allocate objects for each memory allocation site, our approach uses a lazy memory model to generate access-based location sets based on how structs and arrays are accessed. Lpa can precisely analyze arrays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, Lpa is designed so that existing points-to resolution algorithms (e.g., flow-insensitive and flow-sensitive pointer analysis) can be reused easily. We have implemented Lpa fully in the LLVM compiler infrastructure (version 3.8.0). We evaluate Lpa by considering SLP and LLV, the two classic vectorization techniques, on a set of 20 C and Fortran CPU2000/2006 benchmarks. For SLP, Lpa outperforms LLVM's BasicAA and ScevAA by discovering 139 and 273 more vectorizable basic blocks, respectively, resulting in the best speedup of 2.95% for 173.applu. For LLV, LLVM introduces totally 551 and 652 static bound checks under BasicAA and ScevAA, respecti...
Sui, Y., Ye, D., Su, Y. & Xue, J. 2016, 'Eliminating redundant bounds checks in dynamic buffer overflow detection using weakest preconditions', IEEE Transactions on Reliability, vol. 65, no. 4, pp. 1682-1699.View/Download from: Publisher's site
© 1963-2012 IEEE. Spatial errors (e.g., buffer overflows) continue to be one of the dominant threats to software reliability and security in C/C++ programs. Presently, the software industry typically enforces spatial memory safety by instrumentation. Due to high overheads incurred in bounds checking at runtime, many program inputs cannot be exercised, causing some input-specific spatial errors to go undetected in today's commercial software. This paper introduces a new compile-time approach for reducing bounds checking overheads based on the notion of weakest precondition (WP). The basic idea is to guard a bounds check at a pointer dereference inside a loop, where the WP-based guard is hoisted outside the loop, so that its falsehood implies the absence of out-of-bounds errors at the dereference, thereby avoiding the corresponding bounds check inside the loop. This WP-based approach is applicable to any spatial-error detection approach (in software or hardware or both). To evaluate the effectiveness of our approach, we take SoftBound, a compile-time tool with an open-source implementation in low-level virtual machine (LLVM), as our baseline. SoftBound adopts a pointer-based checking scheme with disjoint metadata, making it a state-of-the-art tool in providing compatible and complete spatial safety for C. Our new tool, called WPBound, is a refined version of SoftBound, also implemented in LLVM, by incorporating our WP-based compiler approach comprising both intra and interprocedural optimizations. For a set of 20 C benchmarks selected from SPEC and MiBench,WPBound reduces the average runtime overhead of SoftB ound from 77% to 47% (by a reduction of 39%), with small code size increases.
Sui, Y., Ye, D. & Xue, J. 2014, 'Detecting memory leaks statically with full-sparse value-flow analysis', IEEE Transactions on Software Engineering, vol. 40, no. 2, pp. 107-122.View/Download from: Publisher's site
We introduce a static detector, Saber, for detecting memory leaks in C programs. Leveraging recent advances on sparse pointer analysis, Saber is the first to use a full-sparse value-flow analysis for detecting memory leaks statically. Saber tracks the flow of values from allocation to free sites using a sparse value-flow graph (SVFG) that captures def-use chains and value flows via assignments for all memory locations represented by both top-level and address-taken pointers. By exploiting field-, flow-and context-sensitivity during different phases of the analysis, Saber detects memory leaks in a program by solving a graph reachability problem on its SVFG. Saber, which is fully implemented in Open64, is effective at detecting 254 leaks in the 15 SPEC2000 C programs and seven applications, while keeping the false positive rate at 18.3 percent. Saber compares favorably with several static leak detectors in terms of accuracy (leaks and false alarms reported) and scalability (LOC analyzed per second). In particular, compared with Fastcheck (which analyzes allocated objects flowing only into top-level pointers) using the 15 SPEC2000 C programs, Saber detects 44.1 percent more leaks at a slightly higher false positive rate but is only a few times slower. © 2014 IEEE.
Sui, Y., Ye, S., Xue, J. & Zhang, J. 2014, 'Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation', Software - Practice and Experience, vol. 44, no. 12, pp. 1485-1510.View/Download from: Publisher's site
Copyright © 2013 John Wiley & Sons, Ltd. Because of its high precision as a flow-insensitive pointer analysis, Andersen's analysis has been deployed in some modern optimising compilers. To obtain improved precision, we describe how to add context sensitivity on top of Andersen's analysis. The resulting analysis, called ICON, is efficient to analyse large programs while being sufficiently precise to drive compiler optimisations. Its novelty lies in summarising the side effects of a procedure by using one transfer function on virtual variables that represent fully parameterised locations accessed via its formal parameters. As a result, a good balance between efficiency and precision is made, resulting in ICON that is more powerful than a 1-callsite-sensitive analysis and less so than a call-path-sensitive analysis (when the recursion cycles in a program are collapsed in all cases). We have compared ICON with FULCRA, a state of the art Andersen's analysis that is context sensitive by acyclic call paths, in Open64 (with recursion cycles collapsed in both cases) using the 16 C/C++ benchmarks in SPEC2000 (totalling 600 KLOC) and 5 C applications (totalling 2.1 MLOC). Our results demonstrate scalability of ICON and lack of scalability of FULCRA. FULCRA spends over 2 h in analysing SPEC2000 and fails to run to completion within 5 h for two of the five applications tested. In contrast, ICON spends just under 7 min on the 16 benchmarks in SPEC2000 and just under 26 min on the same two applications. For the 19 benchmarks analysable by FULCRA, ICON is nearly as accurate as FULCRA in terms of the quality of the built Static Single Assignment (SSA) form and the precision of the discovered alias information.
Ye, S., Sui, Y. & Xue, J. 2014, 'Region-Based selective flow-sensitive pointer analysis', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8723, pp. 319-336.
© Springer International Publishing Switzerland 2014. We introduce a new region-based SELective Flow-Sensitive (Selfs) approach to inter-procedural pointer analysis for C that operates on the regions partitioned from a program. Flow-sensitivity is maintained between the regions but not inside, making traditional flow-insensitive and flow-sensitive as well as recent sparse flow-sensitive analyses all special instances of our Selfs framework. By separating region partitioning as an independent concern from the rest of the pointer analysis, Selfs facilitates the development of flow-sensitive variations with desired efficiency and precision tradeoffs by reusing existing pointer resolution algorithms. We also introduce a new unification-based approach for region partitioning to demonstrate the generality and flexibility of our Selfs framework, as evaluated using SPEC2000/2006 benchmarks in LLVM.
Fan, X., Sui, Y., Liao, X. & Xue, J. 2017, 'Boosting the precision of virtual call integrity protection with partial pointer analysis for C++', ISSTA 2017 - Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 329-340.View/Download from: Publisher's site
© 2017 Association for Computing Machinery. We present, Vip, an approach to boosting the precision of Virtual call Integrity Protection for large-scale real-world C++ programs (e.g., Chrome) by using pointer analysis for the first time. Vip introduces two new techniques: (1) a sound and scalable partial pointer analysis for discovering statically the sets of legitimate targets at virtual callsites from separately compiled C++ modules and (2) a lightweight instrumentation technique for performing (virtual call) integrity checks at runtime. Vip raises the bar against vtable hijacking attacks by providing stronger security guarantees than the CHA-based approach with comparable performance overhead. Vip is implemented in LLVM-3.8.0 and evaluated using SPEC programs and Chrome. Statically, Vip protects virtual calls more effectively than CHA by significantly reducing the sets of legitimate targets permitted at 20.3% of the virtual callsites per program, on average. Dynamically, Vip incurs an average (maximum) instrumentation overhead of 0.7% (3.3%), making it practically deployable as part of a compiler tool chain.
Yan, H., Sui, Y., Chen, S. & Xue, J. 2017, 'Machine-learning-guided typestate analysis for static use-After-free detection', ACM International Conference Proceeding Series, pp. 42-54.View/Download from: UTS OPUS or Publisher's site
Typestate analysis relies on pointer analysis for detecting temporal memory safety errors, such as use-After-free (UAF). For large programs, scalable pointer analysis is usually imprecise in analyzing their hard "corner cases", such as infeasible paths, recursion cycles, loops, arrays, and linked lists. Due to a sound over-Approximation of the points-To information, a large number of spurious aliases will be reported conservatively, causing the corresponding typestate analysis to report a large number of false alarms. Thus, the usefulness of typestate analysis for heap-intensive clients, like UAF detection, becomes rather limited, in practice. We introduce Tac, a static UAF detector that bridges the gap between typestate and pointer analyses by machine learning. Tac learns the correlations between program features and UAF-related aliases by using a Support Vector Machine (SVM) and applies this knowledge to further disambiguate the UAF-related aliases reported imprecisely by the pointer analysis so that only the ones validated by its SVM classifier are further investigated by the typestate analysis. Despite its unsoundness, Tac represents a practical typestate analysis approach for UAF detection. We have implemented Tac in LLVM-3.8.0 and evaluated it using a set of eight open-source C/C++ programs. The results show that Tac is effective (in terms of finding 5 known CVE vulnerabilities, 1 known bug, and 8 new bugs with a low f alse alarm rate) and scalable (in terms of analyzing a large codebase with 2,098 KLOC in just over 4 hours).
Zhang, J., Sui, Y. & Xue, J. 2017, 'Incremental analysis for probabilistic programs', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 450-472.View/Download from: Publisher's site
© 2017, Springer International Publishing AG. This paper presents Icpp, a new data-flow-based InCremental analysis for Probabilistic Programs, to infer their posterior probability distributions in response to small yet frequent changes to probabilistic knowledge, i.e., prior probability distributions and observations. Unlike incremental analyses for usual programs, which emphasize code changes, such as statement additions and deletions, Icpp focuses on changes made to probabilistic knowledge, the key feature in probabilistic programming. The novelty of Icpp lies in capturing the correlation between prior and posterior probability distributions by reasoning about the probabilistic dependence of each data-flow fact, so that any posterior probability affected by newly changed probabilistic knowledge can be incrementally updated in a sparse manner without recomputing it from scratch, thereby allowing the previously computed results to be reused. We have evaluated Icpp with a set of probabilistic programs. Our results show that Icpp is an order of magnitude faster than the state-of-the-art data-flow-based inference in analyzing probabilistic programs under small yet frequent changes to probabilistic knowledge, with an average analysis overhead of around 0.1Â s in response to a single change.
Di, P. & Sui, Y. 2016, 'Accelerating dynamic data race detection using static thread interference analysis', Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016, pp. 30-39.View/Download from: Publisher's site
Copyright © 2016 ACM. Precise dynamic race detectors report an error if and only if more than one thread concurrently exhibits conict on a memory access. They insert instrumentations at compiletime to perform runtime checks on all memory accesses to ensure that all races are captured and no spurious warnings are generated. However, a dynamic race check for a particular memory access statement is guaranteed to be redundant if the statement can be statically identified as thread interference-free. Despite significant recent advances in dynamic detection techniques, the redundant check remains a critical factor that leads to prohibitive overhead of dynamic race detection for multithreaded programs. In this paper, we present a new framework that eliminates redundant race check and boosts the dynamic race detection by performing static optimizations on top of a series of thread interference analysis phases. Our framework is implemented on top of LLVM 3.5.0 and evaluated with an industry dynamic race detector TSAN which is available as a part of LLVM tool chain. 11 benchmarks from SPLASH2 are used to evaluate the effectiveness of our approach in accelerating TSAN by eliminating redundant interference-free checks. The experimental result demonstrates our new approach achieves from 1.4x to 4.0x (2.4x on average) speedup over original TSAN under 4 threads setting, and achieves from 1.3x to 4.6x (2.6x on average) speedup under 16 threads setting.
Fan, X., Sui, Y. & Xue, J. 2016, 'Contention-aware scheduling for asymmetric multicore processors', Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, pp. 742-751.View/Download from: Publisher's site
© 2015 IEEE. Asymmetric multicore processors (AMPs) have been proposed as an energy-efficient alternative to symmetric mul-ticore processors (SMPs). However, AMPs derive their performance from core specialization, which requires co-running applications to be scheduled to run on their most appropriate core types. Despite extensive research on AMP scheduling, developing an effective scheduling algorithm remains challenging. Contention for shared resources is a key performance-limiting factor, which often renders existing contention-free scheduling algorithms ineffective. We introduce a contention-aware scheduling algorithm for ARM's big.LITTLE, a commercial AMP platform. Our algorithm comprises an offline stage and an online stage. The offline stage builds a performance interference model for an application by training it with a set of co-running applications. Guided by this model, the online stage schedules a workload by assigning its applications to their most appropriate core types in order to minimize the performance degradation caused by contention for shared resources. Our model can accurately predict the performance degradation of an application when co-running with other applications with an average prediction error of 9.60%. Compared with the default scheduler provided for ARM's big.LITTLE and the speedup-factor-driven scheduler, our contention-aware scheduler can improve overall system performance by up to 28.32% and 28.51%, respectively.
Sui, Y. & Xue, J. 2016, 'On-demand strong update analysis via value-flow refinement', Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 460-473.View/Download from: Publisher's site
© 2016 ACM. We present a new Strong UPdate Analysis for C programs, called Supa, that enables computing points-To information on-demand via value-ow refinement, in environments with small time and memory budgets such as IDEs. We formulate Supa by solving a graph-reachability problem on a value-ow graph representation of the program, so that strong updates are performed where needed, as long as the total analysis budget is not exhausted. Supa facilitates efficiency and precision tradeofis by allowing different pointer analyses to be applied in a hybrid multi-stage analysis framework. We have implemented Supa in LLVM and evaluated Supa by choosing uninitialized pointer detection as a major client on 12 open-source C programs. As the analysis budget increases, Supa achieves improved precision, with its single-stage ow-sensitive analysis reaching 97% of that achieved by whole-program ow-sensitive analysis by consuming about 0.19 seconds and 36KB of memory per query, on average (with a budget of at most 10000 value-ow edges per query).
Sui, Y. & Xue, J. 2016, 'SVF: Interprocedural static value-flow analysis in LLVM', Proceedings of CC 2016: The 25th International Conference on Compiler Construction, pp. 265-266.View/Download from: Publisher's site
© 2016 ACM. This paper presents SVF, a tool that enables scalable and precise interprocedural Static Value-Flow analysis for C programs by leveraging recent advances in sparse analysis. SVF, which is fully implemented in LLVM, allows value-flow construction and pointer analysis to be performed in an iterative manner, thereby providing increasingly improved precision for both. SVF accepts pointsto information generated by any pointer analysis (e.g., Andersen's analysis) and constructs an interprocedural memory SSA form, in which the def-use chains of both top-level and address-taken variables are captured. Such value-flows can be subsequently exploited to support various forms of program analysis or enable more precise pointer analysis (e.g., flow-sensitive analysis) to be performed sparsely. By dividing a pointer analysis into three loosely coupled components: Graph, Rules and Solver, SVF provides an extensible interface for users to write their own solutions easily. SVF is publicly available at http://unsw-corg.github.io/SVF.
Sui, Y., Di, P. & Xue, J. 2016, 'Sparse flow-sensitive pointer analysis for multithreaded programs', Proceedings of the 14th International Symposium on Code Generation and Optimization, CGO 2016, pp. 160-170.View/Download from: Publisher's site
© 2016 ACM. For C programs, flow-sensitivity is important to enable pointer analysis to achieve highly usable precision. Despite significant recent advances in scaling flow-sensitive pointer analysis sparsely for sequential C programs, relatively little progress has been made for multithreaded C programs. In this paper, we present FSAM, a new Flow-Sensitive pointer Analysis that achieves its scalability for large Multithreaded C programs by performing sparse analysis on top of a series of thread interference analysis phases. We evaluate FSAM with 10 multithreaded C programs (with more than 100K lines of code for the largest) from Phoenix-2.0, Parsec-3.0 and open-source applications. For two programs, raytrace and x264, the traditional data-flow-based flow-sensitive pointer analysis is unscalable (under two hours) but our analysis spends just under 5 minutes on raytrace and 9 minutes on x264. For the rest, our analysis is 12x faster and uses 28x less memory.
Sui, Y., Fan, X., Zhou, H. & Xue, J. 2016, 'Loop-Oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization', Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 41-51.View/Download from: Publisher's site
© 2016 ACM. Compiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), require precise dependence analysis on arrays and structs in order to vectorize isomorphic scalar instructions and/or reduce dynamic dependence checks incurred at runtime. The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-insensitive models, which are too imprecise in handling arrays and structs). This paper proposes an inter-procedural Loop-oriented Pointer Analysis, called LPA, for analyzing arrays and structs to support aggressive SLP and LLV optimizations. Unlike field-insensitive solutions that preallocate objects for each memory allocation site, our approach uses a fine-grained memory model to generate location sets based on how structs and arrays are accessed. LPA can precisely analyze arrays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, LPA is designed to reuse easily existing points-to resolution algorithms. We evaluate LPA using SLP and LLV, the two classic vectorization techniques on a set of 20 CPU2000/2006 benchmarks. For SLP, LPA enables it to vectorize a total of 133 more basic blocks, with an average of 12.09 per benchmark, resulting in the best speedup of 2.95% for 173.applu. For LLV, LPA has reduced a total of 319 static bound checks, with an average of 22.79 per benchmark, resulting in the best speedup of 7.18% for 177.mesa.
Yan, H., Sui, Y., Chen, S. & Xue, J. 2016, 'Automated memory leak fixing on value-flow slices for C programs', Proceedings of the ACM Symposium on Applied Computing, pp. 1386-1393.View/Download from: Publisher's site
© 2016 ACM. C is the dominant programming language for developing embedded software, operating systems, and device drivers. Unlike programs written in managed languages like Java, C programs rely on explicit memory management and are prone to memory leaks. Existing (static or dynamic) detectors only report leaks, but fixing them often requires considerable manual effort by inspecting a list of reported true and false alarms. How to develop on-demand lightweight techniques for automated leak fixing without introducing new memory errors remains challenging. In this paper, we introduce AUTOFIX, a fully automated leak-fixing approach for C programs by combining static and dynamic program analysis. Given a leaky allocation site reported by a static memory leak detector, AUTOFIX performs a graph reachability analysis to identify leaky paths on the value-flow slices of the program, and then conducts a liveness analysis to locate the program points for inserting fixes (i.e., the missing free calls) on the identified leaky paths. We have implemented AUTOFIX in LLVM-3.5.0 and evaluated it using five SPEC2000 benchmarks and three open-source applications. Experimental results show that AUTOFIX can safely fix all the memory leaks reported by a state-of-theart static memory leak detector with small instrumentation overhead.
Di, P., Sui, Y., Ye, D. & Xue, J. 2015, 'Region-based may-happen-in-parallel analysis for C programs', Proceedings of the International Conference on Parallel Processing, pp. 889-898.View/Download from: Publisher's site
© 2015 IEEE. The C programming language continues to play an essential role in the development of system software. May-Happen-in-Parallel (MHP) analysis is the basis of many other analyses and optimisations for concurrent programs. Existing MHP analyses that work well for programming languages such as X10 are often not effective for C (with Pthreads). This paper presents a new MHP algorithm for C that operates at the granularity of code regions rather than individual statements in a program. A flow-sensitive Happens-Before (HB) analysis is performed to account for fork-join semantics of pthreads on an interprocedural thread-sensitive control flow graph representation of a program, enabling the HB relations among its statements to be discovered. All the statements that share the same HB properties are then grouped into one region. As a result, computing the MHP information for all pairs of statements in a program is reduced to one of inferring the HB relations from among its regions. We have implemented our algorithm in LLVM-3.5.0 and evaluated it using 14 programs from the SPLASH2 and PARSEC benchmark suites. Our preliminary results show that our approach is more precise than two existing MHP analyses yet computationally comparable with the fastest MHP analysis.
Li, Y., Tan, T., Sui, Y. & Xue, J. 2014, 'Self-inferencing reflection resolution for Java', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 27-53.View/Download from: Publisher's site
Reflection has always been an obstacle both for sound and for effective under-approximate pointer analysis for Java applications. In pointer analysis tools, reflection is either ignored or handled partially, resulting in missed, important behaviors. In this paper, we present our findings on reflection usage in Java benchmarks and applications. Guided by these findings, we introduce a static reflection analysis, called Elf, by exploiting a self-inferencing property inherent in many reflective calls. Given a reflective call, the basic idea behind Elf is to automatically infer its targets (methods or fields) based on the dynamic types of the arguments of its target calls and the downcasts (if any) on their returned values, if its targets cannot be already obtained from the Class, Method or Field objects on which the reflective call is made. We evaluate Elf against Doop's state-of-the-art reflection analysis performed in the same context-sensitive Andersen's pointer analysis using all 11 DaCapo benchmarks and two applications. Elf can make a disciplined tradeoff among soundness, precision and scalability while also discovering usually more reflective targets. Elf is useful for any pointer analysis, particularly under-approximate techniques deployed for such clients as bug detection, program understanding and speculative compiler optimization. © 2014 Springer-Verlag.
Ye, D., Su, Y., Sui, Y. & Xue, J. 2014, 'WPBOUND: Enforcing spatial memory safety efficiently at runtime with weakest preconditions', Proceedings - International Symposium on Software Reliability Engineering, ISSRE, pp. 88-99.View/Download from: Publisher's site
© 2014 IEEE. Spatial errors (e.g., Buffer overflows) continue to be one of the dominant threats to software reliability and security in C/C++ programs. Presently, the software industry typically enforces spatial memory safety by instrumentation. Due to high overheads incurred in bounds checking at runtime, many program inputs cannot be exercised, causing some input-specific spatial errors to go undetected in today's commercial software. This paper introduces a new compile-time optimisation for reducing bounds checking overheads based on the notion of Weakest Precondition (WP). The basic idea is to guard a bounds check at a pointer dereference inside a loop, where the WP-based guard is hoisted outside the loop, so that its falsehood implies the absence of out-of-bounds errors at the dereference, thereby avoiding the corresponding bounds check inside the loop. This WP-based optimisation is applicable to any spatial-error detection approach (in software or hardware or both). To evaluate the effectiveness of our optimisation, we take SOFTBOUND, a compile-time tool with an open-source implementation in LLVM, as our baseline. SOFTBOUND adopts a pointer-based checking approach with disjoint metadata, making it a state-of-the-art tool in providing compatible and complete spatial safety for C. Our new tool, called WPBOUND, is a refined version of SOFTBOUND, also implemented in LLVM, by incorporating our WP-based optimisation. For a set of 12 SPEC C benchmarks evaluated, WPBOUND reduces the average (geometric mean) slowdown of SOFTBOUND from 71% to 45% (by a reduction of 37%), with small code size increases.
Ye, D., Sui, Y. & Xue, J. 2014, 'Accelerating dynamic detection of uses of undefined values with static value-flow analysis', Proceedings of the 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014, pp. 154-164.View/Download from: Publisher's site
Uninitialized variables can cause system crashes when used and security vulnerabilities when exploited. With source rather than binary instrumentation, dynamic analysis tools such as MSan can detect uninitialized memory uses at significantly reduced overhead but are still costly. In this paper, we introduce a static value-flow analysis, called Usher, to guide and accelerate the dynamic analysis performed by such tools. Usher reasons about the definedness of values using a value-flow graph (VFG) that captures def-use chains for both top-level and address-taken variables interprocedurally and removes unnecessary instrumentation by solving a graph reachability problem. Usher works well with any pointer analysis (done a priori) and facilitates advanced instrumentation-reducing optimizations (with two demonstrated here). Implemented in LLVM and evaluated using all the 15 SPEC2000 C programs, Usher can reduce the slowdown of MSan from 212% - 302% to 123% - 140% for a number of configurations tested. Copyright © 2014 by the Association for Computing Machinery, Inc. (ACM).
Sui, Y., Li, Y. & Xue, J. 2013, 'Query-directed adaptive heap cloning for optimizing compilers', Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013.View/Download from: Publisher's site
Andersen's pointer analysis becomes more precise when applied with full heap cloning but unscalable for large, heap-intensive programs. In contrast, k-callsite-sensitive heap cloning can be faster but less precise for some programs. In this paper, we make one step forward by enhancing Andersen's analysis with QUery-Directed Adaptive (QUDA) heap cloning for optimizing compilers. The novelty of our analysis, called QUDA, lies in performing k-callsite-sensitive heap cloning iteratively, starting with k = 0 (without heap cloning), so that an abstract heap object is cloned at iteration k = i + 1 only if some mayalias queries that are not answered positively at iteration k = i may now be answered more precisely. QUDA, which is implemented in Open64, has the same precision as the state-of-the-art, FULCRA, a version of QUDA with exhaustive heap cloning, but is significantly more scalable. For 10 SPEC2000 C benchmarks and 5 C applications (totalling 840 KLOC) evaluated, QUDA takes only 4+ minutes but exhaustive heap cloning takes 42+ minutes to complete. QUDA takes only 75.1 % of the time that Open64 takes on average to compile these 15 programs under '-O2'. © 2013 IEEE.
Di, P., Ye, D., Su, Y., Sui, Y. & Xue, J. 2012, 'Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs', Proceedings of the International Conference on Parallel Processing, pp. 350-359.View/Download from: Publisher's site
Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the polyhedral model to extract parallelism from a loop nest by applying a sequence of affine transformations to the loop nest. However, how to automate this process to exploit both intra and inter-SM parallelism for GPUs remains a challenging problem. Presently, compilers may generate code significantly slower than hand-optimized code for certain applications. This paper describes a compiler framework for tiling and parallelizing loop nests with uniform dependences into CUDA code. We aim to improve two levels of wave front parallelism. We find tiling hyper planes by embedding parallelism enhancing constraints in the polyhedral model to maximize intra-tile, i.e., intra-SM parallelism. This improves the load balance among the SPs in an SM executing a wave front of loop iterations within a tile. We eliminate parallelism-hindering false dependences to maximize inter-tile, i.e., inter-SM parallelism. This improves the load balance among the SMs executing a wave front of tiles. Our approach has been implemented in PLUTO and validated using eight benchmarks on two different NVIDIA GPUs (C1060 and C2050). Compared to PLUTO, our approach achieves 2 - 5.5X speedups across the benchmarks. Compared to highly hand-optimized 1-D Jacobi (3 points), 2-D Jacobi (5 points), 3-D Jacobi (7 points) and 3-D Jacobi (27 points), our speedups, 1.17X, 1.41X, 0.97X and 0.87X with an average of 1.10X on C1060 and 1.24X, 1.20X, 0.86X and 0.95X with an average of 1.06X on C2050, are competitive. © 2012 IEEE.
Sui, Y., Ye, D. & Xue, J. 2012, 'Static memory leak detection using full-sparse value-flow analysis', 2012 International Symposium on Software Testing and Analysis, ISSTA 2012 - Proceedings, pp. 254-264.View/Download from: Publisher's site
We introduce a static detector, Saber, for detecting memory leaks in C programs. Leveraging recent advances on sparse pointer analysis, Saber is the first to use a full-sparse value-flow analysis for leak detection. Saber tracks the flow of values from allocation to free sites using a sparse value-flow graph (SVFG) that captures def-use chains and value flows via assignments for all memory locations represented by both top-level and address-taken pointers. By exploiting field-, flow- and context-sensitivity during different phases of the analysis, Saber detects leaks in a program by solving a graph reachability problem on its SVFG. Saber, which is fully implemented in Open64, is effective at detecting 211 leaks in the 15 SPEC2000 C programs and five applications, while keeping the false positive rate at 18.5%. We have also compared Saber with Fastcheck (which analyzes allocated objects flowing only into top-level pointers) and Sparrow (which handles all allocated objects using abstract interpretation) using the 15 SPEC2000 C programs. Saber is as accurate as Sparrow but is 14.2X faster and reports 40.7% more bugs than Fastcheck at a slightly higher false positive rate but is only 3.7X slower. © 2012 ACM.
Sui, Y., Ye, S., Xue, J. & Yew, P.C. 2011, 'SPAS: Scalable path-sensitive pointer analysis on full-sparse SSA', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 155-171.View/Download from: Publisher's site
We present a new SPAS (Scalable PAth-Sensitive) framework for resolving points-to sets in C programs that exploits recent advances in pointer analysis. SPAS enables intraprocedural path-sensitivity to be obtained in flow-sensitive and context-sensitive (FSCS) techniques scalably, by using BDDs to manipulate program paths and by performing pointer analysis level-by-level on a full-sparse SSA representation similarly as the state-of-the-art LevPA (the FSCS version of SPAS). Compared with LevPA using all 27 C benchmarks in SPEC CPU2000 and CPU2006, SPAS incurs 18.42% increase in analysis time and 10.97% increase in memory usage on average, while guaranteeing that all points-to sets are obtained with non-decreasing precision. © 2011 Springer-Verlag.