407 Sixian Building
Email:yxxue AT ustc.edu.cn
188 Renai Road,Suzhou Industrial Park, China
I am now a research professor in the Department of Computer Science & Technology,
University of Science and Technology of China(USTC).
Prior to that, I received my B.E and M.S from Wuhan University in 2005 and 2007,Ph.D. from Singapore National University (NUS)in February 2013,
Research Scientist from Temasek Lab@NUS in Singapore National University from November 2013 to November 2014, and Singapore National University from 2015 to July 2017. Temasek Lab@NTU, a research scientist at Nanyang Polytechnic University, is engaged in Singapore's national defense project.
Since then, I worked as a Level 2 data scientist in the anti-spam core project of Exchange Mail System in the Internet Engineering Institute of Microsoft Asia Pacific R&D Group.
I am recruiting PhD, Master and Undergraduate students. If interested, please feel free to drop me an email.
My research interests Software engineering and Cyber Security.
I am recently focusing on building systems that combines program analysis, data analysis and verification to accomplish the Cyber Security tasks,e.g.,vulnerability detection,malware detection and so on.
[2019.7] Invited to serve on the PC Committee of NASAC'19 (18th National Conference on Software and Applications).
[2019.6] Our recent work Cerebro: Context-Aware Adaptive Fuzzing for E-enactivity Vulnerability Detection was accepted in full by ACM FSE (CCF A)（近期工作被ACM FSE全文接受）.
[2019.2] Invited to serve as the Tool and A E Committee for ISSTA'19 (CCF A).
[2019.1] Our recent work :Securing Android App Markets via Modelling and Predicting Malware Spread between Markets , accepted by IEEE Trans. Information Forensics and Security (CCF A)
Securing Android App Markets via Modelling and Predicting Malware Spread between Markets. Guozhu Meng, Mattew Patrick,, Yinxing Xue, Yang Liu, Jie Zhang .
In the processing of IEEE Trans. Information Forensics and Security 14(7):1944-1959(2019) (CCF 信息安全A类) .
Accurate and Scalable Cross-Architecture Cross-OS Binary Code Search with Emulation Yinxing Xue, Zhengzi Xu, Mahinthan Chandramohan and Yang Liu.
IEEE Trans Software Engineering , accepted to appear, 2018. DOI: 10.1109/TSE.2018.2827379(CCF软件工程A类).
Dresden, Germany, March 25-28, 2019.
Towards a Desired Directed Grey-box Fuzzer Hongxu Chen, Yinxing Xue, Yuekang Li, Bihuan Chen, Xiaofei Xie, Xiuheng Wu and Yang Liu, Hawkeye
accepted by 25th ACM Conference on Computer and Communications Security ，pp: 2095-2108, 2018(ACM CCS’18，CCF信息安全A类)..
Auditing Anti-Malware Tools by Evolving Android Malware and Dynamic Loading Technique. Yinxing Xue, Guozhu Meng, Yang Liu, Tian Huat Tan, Hongxu Chen, Jun Sun, Jie Zhang .
In the processing of IEEE Trans. Information Forensics and Security 12(7): 1529-1544 (2017)(CCF信息安全A类).
IBED: Combining IBEA and DE for Optimal Feature Selection in Software Product Line Engineering Yinxing Xue, Jinghui Zhong, Tian Huat Tan, Yang Liu, Wentong Cai, Manman Chen, Jun Sun.
Applied Soft Computing (中科院分区表2).vol. 49: 1215-1231 (2016).
Multi-Objective Integer Programming Approaches for Solving Optimal Feature Selection Problem Yinxing Xue, Yan-Fu Li.
Submitted to ACM/IEEE International Conference on Software Engineering 2018, pp: 1231-1242 (CCF 软件工程 A 类).
Mining Implicit Design Templates for Actionable Code Reuse Yun Lin, Guozhu Meng,Yinxing Xue, Zhenchang Xing, Jun Sun, Xin Peng, Yang Liu, Wenyun Zhao and Jinsong Dong.
In Proceedings of ACM/IEEE International Conference on Automated Software Engineering 2017,pp: 394-404 (CCF 软件工程 A 类).
Feedback-Based Debugging Yun Lin, Jun Sun,
Yinxing Xue, Yang Liu and Jinsong Dong
In Proceedings of IEEE International Conference on Software Engineering 2017, pp：393-403 (CCF 软件工程A类)..
BinGo: Cross-Architecture Cross-OS Binary Search Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Choo and Hee Beng Kuan Tan,
Proceedings of the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, pp: 678-689 (CCF 软件工程A类).
Optimizing Selection of Competing Services with Probabilistic Hierarchical Refinement Tian Huat Tan, Manman Chen, Jun Sun, Yang Liu, Étienne André, Yinxing Xue, and Jin Song Dong
In Proceedings of 38th International Conference on Software Engineering (ICSE), Austin, TX, May 14th - 22nd, pp: 85-95, 2016, (CCF 软件工程A类)
Semantic Modelling of Android Malware for Effective Malware Comprehension, Detection and Classification Guozhu Meng ,Yinxing Xue. Accepted by ACM International Symposium on Software Testing and Analysis (ISSTA), pp: 306-317, 2016 (CCF 软件工程A类).
Evolving Android Malware for Auditing Anti-Malware Tools Guozhu Meng,Yinxing Xue ,Chandramohan Mahinthan, Annamalai Narayanan, Yang Liu, Jie Zhang. Accepted by ACM Symposium on Information, Computer and Communications Security (ASIACCS) 2016, pp: 365-376. (CCF 信息安全C类, 录取率:19%).
In Proceedings of ACM International Symposium on Software Testing and Analysis (ISSTA) 2015: 246-256. (CCF 软件工程A类).
Optimizing Selection of Competing Features via Feedback-Directed Evolutionary Algorithmsstrong Tian Huat Tan, Yinxing Xue , Manman Chen, Jun Sun, Yang Liu, and Jin Song Dong .
In Proceedings of ACM International Symposium on Software Testing and Analysis (ISSTA) 2015: 48-59. (CCF 软件工程A类).
In Proceedings of 10th ACM Symposium on Information, Computer and Communications Security (ASIACCS) 2015: 109-120. (CCF信息安全C类, 录取率: 17.8%).
Detecting differences across multiple instances of code clones. Yun Lin, Zhenchang Xing, Yinxing Xue , Yang Liu, Xin Peng, Jun Sun, Wenyun Zhao.
In Proceedings of the 36th IEEE/ACM International Conference on Software Engineering (ICSE) 2014: 164-174. (CCF软件工程A类).
A large scale Linux-kernel based benchmark for feature location research. Zhenchang Xing, Yinxing Xue ,Stan Jarzabek.
In Proceedings of the 35th IEEE/ACM International Conference on Software Engineering (ICSE) 2013: 1311-1314. (CCF 软件工程A类).
Distilling useful clones by contextual differencing. Zhenchang Xing, Yinxing Xue ,Stan Jarzabek.
In Proceedings of the 20th IEEE Working Conference on Reverse Engineering (WCRE) 2013: 102-111. (CCF 软件工程B类, WCRE现在改名叫SANER).
Feature Location in a Collection of Product Variants. Yinxing Xue ,Zhenchang Xing, Stan Jarzabek:.
In Proceedings of the 20th IEEE Working Conference on Reverse Engineering (WCRE) 2012: 145-154. (CCF 软件工程B类, WCRE现在改名叫SANER).
PI,Research Start-up Funds for Special Fellows of USTC （RMB 1 million）(中国科学技术大学特任研究员科研启动经费100万，负责人)
PI,Special Funds for Class C Talents in the 100-person Program of the Chinese Academy of Sciences（RMB 800,000）(中国科学院百人计划C类人才专项经费80万，负责人)
SMART:Semantic Modelling of Android Attacks(Nov 2014 – Jun 2015, NTU)
Malware has posed a major threat to the Android ecosystem. Existing anti-virus
and malware detection tools mainly rely on signature- orfeature- based approaches,
failing to provide detailed information beyond the mere detection.In this work,
we propose a precise semantic model of Android malware based on Deterministic Symbolic Automaton (DSA)
for the purpose of malware comprehension, detection and classification. We show that DSA can capture
the common malicious behaviors of a malware family,as well as the malware variants. Based on DSA,
we develop an automatic analysis framwork, named SMAT, which learns DSA by detecting and summarizing semantic
clones from malware families, and then uses the learned DSA as the semantic features to detect malware using
machine learning, and further classifies malware by performing static analysis on the attack patterns. We conduct
the experiments in both malware benchmark and 22,327 real-world apps. The results show that SMAT builds meaningful
semantic models and outperforms both state-of-the-art approaches and anti-virus tools in malware detection.SMAT
identifies 389 new malware in real-world apps that are missed by most anti-virus tools.The classification step
further identifies new malware variants and unknown families.
JS*(Oct 2013 – Jun 2014, NTU-NUS)
mostly use feature-based or signature-based approaches to detect JS malware. These Tools
can detect malware included in their database,but they are weak in resistance to obfuscation
and JS malware variants. Besides, these tools seldom agree on the names of detected JS malware,
not mentioning about the types of attack or attack details. Such limitations root in the
incapability of capturing attack behavior in these approches.In this project, we propose
to use Deterministic Finite Automaton (DFA)to abstract and summarize common behaviors of malicious
JS of the same attack type. We propose an automatic behavior learning framework, named JS∗,
to learn a DFA from dynamic execution traces of JS malware, where we implement an effective online
teacher by combining data dependency analysis,defense rules and trace replay mechanism.We evaluated JS∗
with 1000 benign and 276 malicious JS samples from real world to cover 8 most-infectious attack types.
The results demonstrate the scalability and effectiveness of our approach in the malware detection and
classification compared with commercial JS malware detection tools. We also show how to use our DFAs
to detect variants and new attacks.
Usually, the detection will simply report the malware family name without elaborating details
about attacks conducted by the malware. Worse yet, the reported family name may diﬀer from
one tool to another due to the diﬀerent naming conventions. In this paper, we propose a
eﬃcient way which could not only explain the attack model but also potentially discover new
malware variants and new vulnerabilities. Our approach starts with machine learning techniques
and risky function calls. For the detected malware, we classify them into eight known attack
ypes according to their attack feature vector or dynamic execution traces by using machine learning
and dynamic program analysis respectively. We implement our approach in a tool named JSDC,and conduct
largescale evaluations to show its eﬀectiveness. The controlled experiments (with 942 malware)
show that JSDC gives low false positive rate (0.2123%) and low false negative rate (0.8492%),
1,500 malware reported,for which many anti-virus tools failed. Lastly, JSDC can eﬀectively and accurately
classify these detected malwares into either attack types.
MCiDiff( Nov 2012– Jun 2013, Fudan-NTU-NUS)
Clone detectors find similar code fragments (i.e., instances of code clones) and report
large numbers of them for industrial systems. To maintain or manage code clones,developers often
have to investigate differences of multiple cloned code fragments. However, existing program
differencing techniques compare only two code fragments at a time. Developers then have to manually
combine and summarize several pairwise differencing results. In this paper, we present an approach
to automatically detecting differences across multiple clone We have implemented our approach as
an Eclipse plugin and evaluated its accuracy with three Java software systems. Our evaluation shows
that our algorithm has precision over 97.66% and recall over 95.63% in three open source Java projects.
PDG-CS：Program Dependency Graph based Code Search（ Feb 2012– Oct 2012, NUS-NTU joint）
We present the eclipse plug-in PDG-CS (Program Dependency Graph based Code Search)
to recommend code by virtue of PDG's rich set of program information and its independence
to the source code. With predefined PDG query patterns, PDG-CS can generate a representative PDG
according to the working context of the developer, as the query. Instead of using the computationally
costly graph matching for PDG, we develop an efficient filtering algorithm to find related methods.
Subsequently, clustering techniques are applied to rank PDGs' similarity in accordance with the distance to
the enquiring PDG. Finally, the results are visualized with clear differencing with the query PDG.
Our evaluation shows that PDG-CS can be effectively applied in both code solution searching and Malicious
The Linux Kernel Dataset: A Benchmark for Feature Location （Mar. 2010– Jun. 2011, NUS）
Many software maintenance tasks need to locate code components that implement a certain feature
(termed as feature location). Feature location has been an active research area for about two decades.
However, there lack of publicly available, large scale benchmarks for evaluating and comparing feature
location approaches.In this paper, we present a Linux-Kernel based benchmark for feature location
research (video link). This benchmark is large scale and extensible. It provides rich feature and program
information and accurate ground-truth links between features and code units.This benchmark supports
the evaluation of a wide range of feature location approaches. It allows researchers to gain deeper
insights into existing approaches and how they can be improved. It also enables communication and
collaboration among different researchers.
CloneDifferentiator - Semantic Differencing of Clones（Oct. 2009– Mar. 2010, NUS）
Clone detection provides a scalable and efficient way to detect similar code fragments.
But it offers limited explanation of differences of functions performed by clones and variations of
control and data flows of clones. We refer to such differences as semantic differences of clones.
Understanding these semantic differences is essential to correctly interpret cloning information
and perform maintenance tasks on clones. Manual analysis of semantic differences of clones is
complicated and error-prone.In the paper, we present our clone analysis tool, called CloneDifferentiator.
Our tool automatically characterizes clones returned by a clone detector by differentiating Program Dependence
Graphs (PDGs) of clones. CloneDifferentiator is able to provide a precise characterization of semantic
differences of clones. It can provide an effective means of analyzing clones in a task oriented manner.
XVCL Solution for TuitionPayment （Oct. 2008– Aug. 2009,NUS）
Having set up reusable core assets for a Software Product Line (SPL), it is a common practice
to apply Variation Techniques (VTs) to manage variant features. As each VT can handle only certain
types of variability,multiple VTs are often employed, such as conditional compilation, configuration
parameters or build toolsOur earlier study of an SPL at Fudan Wingsoft Ltd revealed potential scalability
Wingsoft product line, with a single, uniform VT of XML-based Variant Configuration Language (XVCL).
This paper provides a proof-of-concept that commonly used variation techniques can indeed be superseded
by a subset of XVCL, in a simple and natural way. We describe the essence of the XVCL solution,and evaluate the
benefits and trade-offs involved in multiple VTs solution and single VT - XVCL