I am a research fellow at Cyber Security Lab (CSL), Nanyang Technological University (NTU), Singapore and my Ph.D study is supervised by Full Prof.Yang Liu at NTU (2019-2022). Before that, I was a research assistant at CSL. My research focuses on employing deep learning techniques to assist software engineering development as follows:
Program Semantics Comprehension. We aim at learning program semantics with deep learning techniques. We conduct a broad research to explore learning program semantics with different program structures. An empirical study is conducted to illustrate different program structures for learning program semantics (SANER 2022). We also employ GNNs to learn program semantics for various software engineering tasks such as software vulnerability detection (NeurIPS 2019, FSE 2023), source code summarization (ICLR 2021), deep code search (TSE 2022). A transformer-based model is also designed for repairing program compilation errors (ASE 2022).
Code Pre-trained Models. We conduct a broad research to design and analyze code pre-trained models. We design several probing tasks to analyze code pre-trained models in learning syntax and semantics. We propose to enhance the robustness of code pre-trained models by contrastive learning (ICSE 2023). Some attack approaches are also proposed to attack code pre-trained models(ACL 2023, EMNLP 2023). In addition, we propose a retrieval-based approach to enhanced LLMs (ASE 2023 Distinguished Paper Award). Furthermore, we conduct some studies to integret ChatGPT in SE for example an empirical study to compare ChatGPT with code pre-trained models for code refinement (ICSE 2024).
Software Repository Mining. We design several tools for commits to facilitate software engineering development such as commit message generation (TSE 2020), security patch identification (TOSEM 2021, TDSC 2022). We also train a large-scale pre-trained model CommitBART to support commit-related classification tasks and generation tasks.
- (10/2023) Our paper “A Black-Box Attack on Code Models via Representation Nearest Neighbor Search” is accepted by Findings of EMNLP23.
- (09/2023) Our paper “Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases” won ACM SIGSOFT Distinguished Paper Award (ASE23).
- (08/2023) Our paper “Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study” is accepted by ICSE24.
- (07/2023) Our paper “Learning Program Semantics for Vulnerability Detection via Vulnerability-specific Inter-procedural Slicing” is accepted by ESEC/FSE23.
- (07/2023) Our paper “Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases” is accepted by ASE23.
- (07/2023) Our paper “Learning to Locate and Describe Vulnerabilities” is accepted by ASE23.
- (05/2023) Our paper “Multi-target Backdoor Attacks for Code Pre-trained Models” is accepted by ACL23.
- (12/2022) Our paper “GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search” is accepted by TSE.
- (12/2022) Our paper “Learning Program Representations with a Tree-Structured Transformer” is accepted by SANER23.
- (12/2022) Our paper “ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning” is accepted by ICSE23.
- (08/2022) Our paper “TransRepair: Context-aware Program Repair for Compilation Errors” is accepted by ASE22.
- (08/2022) Join ICLR 2023, AAAI 2023 as reviewers.
- (07/2022) Our paper “Enhancing Security Patch Identification by Capturing Structures in Commits” is accepted by TDSC.
- (04/2022) I have submitted my Ph.D thesis “Learning program semantics via exploring program structures with deep learning”.
- (03/2022) Join NeurIPS 2022 as reviewers.
- (01/2022) Join ICML 2022 as reviewers.
- (12/2021）Our paper “Learning Program Semantics with Code Representations: An Empirical Study” is accepted by SANER 2022.
- (6/2021) Join ICLR 2022 as reviewers.
- (5/2021) Our paper “SPI: Automated Identification of Security Patches via Commits” is accepted by TOSEM.
- (4/2021) Join NeurIPS 2021 as reviewers.
- (1/2021) Our paper “Retrieval-Augmented Generation for Code Summarization via Hybrid GNN” is accepted by ICLR 2021 (spotlight).