About me

I am a research fellow at Cyber Security Lab (CSL), Nanyang Technological University (NTU), Singapore and my Ph.D study is supervised by Prof.Yang Liu at NTU (2019-2022). Before that, I was a research assistant at CSL. My research focuses on employing deep learning techniques to assist software engineering as follows:

Program Semantics Comprehension. We aim at learning program semantics with different deep learning techniques. We conduct a broad research to explore learning program semantics with different program structures. An empirical study is conducted to explore different program structures for learning program semantics (SANER 2022). We employ different GNN variants to learn program semantics for various software engineering tasks such as software vulnerability detection (NeurIPS 2019, FSE 2023), source code summarization (ICLR 2021), deep code search (TSE 2022). A transformer-based model is designed for repairing program compilation errors (ASE 2022).

Code Pre-trained Models. We conduct a broad research to design and analyze code pre-trained models. We design several probing tasks to analyze code pre-trained models in learning syntax and semantics (TOSEM 2024). We propose to enhance the robustness of code pre-trained models by contrastive learning (ICSE 2023). Some attack approaches are proposed to attack code pre-trained models(ACL 2023, EMNLP 2023). In addition, we propose some retrieval-based approaches to enhanced code pre-trained models (ASE 2023 Distinguished Paper Award, ISSTA 2024). We also expore some studies to integret ChatGPT in SE for example an empirical study to compare ChatGPT with code pre-trained models for code refinement (ICSE 2024).

Software Repository Mining. We design several tools for GitHub commits to facilitate SE development such as commit message generation (TSE 2020), security patch identification (TOSEM 2021, TDSC 2022). We propose a large-scale pre-trained model CommitBART (TOSEM 2024) to support commit-related classification and generation tasks.

News

  • (06/2024) Our paper “Automated Commit Intelligence by Pre-training” is accepted by TOSEM.
  • (04/2024) Our paper “Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities” is accepted by TOSEM.
  • (04/2024) Our paper “Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation” is accepted by LCTES24.
  • (03/2024) Our paper “FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion” is accepted by ISSTA24.
  • (01/2024) Our paper “BadEdit: Backdooring Large Language Models by Model Editing” is accepted by ICLR24.
  • (10/2023) Our paper “A Black-Box Attack on Code Models via Representation Nearest Neighbor Search” is accepted by Findings of EMNLP23.
  • (09/2023) Our paper “Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases” won ACM SIGSOFT Distinguished Paper Award (ASE23).
  • (08/2023) Our paper “Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study” is accepted by ICSE24.
  • (07/2023) Our paper “Learning Program Semantics for Vulnerability Detection via Vulnerability-specific Inter-procedural Slicing” is accepted by ESEC/FSE23.
  • (07/2023) Our paper “Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases” is accepted by ASE23.
  • (07/2023) Our paper “Learning to Locate and Describe Vulnerabilities” is accepted by ASE23.
  • (05/2023) Our paper “Multi-target Backdoor Attacks for Code Pre-trained Models” is accepted by ACL23.
  • (12/2022) Our paper “GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search” is accepted by TSE.
  • (12/2022) Our paper “Learning Program Representations with a Tree-Structured Transformer” is accepted by SANER23.
  • (12/2022) Our paper “ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning” is accepted by ICSE23.
  • (08/2022) Our paper “TransRepair: Context-aware Program Repair for Compilation Errors” is accepted by ASE22.
  • (08/2022) Join ICLR 2023, AAAI 2023 as reviewers.
  • (07/2022) Our paper “Enhancing Security Patch Identification by Capturing Structures in Commits” is accepted by TDSC.
  • (04/2022) I have submitted my Ph.D thesis “Learning program semantics via exploring program structures with deep learning”.
  • (03/2022) Join NeurIPS 2022 as reviewers.
  • (01/2022) Join ICML 2022 as reviewers.
  • (12/2021)Our paper “Learning Program Semantics with Code Representations: An Empirical Study” is accepted by SANER 2022.
  • (6/2021) Join ICLR 2022 as reviewers.
  • (5/2021) Our paper “SPI: Automated Identification of Security Patches via Commits” is accepted by TOSEM.
  • (4/2021) Join NeurIPS 2021 as reviewers.
  • (1/2021) Our paper “Retrieval-Augmented Generation for Code Summarization via Hybrid GNN” is accepted by ICLR 2021 (spotlight).