About me

I am a research fellow at Cyber Security Lab (CSL), Nanyang Technological University (NTU), Singapore and my Ph.D study is supervised by Prof. Yang Liu at NTU (2019-2022). Before that, I was a research assistant at CSL. My research focuses on employing deep learning techniques to assist software engineering as follows:

Program Semantics Comprehension. We aim at learning program semantics with different deep learning techniques. We conduct a broad research to explore learning program semantics with different program structures. An empirical study is conducted to explore different program structures for learning program semantics (SANER 2022). We employ different GNN variants to learn program semantics for various software engineering tasks such as software vulnerability detection (NeurIPS 2019, FSE 2023, LCTES 2024), source code summarization (ICLR 2021), deep code search (TSE 2022). Transformer-based models are proposed for repairing program errors (ASE 2022, ISSRE 2024).

Code Pre-trained Models. We conduct a broad research to design and analyze code pre-trained models. We design several probing tasks to analyze code pre-trained models in learning syntax and semantics (TOSEM 2024). We propose to enhance the robustness of code pre-trained models by contrastive learning (ICSE 2023). Some attack approaches are proposed to attack code pre-trained models(ACL 2023, EMNLP 2023). In addition, we propose some retrieval-based approaches to enhanced code pre-trained models (ASE 2023 Distinguished Paper Award, ISSTA 2024). We also expore some studies to integret LLMs in SE for example an empirical study to compare ChatGPT with code pre-trained models for code refinement (ICSE 2024).

Software Repository Mining. We design several tools for GitHub commits to facilitate SE development such as commit message generation (TSE 2020), security patch identification (TOSEM 2021, TDSC 2022). We propose a large-scale pre-trained model CommitBART (TOSEM 2024) to support commit-related classification and generation tasks.

News

  • (10/2024) Our paper “SpecGen: Automated Generation of Formal Program Specifications via Large Language Models” is accepted by ICSE25.
  • (08/2024) Congrats to my wife Prof. Han at SouthEast University for receiving the National Science Foundation of China for Excellent Young Scholars (国家优青). Welcome collaboration!
  • (08/2024) Two papers are accepted by ASE24.
  • (07/2024) Our paper “RATCHET: Retrieval Augmented Transformer for Program Repair” is accepted by ISSRE24.
  • (07/2024) Our paper “Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications” is accepted by ICSE25.
  • (06/2024) Our paper “Automated Commit Intelligence by Pre-training” is accepted by TOSEM.
  • (04/2024) Our paper “Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities” is accepted by TOSEM.
  • (04/2024) Our paper “Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation” is accepted by LCTES24.
  • (03/2024) Our paper “FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion” is accepted by ISSTA24.
  • (01/2024) Our paper “BadEdit: Backdooring Large Language Models by Model Editing” is accepted by ICLR24.
  • (10/2023) Our paper “A Black-Box Attack on Code Models via Representation Nearest Neighbor Search” is accepted by Findings of EMNLP23.
  • (09/2023) Our paper “Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases” won ACM SIGSOFT Distinguished Paper Award (ASE23).
  • (08/2023) Our paper “Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study” is accepted by ICSE24.
  • (07/2023) Our paper “Learning Program Semantics for Vulnerability Detection via Vulnerability-specific Inter-procedural Slicing” is accepted by ESEC/FSE23.
  • (07/2023) Two papers are accepted by ASE23.
  • (05/2023) Our paper “Multi-target Backdoor Attacks for Code Pre-trained Models” is accepted by ACL23.
  • (12/2022) Our paper “GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search” is accepted by TSE.
  • (12/2022) Our paper “Learning Program Representations with a Tree-Structured Transformer” is accepted by SANER23.
  • (12/2022) Our paper “ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning” is accepted by ICSE23.
  • (08/2022) Our paper “TransRepair: Context-aware Program Repair for Compilation Errors” is accepted by ASE22.
  • (08/2022) Join ICLR 2023, AAAI 2023 as reviewers.
  • (07/2022) Our paper “Enhancing Security Patch Identification by Capturing Structures in Commits” is accepted by TDSC.
  • (04/2022) I have submitted my Ph.D thesis “Learning program semantics via exploring program structures with deep learning”.
  • (03/2022) Join NeurIPS 2022 as reviewers.
  • (01/2022) Join ICML 2022 as reviewers.
  • (12/2021)Our paper “Learning Program Semantics with Code Representations: An Empirical Study” is accepted by SANER 2022.
  • (6/2021) Join ICLR 2022 as reviewers.
  • (5/2021) Our paper “SPI: Automated Identification of Security Patches via Commits” is accepted by TOSEM.
  • (4/2021) Join NeurIPS 2021 as reviewers.
  • (1/2021) Our paper “Retrieval-Augmented Generation for Code Summarization via Hybrid GNN” is accepted by ICLR 2021 (spotlight).