Software Defect Prediction (SDP) is a critical and highly valuable task in the field of software engineering, aimed at identifying potential defects in software systems at an early stage of development. Early detection of defects not only significantly improves the overall quality and reliability of software but also reduces the cost and time associated with post-release maintenance and bug fixing. Traditionally, SDP approaches have relied on machine learning and deep learning models that use manually engineered software metrics, such as code complexity, churn, and historical defect data. While these features provide useful information, they often fail to capture the rich semantic, syntactic, and contextual characteristics present in source code. As a result, these models can struggle to generalize effectively across diverse codebases or detect subtle defect patterns embedded within complex software structures.
In recent years, significant progress in Natural Language Processing (NLP) has opened new avenues for software analysis. Pre-trained transformer models, such as BERT and its variants, have demonstrated remarkable ability in learning contextual and semantic representations from sequential data. These models can capture long-range dependencies, subtle patterns, and the structural relationships between elements in a sequence, making them particularly suitable for tasks that involve understanding source code. Source code, although distinct from natural language, shares several properties, including sequential structure, hierarchical organization, and the presence of meaningful tokens and identifiers. Leveraging these similarities, transformer-based models can be adapted to software-related tasks such as code summarization, code completion, and defect prediction.
In this work, we propose a novel approach for software defect prediction that leverages the supervised fine-tuning of DistilBERT, a lightweight and efficient variant of BERT. DistilBERT is designed to retain much of the language understanding capabilities of BERT while significantly reducing computational overhead, making it a practical choice for large-scale or resource-constrained applications. Our method involves representing source code as sequences of tokens that are input to the DistilBERT model, which is then fine-tuned on labeled defect datasets to predict whether a given code segment is defective or not. By using DistilBERT, our approach combines the benefits of deep contextualized embeddings with the efficiency required for real-world software engineering tasks.
We evaluate our proposed model on the Devign dataset, a widely used benchmark for SDP provided by the CodeXGLUE framework. Performance metrics including accuracy, precision, recall, and F1-score are used to assess the effectiveness of our approach. The experimental results indicate that DistilBERT-based defect prediction achieves competitive performance relative to existing baseline models and state-of-the-art methods while requiring significantly fewer computational resources. Furthermore, the results suggest that transformer-based models can effectively capture the semantic and contextual cues that are often missed by traditional code metrics, leading to improved detection of subtle and complex defects.
Overall, our study highlights the potential of compact transformer models such as DistilBERT as practical, efficient, and effective tools for software defect prediction. The findings provide strong evidence that pre-trained language models, when properly fine-tuned on code data, can serve as powerful alternatives to conventional machine learning methods, bridging the gap between NLP advancements and software engineering applications. This work lays the foundation for further research into applying lightweight transformer architectures to other software analytics tasks, offering a promising direction for enhancing software quality and reliability in modern software development pipelines.