DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Analysis of Code Vulnerabilities in Repositories of GitHub and Rosettacode: A comparative Study


Article Information

Title: Analysis of Code Vulnerabilities in Repositories of GitHub and Rosettacode: A comparative Study

Authors: Abdul Malik, Muhammad Shumail Naveed

Journal: International Journal of Innovations in Science & Technology

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31
Y 2023-07-01 2024-09-30
Y 2021-07-01 2022-06-30

Publisher: 50SEA JOURNALS (SMC-PRIVATE) LIMITED

Country: Pakistan

Year: 2022

Volume: 4

Issue: 2

Language: English

Keywords: Software VulnerabilitySoftware SecurityProgramming PortalVulnerability Severity

Categories

Abstract

Open-source code hosted online at programming portals is present in 99% of commercial software and is common practice among developers for rapid prototyping and cost-effective development. However, research reports the presence of vulnerabilities, which result in catastrophic security compromise, and the individual, organization, and even national secrecy are all victims of this circumstance. One of the frustrating aspects of vulnerabilities is that vulnerabilities manifest themselves in hidden ways that software developers are unaware of. One of the most critical tasks in ensuring software security is vulnerability detection, which jeopardizes core security concepts like integrity, authenticity, and availability. This study aims to explore security-related vulnerabilities in programming languages such as C, C++, and Java and present the disparities between them hosted at popular code repositories. To attain this purpose, 708 programs were examined by severity-based guidelines. A total of 1371 vulnerable codes were identified, of which 327 in C, 51 in C++, and 993 in Java. Statistical analysis also indicated a substantial difference between them, as there is ample evidence that the Kruskal-Wallis H-test p-value (.000) is below the 0.05 significance level. The Mann-Whitney Test mean rank for GitHub (Mean-rank=676.05) and Rosettacode (Mean-rank=608.64) are also different. The novelty of this article is to identify security vulnerabilities and grasp the nature severity of vulnerability in popular code repositories. This study eventually manifests a guideline for choosing a secure programming language as a successful testing technique that targets vulnerabilities more liable to breaching security.
Full Text


Research Objective

To explore and compare security-related vulnerabilities in C, C++, and Java programming languages hosted on GitHub and RosettaCode, and to identify disparities between them.


Methodology

The study involved examining 708 programs from GitHub and RosettaCode, focusing on C, C++, and Java. These programs were selected based on common algorithms and tasks across various disciplines. The code was preprocessed to standardize it for analysis. The Yasca tool was used to scan for vulnerabilities, categorizing them by severity (Critical, High, Warning, Low, Informational). Statistical tests, including the Mann-Whitney Test, Kolmogorov-Smirnov Test, Wald-Wolfowitz Test, and Kruskal-Wallis H-test, were employed to analyze the significance of the findings.

Methodology Flowchart
                        graph TD;
    A[Select Programs from GitHub & RosettaCode C, C++, Java] --> B[Preprocess Code];
    B --> C[Scan for Vulnerabilities using Yasca];
    C --> D[Categorize Vulnerabilities by Severity];
    D --> E[Perform Statistical AnalysisMann-Whitney, Kruskal-Wallis, etc.];
    E --> F[Analyze Results & Draw Conclusions];                    

Discussion

The study highlights that security vulnerabilities are a significant concern in open-source code, impacting software security, confidentiality, integrity, and availability. The comparative analysis of GitHub and RosettaCode, and C, C++, and Java, reveals that certain languages and platforms are more prone to vulnerabilities. The prevalence of informational severity vulnerabilities suggests that while critical issues are less common, a large number of less severe vulnerabilities can still pose risks. The findings suggest that developers should consider both the programming language and the hosting portal when aiming for secure software development.


Key Findings

- A total of 1371 vulnerable codes were identified: 327 in C, 51 in C++, and 993 in Java.
- GitHub repositories contained more vulnerabilities than RosettaCode repositories.
- Java exhibited the highest number of vulnerabilities, followed by C, and then C++.
- Informational severity vulnerabilities were the most prevalent across all categories, while critical severity vulnerabilities were the least reported.
- Statistical analysis confirmed significant differences in code vulnerabilities between programming languages and between programming portals.


Conclusion

The research concludes that GitHub (Java) is the most vulnerable, followed by RosettaCode (Java), GitHub (C), RosettaCode (C), GitHub (C++), and RosettaCode (C++). Informational severity vulnerabilities are most frequent. RosettaCode (C++) appears to be the most secure in terms of vulnerabilities. The study provides a guideline for selecting secure programming languages and effective testing approaches to identify vulnerabilities.


Fact Check

- Total vulnerable codes identified: 1371 (327 in C, 51 in C++, 993 in Java).
- Kruskal-Wallis H-test p-value: .000, which is below the 0.05 significance level, indicating a statistically significant difference in code vulnerabilities between different programming languages.
- Mann-Whitney Test mean ranks for GitHub (676.05) and RosettaCode (608.64) indicate a statistically significant difference in vulnerability disclosure between the portals.


Mind Map

Loading PDF...

Loading Statistics...