The package manager (PM) is crucial to most technology stacks, acting as a broker to ensure that a verified dependency package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of PMs with various features. While our recent study indicates that package management features of PM are related to end-user experiences, it is unclear what those issues are and what information is required to resolve them. In this paper, we have investigated PM issues faced by end-users through an empirical study of content on Stack Overflow (SO). We carried out a qualitative analysis of 1,131 questions and their accepted answer posts for three popular PMs (i.e., Maven, npm, and NuGet) to identify issue types, underlying causes, and their resolutions. Our results confirm that end-users struggle with PM tool usage (approximately 64-72%). We observe that most issues are raised by end-users due to lack of instructions and errors messages from PM tools. In terms of issue resolution, we find that external link sharing is the most common practice to resolve PM issues. Additionally, we observe that links pointing to useful resources (i.e., official documentation websites, tutorials, etc.) are most frequently shared, indicating the potential for tool support and the ability to provide relevant information for PM end-users.
2022
V-Achilles: An Interactive Visualization of Transitive Security Vulnerabilities
Vipawan Jarukitpipat, Klinton Chhun, Wachirayana Wanprasert, and 7 more authors
In IEEE/ACM International Conference on Automated Software Engineering (ASE) Oct 2022
A key threat to the usage of third-party dependencies has been the threat of security vulnerabilities, which risks unwanted access to a user application. As part of an ecosystem of dependencies, users of a library are prone to both the direct and transitive dependencies adopted into their applications. Recent work involves tool supports for vulnerable dependency updates, rarely showing the complexity of the transitive updates. In this paper, we introduce our solution to support vulnerability updating in npm. V-Achilles is a prototype that shows a visualization (ie, using dependency graphs) affected by vulnerability attacks. In addition to the tool overview, we highlight three use cases to demonstrate the usefulness and application of our prototype with real-world npm packages. The prototype is available at https://github.com/MUICTSERU/V-Achilles, with an accompanying video demonstration at https://www.youtube.com/watch?v=tspiZfhMNcs.
An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode
Sila Lertbanjongngam, Bodin Chinthanet, Takashi Ishio, and 5 more authors
In International Workshop on Software Clones (IWSC) Oct 2022
AlphaCode is a code generation system for assisting software developers in solving competitive programming problems using natural language problem descriptions. Despite the advantages of the code generating system, the open source community expressed concerns about practicality and data licensing. However, there is no research investigating generated codes in terms of code clone and performance. In this paper, we conduct an empirical study to find code similarities and performance differences between AlphaCode-generated codes and human codes. The results show that (i) the generated codes from AlphaCode are similar to human codes (i.e., the average maximum similarity score is 0.56) and (ii) the generated code performs on par with or worse than the human code in terms of execution time and memory usage. Moreover, AlphaCode tends to generate more similar codes to humans for low-difficulty problems (i.e., four cases have the exact same codes). It also employs excessive nested loops and unnecessary variable declarations for high-difficulty problems, which cause low performance regarding our manual investigation. The replication package is available at https:/doi.org/10.5281/zenodo.6820681
On the Use of Refactoring in Security Vulnerability Fixes: An Exploratory Study on Maven Libraries
Ayano Ikegami, Raula Gaikovina Kula, Bodin Chinthanet, and 4 more authors
In International Conference on Evaluation and Assessment in Software Engineering (EASE) Jun 2022
Third-party library dependencies are commonplace in today’s software development. With the growing threat of security vulnerabilities, applying security fixes in a timely manner is important to protect software systems. As such, the community developed a list of software and hardware weakness known as Common Weakness Enumeration (CWE) to assess vulnerabilities. Prior work has revealed that maintenance activities such as refactoring code potentially correlate with security-related aspects in the source code. In this work, we explore the relationship between refactoring and security by analyzing refactoring actions performed jointly with vulnerability fixes in practice. We conducted a case study to analyze 143 maven libraries in which 351 known vulnerabilities had been detected and fixed. Surprisingly, our exploratory results show that developers incorporate refactoring operations in their fixes, with 31.9% (112 out of 351) of the vulnerabilities paired with refactoring actions. We envision this short paper to open up potential new directions to motivate automated tool support, allowing developers to deliver fixes faster, while maintaining their code.
Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale
Pattara Leelaprute, Bodin Chinthanet, Supatsara Wattanakriengkrai, and 3 more authors
In International Conference on Program Comprehension (ICPC) May 2022
In the field of data science, and for academics in general, the Python programming language is a popular choice, mainly because of its libraries for storing, manipulating, and gaining insight from data. Evidence includes the versatile set of machine learning, data visualization, and manipulation packages used for the ever-growing size of available data. The Zen of Python is a set of guiding design principles that developers use to write acceptable and elegant Python code. Most principles revolve around simplicity. However, as the need to compute large amounts of data, performance has become a necessity for the Python programmer. The new idea in this paper is to confirm whether writing the Pythonic way peaks performance at scale. As a starting point, we conduct a set of preliminary experiments to evaluate nine Pythonic code examples by comparing the performance of both Pythonic and Non-Pythonic code snippets. Our results reveal that writing in Pythonic idioms may save memory and time. We show that incorporating list comprehension, generator expression, zip, and itertools.zip_longest idioms can save up to 7,000 MB and up to 32.25 seconds. The results open more questions on how they could be utilized in a real-world setting. The replication package includes all scripts, and the results are available at https://doi.org/10.5281/zenodo.5712349
GitHub repositories with links to academic papers: Public access, traceability, and evolution
Supatsara Wattanakriengkrai, Bodin Chinthanet, Hideaki Hata, and 4 more authors
Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of open-source scientific software which implements bleeding-edge science in its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the current practice of establishing and maintaining such links remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conduct a large-scale study of 20 thousand GitHub repositories that make references to academic papers. We use a mixed-methods approach to identify public access, traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are public access. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. We find that academic papers from top-tier SE venues are not likely to reference a repository, but when they do, they usually link to a GitHub software repository. In a network of arXiv papers and referenced repositories, we find that the most referenced papers are (i) highly-cited in academia and (ii) are referenced by repositories written in different programming languages.
SōjiTantei: Function-Call Reachability Detection of Vulnerable Code for npm Packages
Bodin Chinthanet, Raula Gaikovina Kula, Rodrigo Eliza Zapata, and 3 more authors
IEICE Transactions on Information and Systems Jan 2022
It has become common practice for software projects to adopt third-party dependencies. Developers are encouraged to update any outdated dependency to remain safe from potential threats of vulnerabilities. In this study, we present an approach to aid developers show whether or not a vulnerable code is reachable for JavaScript projects. Our prototype, SōjiTantei, is evaluated in two ways (i) the accuracy when compared to a manual approach and (ii) a larger-scale analysis of 780 clients from 78 security vulnerability cases. The first evaluation shows that SōjiTantei has a high accuracy of 83.3%, with a speed of less than a second analysis per client. The second evaluation reveals that 68 out of the studied 78 vulnerabilities reported having at least one clean client. The study proves that automation is promising with the potential for further improvement.
2021
Contrasting Third-Party Package Management User Experience
Syful Islam, Raula Gaikovina Kula, Christoph Treude, and 3 more authors
In IEEE International Conference on Software Maintenance and Evolution (ICSME) Sep 2021
The management of third-party package dependencies is crucial to most technology stacks, with package managers acting as brokers to ensure that a verified package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of package ecosystems with their own management features. While recent studies have shown that developers struggle to migrate their dependencies, the common assumption is that package ecosystems are used without any issue. In this study, we explore 13 package ecosystems to understand whether their features correlate with the experience of their users. By studying experience through the questions that developers ask on the question-and-answer site Stack Overflow, we find that developer questions are grouped into three themes (i.e., Package management, Input-Output, and Package Usage). Our preliminary analysis indicates that specific features are correlated with the user experience. Our work lays out future directions to investigate the trade-offs involved in designing the ideal package ecosystem.
Lags in the Release, Adoption, and Propagation of Npm Vulnerability Fixes
Bodin Chinthanet, Raula Gaikovina Kula, Shane McIntosh, and 3 more authors
Security vulnerability in third-party dependencies is a growing concern not only for developers of the affected software, but for the risks it poses to an entire software ecosystem, e.g., Heartbleed vulnerability. Recent studies show that developers are slow to respond to the threat of vulnerability, sometimes taking four to eleven months to act. To ensure quick adoption and propagation of a release that contains the fix (fixing release), we conduct an empirical investigation to identify lags that may occur between the vulnerable release and its fixing release (package-side fixing release). Through a preliminary study of 231 package-side fixing release of npm projects on GitHub, we observe that a fixing release is rarely released on its own, with up to 85.72% of the bundled commits being unrelated to a fix. We then compare the package-side fixing release with changes on a client-side (client-side fixing release). Through an empirical study of the adoption and propagation tendencies of 1,290 package-side fixing releases that impact throughout a network of 1,553,325 releases of npm packages, we find that stale clients require additional migration effort, even if the package-side fixing release was quick (i.e., package-side fixing releasetypeSpatch). Furthermore, we show the influence of factors such as the branch that the package-side fixing release lands on and the severity of vulnerability on its propagation. In addition to these lags we identify and characterize, this paper lays the groundwork for future research on how to mitigate propagation lags in an ecosystem.
2020
Code-Based Vulnerability Detection in Node.Js Applications: How Far Are We?
Bodin Chinthanet, Serena Elisa Ponta, Henrik Plate, and 4 more authors
In IEEE/ACM International Conference on Automated Software Engineering (ASE) Dec 2020
With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection tool for Node.js applications. Our case study lists the challenges encountered while implementing our Node.js vulnerable code detector.
2019
2018
Towards Smoother Library Migrations: A Look at Vulnerable Dependency Migrations at Function Level for npm JavaScript Packages
Rodrigo Elizalde Zapata, Raula Gaikovina Kula, Bodin Chinthanet, and 3 more authors
In IEEE International Conference on Software Maintenance and Evolution (ICSME) Sep 2018
It has become common practice for software projects to adopt third-party libraries, allowing developers full access to functions that otherwise will take time and effort to create them-selves. Regardless of migration effort involved, developers are encouraged to maintain their library dependencies by updating any outdated dependency, so as to remain safe from potential threats such as vulnerabilities. Through a manual inspection of a total of 60 client projects from three cases of high severity vulnerabilities, we investigate whether or not clients are really safe from these threats. Surprisingly, our early results show evidence that up to 73.3% of outdated clients were actually safe from the threat. This is the first work to confirm that analysis at the library level is indeed an overestimation. This result to pave the path for future studies to empirically investigate and validate this phenomena, and is towards aiding a smoother library migration for client developers.
2017
2016
Graph Clustering-Based Emerging Event Detection from Twitter Data Stream
Bundit Manaskasemsak, Bodin Chinthanet, and Arnon Rungsawang
In International Conference on Network, Communication and Computing (ICNCC) Dec 2016
Event detection from online social media is nowadays important to many fields, such as crisis notification, health epidemic identification, and trending topic extraction. To deal with the problem, in this paper we propose a new methodology to capture emerging events from Twitter data stream. We define a tweet graph representing tweet term vectors as vertices associated by their content similarities. Based on the assumption that an event denotes a set of similar tweets, we therefore employ the Markov clustering algorithm on the tweet graph to group related tweets. Then, the connected of similar events between consecutive time intervals are classified as an event trend line. Finally, the first one of those connected events will be considered as the emerging event. Performance evaluation of the proposed approach has been done on thirty days of extracted Twitter data stream. The results of detected emerging events have been studied and evaluated by fifteen volunteers with 70-80% precision.
A Review and Comparison of Methods for Determining the Best Analogies in Analogy-Based Software Effort Estimation
Bodin Chinthanet, Passakorn Phannachitta, Yasutaka Kamei, and 4 more authors
In ACM Symposium on Applied Computing (SAC) Apr 2016
Analogy-based effort estimation (ABE) is a commonly used software development effort estimation method. The processes of ABE are based on a reuse of effort values from similar past projects, where the appropriate numbers of past projects (k values) to be reused is one of the long-standing debates in ABE research studies. To date, many approaches to find this k value have been continually proposed. One important reason for this inconclusive debate is that different studies appear to produce different conclusions of the k value to be appropriate. Therefore, in this study, we revisit 8 common approaches to the k value being most appropriate in general situations. With a more robust and comprehensive evaluation methodology using 5 robust error measures subject to the Wilcoxon rank-sum statistical test, we found that conicting results in the previous studies were not mainly due to the use of different methodologies nor different datasets, but the performance of the different approaches are actually varied widely.