Publications | Bodin Chinthanet

2023

An Empirical Study of Package Management Issues via Stack Overflow

Syful Islam, Raula Gaikovina Kula, Christoph Treude, and 3 more authors

IEICE Transactions on Information and Systems Feb 2023

Abs Bib HTML

The package manager (PM) is crucial to most technology stacks, acting as a broker to ensure that a verified dependency package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of PMs with various features. While our recent study indicates that package management features of PM are related to end-user experiences, it is unclear what those issues are and what information is required to resolve them. In this paper, we have investigated PM issues faced by end-users through an empirical study of content on Stack Overflow (SO). We carried out a qualitative analysis of 1,131 questions and their accepted answer posts for three popular PMs (i.e., Maven, npm, and NuGet) to identify issue types, underlying causes, and their resolutions. Our results confirm that end-users struggle with PM tool usage (approximately 64-72%). We observe that most issues are raised by end-users due to lack of instructions and errors messages from PM tools. In terms of issue resolution, we find that external link sharing is the most common practice to resolve PM issues. Additionally, we observe that links pointing to useful resources (i.e., official documentation websites, tutorials, etc.) are most frequently shared, indicating the potential for tool support and the ability to provide relevant information for PM end-users.
@article{Islam:IEICE2023, title = {An Empirical Study of Package Management Issues via Stack Overflow}, author = {Islam, Syful and Kula, Raula Gaikovina and Treude, Christoph and Chinthanet, Bodin and Ishio, Takashi and Matsumoto, Kenichi}, journal = {IEICE Transactions on Information and Systems}, volume = {E106-D}, number = {2}, pages = {}, month = feb, year = {2023}, }

2022

V-Achilles: An Interactive Visualization of Transitive Security Vulnerabilities

Vipawan Jarukitpipat, Klinton Chhun, Wachirayana Wanprasert, and 7 more authors

In IEEE/ACM International Conference on Automated Software Engineering (ASE) Oct 2022

Abs Bib HTML

A key threat to the usage of third-party dependencies has been the threat of security vulnerabilities, which risks unwanted access to a user application. As part of an ecosystem of dependencies, users of a library are prone to both the direct and transitive dependencies adopted into their applications. Recent work involves tool supports for vulnerable dependency updates, rarely showing the complexity of the transitive updates. In this paper, we introduce our solution to support vulnerability updating in npm. V-Achilles is a prototype that shows a visualization (ie, using dependency graphs) affected by vulnerability attacks. In addition to the tool overview, we highlight three use cases to demonstrate the usefulness and application of our prototype with real-world npm packages. The prototype is available at https://github.com/MUICTSERU/V-Achilles, with an accompanying video demonstration at https://www.youtube.com/watch?v=tspiZfhMNcs.
@inproceedings{Jarukitpipat:ASE2022, author = {Jarukitpipat, Vipawan and Chhun, Klinton and Wanprasert, Wachirayana and Ragkhitwetsagul, Chaiyong and Choetkiertikul, Morakot and Sunetnanta, Thanwadee and Kula, Raula Gaikovina and Chinthanet, Bodin and Ishio, Takashi and Matsumoto, Kenichi}, title = {V-Achilles: An Interactive Visualization of Transitive Security Vulnerabilities}, month = oct, year = {2022}, booktitle = {IEEE/ACM International Conference on Automated Software Engineering (ASE)}, pages = {}, numpages = {}, }
An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

Sila Lertbanjongngam, Bodin Chinthanet, Takashi Ishio, and 5 more authors

In International Workshop on Software Clones (IWSC) Oct 2022

Abs Bib HTML

AlphaCode is a code generation system for assisting software developers in solving competitive programming problems using natural language problem descriptions. Despite the advantages of the code generating system, the open source community expressed concerns about practicality and data licensing. However, there is no research investigating generated codes in terms of code clone and performance. In this paper, we conduct an empirical study to find code similarities and performance differences between AlphaCode-generated codes and human codes. The results show that (i) the generated codes from AlphaCode are similar to human codes (i.e., the average maximum similarity score is 0.56) and (ii) the generated code performs on par with or worse than the human code in terms of execution time and memory usage. Moreover, AlphaCode tends to generate more similar codes to humans for low-difficulty problems (i.e., four cases have the exact same codes). It also employs excessive nested loops and unnecessary variable declarations for high-difficulty problems, which cause low performance regarding our manual investigation. The replication package is available at https:/doi.org/10.5281/zenodo.6820681
@inproceedings{Lertbanjongngam:IWSC2022, title = {An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode}, author = {Lertbanjongngam, Sila and Chinthanet, Bodin and Ishio, Takashi and Kula, Raula Gaikovina and Leelaprute, Pattara and Manaskasemsak, Bundit and Rungsawang, Arnon and Matsumoto, Kenichi}, month = oct, year = {2022}, booktitle = {International Workshop on Software Clones (IWSC)}, pages = {}, numpages = {}, }
On the Use of Refactoring in Security Vulnerability Fixes: An Exploratory Study on Maven Libraries

Ayano Ikegami, Raula Gaikovina Kula, Bodin Chinthanet, and 4 more authors

In International Conference on Evaluation and Assessment in Software Engineering (EASE) Jun 2022

Abs Bib HTML

Third-party library dependencies are commonplace in today’s software development. With the growing threat of security vulnerabilities, applying security fixes in a timely manner is important to protect software systems. As such, the community developed a list of software and hardware weakness known as Common Weakness Enumeration (CWE) to assess vulnerabilities. Prior work has revealed that maintenance activities such as refactoring code potentially correlate with security-related aspects in the source code. In this work, we explore the relationship between refactoring and security by analyzing refactoring actions performed jointly with vulnerability fixes in practice. We conducted a case study to analyze 143 maven libraries in which 351 known vulnerabilities had been detected and fixed. Surprisingly, our exploratory results show that developers incorporate refactoring operations in their fixes, with 31.9% (112 out of 351) of the vulnerabilities paired with refactoring actions. We envision this short paper to open up potential new directions to motivate automated tool support, allowing developers to deliver fixes faster, while maintaining their code.
@inproceedings{Ikegami:EASE2022, author = {Ikegami, Ayano and Kula, Raula Gaikovina and Chinthanet, Bodin and Maeprasart, Vittunyuta and Ouni, Ali and Ishio, Takashi and Matsumoto, Kenichi}, title = {On the Use of Refactoring in Security Vulnerability Fixes: An Exploratory Study on Maven Libraries}, month = jun, year = {2022}, booktitle = {International Conference on Evaluation and Assessment in Software Engineering (EASE)}, pages = {288–293}, numpages = {6} }
Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale

Pattara Leelaprute, Bodin Chinthanet, Supatsara Wattanakriengkrai, and 3 more authors

In International Conference on Program Comprehension (ICPC) May 2022

Abs Bib HTML

In the field of data science, and for academics in general, the Python programming language is a popular choice, mainly because of its libraries for storing, manipulating, and gaining insight from data. Evidence includes the versatile set of machine learning, data visualization, and manipulation packages used for the ever-growing size of available data. The Zen of Python is a set of guiding design principles that developers use to write acceptable and elegant Python code. Most principles revolve around simplicity. However, as the need to compute large amounts of data, performance has become a necessity for the Python programmer. The new idea in this paper is to confirm whether writing the Pythonic way peaks performance at scale. As a starting point, we conduct a set of preliminary experiments to evaluate nine Pythonic code examples by comparing the performance of both Pythonic and Non-Pythonic code snippets. Our results reveal that writing in Pythonic idioms may save memory and time. We show that incorporating list comprehension, generator expression, zip, and itertools.zip_longest idioms can save up to 7,000 MB and up to 32.25 seconds. The results open more questions on how they could be utilized in a real-world setting. The replication package includes all scripts, and the results are available at https://doi.org/10.5281/zenodo.5712349
@inproceedings{Leelaprute:ICPC2022, author = {Leelaprute, Pattara and Chinthanet, Bodin and Wattanakriengkrai, Supatsara and Kula, Raula Gaikovina and Jaisri, Pongchai and Ishio, Takashi}, title = {Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale}, month = may, year = {2022}, booktitle = {International Conference on Program Comprehension (ICPC)}, pages = {575–579}, numpages = {5} }
GitHub repositories with links to academic papers: Public access, traceability, and evolution

Supatsara Wattanakriengkrai, Bodin Chinthanet, Hideaki Hata, and 4 more authors

Journal of Systems and Software (JSS) Jan 2022

Abs Bib HTML

Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of open-source scientific software which implements bleeding-edge science in its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the current practice of establishing and maintaining such links remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conduct a large-scale study of 20 thousand GitHub repositories that make references to academic papers. We use a mixed-methods approach to identify public access, traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are public access. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. We find that academic papers from top-tier SE venues are not likely to reference a repository, but when they do, they usually link to a GitHub software repository. In a network of arXiv papers and referenced repositories, we find that the most referenced papers are (i) highly-cited in academia and (ii) are referenced by repositories written in different programming languages.
@article{Wattanakriengkrai:JSS2022, title = {GitHub repositories with links to academic papers: Public access, traceability, and evolution}, journal = {Journal of Systems and Software (JSS)}, volume = {183}, pages = {111117}, month = jan, year = {2022}, author = {Wattanakriengkrai, Supatsara and Chinthanet, Bodin and Hata, Hideaki and Kula, Raula Gaikovina and Treude, Christoph and Guo, Jin and Matsumoto, Kenichi}, }
SōjiTantei: Function-Call Reachability Detection of Vulnerable Code for npm Packages

Bodin Chinthanet, Raula Gaikovina Kula, Rodrigo Eliza Zapata, and 3 more authors

IEICE Transactions on Information and Systems Jan 2022

Abs Bib HTML

It has become common practice for software projects to adopt third-party dependencies. Developers are encouraged to update any outdated dependency to remain safe from potential threats of vulnerabilities. In this study, we present an approach to aid developers show whether or not a vulnerable code is reachable for JavaScript projects. Our prototype, SōjiTantei, is evaluated in two ways (i) the accuracy when compared to a manual approach and (ii) a larger-scale analysis of 780 clients from 78 security vulnerability cases. The first evaluation shows that SōjiTantei has a high accuracy of 83.3%, with a speed of less than a second analysis per client. The second evaluation reveals that 68 out of the studied 78 vulnerabilities reported having at least one clean client. The study proves that automation is promising with the potential for further improvement.
@article{Chinthanet:IEICE2022, title = {SōjiTantei: Function-Call Reachability Detection of Vulnerable Code for npm Packages}, author = {Chinthanet, Bodin and Kula, Raula Gaikovina and Zapata, Rodrigo Eliza and Ishio, Takashi and Matsumoto, Kenichi and Ihara, Akinori}, journal = {IEICE Transactions on Information and Systems}, volume = {E105.D}, number = {1}, pages = {19-20}, month = jan, year = {2022}, }

2021

Contrasting Third-Party Package Management User Experience

Syful Islam, Raula Gaikovina Kula, Christoph Treude, and 3 more authors

In IEEE International Conference on Software Maintenance and Evolution (ICSME) Sep 2021

Abs Bib HTML

The management of third-party package dependencies is crucial to most technology stacks, with package managers acting as brokers to ensure that a verified package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of package ecosystems with their own management features. While recent studies have shown that developers struggle to migrate their dependencies, the common assumption is that package ecosystems are used without any issue. In this study, we explore 13 package ecosystems to understand whether their features correlate with the experience of their users. By studying experience through the questions that developers ask on the question-and-answer site Stack Overflow, we find that developer questions are grouped into three themes (i.e., Package management, Input-Output, and Package Usage). Our preliminary analysis indicates that specific features are correlated with the user experience. Our work lays out future directions to investigate the trade-offs involved in designing the ideal package ecosystem.
@inproceedings{Islam:ICSME2021, author = {Islam, Syful and Kula, Raula Gaikovina and Treude, Christoph and Chinthanet, Bodin and Ishio, Takashi and Matsumoto, Kenichi}, booktitle = {IEEE International Conference on Software Maintenance and Evolution (ICSME)}, title = {Contrasting Third-Party Package Management User Experience}, month = sep, year = {2021}, volume = {}, number = {}, pages = {664-668}, }
Lags in the Release, Adoption, and Propagation of Npm Vulnerability Fixes

Bodin Chinthanet, Raula Gaikovina Kula, Shane McIntosh, and 3 more authors

Empirical Software Engineering (EMSE) May 2021

Abs Bib HTML

Security vulnerability in third-party dependencies is a growing concern not only for developers of the affected software, but for the risks it poses to an entire software ecosystem, e.g., Heartbleed vulnerability. Recent studies show that developers are slow to respond to the threat of vulnerability, sometimes taking four to eleven months to act. To ensure quick adoption and propagation of a release that contains the fix (fixing release), we conduct an empirical investigation to identify lags that may occur between the vulnerable release and its fixing release (package-side fixing release). Through a preliminary study of 231 package-side fixing release of npm projects on GitHub, we observe that a fixing release is rarely released on its own, with up to 85.72% of the bundled commits being unrelated to a fix. We then compare the package-side fixing release with changes on a client-side (client-side fixing release). Through an empirical study of the adoption and propagation tendencies of 1,290 package-side fixing releases that impact throughout a network of 1,553,325 releases of npm packages, we find that stale clients require additional migration effort, even if the package-side fixing release was quick (i.e., package-side fixing releasetypeSpatch). Furthermore, we show the influence of factors such as the branch that the package-side fixing release lands on and the severity of vulnerability on its propagation. In addition to these lags we identify and characterize, this paper lays the groundwork for future research on how to mitigate propagation lags in an ecosystem.
@article{Chinthanet:EMSE2021, author = {Chinthanet, Bodin and Kula, Raula Gaikovina and McIntosh, Shane and Ishio, Takashi and Ihara, Akinori and Matsumoto, Kenichi}, title = {Lags in the Release, Adoption, and Propagation of Npm Vulnerability Fixes}, year = {2021}, issue_date = {May 2021}, volume = {26}, number = {3}, journal = {Empirical Software Engineering (EMSE)}, month = may, numpages = {28} }

2020

Code-Based Vulnerability Detection in Node.Js Applications: How Far Are We?

Bodin Chinthanet, Serena Elisa Ponta, Henrik Plate, and 4 more authors

In IEEE/ACM International Conference on Automated Software Engineering (ASE) Dec 2020

Abs Bib HTML

With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection tool for Node.js applications. Our case study lists the challenges encountered while implementing our Node.js vulnerable code detector.
@inproceedings{Chinthanet:ASE2020, author = {Chinthanet, Bodin and Ponta, Serena Elisa and Plate, Henrik and Sabetta, Antonino and Kula, Raula Gaikovina and Ishio, Takashi and Matsumoto, Kenichi}, title = {Code-Based Vulnerability Detection in Node.Js Applications: How Far Are We?}, month = dec, year = {2020}, booktitle = {IEEE/ACM International Conference on Automated Software Engineering (ASE)}, pages = {1199–1203}, numpages = {5} }

2019

2018

Towards Smoother Library Migrations: A Look at Vulnerable Dependency Migrations at Function Level for npm JavaScript Packages

Rodrigo Elizalde Zapata, Raula Gaikovina Kula, Bodin Chinthanet, and 3 more authors

In IEEE International Conference on Software Maintenance and Evolution (ICSME) Sep 2018

Abs Bib HTML

It has become common practice for software projects to adopt third-party libraries, allowing developers full access to functions that otherwise will take time and effort to create them-selves. Regardless of migration effort involved, developers are encouraged to maintain their library dependencies by updating any outdated dependency, so as to remain safe from potential threats such as vulnerabilities. Through a manual inspection of a total of 60 client projects from three cases of high severity vulnerabilities, we investigate whether or not clients are really safe from these threats. Surprisingly, our early results show evidence that up to 73.3% of outdated clients were actually safe from the threat. This is the first work to confirm that analysis at the library level is indeed an overestimation. This result to pave the path for future studies to empirically investigate and validate this phenomena, and is towards aiding a smoother library migration for client developers.
@inproceedings{Zapata:ICSME2018, author = {Zapata, Rodrigo Elizalde and Kula, Raula Gaikovina and Chinthanet, Bodin and Ishio, Takashi and Matsumoto, Kenichi and Ihara, Akinori}, booktitle = {IEEE International Conference on Software Maintenance and Evolution (ICSME)}, title = {Towards Smoother Library Migrations: A Look at Vulnerable Dependency Migrations at Function Level for npm JavaScript Packages}, month = sep, year = {2018}, pages = {559-563} }

2017

2016

Graph Clustering-Based Emerging Event Detection from Twitter Data Stream

Bundit Manaskasemsak, Bodin Chinthanet, and Arnon Rungsawang

In International Conference on Network, Communication and Computing (ICNCC) Dec 2016

Abs Bib HTML

Event detection from online social media is nowadays important to many fields, such as crisis notification, health epidemic identification, and trending topic extraction. To deal with the problem, in this paper we propose a new methodology to capture emerging events from Twitter data stream. We define a tweet graph representing tweet term vectors as vertices associated by their content similarities. Based on the assumption that an event denotes a set of similar tweets, we therefore employ the Markov clustering algorithm on the tweet graph to group related tweets. Then, the connected of similar events between consecutive time intervals are classified as an event trend line. Finally, the first one of those connected events will be considered as the emerging event. Performance evaluation of the proposed approach has been done on thirty days of extracted Twitter data stream. The results of detected emerging events have been studied and evaluated by fifteen volunteers with 70-80% precision.
@inproceedings{Manaskasemsak:ICNCC2016, author = {Manaskasemsak, Bundit and Chinthanet, Bodin and Rungsawang, Arnon}, title = {Graph Clustering-Based Emerging Event Detection from Twitter Data Stream}, month = dec, year = {2016}, booktitle = {International Conference on Network, Communication and Computing (ICNCC)}, pages = {37–41}, numpages = {5} }
A Review and Comparison of Methods for Determining the Best Analogies in Analogy-Based Software Effort Estimation

Bodin Chinthanet, Passakorn Phannachitta, Yasutaka Kamei, and 4 more authors

In ACM Symposium on Applied Computing (SAC) Apr 2016

Abs Bib HTML

Analogy-based effort estimation (ABE) is a commonly used software development effort estimation method. The processes of ABE are based on a reuse of effort values from similar past projects, where the appropriate numbers of past projects (k values) to be reused is one of the long-standing debates in ABE research studies. To date, many approaches to find this k value have been continually proposed. One important reason for this inconclusive debate is that different studies appear to produce different conclusions of the k value to be appropriate. Therefore, in this study, we revisit 8 common approaches to the k value being most appropriate in general situations. With a more robust and comprehensive evaluation methodology using 5 robust error measures subject to the Wilcoxon rank-sum statistical test, we found that conicting results in the previous studies were not mainly due to the use of different methodologies nor different datasets, but the performance of the different approaches are actually varied widely.
@inproceedings{Chinthanet:SAC2016, author = {Chinthanet, Bodin and Phannachitta, Passakorn and Kamei, Yasutaka and Leelaprute, Pattara and Rungsawang, Arnon and Ubayashi, Naoyasu and Matsumoto, Kenichi}, title = {A Review and Comparison of Methods for Determining the Best Analogies in Analogy-Based Software Effort Estimation}, month = apr, year = {2016}, booktitle = {ACM Symposium on Applied Computing (SAC)}, pages = {1554–1557}, numpages = {4} }