Articles

Below are articles that use LLMs in their research workflows. You can use the Search option to find examples from your discipline, or for specific workflow applications you may be considering.

Title	Type of Resource	Link to Resource	Date Recorded	Open Science	Use of LLM	Research Discipline(s)	Description of Resource
Hypothesis Generation with Large Language Models	Research Article	Hypotheses	December 20, 2024	Preprint	Research Design	Any Discipline	Effective generation of novel hypotheses is instrumental to scientific progress. So far, researchers have been the main powerhouse behind hypothesis generation by painstaking data analysis and thinking (also known as the Eureka moment). In this paper, we examine the potential of large language models (LLMs) to generate hypotheses. We focus on hypothesis generation based on data (i.e., labeled examples). To enable LLMs to handle arbitrarily long contexts, we generate initial hypotheses from a small number of examples and then update them iteratively to improve the quality of hypotheses. Inspired by multi-armed bandits, we design a reward function to inform the exploitation-exploration tradeoff in the update process. Our algorithm is able to generate hypotheses that enable much better predictive performance than few-shot prompting in classification tasks, improving accuracy by 31.7% on a synthetic dataset and by 13.9%, 3.3% and, 24.9% on three real-world datasets. We also outperform supervised learning by 12.8% and 11.2% on two challenging real-world datasets. Furthermore, we find that the generated hypotheses not only corroborate human-verified theories but also uncover new insights for the tasks.
Scaling up the Evaluation of Collaborative Problem Solving: Promises and Challenges of Coding Chat Data with ChatGPT	Research Article	Coding	November 19, 2024	Preprint	Data Analysis	Computer Science	Collaborative problem solving (CPS) is widely recognized as a critical 21st century skill. Efficiently coding communication data is a big challenge in scaling up research on assessing CPS. This paper reports the findings on using ChatGPT to directly code CPS chat data by benchmarking performance across multiple datasets and coding frameworks. We found that ChatGPT-based coding outperformed human coding in tasks where the discussions were characterized by colloquial languages but fell short in tasks where the discussions dealt with specialized scientific terminology and contexts. The findings offer practical guidelines for researchers to develop strategies for efficient and scalable analysis of communication data from CPS tasks.
AI-Augmented Cultural Sociology: Guidelines for LLM-assisted text analysis and an illustrative example	Research Article, Use Case Example	Sociology	December 3, 2024	Preprint	Data Analysis	Sociology	The advent of large language models (LLMs) presents a promising opportunity for how we analyze text and, by extension, can study the role of culture and symbolic meanings in social life. Using an illustrative example focused on the concept of “personalized service” within Michelin-starred restaurants, this research note demonstrates how LLMs can reliably identify complex, multifaceted concepts similarly to a qualitative data analyst, but in a more scalable manner. We extend existing validation approaches, offering guidelines on the amount of manually coded data needed to evaluate LLM-generated outputs, drawing on sampling theory and a data simulation. We also discuss broader applications of LLMs in cultural sociology, such as investigations on established concepts (e.g., cultural consecration) and emerging concepts (e.g., future-oriented deliberation). This discussion underscores that AI-tools can significantly augment the empirical scope of research projects, building on rather than replacing traditional qualitative approaches. Our study ultimately advocates for an optimistic yet cautious engagement with AI-tools in social scientific inquiry, highlighting both their analytic potential and the need for ongoing reflection on their ethical implications.
Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation	Research Article	Auto Lit	November 28, 2024	Preprint	Research Design	Computer Science	This research presents and compares multiple approaches to automate the generation of literature reviews using several Natural Language Processing (NLP) techniques and retrieval-augmented generation (RAG) with a Large Language Model (LLM). The ever-increasing number of research articles provides a huge challenge for manual literature review. It has resulted in an increased demand for automation. Developing a system capable of automatically generating the literature reviews from only the PDF files as input is the primary objective of this research work. The effectiveness of several Natural Language Processing (NLP) strategies, such as the frequency-based method (spaCy), the transformer model (Simple T5), and retrieval-augmented generation (RAG) with Large Language Model (GPT-3.5-turbo), is evaluated to meet the primary objective. The SciTLDR dataset is chosen for this research experiment and three distinct techniques are utilized to implement three different systems for auto-generating the literature reviews. The ROUGE scores are used for the evaluation of all three systems. Based on the evaluation, the Large Language Model GPT-3.5-turbo achieved the highest ROUGE-1 score, 0.364. The transformer model comes in second place and spaCy is at the last position. Finally, a graphical user interface is created for the best system based on the large language model.
A Computational Method for Measuring "Open Codes" in Qualitative Analysis	Research Article	Open Code	November 27, 2024	Preprint	Data Analysis	Computer Science, Any Discipline	Qualitative analysis is critical to understanding human datasets in many social science disciplines. Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets. Yet, meeting methodological expectations (such as "as exhaustive as possible") can be challenging. While many machine learning (ML)/generative AI (GAI) studies have attempted to support open coding, few have systematically measured or evaluated GAI outcomes, increasing potential bias risks. Building on Grounded Theory and Thematic Analysis theories, we present a computational method to measure and identify potential biases from "open codes" systematically. Instead of operationalizing human expert results as the "ground truth," our method is built upon a team-based approach between human and machine coders. We experiment with two HCI datasets to establish this method's reliability by 1) comparing it with human analysis, and 2) analyzing its output stability. We present evidence-based suggestions and example workflows for ML/GAI to support open coding.
Why and how to embrace AI such as ChatGPT in your academic life	Research Article, Documentation, Discussion Article, Use Case Example, Tutorial w/o Code, Application/Tool	Why and how to use AI in science	November 10, 2024	Preprint	Research Design, Data Collection, Data Cleaning/Preparation, Data Generation, Dataset Joining, Data Analysis, Describing Results, Web Scraping, Science Communication, Other	Any Discipline	Generative artificial intelligence (AI), including large language models (LLMs), is poised to transform scientific research, enabling researchers to elevate their research productivity. This article presents a how-to guide for employing LLMs in academic settings, focusing on their unique strengths, constraints and implications through the lens of philosophy of science and epistemology. Using ChatGPT as a case study, I identify and elaborate on three attributes contributing to its effectiveness—intelligence, versatility and collaboration—accompanied by tips on crafting effective prompts, practical use cases and a living resource online (https://osf.io/8vpwu/). Next, I evaluate the limitations of generative AI and its implications for ethical use, equality and education. Regarding ethical and responsible use, I argue from technical and epistemic standpoints that there is no need to restrict the scope or nature of AI assistance, provided that its use is transparently disclosed. A pressing challenge, however, lies in detecting fake research, which can be mitigated by embracing open science practices, such as transparent peer review and sharing data, code and materials. Addressing equality, I contend that while generative AI may promote equality for some, it may simultaneously exacerbate disparities for others—an issue with potentially significant yet unclear ramifications as it unfolds. Lastly, I consider the implications for education, advocating for active engagement with LLMs and cultivating students' critical thinking and analytical skills. The how-to guide seeks to empower researchers with the knowledge and resources necessary to effectively harness generative AI while navigating the complex ethical dilemmas intrinsic to its application.
AUTOGEN: A Personalized Large Language Model for AcademicEnhancement—Ethics and Proof of Principle	Research Article	Open Access manuscript	October 8, 2023	Open Source	Data Generation	Other	Fine-tuning on authors' previously published papers. In this article, we explore the potential of enhancing academic prose and idea generation by fine-tuning a large language model (here, GPT-3) on one’s own previously published writings: AUTOGEN (“AI Unique Tailored Output GENerator”). We develop, test, and describe three distinct AUTOGEN models trained on the prior scholarly output of three of the current authors (SBM, BDE, JS), with a fourth model trained on the combined works of all three. Our AUTOGEN models demonstrate greater variance in quality than the base GPT-3 model, with many outputs outperforming the base model in format, style, overall quality, and novel idea generation. As proof of principle, we present and discuss examples of AUTOGEN-written sections of existing and hypothetical research papers. We further discuss ethical opportunities, concerns, and open questions associated with personalized academic prose and idea generators. Ethical opportunities for personalized LLMs such as AUTOGEN include increased productivity, preservation of writing styles and cultural traditions, and aiding consensus building. However, ethical concerns arise due to the potential for personalized LLMs to reduce output diversity, violate privacy and intellectual property rights, and facilitate plagiarism or fraud. The use of coauthored or multiple-source trained models further complicates issues surrounding ownership and attribution. Open questions concern a potential credit-blame asymmetry for LLM outputs, the legitimacy of licensing agreements in authorship ascription, and the ethical implications of coauthorship attribution for data contributors. Ensuring the output is sufficiently distinct from the source material is crucial to maintaining ethical standards in academic writing. These opportunities, risks, and open issues highlight the intricate ethical landscape surrounding the use of personalized LLMs in academia. We also discuss open technical questions concerning the integration of AUTOGEN-style personalized LLMs with other LLMs, such as GPT-4, for iterative refinement and improvement of generated text. In conclusion, we argue that AUTOGEN-style personalized LLMs offer significant potential benefits in terms of both prose generation and, to a lesser extent, idea generation. If associated ethical issues are appropriately addressed, AUTOGEN alone or in combination with other LLMs can be seen as a potent form of academic enhancement.
Named Entity Recognition for peer-review disambiguation in academic publishing	Research Article	Named Entity Recognition	February 13, 2024			Computer Science, Education	In recent years, there has been a constant increase in the number of scientific peer-reviewed articles published. Each of these articles has to go through a laborious process, from peer review, through author revision rounds, to the final decision made by the editor-in-chief. Lacking time and being under pressure with diverse research tasks, senior scientists need new tools to automate parts of their activities. In this paper, we propose a new approach based on named entity recognition that is able to annotate review comments in order to extract meaningful information about changes requested by reviewers. This research focuses on deep learning models that are achieving state-of-the-art results in many natural language processing tasks. Exploring the performance of BERT-based and XLNet models on the review comments annotation task, a “review-annotation“ model based on SciBERT was trained, able to achieve an F1 score of 0.87. Its usage allows different players in the academic publishing process to better understand the review request. In addition, the correlation of the requested and the actual changes is made possible, allowing the final decision-maker to strengthen the article evaluation.
Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions	Research Article, Use Case Example, Application/Tool	Quality	November 13, 2024		Research Design, Science Communication, Other	Medicine, Public Health	This publications outlines a comprehensive evaluation approach to determine an LLM’s quality of responses to radiation oncology patient care questions using both domain-specific expertise and domain-agnostic metrics. Domain-specific expertise involves evaluating LLM-generated responses through human expert assessments, augmented with Likert scales, while domain-agnostic evaluation utilizes computational quantitative techniques to assess the LLM-generated responses.
Conducting Qualitative Interviews with AI	Research Article	Interviews	November 24, 2024	Preprint	Data Collection	Economics	We introduce a new approach to conducting qualitative interviews by delegating the task of interviewing human subjects to an AI interviewer. Our AI interviewer conducts 381 interviews with human subjects about their reasons for not participating in the stock market. The AIconducted interviews uncover rich evidence on the underlying factors influencing nonparticipation in the stock market. Among our main qualitative findings is a prominent role for an active investing mental model. A separate large-scale survey shows that this mental model differs systematically between stock owners and non-owners. We also document systematic differences between factors identified in initial top-of-mind responses and those uncovered in subsequent responses, with mental models consistently emerging later in the interviews. Finally, a follow-up study shows that the interview data predicts economic behavior eight months after being collected, mitigating concerns about “cheap talk” in interviews. Our results demonstrate that AI-conducted interviews can generate rich, high-quality data at a fraction of the cost of human-led interviews.

Highlighted Resources

Categories