Below are articles that use LLMs in their research workflows. You can use the Search option to find examples from your discipline, or for specific workflow applications you may be considering.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
Named Entity Recognition for peer-review disambiguation in academic publishing | Research Article | Named Entity Recognition | February 13, 2024 | Computer Science, Education | In recent years, there has been a constant increase in the number of scientific peer-reviewed articles published. Each of these articles has to go through a laborious process, from peer review, through author revision rounds, to the final decision made by the editor-in-chief. Lacking time and being under pressure with diverse research tasks, senior scientists need new tools to automate parts of their activities. In this paper, we propose a new approach based on named entity recognition that is able to annotate review comments in order to extract meaningful information about changes requested by reviewers. This research focuses on deep learning models that are achieving state-of-the-art results in many natural language processing tasks. Exploring the performance of BERT-based and XLNet models on the review comments annotation task, a “review-annotation“ model based on SciBERT was trained, able to achieve an F1 score of 0.87. Its usage allows different players in the academic publishing process to better understand the review request. In addition, the correlation of the requested and the actual changes is made possible, allowing the final decision-maker to strengthen the article evaluation. | ||
Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions | Research Article, Application/Tool, Use Case Example | Quality | November 13, 2024 | Research Design, Science Communication, Other | Medicine, Public Health | This publications outlines a comprehensive evaluation approach to determine an LLM’s quality of responses to radiation oncology patient care questions using both domain-specific expertise and domain-agnostic metrics. Domain-specific expertise involves evaluating LLM-generated responses through human expert assessments, augmented with Likert scales, while domain-agnostic evaluation utilizes computational quantitative techniques to assess the LLM-generated responses. | |
Conducting Qualitative Interviews with AI | Research Article | Interviews | November 24, 2024 | Preprint | Data Collection | Economics | We introduce a new approach to conducting qualitative interviews by delegating the task of interviewing human subjects to an AI interviewer. Our AI interviewer conducts 381 interviews with human subjects about their reasons for not participating in the stock market. The AIconducted interviews uncover rich evidence on the underlying factors influencing nonparticipation in the stock market. Among our main qualitative findings is a prominent role for an active investing mental model. A separate large-scale survey shows that this mental model differs systematically between stock owners and non-owners. We also document systematic differences between factors identified in initial top-of-mind responses and those uncovered in subsequent responses, with mental models consistently emerging later in the interviews. Finally, a follow-up study shows that the interview data predicts economic behavior eight months after being collected, mitigating concerns about “cheap talk” in interviews. Our results demonstrate that AI-conducted interviews can generate rich, high-quality data at a fraction of the cost of human-led interviews. |
Large Language Models in Qualitative Research: Can We Do the Data Justice? | Research Article | Qualitative | November 24, 2024 | Preprint | Data Collection, Data Analysis, Other | Computer Science | Qualitative researchers use tools to collect, sort, and analyze their data. Should qualitative researchers use large language models (LLMs) as part of their practice? LLMs could augment qualitative research, but it is unclear if their use is appropriate, ethical, or aligned with qualitative researchers' goals and values. We interviewed twenty qualitative researchers to investigate these tensions. Many participants see LLMs as promising interlocutors with attractive use cases across the stages of research, but wrestle with their performance and appropriateness. Participants surface concerns regarding the use of LLMs while protecting participant interests, and call attention to an urgent lack of norms and tooling to guide the ethical use of LLMs in research. Given the importance of qualitative methods to human-computer interaction, we use the tensions surfaced by our participants to outline guidelines for researchers considering using LLMs in qualitative research and design principles for LLM-assisted qualitative data analysis tools. |
Generative Agent Simulations of 1,000 People | Research Article | GenAI Agents | November 22, 2024 | Preprint | Computer Science, Other | The promise of human behavioral simulation--general-purpose computational agents that replicate human behavior across domains--could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals--applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions. This work provides a foundation for new tools that can help investigate individual and collective behavior. | |
LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research | Research Article | LLM-Measure | November 9, 2024 | Preprint | Data Analysis | Computer Science, Other | The increasing use of text as data in social science research necessitates the development of valid, consistent, reproducible, and efficient methods for generating text-based concept measures. This paper presents a novel method that leverages the internal hidden states of large language models (LLMs) to generate these concept measures. Specifically, the proposed method learns a concept vector that captures how the LLM internally represents the target concept, then estimates the concept value for text data by projecting the text’s LLM hidden states onto the concept vector. Three replication studies demonstrate the method’s effectiveness in producing highly valid, consistent, and reproducible text-based measures across various social science research contexts, highlighting its potential as a valuable tool for the research community. |
Artificial Intelligence, Scientific Discovery, and Product Innovation | Research Article | Innovation | November 7, 2024 | Open Source | Other | Economics | This paper studies the impact of artificial intelligence on innovation, exploiting the randomized introduction of a new materials discovery technology to 1,018 scientists in the R&D lab of a large U.S. firm. AI-assisted researchers discover 44% more materials, resulting in a 39% increase in patent filings and a 17% rise in downstream product innovation. These compounds possess more novel chemical structures and lead to more radical inventions. However, the technology has strikingly disparate effects across the productivity distribution: while the bottom third of scientists see little benefit, the output of top researchers nearly doubles. Investigating the mechanisms behind these results, I show that AI automates 57% of “idea-generation” tasks, reallocating researchers to the new task of evaluating model-produced candidate materials. Top scientists leverage their domain knowledge to prioritize promising AI suggestions, while others waste significant resources testing false positives. Together, these findings demonstrate the potential of AI-augmented research and highlight the complementarity between algorithms and expertise in the innovative process. Survey evidence reveals that these gains come at a cost, however, as 82% of scientists report reduced satisfaction with their work due to decreased creativity and skill underutilization. |
Using machine learning to automate the collection, transcription, and analysis of verbal-report data | Research Article | Verbal Reports | November 3, 2024 | Preprint | Data Collection, Data Cleaning/Preparation, Data Analysis | Psychology | What people think and say during experiments is important for our understanding of the human mind. However, the collection and analysis of verbal-report data in experiments is relatively costly, and so is grossly underutilized. Here, we aim to reduce such costs by providing software that will collect, transcribe, and analyse verbal-report data. Verbal data is collected using jsPsych (De Leeuw, 2015), making it suitable for online and lab-based experiments. The transcription and analyses rely on machine-learning methods (e.g., large-language models), making them substantially more efficient than current methods using human coders. We demonstrate how to use the software we provide in a case study via a simple memory experiment. This collection of software was made to be modular, so that the various components can be updated and replaced with superior models and new methods easily added. It is our sincere hope that this approach popularizes the collection of verbal-report data in psychology experiments. |
Centaur: a foundation model of human cognition | Research Article | Centaur | October 29, 2024 | Preprint | Data Generation | Psychology | Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model’s internal representations become more aligned with human neural activity after finetuning. Taken together, Centaur is the first real candidate for a unified model of human cognition. We anticipate that it will have a disruptive impact on the cognitive sciences, challenging the existing paradigm for developing computational models. |
Generative Artificial Intelligence and Evaluating Strategic Decisions | Research Article | Strategic Decisions | October 28, 2024 | Preprint | Data Generation | Business | Strategic decisions are uncertain and often irreversible. Hence, predicting the value of alternatives is important for strategic decision making. We investigate the use of generative artificial intelligence (AI) in evaluating strategic alternatives using business models generated by AI (study 1) or submitted to a competition (study 2). Each study uses a sample of 60 business models and examines agreement in business model rankings made by large language models (LLMs) and those by human experts. We consider multiple LLMs, assumed LLM roles, and prompts. We find that generative AI often produces evaluations that are inconsistent and biased. However, when aggregating evaluations, AI rankings tend to resemble those of human experts. This study highlights the value of generative AI in strategic decision making by providing predictions. |