Below are articles that use LLMs in their research workflows. You can use the Search option to find examples from your discipline, or for specific workflow applications you may be considering.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
Conducting Qualitative Interviews with AI | Research Article | Interviews | November 24, 2024 | Preprint | Data Collection | Economics | We introduce a new approach to conducting qualitative interviews by delegating the task of interviewing human subjects to an AI interviewer. Our AI interviewer conducts 381 interviews with human subjects about their reasons for not participating in the stock market. The AIconducted interviews uncover rich evidence on the underlying factors influencing nonparticipation in the stock market. Among our main qualitative findings is a prominent role for an active investing mental model. A separate large-scale survey shows that this mental model differs systematically between stock owners and non-owners. We also document systematic differences between factors identified in initial top-of-mind responses and those uncovered in subsequent responses, with mental models consistently emerging later in the interviews. Finally, a follow-up study shows that the interview data predicts economic behavior eight months after being collected, mitigating concerns about “cheap talk” in interviews. Our results demonstrate that AI-conducted interviews can generate rich, high-quality data at a fraction of the cost of human-led interviews. |
Large Language Models in Qualitative Research: Can We Do the Data Justice? | Research Article | Qualitative | November 24, 2024 | Preprint | Data Collection, Data Analysis, Other | Computer Science | Qualitative researchers use tools to collect, sort, and analyze their data. Should qualitative researchers use large language models (LLMs) as part of their practice? LLMs could augment qualitative research, but it is unclear if their use is appropriate, ethical, or aligned with qualitative researchers' goals and values. We interviewed twenty qualitative researchers to investigate these tensions. Many participants see LLMs as promising interlocutors with attractive use cases across the stages of research, but wrestle with their performance and appropriateness. Participants surface concerns regarding the use of LLMs while protecting participant interests, and call attention to an urgent lack of norms and tooling to guide the ethical use of LLMs in research. Given the importance of qualitative methods to human-computer interaction, we use the tensions surfaced by our participants to outline guidelines for researchers considering using LLMs in qualitative research and design principles for LLM-assisted qualitative data analysis tools. |
Generative Agent Simulations of 1,000 People | Research Article | GenAI Agents | November 22, 2024 | Preprint | Computer Science, Other | The promise of human behavioral simulation--general-purpose computational agents that replicate human behavior across domains--could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals--applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions. This work provides a foundation for new tools that can help investigate individual and collective behavior. | |
LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research | Research Article | LLM-Measure | November 9, 2024 | Preprint | Data Analysis | Computer Science, Other | The increasing use of text as data in social science research necessitates the development of valid, consistent, reproducible, and efficient methods for generating text-based concept measures. This paper presents a novel method that leverages the internal hidden states of large language models (LLMs) to generate these concept measures. Specifically, the proposed method learns a concept vector that captures how the LLM internally represents the target concept, then estimates the concept value for text data by projecting the text’s LLM hidden states onto the concept vector. Three replication studies demonstrate the method’s effectiveness in producing highly valid, consistent, and reproducible text-based measures across various social science research contexts, highlighting its potential as a valuable tool for the research community. |
Artificial Intelligence, Scientific Discovery, and Product Innovation | Research Article | Innovation | November 7, 2024 | Open Source | Other | Economics | This paper studies the impact of artificial intelligence on innovation, exploiting the randomized introduction of a new materials discovery technology to 1,018 scientists in the R&D lab of a large U.S. firm. AI-assisted researchers discover 44% more materials, resulting in a 39% increase in patent filings and a 17% rise in downstream product innovation. These compounds possess more novel chemical structures and lead to more radical inventions. However, the technology has strikingly disparate effects across the productivity distribution: while the bottom third of scientists see little benefit, the output of top researchers nearly doubles. Investigating the mechanisms behind these results, I show that AI automates 57% of “idea-generation” tasks, reallocating researchers to the new task of evaluating model-produced candidate materials. Top scientists leverage their domain knowledge to prioritize promising AI suggestions, while others waste significant resources testing false positives. Together, these findings demonstrate the potential of AI-augmented research and highlight the complementarity between algorithms and expertise in the innovative process. Survey evidence reveals that these gains come at a cost, however, as 82% of scientists report reduced satisfaction with their work due to decreased creativity and skill underutilization. |
Using machine learning to automate the collection, transcription, and analysis of verbal-report data | Research Article | Verbal Reports | November 3, 2024 | Preprint | Data Collection, Data Cleaning/Preparation, Data Analysis | Psychology | What people think and say during experiments is important for our understanding of the human mind. However, the collection and analysis of verbal-report data in experiments is relatively costly, and so is grossly underutilized. Here, we aim to reduce such costs by providing software that will collect, transcribe, and analyse verbal-report data. Verbal data is collected using jsPsych (De Leeuw, 2015), making it suitable for online and lab-based experiments. The transcription and analyses rely on machine-learning methods (e.g., large-language models), making them substantially more efficient than current methods using human coders. We demonstrate how to use the software we provide in a case study via a simple memory experiment. This collection of software was made to be modular, so that the various components can be updated and replaced with superior models and new methods easily added. It is our sincere hope that this approach popularizes the collection of verbal-report data in psychology experiments. |
Centaur: a foundation model of human cognition | Research Article | Centaur | October 29, 2024 | Preprint | Data Generation | Psychology | Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model’s internal representations become more aligned with human neural activity after finetuning. Taken together, Centaur is the first real candidate for a unified model of human cognition. We anticipate that it will have a disruptive impact on the cognitive sciences, challenging the existing paradigm for developing computational models. |
Generative Artificial Intelligence and Evaluating Strategic Decisions | Research Article | Strategic Decisions | October 28, 2024 | Preprint | Data Generation | Business | Strategic decisions are uncertain and often irreversible. Hence, predicting the value of alternatives is important for strategic decision making. We investigate the use of generative artificial intelligence (AI) in evaluating strategic alternatives using business models generated by AI (study 1) or submitted to a competition (study 2). Each study uses a sample of 60 business models and examines agreement in business model rankings made by large language models (LLMs) and those by human experts. We consider multiple LLMs, assumed LLM roles, and prompts. We find that generative AI often produces evaluations that are inconsistent and biased. However, when aggregating evaluations, AI rankings tend to resemble those of human experts. This study highlights the value of generative AI in strategic decision making by providing predictions. |
Prompting Diverse Ideas: Increasing AI Idea Variance | Research Article | Idea Variance | October 28, 2024 | Preprint | Data Generation | Business | Unlike routine tasks where consistency is prized, in creativity and innovation the goal is to create a diverse set of ideas. This paper delves into the burgeoning interest in employing Artificial Intelligence (AI) to enhance the productivity and quality of the idea generation process. While previous studies have found that the average quality of AI ideas is quite high, prior research also has pointed to the inability of AI-based brainstorming to create sufficient dispersion of ideas, which limits novelty and the quality of the overall best idea. Our research investigates methods to increase the dispersion in AI-generated ideas. Using GPT-4, we explore the effect of different prompting methods on Cosine Similarity, the number of unique ideas, and the speed with which the idea space gets exhausted. We do this in the domain of developing a new product development for college students, priced under $50. In this context, we find that (1) pools of ideas generated by GPT-4 with various plausible prompts are less diverse than ideas generated by groups of human subjects (2) the diversity of AI generated ideas can be substantially improved using prompt engineering (3) Chain-of-Thought (CoT) prompting leads to the highest diversity of ideas of all prompts we evaluated and was able to come close to what is achieved by groups of human subjects. It also was capable of generating the highest number of unique ideas of any prompt we studied. |
Financial Statement Analysis with Large Language Models | Research Article | Finance analysis | October 28, 2024 | Preprint | Data Analysis | Economics | We investigate whether an LLM can successfully perform financial statement analysis in a way similar to a professional human analyst. We provide standardized and anonymous financial statements to GPT4 and instruct the model to analyze them to determine the direction of future earnings. Even without any narrative or industry-specific information, the LLM outperforms financial analysts in its ability to predict earnings changes. The LLM exhibits a relative advantage over human analysts in situations when the analysts tend to struggle. Furthermore, we find that the prediction accuracy of the LLM is on par with the performance of a narrowly trained state-of-the-art ML model. LLM prediction does not stem from its training memory. Instead, we find that the LLM generates useful narrative insights about a company's future performance. Lastly, our trading strategies based on GPT's predictions yield a higher Sharpe ratio and alphas than strategies based on other models. Taken together, our results suggest that LLMs may take a central role in decision-making. |