Below are articles that use LLMs in their research workflows. You can use the Search option to find examples from your discipline, or for specific workflow applications you may be considering.

TitleType of ResourceDescription of ResourceLink to ResourceOpen ScienceUse of LLMResearch Discipline(s)
Experimental Evidence on Large Language Models Research Article This paper investigate the formation of inflation expectations using Large Language Models (LLMs) based on different text data. Employing a new experimental design, I integrate generative AI with economic analysis to explore the impact of different information treatments on LLMs' responses. Results from six distinct knowledge sources reveal that the type of information accessible to an LLM significantly affects the variance of its generated expectations. LLMs with access to relevant economic documents exhibit lower variance compared to those with irrelevant information. Furthermore, information treatments, particularly the one related to mortgage rates, influence the updating of LLMs' prior inflation expectations, showing similar findings from human surveys. The findings underscore the importance of providing domain-specific knowledge to LLMs and showcase the potential of AI agents in studying expectation formation and decision-making processes in economics. Preprint Data Analysis Economics
Is that a Guideline? Addressing Learning in Ethics Guidelines Through a PRISMA-ETHICS informed Scoping Review of Guidelines Research Article There have been recent calls for new ethics guidelines regarding the use of artificial intelligence in research. How should we go about developing such ethics guidance documents with respect to emerging contexts such as new technologies, and established domains such as research in education? This paper provides a PRISMA-ETHICS informed scoping review of approaches to ethics guideline development, the structures of ethics guidelines, and their audiences and purposes particularly in the context of education and AI. Preprint Other Other
Named Entity Recognition for peer-review disambiguation in academic publishing Research Article In recent years, there has been a constant increase in the number of scientific peer-reviewed articles published. Each of these articles has to go through a laborious process, from peer review, through author revision rounds, to the final decision made by the editor-in-chief. Lacking time and being under pressure with diverse research tasks, senior scientists need new tools to automate parts of their activities. In this paper, we propose a new approach based on named entity recognition that is able to annotate review comments in order to extract meaningful information about changes requested by reviewers. This research focuses on deep learning models that are achieving state-of-the-art results in many natural language processing tasks. Exploring the performance of BERT-based and XLNet models on the review comments annotation task, a “review-annotation“ model based on SciBERT was trained, able to achieve an F1 score of 0.87. Its usage allows different players in the academic publishing process to better understand the review request. In addition, the correlation of the requested and the actual changes is made possible, allowing the final decision-maker to strengthen the article evaluation. Computer Science, Education
Emergent autonomous scientific research capabilities of large language models Research Article Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent's scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse. Preprint Other Computer Science
Machine Learning as a Tool for Hypothesis Generation Research Article While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a procedure that uses machine learning algorithms—and their capacity to notice patterns people might not—to generate novel hypotheses about human behavior. We illustrate the procedure with a concrete application: judge decisions. We begin with a striking fact: up to half of the predictable variation in who judges jail is explained solely by the pixels in the defendant’s mugshot—that is, the predictions from an algorithm built using just facial images. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by factors implied by existing research (demographics, facial features emphasized by previous psychology studies), nor are they already known (even if just tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science. Preprint Other Economics
Mathematical discoveries from program search with large language models Research Article Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations) which can result in them making plausible but incorrect statements [1,2]. This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best known results in important problems, pushing the boundary of existing LLM-based approaches [3]. Applying FunSearch to a central problem in extremal combinatorics — the cap set problem — we discover new constructions of large cap sets going beyond the best known ones, both in finite dimensional and asymptotic cases. This represents the first discoveries made for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve upon widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications. Open Source Data Generation, Data Analysis, Other Math
Tweetorial Hooks: Generative AI Tools to Motivate Science on Social Media (science communication) Research Article Communicating science and technology is essential for the public to understand and engage in a rapidly changing world. Tweetorials are an emerging phenomenon where experts explain STEM topics on social media in creative and engaging ways. However, STEM experts struggle to write an engaging "hook" in the first tweet that captures the reader's attention. We propose methods to use large language models (LLMs) to help users scaffold their process of writing a relatable hook for complex scientific topics. We demonstrate that LLMs can help writers find everyday experiences that are relatable and interesting to the public, avoid jargon, and spark curiosity. Our evaluation shows that the system reduces cognitive load and helps people write better hooks. Lastly, we discuss the importance of interactivity with LLMs to preserve the correctness, effectiveness, and authenticity of the writing. Preprint Science Communication Computer Science
LLMs for Science: Usage for Code Generation and Data Analysis Research Article Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: the potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialise in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research, and conducted a first study to assess to which degree current tools are helpful. In this paper we report specifically on use cases related to software engineering, such as generating application code and developing scripts for data analytics. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide. Preprint, Open Source, Open Data, Open Code Data Generation, Data Analysis Computer Science
AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction Research Article Large language models (LLMs) that produce human-like responses have begun to revolutionize research practices in the social sciences. This paper shows how we can integrate LLMs and social surveys to accurately predict individual responses to survey questions that were not asked before. We develop a novel methodological framework to personalize LLMs by considering the meaning of survey questions derived from their text, the latent beliefs of individuals inferred from their response patterns, and the temporal contexts across different survey periods through fine-tuning LLMs with survey data. Using the General Social Survey from 1972 to 2021, we show that the fine-tuned model based on Alpaca-7b can predict individual responses to survey questions that are partially missing as well as entirely missing. The remarkable prediction capabilities allow us to fill in missing trends with high confidence and pinpoint when public attitudes changed, such as the rising support for same-sex marriage. We discuss practical constraints, socio-demographic representation, and ethical concerns regarding individual autonomy and privacy when using LLMs for opinion prediction. This study demonstrates that LLMs and surveys can mutually enhance each other's capabilities: LLMs broaden survey potential, while surveys improve the alignment of LLMs. Preprint Data Collection, Data Cleaning/Preparation Computer Science
Human-AI Collaboration in Thematic Analysis using ChatGPT: A User Study and Design Recommendations Research Article Generative artificial intelligence (GenAI) offers promising potential for advancing human-AI collaboration in qualitative research. However, existing works focused on conventional machine-learning and pattern-based AI systems, and little is known about how researchers interact with GenAI in qualitative research. This work delves into researchers' perceptions of their collaboration with GenAI, specifically ChatGPT. Through a user study involving ten qualitative researchers, we found ChatGPT to be a valuable collaborator for thematic analysis, enhancing coding efficiency, aiding initial data exploration, offering granular quantitative insights, and assisting comprehension for non-native speakers and non-experts. Yet, concerns about its trustworthiness and accuracy, reliability and consistency, limited contextual understanding, and broader acceptance within the research community persist. We contribute five actionable design recommendations to foster effective human-AI collaboration. These include incorporating transparent explanatory mechanisms, enhancing interface and integration capabilities, prioritising contextual understanding and customisation, embedding human-AI feedback loops and iterative functionality, and strengthening trust through validation mechanisms. Preprint Data Analysis Computer Science