Articles that discuss the use of LLMs in science.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
A Primer for Evaluating Large Language Models in Social Science Research | Discussion Article | Primer | February 12, 2025 | Preprint | Other | Psychology | Autoregressive Large Language Models (LLMs) exhibit remarkable conversational and reasoning abilities, and exceptional flexibility across a wide range of tasks. Subsequently, LLMs are being increasingly used in scientific research, to analyze data, generate synthetic data, or even to write scientific papers. This trend necessitates that authors follow best practices for conducting and reporting LLM research and that journal reviewers are able to evaluate the quality of works that use LLMs. We provide authors of social scientific research with essential recommendations to ensure replicable and robust results using LLMs. Our recommendations also highlight considerations for reviewers, focusing on methodological rigor, replicability, and validity of results when evaluating studies that use LLMs to automate data processing or simulate human data. We offer practical advice on assessing the appropriateness of LLM applications in submitted studies, emphasizing the need for transparency in methodological reporting and the challenges posed by the non-deterministic and continuously evolving nature of these models. By providing a framework for best practices and critical review, this primer aims to ensure high-quality, innovative research within the evolving landscape of social science studies using LLMs. |
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation | Research Article, Discussion Article | Transforming | February 10, 2025 | Preprint | Research Design, Science Communication, Other | Computer Science, Any Discipline | With the advent of large multimodal language models, science is now at a threshold of an AI-based technological transformation. Recently, a plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. This includes all aspects of the research cycle, especially (1) searching for relevant literature; (2) generating research ideas and conducting experimentation; generating (3) text-based and (4) multimodal content (e.g., scientific figures and diagrams); and (5) AI-based automatic peer review. In this survey, we provide an in-depth overview over these exciting recent developments, which promise to fundamentally alter the scientific research process for good. Our survey covers the five aspects outlined above, indicating relevant datasets, methods and results (including evaluation) as well as limitations and scope for future research. Ethical concerns regarding shortcomings of these tools and potential for misuse (fake science, plagiarism, harms to research integrity) take a particularly prominent place in our discussion. We hope that our survey will not only become a reference guide for newcomers to the field but also a catalyst for new AI-based initiatives in the area of "AI4Science". |
What Limits LLM-based Human Simulation: LLMs or Our Design? | Discussion Article | Simulations | February 9, 2025 | Preprint | Data Generation | Computer Science | We argue that advancing LLM-based human simulation requires addressing both LLM's inherent limitations and simulation framework design challenges. Recent studies have revealed significant gaps between LLM-based human simulations and real-world observations, highlighting these dual challenges. To address these gaps, we present a comprehensive analysis of LLM limitations and our design issues, proposing targeted solutions for both aspects. Furthermore, we explore future directions that address both challenges simultaneously, particularly in data collection, LLM generation, and evaluation. To support further research in this field, we provide a curated collection of LLM-based human simulation resource |
Artificial intelligence and illusions of understanding in scientific research | Discussion Article | Illusions | January 16, 2025 | Open Source | Research Design, Other | Any Discipline | Scientists are enthusiastically imagining ways in which artificial intelligence (AI) tools might improve research. Why are AI tools so attractive and what are the risks of implementing them across the research pipeline? Here we develop a taxonomy of scientists’ visions for AI, observing that their appeal comes from promises to improve productivity and objectivity by overcoming human shortcomings. But proposed AI solutions can also exploit our cognitive limitations, making us vulnerable to illusions of understanding in which we believe we understand more about the world than we actually do. Such illusions obscure the scientific community’s ability to see the formation of scientific monocultures, in which some types of methods, questions and viewpoints come to dominate alternative approaches, making science less innovative and more vulnerable to errors. The proliferation of AI tools in science risks introducing a phase of scientific enquiry in which we produce more but understand less. By analysing the appeal of these tools, we provide a framework for advancing discussions of responsible knowledge production in the age of AI. |
Engineering of Inquiry: The “Transformation” of Social Science through Generative AI | Discussion Article | Social | January 10, 2025 | Preprint | Research Design, Other | Any Discipline | We increasingly read that generative AI will “transform” the social sciences, but little to no work has conceptualized the conditions necessary to fulfill such a promise. We review recent research on generative AI and evaluate its potential to reshape research practices. As the technology advances, generative AI could support various research tasks, including idea generation, data collection, and analysis. However, we discuss three challenges to an optimistic outlook that focuses solely on accelerating research through practical tools and reducing costs through inexpensive “synthetic” data. First, generative AI raises severe concerns about the validity of conclusions drawn from synthetic data about human populations. Second, possible efficiency gains in the research process may be partially offset by new problems introduced by the technology. Third, applications of generative AI have so far focused on enhancing existing methods, with limited efforts to harness the technology’s unique potential to simulate human behavior in social environments. Sociologists could use sociological theories and methods to develop “generative agents.” A new “trading zone” could emerge where social scientists, statisticians, and computer scientists develop new methodologies to facilitate innovative lines of inquiry and produce scientifically valid conclusions. |
Contribution and Challenges of ChatGPT and Similar Generative Artificial Intelligence in Biochemistry, Genetics and Molecular Biology | Discussion Article | Contributions | January 9, 2025 | Preprint | Other | Biology | The incorporation of ChatGPT, an advanced natural language processing model, into the realms of biochemistry, genetics, and molecular biology has revolutionized research and communication within these fields. This study explores the impacts and obstacles associated with ChatGPT in these domains. ChatGPT has made substantial contributions to the accessibility and dissemination of knowledge in biochemistry, genetics, and molecular biology. It simplifies complex scientific literature, offers concise explanations, answers queries, and generates summaries, benefiting researchers, students, and practitioners. Furthermore, it fosters global collaboration by enabling discussions and knowledge sharing among scientists. One of the primary advantages of ChatGPT is its assistance in decoding intricate genomic and proteomic data. It aids in genetic sequence analysis, identifies potential disease markers, and provides suggestions for experimental designs. Additionally, ChatGPT can assist in composing and reviewing research papers, elevating the quality of scientific publications in these fields. However, despite its merits, ChatGPT encounters challenges in the context of biochemistry, genetics, and molecular biology. It may struggle to grasp highly specialized or novel research topics, potentially leading to the dissemination of inaccurate information if not used judiciously. Privacy concerns arise when discussing sensitive genetic or medical data. ChatGPT brings valuable advantages to the domains of biochemistry, genetics, and molecular biology by simplifying information access, promoting collaborative research, and aiding in data interpretation. Nevertheless, users must remain vigilant about potential inaccuracies and privacy issues. Addressing these challenges as technology evolves will be crucial to fully unlock ChatGPT's potential in advancing research and education within these critical scientific disciplines. |
LLMs and the Risk of Sloppy Science: Navigating the Future of Scientific Inquiry in the Age of Artificial Intelligence | Discussion Article | Sloppy | December 18, 2024 | Preprint | Other | The emergence of Large Language Models (LLMs) such as GPT-3 represents a paradigm shift in scientific research, offering unparalleled capabilities in data analysis, hypothesis generation, and literature synthesis. However, their integration into research processes raises fundamental philosophical, methodological and ethical concerns. This article critically examines the benefits and risks associated with LLMs through the lenses of philosophy of science, cognitive science, and second-order cybernetics in order to augment a public debate that has until now seemed excessively focused on practical implementation risks rather than categorical errors. We explore the tension between augmenting human research capabilities and the threat of "sloppy science," where the ease of generating scientific content may compromise the quality and reliability of research outputs. | |
Challenges in Guardrailing Large Language Models for Science | Discussion Article | Guardrails | December 18, 2024 | Preprint | Research Design, Other | Any Discipline | The rapid development in large language models (LLMs) has transformed the landscape of natural language processing and understanding (NLP/NLU), offering significant benefits across various domains. However, when applied to scientific research, these powerful models exhibit critical failure modes related to scientific integrity and trustworthiness. Existing general-purpose LLM guardrails are insufficient to address these unique challenges in the scientific domain. We propose a comprehensive taxonomic framework for LLM guardrails encompassing four key dimensions: trustworthiness, ethics & bias, safety, and legal compliance. Our framework includes structured implementation guidelines for scientific research applications, incorporating white-box, blackbox, and gray-box methodologies. This approach specifically addresses critical challenges in scientific LLM deployment, including temporal sensitivity, knowledge contextualization, conflict resolution, and intellectual property protection. |
Using AI in Grounded Theory research – a proposed framework for a ChatGPT-based research assistant | Discussion Article | Grounded Theory | December 18, 2024 | Preprint | Data Analysis | Sociology | The purpose of this paper is to explore the potential application of ChatGPT in relation to grounded theory. Our focus is building a case as to its usefulness to support the research process as an assistant to the researcher, rather than to replace the intellectual rigour needed to conduct credible grounded theory research. To aid this, we present a framework for using ChatGPT to assist researchers’ decision making and analysis. By structuring the analytical process into clear research phases - from initial coding through to visualisation and expansion - and providing specific prompts and instructions for each phase, the framework enables researchers to systematically harness AI capabilities whilst maintaining the methodological rigour and accountability of the researcher in leading this process. We argue that the framework's strength lies in its careful alignment with established grounded theory processes, particularly in its emphasis on constant comparison throughout all analytical phases. As many grounded theory methods are employed in other qualitative research designs, we argue that the proposed framework may have potential for use in a broad range of designs, however, we also suggest that this is the start of new conversations about how researchers can harness AI to assist their decision making and intellectual work, processes which can never be fully replaced. |
Beyond principlism: Practical strategies for ethical AI use in research practices | Discussion Article, Reporting Guidelines | Practical strategies | November 10, 2024 | Preprint | Describing Results, Science Communication | Any Discipline | The rapid adoption of generative artificial intelligence (AI) in scientific research, particularly large language models (LLMs), has outpaced the development of ethical guidelines, leading to a “Triple-Too” problem: too many high-level ethical initiatives, too abstract principles lacking contextual and practical relevance, and too much focus on restrictions and risks over benefits and utilities. Existing approaches—principlism (reliance on abstract ethical principles), formalism (rigid application of rules), and technological solutionism (overemphasis on technological fixes)—offer little practical guidance for addressing ethical challenges of AI in scientific research practices. To bridge the gap between abstract principles and day-to-day research practices, a user-centered, realism-inspired approach is proposed here. It outlines five specific goals for ethical AI use: (1) understanding model training and output, including bias mitigation strategies; (2) respecting privacy, confidentiality, and copyright; (3) avoiding plagiarism and policy violations; (4) applying AI beneficially compared to alternatives; and (5) using AI transparently and reproducibly. Each goal is accompanied by actionable strategies and realistic cases of misuse and corrective measures. I argue that ethical AI application requires evaluating its utility against existing alternatives rather than isolated performance metrics. Additionally, I propose documentation guidelines to enhance transparency and reproducibility in AI-assisted research. Moving forward, we need targeted professional development, training programs, and balanced enforcement mechanisms to promote responsible AI use while fostering innovation. By refining these ethical guidelines and adapting them to emerging AI capabilities, we can accelerate scientific progress without compromising research integrity. |