Discussion Articles

Articles that discuss the use of LLMs in science.

Title	Type of Resource	Link to Resource	Date Recorded	Open Science	Use of LLM	Research Discipline(s)	Description of Resource
Towards an AI policy framework in scholarly publishing	Documentation, Discussion Article, Application/Tool	AI publishing policy	November 10, 2024	Preprint	Science Communication, Other	Any Discipline	The rapid adoption of artificial intelligence (AI) tools in academic research raises pressing ethical concerns. I examine major publishing policies in science and medicine, uncovering inconsistencies and limitations in guiding AI usage. To encourage responsible AI integration while upholding transparency, I propose an enabling framework with author and reviewer policy templates.
Why and how to embrace AI such as ChatGPT in your academic life	Research Article, Documentation, Discussion Article, Use Case Example, Tutorial w/o Code, Application/Tool	Why and how to use AI in science	November 10, 2024	Preprint	Research Design, Data Collection, Data Cleaning/Preparation, Data Generation, Dataset Joining, Data Analysis, Describing Results, Web Scraping, Science Communication, Other	Any Discipline	Generative artificial intelligence (AI), including large language models (LLMs), is poised to transform scientific research, enabling researchers to elevate their research productivity. This article presents a how-to guide for employing LLMs in academic settings, focusing on their unique strengths, constraints and implications through the lens of philosophy of science and epistemology. Using ChatGPT as a case study, I identify and elaborate on three attributes contributing to its effectiveness—intelligence, versatility and collaboration—accompanied by tips on crafting effective prompts, practical use cases and a living resource online (https://osf.io/8vpwu/). Next, I evaluate the limitations of generative AI and its implications for ethical use, equality and education. Regarding ethical and responsible use, I argue from technical and epistemic standpoints that there is no need to restrict the scope or nature of AI assistance, provided that its use is transparently disclosed. A pressing challenge, however, lies in detecting fake research, which can be mitigated by embracing open science practices, such as transparent peer review and sharing data, code and materials. Addressing equality, I contend that while generative AI may promote equality for some, it may simultaneously exacerbate disparities for others—an issue with potentially significant yet unclear ramifications as it unfolds. Lastly, I consider the implications for education, advocating for active engagement with LLMs and cultivating students' critical thinking and analytical skills. The how-to guide seeks to empower researchers with the knowledge and resources necessary to effectively harness generative AI while navigating the complex ethical dilemmas intrinsic to its application.
Generative AI for Economic Research: LLMs Learn to Collaborate and Reason	Discussion Article, Use Case Example	Econ Research	November 26, 2024	Open Source	Other	Economics	Large language models (LLMs) have seen remarkable progress in speed, cost efficiency, accuracy, and the capacity to process larger amounts of text over the past year. This article is a practical guide to update economists on how to use these advancements in their research. The main innovations covered are (i) new reasoning capabilities, (ii) novel workspaces for interactive LLM collaboration such as Claude's Artifacts, ChatGPT's Canvas or Microsoft's Copilot, and (iii) recent improvements in LLM-powered internet search. Incorporating these capabilities in their work allows economists to achieve significant productivity gains. Additionally, I highlight new use cases in promoting research, such as automatically generated blog posts, presentation slides and interviews as well as podcasts via Google's NotebookLM.
AI-Empowered Human Research Integrating Brain Science and Social Sciences Insights	Discussion Article	Three Collaboration Models	November 21, 2024	Preprint	Research Design, Other	Psychology	This paper explores the transformative role of artificial intelligence (AI) in enhancing scientific research, particularly in the fields of brain science and social sciences. We analyze the fundamental aspects of human research and argue that it is high time for researchers to transition to human-AI joint research. Building upon this foundation, we propose two innovative research paradigms of human-AI joint research: "AI-Brain Science Research Paradigm" and "AI-Social Sciences Research Paradigm". In these paradigms, we introduce three human-AI collaboration models: AI as a research tool (ART), AI as a research assistant (ARA), and AI as a research participant (ARP). Furthermore, we outline the methods for conducting human-AI joint research. This paper seeks to redefine the collaborative interactions between human researchers and AI system, setting the stage for future research directions and sparking innovation in this interdisciplinary field.
Artificial intelligence, machine learning, and big data: Improvements to the science of people at work and applications to practice	Discussion Article	Personnel	November 20, 2024		Research Design, Other	Business, Psychology	Currently, in the organizational research community, artificial intelligence (AI), machine learning(ML), and big data techniques are being vigorously explored as a set of modern-day approaches contributing to a multidisciplinary science of people at work. This paper discusses more specifically how these sophisticated technologies, methods, and data might together advance the science of people at work through various routes, including improving theory and knowledge, construct measurements, and predicting real-world outcomes. Inspired by the four articles in the current special issue highlighting several of these aspects in essential ways, we also share other possibilities for future organizational research. In addition, we indicate many key practical, ethical, and institutional challenges with research involving AI/ML and big data (i.e., data accessibility, methodological skill gaps, data transparency, privacy, reproducibility, generalizability, and interpretability). Taken together, the opportunities and challenges that lie ahead in the areas of AI and ML promise to reshape organizational research and practice in many exciting and impactful ways.
A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work	Discussion Article	Embeddings in Social Work	November 12, 2024	Preprint	Data Analysis	Psychology, Sociology, Other	Word embeddings represent a transformative technology for analyzing text data in social work research, offering sophisticated tools for understanding case notes, policy documents, research literature, and other text-based materials. This methodological paper introduces word embeddings to social work researchers, explaining how these mathematical representations capture meaning and relationships in text data more effectively than traditional keyword-based approaches. We discuss fundamental concepts, technical foundations, and practical applications, including semantic search, clustering, and retrieval augmented generation. The paper demonstrates how embeddings can enhance research workflows through concrete examples from social work practice, such as analyzing case notes for housing instability patterns and comparing social work licensing examinations across languages. While highlighting the potential of embeddings for advancing social work research, we acknowledge limitations including information loss, training data constraints, and potential biases. We conclude that successfully implementing embedding technologies in social work requires developing domain-specific models, creating accessible tools, and establishing best practices aligned with social work's ethical principles. This integration can enhance our ability to analyze complex patterns in text data while supporting more effective services and interventions.
The Problems of LLM-generated Data in Social Science Research	Discussion Article	Problems with LLM Data	November 10, 2024	Open Source	Data Generation	Sociology, Other	Beyond being used as fast and cheap annotators for otherwise complex classification tasks, LLMs have seen a growing adoption for generating synthetic data for social science and design research. Researchers have used LLM-generated data for data augmentation and prototyping, as well as for direct analysis where LLMs acted as proxies for real human subjects. LLM-based synthetic data build on fundamentally different epistemological assumptions than previous synthetically generated data and are justified by a different set of considerations. In this essay, we explore the various ways in which LLMs have been used to generate research data and consider the underlying epistemological (and accompanying methodological) assumptions. We challenge some of the assumptions made about LLM-generated data, and we highlight the main challenges that social sciences and humanities need to address if they want to adopt LLMs as synthetic data generators.
12 Best Practices for Leveraging Generative AI in Experimental Research	Discussion Article	Best Practices	October 21, 2024		Other	Economics	We provide twelve best practices and discuss how each practice can help researchers accurately, credibly, and ethically use Generative AI (GenAI) to enhance experimental research. We split the twelve practices into four areas. First, in the pre-treatment stage, we discuss how GenAI can aid in pre-registration procedures, data privacy concerns, and ethical considerations specific to GenAI usage. Second, in the design and implementation stage, we focus on GenAI’s role in identifying new channels of variation, piloting and documentation, and upholding the four exclusion restrictions. Third, in the analysis stage, we explore how prompting and training set bias can impact results as well as necessary steps to ensure replicability. Finally, we discuss forward-looking best practices that are likely to gain importance as GenAI evolves.
The why, what, and how of AI-based coding in scientific research	Discussion Article	Coding	October 4, 2024	Preprint	Other	Computer Science	Computer programming (coding) is indispensable for researchers across disciplines, yet it remains challenging to learn and time-consuming to carry out. Generative AI, particularly large language models (LLMs), has the potential to transform coding into intuitive conversations, but best practices and effective workflows are only emerging. We dissect AI-based coding through three key lenses: the nature and role of LLMs in coding (why), six types of coding assistance they provide (what), and a five-step workflow in action with practical implementation strategies (how). Additionally, we address the limitations and future outlook of AI in coding. By offering actionable insights, this framework helps to guide researchers in effectively leveraging AI to enhance coding practices and education, accelerating scientific progress.
Generative AI in Academic Research: Perspectives and Cultural Norms	Discussion Article, Other	Cornell	September 23, 2024	Open Source	Other	Other	This report offers perspectives and practical guidelines to the Cornell community, specifically on the use of Generative Artificial Intelligence (GenAI) in the practice and dissemination of academic research. As emphasized in the charge to a Cornell task force representing input across all campuses, the report aims to establish the initial set of perspectives and cultural norms for Cornell researchers, research team leaders, and research administration staff. It is meant as internal advice rather than a set of binding rules. As GenAI policies and guardrails are rapidly evolving, we stress the importance of staying current with the latest developments, and updating procedures and rules governing the use of GenAI tools in research thoughtfully over time. This report was developed within the same 12-month period that GenAI became available to a much wider number of researchers (and citizens) than AI specialists who help create such tools. While the Cornell community is the intended audience, this report is publicly available as a resource for other research communities to use or adapt. No endorsement of specific tools is implied, but specific examples are referenced to illustrate concepts.

Highlighted Resources

Categories