Articles that discuss the use of LLMs in science.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work | Discussion Article | Embeddings in Social Work | November 12, 2024 | Preprint | Data Analysis | Psychology, Sociology, Other | Word embeddings represent a transformative technology for analyzing text data in social work research, offering sophisticated tools for understanding case notes, policy documents, research literature, and other text-based materials. This methodological paper introduces word embeddings to social work researchers, explaining how these mathematical representations capture meaning and relationships in text data more effectively than traditional keyword-based approaches. We discuss fundamental concepts, technical foundations, and practical applications, including semantic search, clustering, and retrieval augmented generation. The paper demonstrates how embeddings can enhance research workflows through concrete examples from social work practice, such as analyzing case notes for housing instability patterns and comparing social work licensing examinations across languages. While highlighting the potential of embeddings for advancing social work research, we acknowledge limitations including information loss, training data constraints, and potential biases. We conclude that successfully implementing embedding technologies in social work requires developing domain-specific models, creating accessible tools, and establishing best practices aligned with social work's ethical principles. This integration can enhance our ability to analyze complex patterns in text data while supporting more effective services and interventions. |
The Problems of LLM-generated Data in Social Science Research | Discussion Article | Problems with LLM Data | November 10, 2024 | Open Source | Data Generation | Sociology, Other | Beyond being used as fast and cheap annotators for otherwise complex classification tasks, LLMs have seen a growing adoption for generating synthetic data for social science and design research. Researchers have used LLM-generated data for data augmentation and prototyping, as well as for direct analysis where LLMs acted as proxies for real human subjects. LLM-based synthetic data build on fundamentally different epistemological assumptions than previous synthetically generated data and are justified by a different set of considerations. In this essay, we explore the various ways in which LLMs have been used to generate research data and consider the underlying epistemological (and accompanying methodological) assumptions. We challenge some of the assumptions made about LLM-generated data, and we highlight the main challenges that social sciences and humanities need to address if they want to adopt LLMs as synthetic data generators. |
Why and how to embrace AI such as ChatGPT in your academic life | Research Article, Documentation, Tutorial w/o Code, Application/Tool, Discussion Article, Use Case Example | Why and how to use AI in science | November 10, 2024 | Preprint | Research Design, Data Collection, Data Cleaning/Preparation, Data Generation, Dataset Joining, Data Analysis, Describing Results, Web Scraping, Science Communication, Other | Anthropology, Biology, Business, Chemistry, Computer Science, Data Science, Education, Economics, Engineering, Geography, History, International Affairs, Math, Medicine, Languages, Law, Philosophy, Political Science, Psychology, Public Health, Sociology, Statistics, Urban Planning, Other | Generative artificial intelligence (AI), including large language models (LLMs), is poised to transform scientific research, enabling researchers to elevate their research productivity. This article presents a how-to guide for employing LLMs in academic settings, focusing on their unique strengths, constraints and implications through the lens of philosophy of science and epistemology. Using ChatGPT as a case study, I identify and elaborate on three attributes contributing to its effectiveness—intelligence, versatility and collaboration—accompanied by tips on crafting effective prompts, practical use cases and a living resource online (https://osf.io/8vpwu/). Next, I evaluate the limitations of generative AI and its implications for ethical use, equality and education. Regarding ethical and responsible use, I argue from technical and epistemic standpoints that there is no need to restrict the scope or nature of AI assistance, provided that its use is transparently disclosed. A pressing challenge, however, lies in detecting fake research, which can be mitigated by embracing open science practices, such as transparent peer review and sharing data, code and materials. Addressing equality, I contend that while generative AI may promote equality for some, it may simultaneously exacerbate disparities for others—an issue with potentially significant yet unclear ramifications as it unfolds. Lastly, I consider the implications for education, advocating for active engagement with LLMs and cultivating students' critical thinking and analytical skills. The how-to guide seeks to empower researchers with the knowledge and resources necessary to effectively harness generative AI while navigating the complex ethical dilemmas intrinsic to its application. |
Towards an AI policy framework in scholarly publishing | Documentation, Application/Tool, Discussion Article | AI publishing policy | November 10, 2024 | Preprint | Science Communication, Other | Anthropology, Biology, Business, Chemistry, Computer Science, Data Science, Education, Economics, Engineering, Geography, History, International Affairs, Math, Medicine, Languages, Law, Philosophy, Political Science, Psychology, Public Health, Sociology, Statistics, Urban Planning, Other | The rapid adoption of artificial intelligence (AI) tools in academic research raises pressing ethical concerns. I examine major publishing policies in science and medicine, uncovering inconsistencies and limitations in guiding AI usage. To encourage responsible AI integration while upholding transparency, I propose an enabling framework with author and reviewer policy templates. |
How to write effective prompts for large language models | Documentation, Tutorial w/o Code, Application/Tool, Discussion Article | Prompt engineering | November 10, 2024 | Preprint | Research Design, Data Collection, Data Cleaning/Preparation, Data Generation, Describing Results, Science Communication, Other | Anthropology, Biology, Business, Chemistry, Computer Science, Data Science, Education, Economics, Engineering, Geography, History, International Affairs, Math, Medicine, Languages, Law, Philosophy, Political Science, Psychology, Public Health, Sociology, Statistics, Urban Planning, Other | Effectively engaging with large language models is becoming increasingly vital as they proliferate across research landscapes. This Comment presents a practical guide for understanding their capabilities and limitations, along with strategies for crafting well-structured queries, to extract maximum utility from these artificial intelligence tools. |
Techniques for supercharging academic writing with generative AI | Documentation, Tutorial w/ Code, Tutorial w/o Code, Application/Tool, Discussion Article, Use Case Example, Reporting Guidelines | AI-based writing | November 10, 2024 | Preprint | Describing Results, Science Communication | Anthropology, Biology, Business, Chemistry, Computer Science, Data Science, Education, Economics, Engineering, Geography, History, International Affairs, Math, Medicine, Languages, Law, Philosophy, Political Science, Psychology, Public Health, Sociology, Statistics, Urban Planning, Other | Generalist large language models can elevate the quality and efficiency of academic writing. |
Beyond principlism: Practical strategies for ethical AI use in research practices | Discussion Article, Reporting Guidelines | Practical strategies for ethical AI use in research practices | November 10, 2024 | Preprint | Describing Results, Science Communication | Anthropology, Biology, Business, Chemistry, Computer Science, Data Science, Education, Economics, Engineering, Geography, History, International Affairs, Math, Medicine, Languages, Law, Philosophy, Political Science, Psychology, Public Health, Sociology, Statistics, Urban Planning, Other | The rapid adoption of generative artificial intelligence (AI) in scientific research, particularly large language models (LLMs), has outpaced the development of ethical guidelines, leading to a “Triple-Too” problem: too many high-level ethical initiatives, too abstract principles lacking contextual and practical relevance, and too much focus on restrictions and risks over benefits and utilities. Existing approaches—principlism (reliance on abstract ethical principles), formalism (rigid application of rules), and technological solutionism (overemphasis on technological fixes)—offer little practical guidance for addressing ethical challenges of AI in scientific research practices. To bridge the gap between abstract principles and day-to-day research practices, a user-centered, realism-inspired approach is proposed here. It outlines five specific goals for ethical AI use: (1) understanding model training and output, including bias mitigation strategies; (2) respecting privacy, confidentiality, and copyright; (3) avoiding plagiarism and policy violations; (4) applying AI beneficially compared to alternatives; and (5) using AI transparently and reproducibly. Each goal is accompanied by actionable strategies and realistic cases of misuse and corrective measures. I argue that ethical AI application requires evaluating its utility against existing alternatives rather than isolated performance metrics. Additionally, I propose documentation guidelines to enhance transparency and reproducibility in AI-assisted research. Moving forward, we need targeted professional development, training programs, and balanced enforcement mechanisms to promote responsible AI use while fostering innovation. By refining these ethical guidelines and adapting them to emerging AI capabilities, we can accelerate scientific progress without compromising research integrity. |
12 Best Practices for Leveraging Generative AI in Experimental Research | Discussion Article | Best Practices | October 21, 2024 | Other | Economics | We provide twelve best practices and discuss how each practice can help researchers accurately, credibly, and ethically use Generative AI (GenAI) to enhance experimental research. We split the twelve practices into four areas. First, in the pre-treatment stage, we discuss how GenAI can aid in pre-registration procedures, data privacy concerns, and ethical considerations specific to GenAI usage. Second, in the design and implementation stage, we focus on GenAI’s role in identifying new channels of variation, piloting and documentation, and upholding the four exclusion restrictions. Third, in the analysis stage, we explore how prompting and training set bias can impact results as well as necessary steps to ensure replicability. Finally, we discuss forward-looking best practices that are likely to gain importance as GenAI evolves. | |
The why, what, and how of AI-based coding in scientific research | Discussion Article | Coding | October 4, 2024 | Preprint | Other | Computer Science | Computer programming (coding) is indispensable for researchers across disciplines, yet it remains challenging to learn and time-consuming to carry out. Generative AI, particularly large language models (LLMs), has the potential to transform coding into intuitive conversations, but best practices and effective workflows are only emerging. We dissect AI-based coding through three key lenses: the nature and role of LLMs in coding (why), six types of coding assistance they provide (what), and a five-step workflow in action with practical implementation strategies (how). Additionally, we address the limitations and future outlook of AI in coding. By offering actionable insights, this framework helps to guide researchers in effectively leveraging AI to enhance coding practices and education, accelerating scientific progress. |
Generative AI in Academic Research: Perspectives and Cultural Norms | Discussion Article, Other | Cornell | September 23, 2024 | Open Source | Other | Other | This report offers perspectives and practical guidelines to the Cornell community, specifically on the use of Generative Artificial Intelligence (GenAI) in the practice and dissemination of academic research. As emphasized in the charge to a Cornell task force representing input across all campuses, the report aims to establish the initial set of perspectives and cultural norms for Cornell researchers, research team leaders, and research administration staff. It is meant as internal advice rather than a set of binding rules. As GenAI policies and guardrails are rapidly evolving, we stress the importance of staying current with the latest developments, and updating procedures and rules governing the use of GenAI tools in research thoughtfully over time. This report was developed within the same 12-month period that GenAI became available to a much wider number of researchers (and citizens) than AI specialists who help create such tools. While the Cornell community is the intended audience, this report is publicly available as a resource for other research communities to use or adapt. No endorsement of specific tools is implied, but specific examples are referenced to illustrate concepts. |