Articles that discuss the use of LLMs in science.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
Why and how to embrace AI such as ChatGPT in your academic life | Research Article, Documentation, Tutorial w/o Code, Application/Tool, Discussion Article, Use Case Example | Why and how to use AI in science | November 10, 2024 | Preprint | Research Design, Data Collection, Data Cleaning/Preparation, Data Generation, Dataset Joining, Data Analysis, Describing Results, Web Scraping, Science Communication, Other | Any Discipline | Generative artificial intelligence (AI), including large language models (LLMs), is poised to transform scientific research, enabling researchers to elevate their research productivity. This article presents a how-to guide for employing LLMs in academic settings, focusing on their unique strengths, constraints and implications through the lens of philosophy of science and epistemology. Using ChatGPT as a case study, I identify and elaborate on three attributes contributing to its effectiveness—intelligence, versatility and collaboration—accompanied by tips on crafting effective prompts, practical use cases and a living resource online (https://osf.io/8vpwu/). Next, I evaluate the limitations of generative AI and its implications for ethical use, equality and education. Regarding ethical and responsible use, I argue from technical and epistemic standpoints that there is no need to restrict the scope or nature of AI assistance, provided that its use is transparently disclosed. A pressing challenge, however, lies in detecting fake research, which can be mitigated by embracing open science practices, such as transparent peer review and sharing data, code and materials. Addressing equality, I contend that while generative AI may promote equality for some, it may simultaneously exacerbate disparities for others—an issue with potentially significant yet unclear ramifications as it unfolds. Lastly, I consider the implications for education, advocating for active engagement with LLMs and cultivating students' critical thinking and analytical skills. The how-to guide seeks to empower researchers with the knowledge and resources necessary to effectively harness generative AI while navigating the complex ethical dilemmas intrinsic to its application. |
Generative AI for Economic Research: LLMs Learn to Collaborate and Reason | Discussion Article, Use Case Example | Econ Research | November 26, 2024 | Open Source | Other | Economics | Large language models (LLMs) have seen remarkable progress in speed, cost efficiency, accuracy, and the capacity to process larger amounts of text over the past year. This article is a practical guide to update economists on how to use these advancements in their research. The main innovations covered are (i) new reasoning capabilities, (ii) novel workspaces for interactive LLM collaboration such as Claude's Artifacts, ChatGPT's Canvas or Microsoft's Copilot, and (iii) recent improvements in LLM-powered internet search. Incorporating these capabilities in their work allows economists to achieve significant productivity gains. Additionally, I highlight new use cases in promoting research, such as automatically generated blog posts, presentation slides and interviews as well as podcasts via Google's NotebookLM. |
AI-Empowered Human Research Integrating Brain Science and Social Sciences Insights | Discussion Article | Three Collaboration Models | November 21, 2024 | Preprint | Research Design, Other | Psychology | This paper explores the transformative role of artificial intelligence (AI) in enhancing scientific research, particularly in the fields of brain science and social sciences. We analyze the fundamental aspects of human research and argue that it is high time for researchers to transition to human-AI joint research. Building upon this foundation, we propose two innovative research paradigms of human-AI joint research: "AI-Brain Science Research Paradigm" and "AI-Social Sciences Research Paradigm". In these paradigms, we introduce three human-AI collaboration models: AI as a research tool (ART), AI as a research assistant (ARA), and AI as a research participant (ARP). Furthermore, we outline the methods for conducting human-AI joint research. This paper seeks to redefine the collaborative interactions between human researchers and AI system, setting the stage for future research directions and sparking innovation in this interdisciplinary field. |
Artificial intelligence, machine learning, and big data: Improvements to the science of people at work and applications to practice | Discussion Article | Personnel | November 20, 2024 | Research Design, Other | Business, Psychology | Currently, in the organizational research community, artificial intelligence (AI), machine learning(ML), and big data techniques are being vigorously explored as a set of modern-day approaches contributing to a multidisciplinary science of people at work. This paper discusses more specifically how these sophisticated technologies, methods, and data might together advance the science of people at work through various routes, including improving theory and knowledge, construct measurements, and predicting real-world outcomes. Inspired by the four articles in the current special issue highlighting several of these aspects in essential ways, we also share other possibilities for future organizational research. In addition, we indicate many key practical, ethical, and institutional challenges with research involving AI/ML and big data (i.e., data accessibility, methodological skill gaps, data transparency, privacy, reproducibility, generalizability, and interpretability). Taken together, the opportunities and challenges that lie ahead in the areas of AI and ML promise to reshape organizational research and practice in many exciting and impactful ways. | |
A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work | Discussion Article | Embeddings in Social Work | November 12, 2024 | Preprint | Data Analysis | Psychology, Sociology, Other | Word embeddings represent a transformative technology for analyzing text data in social work research, offering sophisticated tools for understanding case notes, policy documents, research literature, and other text-based materials. This methodological paper introduces word embeddings to social work researchers, explaining how these mathematical representations capture meaning and relationships in text data more effectively than traditional keyword-based approaches. We discuss fundamental concepts, technical foundations, and practical applications, including semantic search, clustering, and retrieval augmented generation. The paper demonstrates how embeddings can enhance research workflows through concrete examples from social work practice, such as analyzing case notes for housing instability patterns and comparing social work licensing examinations across languages. While highlighting the potential of embeddings for advancing social work research, we acknowledge limitations including information loss, training data constraints, and potential biases. We conclude that successfully implementing embedding technologies in social work requires developing domain-specific models, creating accessible tools, and establishing best practices aligned with social work's ethical principles. This integration can enhance our ability to analyze complex patterns in text data while supporting more effective services and interventions. |
The Problems of LLM-generated Data in Social Science Research | Discussion Article | Problems with LLM Data | November 10, 2024 | Open Source | Data Generation | Sociology, Other | Beyond being used as fast and cheap annotators for otherwise complex classification tasks, LLMs have seen a growing adoption for generating synthetic data for social science and design research. Researchers have used LLM-generated data for data augmentation and prototyping, as well as for direct analysis where LLMs acted as proxies for real human subjects. LLM-based synthetic data build on fundamentally different epistemological assumptions than previous synthetically generated data and are justified by a different set of considerations. In this essay, we explore the various ways in which LLMs have been used to generate research data and consider the underlying epistemological (and accompanying methodological) assumptions. We challenge some of the assumptions made about LLM-generated data, and we highlight the main challenges that social sciences and humanities need to address if they want to adopt LLMs as synthetic data generators. |
12 Best Practices for Leveraging Generative AI in Experimental Research | Discussion Article | Best Practices | October 21, 2024 | Other | Economics | We provide twelve best practices and discuss how each practice can help researchers accurately, credibly, and ethically use Generative AI (GenAI) to enhance experimental research. We split the twelve practices into four areas. First, in the pre-treatment stage, we discuss how GenAI can aid in pre-registration procedures, data privacy concerns, and ethical considerations specific to GenAI usage. Second, in the design and implementation stage, we focus on GenAI’s role in identifying new channels of variation, piloting and documentation, and upholding the four exclusion restrictions. Third, in the analysis stage, we explore how prompting and training set bias can impact results as well as necessary steps to ensure replicability. Finally, we discuss forward-looking best practices that are likely to gain importance as GenAI evolves. | |
The why, what, and how of AI-based coding in scientific research | Discussion Article | Coding | October 4, 2024 | Preprint | Other | Computer Science | Computer programming (coding) is indispensable for researchers across disciplines, yet it remains challenging to learn and time-consuming to carry out. Generative AI, particularly large language models (LLMs), has the potential to transform coding into intuitive conversations, but best practices and effective workflows are only emerging. We dissect AI-based coding through three key lenses: the nature and role of LLMs in coding (why), six types of coding assistance they provide (what), and a five-step workflow in action with practical implementation strategies (how). Additionally, we address the limitations and future outlook of AI in coding. By offering actionable insights, this framework helps to guide researchers in effectively leveraging AI to enhance coding practices and education, accelerating scientific progress. |
Generative AI in Academic Research: Perspectives and Cultural Norms | Discussion Article, Other | Cornell | September 23, 2024 | Open Source | Other | Other | This report offers perspectives and practical guidelines to the Cornell community, specifically on the use of Generative Artificial Intelligence (GenAI) in the practice and dissemination of academic research. As emphasized in the charge to a Cornell task force representing input across all campuses, the report aims to establish the initial set of perspectives and cultural norms for Cornell researchers, research team leaders, and research administration staff. It is meant as internal advice rather than a set of binding rules. As GenAI policies and guardrails are rapidly evolving, we stress the importance of staying current with the latest developments, and updating procedures and rules governing the use of GenAI tools in research thoughtfully over time. This report was developed within the same 12-month period that GenAI became available to a much wider number of researchers (and citizens) than AI specialists who help create such tools. While the Cornell community is the intended audience, this report is publicly available as a resource for other research communities to use or adapt. No endorsement of specific tools is implied, but specific examples are referenced to illustrate concepts. |
ChatGPT is a Remarkable Tool—For Experts | Discussion Article | Experts | September 23, 2024 | Open Source | Other | Other | This paper investigates the capabilities of ChatGPT as an automated assistant in diverse domains, including scientific writing, mathematics, education, programming, and healthcare. We explore the potential of ChatGPT to enhance productivity, streamline problem-solving processes, and improve writing style. Furthermore, we highlight the potential risks associated with excessive reliance on ChatGPT in these fields. These limitations encompass factors like incorrect and fictitious responses, inaccuracies in code, limited logical reasoning abilities, overconfidence, and critical ethical concerns of copyright and privacy violation. We outline areas and objectives where ChatGPT proves beneficial, applications where it should be used judiciously, and scenarios where its reliability may be limited. In light of observed limitations, and given that the tool's fundamental errors may pose a special challenge for non-experts, ChatGPT should be used with a strategic methodology. By drawing from comprehensive experimental studies, we offer methods and flowcharts for effectively using ChatGPT. Our recommendations emphasize iterative interaction with ChatGPT and independent verification of its outputs. Considering the importance of utilizing ChatGPT judiciously and with expertise, we recommend its usage for experts who are well-versed in the respective [research] domains. [https://direct.mit.edu/view-large/figure/4705485/dint_a_00235.figure.12.jpg] |