Articles that discuss the use of LLMs in science.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
Artificial intelligence, machine learning, and big data: Improvements to the science of people at work and applications to practice | Discussion Article | Personnel | November 20, 2024 | Research Design, Other | Business, Psychology | Currently, in the organizational research community, artificial intelligence (AI), machine learning(ML), and big data techniques are being vigorously explored as a set of modern-day approaches contributing to a multidisciplinary science of people at work. This paper discusses more specifically how these sophisticated technologies, methods, and data might together advance the science of people at work through various routes, including improving theory and knowledge, construct measurements, and predicting real-world outcomes. Inspired by the four articles in the current special issue highlighting several of these aspects in essential ways, we also share other possibilities for future organizational research. In addition, we indicate many key practical, ethical, and institutional challenges with research involving AI/ML and big data (i.e., data accessibility, methodological skill gaps, data transparency, privacy, reproducibility, generalizability, and interpretability). Taken together, the opportunities and challenges that lie ahead in the areas of AI and ML promise to reshape organizational research and practice in many exciting and impactful ways. | |
A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work | Discussion Article | Embeddings in Social Work | November 12, 2024 | Preprint | Data Analysis | Psychology, Sociology, Other | Word embeddings represent a transformative technology for analyzing text data in social work research, offering sophisticated tools for understanding case notes, policy documents, research literature, and other text-based materials. This methodological paper introduces word embeddings to social work researchers, explaining how these mathematical representations capture meaning and relationships in text data more effectively than traditional keyword-based approaches. We discuss fundamental concepts, technical foundations, and practical applications, including semantic search, clustering, and retrieval augmented generation. The paper demonstrates how embeddings can enhance research workflows through concrete examples from social work practice, such as analyzing case notes for housing instability patterns and comparing social work licensing examinations across languages. While highlighting the potential of embeddings for advancing social work research, we acknowledge limitations including information loss, training data constraints, and potential biases. We conclude that successfully implementing embedding technologies in social work requires developing domain-specific models, creating accessible tools, and establishing best practices aligned with social work's ethical principles. This integration can enhance our ability to analyze complex patterns in text data while supporting more effective services and interventions. |
The Problems of LLM-generated Data in Social Science Research | Discussion Article | Problems with LLM Data | November 10, 2024 | Open Source | Data Generation | Sociology, Other | Beyond being used as fast and cheap annotators for otherwise complex classification tasks, LLMs have seen a growing adoption for generating synthetic data for social science and design research. Researchers have used LLM-generated data for data augmentation and prototyping, as well as for direct analysis where LLMs acted as proxies for real human subjects. LLM-based synthetic data build on fundamentally different epistemological assumptions than previous synthetically generated data and are justified by a different set of considerations. In this essay, we explore the various ways in which LLMs have been used to generate research data and consider the underlying epistemological (and accompanying methodological) assumptions. We challenge some of the assumptions made about LLM-generated data, and we highlight the main challenges that social sciences and humanities need to address if they want to adopt LLMs as synthetic data generators. |
12 Best Practices for Leveraging Generative AI in Experimental Research | Discussion Article | Best Practices | October 21, 2024 | Other | Economics | We provide twelve best practices and discuss how each practice can help researchers accurately, credibly, and ethically use Generative AI (GenAI) to enhance experimental research. We split the twelve practices into four areas. First, in the pre-treatment stage, we discuss how GenAI can aid in pre-registration procedures, data privacy concerns, and ethical considerations specific to GenAI usage. Second, in the design and implementation stage, we focus on GenAI’s role in identifying new channels of variation, piloting and documentation, and upholding the four exclusion restrictions. Third, in the analysis stage, we explore how prompting and training set bias can impact results as well as necessary steps to ensure replicability. Finally, we discuss forward-looking best practices that are likely to gain importance as GenAI evolves. | |
The why, what, and how of AI-based coding in scientific research | Discussion Article | Coding | October 4, 2024 | Preprint | Other | Computer Science | Computer programming (coding) is indispensable for researchers across disciplines, yet it remains challenging to learn and time-consuming to carry out. Generative AI, particularly large language models (LLMs), has the potential to transform coding into intuitive conversations, but best practices and effective workflows are only emerging. We dissect AI-based coding through three key lenses: the nature and role of LLMs in coding (why), six types of coding assistance they provide (what), and a five-step workflow in action with practical implementation strategies (how). Additionally, we address the limitations and future outlook of AI in coding. By offering actionable insights, this framework helps to guide researchers in effectively leveraging AI to enhance coding practices and education, accelerating scientific progress. |
Generative AI in Academic Research: Perspectives and Cultural Norms | Discussion Article, Other | Cornell | September 23, 2024 | Open Source | Other | Other | This report offers perspectives and practical guidelines to the Cornell community, specifically on the use of Generative Artificial Intelligence (GenAI) in the practice and dissemination of academic research. As emphasized in the charge to a Cornell task force representing input across all campuses, the report aims to establish the initial set of perspectives and cultural norms for Cornell researchers, research team leaders, and research administration staff. It is meant as internal advice rather than a set of binding rules. As GenAI policies and guardrails are rapidly evolving, we stress the importance of staying current with the latest developments, and updating procedures and rules governing the use of GenAI tools in research thoughtfully over time. This report was developed within the same 12-month period that GenAI became available to a much wider number of researchers (and citizens) than AI specialists who help create such tools. While the Cornell community is the intended audience, this report is publicly available as a resource for other research communities to use or adapt. No endorsement of specific tools is implied, but specific examples are referenced to illustrate concepts. |
ChatGPT is a Remarkable Tool—For Experts | Discussion Article | Experts | September 23, 2024 | Open Source | Other | Other | This paper investigates the capabilities of ChatGPT as an automated assistant in diverse domains, including scientific writing, mathematics, education, programming, and healthcare. We explore the potential of ChatGPT to enhance productivity, streamline problem-solving processes, and improve writing style. Furthermore, we highlight the potential risks associated with excessive reliance on ChatGPT in these fields. These limitations encompass factors like incorrect and fictitious responses, inaccuracies in code, limited logical reasoning abilities, overconfidence, and critical ethical concerns of copyright and privacy violation. We outline areas and objectives where ChatGPT proves beneficial, applications where it should be used judiciously, and scenarios where its reliability may be limited. In light of observed limitations, and given that the tool's fundamental errors may pose a special challenge for non-experts, ChatGPT should be used with a strategic methodology. By drawing from comprehensive experimental studies, we offer methods and flowcharts for effectively using ChatGPT. Our recommendations emphasize iterative interaction with ChatGPT and independent verification of its outputs. Considering the importance of utilizing ChatGPT judiciously and with expertise, we recommend its usage for experts who are well-versed in the respective [research] domains. [https://direct.mit.edu/view-large/figure/4705485/dint_a_00235.figure.12.jpg] |
The advantages and limitations of using ChatGPT to enhance technological research | Discussion Article | Tech Resesarch | September 23, 2024 | Open Source | Research Design, Data Analysis, Describing Results, Science Communication | Other | In 2022, OpenAI made a groundbreaking entrance with the release of ChatGPT, a new online chatbot that allows users to interact with the GPT-3.5 language model. Users can ask questions and converse with ChatGPT by typing into a text field similar to direct messaging software. Then, ChatGPT will generate a response. Users can then either respond to ChatGPT, regenerate the previous response, or “like” the response and give feedback. OpenAI improved the program on March 14th, 2023, with the release of GPT 4, which promised better reasoning ability. Both these iterations of ChatGPT have attracted significant attention from researchers due to the software's remarkably enhanced capabilities compared to earlier versions. Due to its inherent value as a research tool, ChatGPT will likely become a permanent fixture, so a thorough evaluation of ethical and professional boundaries is crucial. In this opinion paper, we explore ChatGPT 4.0 by addressing: a) its capabilities, b) its limitations and weaknesses, and c) strategies for fact-checking its output to ensure high-quality responses. Subsequently, the authors delve into the diverse implications of this software and discuss how it can be optimally employed to advance research in technology and various other domains. |
How will generative AI disrupt data science in drug discovery? | Discussion Article | Drug discovery | September 23, 2024 | Open Source | Other | Biology, Other | In the short few months since the release of ChatGPT1,2, the potential for large language models (LLMs) and generative artificial intelligence (AI) to disrupt fields as diverse as art, marketing, journalism, copywriting, law and software engineering is already being realized. These technologies use deep learning models trained on enormous amounts of data to generate new texts or images. While trained only to capture statistical regularities in the training data, their ability, once trained, to imitate human language in a convincing way; to generate realistic images, sounds or software; or to solve tasks apparently involving higher cognitive functions such as reasoning has caught the world by surprise. As such, they are also poised to disrupt in many ways how scientists and engineers understand biology and discover and develop new treatments. |
‘ChatGPT et al.’: The ethics of using (generative) artificial intelligence in research and science | Discussion Article | Ethics | September 23, 2024 | Open Source | Other | Computer Science, Data Science, Other | As journal editors, the emergence of ChatGPT prompted us – and others (e.g. Hill-Yardin et al., 2023; Liebrenz et al., 2023; Lund and Wang, 2023; Teubner et al., 2023; Van Dis et al., 2023) – to ask foundational questions about using generative AI in research and science. Specifically: Is it ‘ethical’ to use generative or other AIs in conducting research or for writing academic research papers? In this editorial, we go back to first principles to reflect on the fundamental ethics to apply to using ChatGPT and AI in research and science. Next, we caution that (generative) AI is also at the ‘peak of inflated (hype) expectations’ and discuss eight in-principle issues that AI struggles with, both ethically and practically. We conclude with what this all means for the ethics of using generative AI in research and science. |