Below are articles that use LLMs in their research workflows. You can use the Search option to find examples from your discipline, or for specific workflow applications you may be considering.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
Rise of Generative Artificial Intelligence in Science | Research Article | Rise | January 9, 2025 | Preprint | Other | Any Discipline | Generative Artificial Intelligence (GenAI, generative AI) has rapidly become available as a tool in scientific research. To explore the use of generative AI in science, we conduct an empirical analysis using OpenAlex. Analyzing GenAI publications and other AI publications from 2017 to 2023, we profile growth patterns, the diffusion of GenAI publications across fields of study, and the geographical spread of scientific research on generative AI. We also investigate team size and international collaborations to explore whether GenAI, as an emerging scientific research area, shows different collaboration patterns compared to other AI technologies. The results indicate that generative AI has experienced rapid growth and increasing presence in scientific publications. The use of GenAI now extends beyond computer science to other scientific research domains. Over the study period, U.S. researchers contributed nearly two-fifths of global GenAI publications. The U.S. is followed by China, with several small and medium-sized advanced economies demonstrating relatively high levels of GenAI deployment in their research publications. Although scientific research overall is becoming increasingly specialized and collaborative, our results suggest that GenAI research groups tend to have slightly smaller team sizes than found in other AI fields. Furthermore, notwithstanding recent geopolitical tensions, GenAI research continues to exhibit levels of international collaboration comparable to other AI technologies. |
Hypothesis Generation with Large Language Models | Research Article | Hypotheses | December 20, 2024 | Preprint | Research Design | Any Discipline | Effective generation of novel hypotheses is instrumental to scientific progress. So far, researchers have been the main powerhouse behind hypothesis generation by painstaking data analysis and thinking (also known as the Eureka moment). In this paper, we examine the potential of large language models (LLMs) to generate hypotheses. We focus on hypothesis generation based on data (i.e., labeled examples). To enable LLMs to handle arbitrarily long contexts, we generate initial hypotheses from a small number of examples and then update them iteratively to improve the quality of hypotheses. Inspired by multi-armed bandits, we design a reward function to inform the exploitation-exploration tradeoff in the update process. Our algorithm is able to generate hypotheses that enable much better predictive performance than few-shot prompting in classification tasks, improving accuracy by 31.7% on a synthetic dataset and by 13.9%, 3.3% and, 24.9% on three real-world datasets. We also outperform supervised learning by 12.8% and 11.2% on two challenging real-world datasets. Furthermore, we find that the generated hypotheses not only corroborate human-verified theories but also uncover new insights for the tasks. |
Scaling up the Evaluation of Collaborative Problem Solving: Promises and Challenges of Coding Chat Data with ChatGPT | Research Article | Coding | November 19, 2024 | Preprint | Data Analysis | Computer Science | Collaborative problem solving (CPS) is widely recognized as a critical 21st century skill. Efficiently coding communication data is a big challenge in scaling up research on assessing CPS. This paper reports the findings on using ChatGPT to directly code CPS chat data by benchmarking performance across multiple datasets and coding frameworks. We found that ChatGPT-based coding outperformed human coding in tasks where the discussions were characterized by colloquial languages but fell short in tasks where the discussions dealt with specialized scientific terminology and contexts. The findings offer practical guidelines for researchers to develop strategies for efficient and scalable analysis of communication data from CPS tasks. |
AI-Augmented Cultural Sociology: Guidelines for LLM-assisted text analysis and an illustrative example | Research Article, Use Case Example | Sociology | December 3, 2024 | Preprint | Data Analysis | Sociology | The advent of large language models (LLMs) presents a promising opportunity for how we analyze text and, by extension, can study the role of culture and symbolic meanings in social life. Using an illustrative example focused on the concept of “personalized service” within Michelin-starred restaurants, this research note demonstrates how LLMs can reliably identify complex, multifaceted concepts similarly to a qualitative data analyst, but in a more scalable manner. We extend existing validation approaches, offering guidelines on the amount of manually coded data needed to evaluate LLM-generated outputs, drawing on sampling theory and a data simulation. We also discuss broader applications of LLMs in cultural sociology, such as investigations on established concepts (e.g., cultural consecration) and emerging concepts (e.g., future-oriented deliberation). This discussion underscores that AI-tools can significantly augment the empirical scope of research projects, building on rather than replacing traditional qualitative approaches. Our study ultimately advocates for an optimistic yet cautious engagement with AI-tools in social scientific inquiry, highlighting both their analytic potential and the need for ongoing reflection on their ethical implications. |
Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation | Research Article | Auto Lit | November 28, 2024 | Preprint | Research Design | Computer Science | This research presents and compares multiple approaches to automate the generation of literature reviews using several Natural Language Processing (NLP) techniques and retrieval-augmented generation (RAG) with a Large Language Model (LLM). The ever-increasing number of research articles provides a huge challenge for manual literature review. It has resulted in an increased demand for automation. Developing a system capable of automatically generating the literature reviews from only the PDF files as input is the primary objective of this research work. The effectiveness of several Natural Language Processing (NLP) strategies, such as the frequency-based method (spaCy), the transformer model (Simple T5), and retrieval-augmented generation (RAG) with Large Language Model (GPT-3.5-turbo), is evaluated to meet the primary objective. The SciTLDR dataset is chosen for this research experiment and three distinct techniques are utilized to implement three different systems for auto-generating the literature reviews. The ROUGE scores are used for the evaluation of all three systems. Based on the evaluation, the Large Language Model GPT-3.5-turbo achieved the highest ROUGE-1 score, 0.364. The transformer model comes in second place and spaCy is at the last position. Finally, a graphical user interface is created for the best system based on the large language model. |
A Computational Method for Measuring "Open Codes" in Qualitative Analysis | Research Article | Open Code | November 27, 2024 | Preprint | Data Analysis | Computer Science, Any Discipline | Qualitative analysis is critical to understanding human datasets in many social science disciplines. Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets. Yet, meeting methodological expectations (such as "as exhaustive as possible") can be challenging. While many machine learning (ML)/generative AI (GAI) studies have attempted to support open coding, few have systematically measured or evaluated GAI outcomes, increasing potential bias risks. Building on Grounded Theory and Thematic Analysis theories, we present a computational method to measure and identify potential biases from "open codes" systematically. Instead of operationalizing human expert results as the "ground truth," our method is built upon a team-based approach between human and machine coders. We experiment with two HCI datasets to establish this method's reliability by 1) comparing it with human analysis, and 2) analyzing its output stability. We present evidence-based suggestions and example workflows for ML/GAI to support open coding. |
Why and how to embrace AI such as ChatGPT in your academic life | Research Article, Documentation, Discussion Article, Use Case Example, Tutorial w/o Code, Application/Tool | Why and how to use AI in science | November 10, 2024 | Preprint | Research Design, Data Collection, Data Cleaning/Preparation, Data Generation, Dataset Joining, Data Analysis, Describing Results, Web Scraping, Science Communication, Other | Any Discipline | Generative artificial intelligence (AI), including large language models (LLMs), is poised to transform scientific research, enabling researchers to elevate their research productivity. This article presents a how-to guide for employing LLMs in academic settings, focusing on their unique strengths, constraints and implications through the lens of philosophy of science and epistemology. Using ChatGPT as a case study, I identify and elaborate on three attributes contributing to its effectiveness—intelligence, versatility and collaboration—accompanied by tips on crafting effective prompts, practical use cases and a living resource online (https://osf.io/8vpwu/). Next, I evaluate the limitations of generative AI and its implications for ethical use, equality and education. Regarding ethical and responsible use, I argue from technical and epistemic standpoints that there is no need to restrict the scope or nature of AI assistance, provided that its use is transparently disclosed. A pressing challenge, however, lies in detecting fake research, which can be mitigated by embracing open science practices, such as transparent peer review and sharing data, code and materials. Addressing equality, I contend that while generative AI may promote equality for some, it may simultaneously exacerbate disparities for others—an issue with potentially significant yet unclear ramifications as it unfolds. Lastly, I consider the implications for education, advocating for active engagement with LLMs and cultivating students' critical thinking and analytical skills. The how-to guide seeks to empower researchers with the knowledge and resources necessary to effectively harness generative AI while navigating the complex ethical dilemmas intrinsic to its application. |
AUTOGEN: A Personalized Large Language Model for AcademicEnhancement—Ethics and Proof of Principle | Research Article | Open Access manuscript | October 8, 2023 | Open Source | Data Generation | Other | Fine-tuning on authors' previously published papers. In this article, we explore the potential of enhancing academic prose and idea generation by fine-tuning a large language model (here, GPT-3) on one’s own previously published writings: AUTOGEN (“AI Unique Tailored Output GENerator”). We develop, test, and describe three distinct AUTOGEN models trained on the prior scholarly output of three of the current authors (SBM, BDE, JS), with a fourth model trained on the combined works of all three. Our AUTOGEN models demonstrate greater variance in quality than the base GPT-3 model, with many outputs outperforming the base model in format, style, overall quality, and novel idea generation. As proof of principle, we present and discuss examples of AUTOGEN-written sections of existing and hypothetical research papers. We further discuss ethical opportunities, concerns, and open questions associated with personalized academic prose and idea generators. Ethical opportunities for personalized LLMs such as AUTOGEN include increased productivity, preservation of writing styles and cultural traditions, and aiding consensus building. However, ethical concerns arise due to the potential for personalized LLMs to reduce output diversity, violate privacy and intellectual property rights, and facilitate plagiarism or fraud. The use of coauthored or multiple-source trained models further complicates issues surrounding ownership and attribution. Open questions concern a potential credit-blame asymmetry for LLM outputs, the legitimacy of licensing agreements in authorship ascription, and the ethical implications of coauthorship attribution for data contributors. Ensuring the output is sufficiently distinct from the source material is crucial to maintaining ethical standards in academic writing. These opportunities, risks, and open issues highlight the intricate ethical landscape surrounding the use of personalized LLMs in academia. We also discuss open technical questions concerning the integration of AUTOGEN-style personalized LLMs with other LLMs, such as GPT-4, for iterative refinement and improvement of generated text. In conclusion, we argue that AUTOGEN-style personalized LLMs offer significant potential benefits in terms of both prose generation and, to a lesser extent, idea generation. If associated ethical issues are appropriately addressed, AUTOGEN alone or in combination with other LLMs can be seen as a potent form of academic enhancement. |
Named Entity Recognition for peer-review disambiguation in academic publishing | Research Article | Named Entity Recognition | February 13, 2024 | Computer Science, Education | In recent years, there has been a constant increase in the number of scientific peer-reviewed articles published. Each of these articles has to go through a laborious process, from peer review, through author revision rounds, to the final decision made by the editor-in-chief. Lacking time and being under pressure with diverse research tasks, senior scientists need new tools to automate parts of their activities. In this paper, we propose a new approach based on named entity recognition that is able to annotate review comments in order to extract meaningful information about changes requested by reviewers. This research focuses on deep learning models that are achieving state-of-the-art results in many natural language processing tasks. Exploring the performance of BERT-based and XLNet models on the review comments annotation task, a “review-annotation“ model based on SciBERT was trained, able to achieve an F1 score of 0.87. Its usage allows different players in the academic publishing process to better understand the review request. In addition, the correlation of the requested and the actual changes is made possible, allowing the final decision-maker to strengthen the article evaluation. | ||
Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions | Research Article, Use Case Example, Application/Tool | Quality | November 13, 2024 | Research Design, Science Communication, Other | Medicine, Public Health | This publications outlines a comprehensive evaluation approach to determine an LLM’s quality of responses to radiation oncology patient care questions using both domain-specific expertise and domain-agnostic metrics. Domain-specific expertise involves evaluating LLM-generated responses through human expert assessments, augmented with Likert scales, while domain-agnostic evaluation utilizes computational quantitative techniques to assess the LLM-generated responses. |