Below are articles that use LLMs in their research workflows. You can use the Search option to find examples from your discipline, or for specific workflow applications you may be considering.
Title | Type of Resource | Link to Resource | Date Recorded | Open Science | Use of LLM | Research Discipline(s) | Description of Resource |
---|---|---|---|---|---|---|---|
The Value of Generative AI for Qualitative Research: A Pilot Study | Research Article, Use Case Example | Pilot | September 23, 2024 | Open Source | Data Analysis | Data Science | This mixed-methods approach study investigates the potential of introducing generative AI (ChatGPT 4 and Bard) as part of a deductive qualitative research design that requires coding, focusing on possible gains in cost-effectiveness, coding throughput time, and inter-coder reliability (Cohen’s Kappa). This study involved semi-structured interviews with five domain experts and analyzed a dataset of 122 respondents that required categorization into six predefined categories. The results from using generative AI coders were compared with those from a previous study where human coders carried out the same task. In this comparison, we evaluated the performance of AI-based coders against two groups of human coders, comprising three experts and three non-experts. Our findings support the replacement of human coders with generative AI ones, specifically ChatGPT for deductive qualitative research methods of limited scope. The experimental group, consisting of three independent generative AI coders, outperformed both control groups in coding effort, with a fourfold (4x) efficiency and throughput time (15x) advantage. The latter could be explained by leveraging parallel processing. Concerning expert vs. non-expert coders, minimal evidence suggests a preference for experts. Although experts code slightly faster (17%), their inter-coder reliability showed no substantial advantage. A hybrid approach, combining ChatGPT and domain experts, shows the most promise. This approach reduces costs, shortens project timelines, and enhances inter-coder reliability, as indicated by higher Cohen’s Kappa values. In conclusion, generative AI, exemplified by ChatGPT, offers a viable alternative to human coders, in combination with human research involvement, delivering cost savings and faster research completion without sacrificing notable reliability. These insights, while limited in scope, show potential for further studies with larger datasets, more inductive qualitative research designs, and other research domains. |
Beyond the Average: Exploring the Potential and Challenges of Large Language Models in Social Science Research | Research Article | Beyond Average | September 22, 2024 | Open Source | Other | Computer Science | This paper delves into the integration of Large Language Models (LLMs) in social science research through a case study at the Centre for Consumer Society Research (CCSR). It examines the use of LLMs across the research cycle—including model development, data collection, analysis, editing, and reporting—highlighting how they can augment research efficiency and creativity. It also critically addresses the propensity of LLMs to contribute to average-quality research, underscoring the urgency for ethical guidelines and educational initiatives. The paper contributes significantly by mapping out the human, technological, and procedural barriers, and enablers to AI integration, providing a multifaceted view of LLM adoption and its implications for academia and policy making. Through empirical investigation and analysis, this study offers practical insights, establishes a baseline of current LLM use, pinpoints perceived limitations, and articulates calls for responsible governance within the social sciences. |
Codebook LLMs: Adapting Political Science Codebooks for LLM Use and Adapting LLMs to Follow Codebooks | Research Article | Codebooks | September 22, 2024 | Preprint | Data Analysis | Political Science | Codebooks -- documents that operationalize constructs and outline annotation procedures -- are used almost universally by social scientists when coding unstructured political texts. Recently, to reduce manual annotation costs, political scientists have looked to generative large language models (LLMs) to label and analyze text data. However, previous work using LLMs for classification has implicitly relied on the universal label assumption -- correct classification of documents is possible using only a class label or minimal definition and the information that the LLM inductively learns during its pre-training. In contrast, we argue that political scientists who care about valid measurement should instead make a codebook-construct label assumption -- an LLM should follow the definition and exclusion criteria of a construct/label provided in a codebook. In this work, we collect and curate three political science datasets and their original codebooks and conduct a set of experiments to understand whether LLMs comply with codebook instructions, whether rewriting codebooks improves performance, and whether instruction-tuning LLMs on codebook-document-label tuples improves performance over zero-shot classification. Using Mistral 7B Instruct as our LLM, we find re-structuring the original codebooks gives modest gains in zero-shot performance but the model still struggles to comply with the constraints of the codebooks. Optimistically, instruction-tuning Mistral on one of our datasets gives significant gains over zero-shot inference (0.76 versus 0.53 micro F1). We hope our conceptualization of the codebook-specific task, assumptions, and instruction-tuning pipeline as well our semi-structured LLM codebook format will help political scientists readily adapt to the LLM era. |
Using artificial intelligence in academic writing and research: An essential productivity tool | Research Article | Productivity Tool | September 22, 2024 | Open Source | Research Design, Data Collection, Data Generation, Data Analysis, Science Communication | Biology, Computer Science | Background Academic writing is an essential component of research, characterized by structured expression of ideas, data-driven arguments, and logical reasoning. However, it poses challenges such as handling vast amounts of information and complex ideas. The integration of Artificial Intelligence (AI) into academic writing has become increasingly important, offering solutions to these challenges. This review aims to explore specific domains where AI significantly supports academic writing. Methods A systematic review of literature from databases like PubMed, Embase, and Google Scholar, published since 2019, was conducted. Studies were included based on relevance to AI's application in academic writing and research, focusing on writing assistance, grammar improvement, structure optimization, and other related aspects. Results The search identified 24 studies through which six core domains were identified where AI helps academic writing and research: 1) facilitating idea generation and research design, 2) improving content and structuring, 3) supporting literature review and synthesis, 4) enhancing data management and analysis, 5) supporting editing, review, and publishing, and 6) assisting in communication, outreach, and ethical compliance. ChatGPT has shown substantial potential in these areas, though challenges like maintaining academic integrity and balancing AI use with human insight remain. Conclusion and recommendations AI significantly revolutionises academic writing and research across various domains. Recommendations include broader integration of AI tools in research workflows, emphasizing ethical and transparent use, providing adequate training for researchers, and maintaining a balance between AI utility and human insight. Ongoing research and development are essential to address emerging challenges and ethical considerations in AI's application in academia. |
The Ethics of Generative Ai in Social Science Research: A Qualitative Approach for Community-Based Ai Research Ethics | Research Article | Ethics | September 22, 2024 | Open Source | Other | Other | Despite growing attention to the ethics of Generative AI, there has been little discussion about how research ethics should be updated for the social sciences. This paper fills this gap at the intersection of AI ethics and social science research ethics. Based on 17 semi-structured interviews, we present three narratives about generative AI and research ethics: 1) the equalizer narrative, 2) the meritocracy narrative, and 3) the community narrative. We argue that the ethics of AI-assisted social-scientific research cannot be reduced to universal checklists. Instead, the community-based approach is necessary to organize “ethics-in-practice.” In all narratives, technical functions of Generative AI were merely necessary conditions of unethical practices, while ethical dilemmas started to arise when such functions were situated in the institutional arrangements of academia. Our findings suggest that the ethics of AI-assisted research should encompass not only the specific ethical rules concerning AI functionalities but also community engagement, educational imperatives, institutional governance, and the societal impact of such technologies. It signifies democratic deliberations to address the complex, emergent interactions between AI systems and societal structures. |
Advancing Instrument Validation in Social Sciences: An AI-Powered Chatbot and Interactive Website based on Research Instrument Validation Framework (RIVF) | Research Article | Validity | September 22, 2024 | Open Source | Research Design, Data Collection | Other | Background: In social sciences, ensuring a high level of instrument validation is crucial for upholding the principles of scientific rigor and maintaining the overall quality of research. Objectives: To develop and evaluate an AI chatbot and website for instrument validation, assess their impact on instrument validity improvement, and analyze user perceptions. Methods: Adopting a quantitative design, the study was anchored on the developed Research Instrument Validation Framework (RIVF) of Villarino (2024). Moreover, it was evaluated through users' perceptions (n=100) by administering an online survey, whereby the employment of paired t-tests used contrasting instrument validity-pre-vs post-RIVF scores, and one-way ANOVA was used to determine if a relationship existed between users' perceptions and overall improvement in instrument validity. A G*Power analysis indicated that there was sufficient statistical power for the analyses: for paired t-tests, it was 99.73% (n = 100, dz = 0.5, α = 0.05), and for one-way ANOVA, 80.95% (n = 100, f = 0.25, α=0.05, four groups). All data were analyzed using IBM SPSS version 26. Results: Post-RIVF use, all the validity domains showed significant improvements (p<0.001), but the primary considerable improvement was in construct validity [Mean difference=1.20±0.60, t(49)=14.14]. Participants perceived the AI chatbot as more useful [4.30±0.70 vs. 3.80±0.80, p<0.001] compared to the RIFV website. Conclusion: This AI-powered milieu indicates a potential for increasing the validity of research instruments in RIVF, while an AI chatbot efficiently increments the construct validity. These findings would infer that using AI technologies potentially enhances the quality of research instruments in the social sciences alongside traditional validation methods. |
Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts | Research Article | paper blogs | September 22, 2024 | Preprint | Science Communication | Computer Science | Research-paper blog posts help scientists disseminate their work to a larger audience, but translating papers into this format requires substantial additional effort. Blog post creation is not simply transforming a long-form article into a short output, as studied in most prior work on human-AI summarization. In contrast, blog posts are typically full-length articles that require a combination of strategic planning grounded in the source document, well-organized drafting, and thoughtful revisions. Can tools powered by large language models (LLMs) assist scientists in writing research-paper blog posts? To investigate this question, we conducted a formative study (N=6) to understand the main challenges of writing such blog posts with an LLM: high interaction costs for 1) reviewing and utilizing the paper content and 2) recurrent sub-tasks of generating and modifying the long-form output. To address these challenges, we developed Papers-to-Posts, an LLM-powered tool that implements a new Plan-Draft-Revise workflow, which 1) leverages an LLM to generate bullet points from the full paper to help users find and select content to include (Plan) and 2) provides default yet customizable LLM instructions for generating and modifying text (Draft, Revise). Through a within-subjects lab study (N=20) and between-subjects deployment study (N=37 blog posts, 26 participants) in which participants wrote blog posts about their papers, we compared Papers-to-Posts to a strong baseline tool that provides an LLM-generated draft and access to free-form LLM prompting. Results show that Papers-to-Posts helped researchers to 1) write significantly more satisfying blog posts and make significantly more changes to their blog posts in a fixed amount of time without a significant change in cognitive load (lab) and 2) make more changes to their blog posts for a fixed number of writing actions (deployment). |
An Examination of the Use of Large Language Models to Aid Analysis of Textual Data | Research Article | Textual Data | September 22, 2024 | Open Source | Data Analysis | Statistics, Other | The increasing use of machine learning and Large Language Models (LLMs) opens up opportunities to use these artificially intelligent algorithms in novel ways. This article proposes a methodology using LLMs to support traditional deductive coding in qualitative research. We began our analysis with three different sample texts taken from existing interviews. Next, we created a codebook and inputted the sample text and codebook into an LLM. We asked the LLM to determine if the codes were present in a sample text provided and requested evidence to support the coding. The sample texts were inputted 160 times to record changes between iterations of the LLM response. Each iteration was analogous to a new coder deductively analyzing the text with the codebook information. In our results, we present the outputs for these recursive analyses, along with a comparison of the LLM coding to evaluations made by human coders using traditional coding methods. We argue that LLM analysis can aid qualitative researchers by deductively coding transcripts, providing a systematic and reliable platform for code identification, and offering a means of avoiding analysis misalignment. Implications of using LLM in research praxis are discussed, along with current limitations. |
ChatGPT for Education Research: Exploring the Potential of Large Language Models for Qualitative Codebook Development | Research Article | ChatGPT for Ed | September 18, 2024 | Data Analysis | Education | In qualitative data analysis, codebooks offer a systematic framework for establishing shared interpretations of themes and patterns. While the utility of codebooks is well-established in educational research, the manual process of developing and refining codes that emerge bottom-up from data presents a challenge in terms of time, effort, and potential for human error. This paper explores the potentially transformative role that could be played by Large Language Models (LLMs), specifically ChatGPT (GPT-4), in addressing these challenges by automating aspects of the codebook development process. We compare four approaches to codebook development – a fully manual approach, a fully automated approach, and two approaches that leverage ChatGPT within specific steps of the codebook development process. We do so in the context of studying transcripts from math tutoring lessons. The resultant four codebooks were evaluated in terms of whether the codes could reliably be applied to data by human coders, in terms of the human-rated quality of codes and codebooks, and whether different approaches yielded similar or overlapping codes. The results show that approaches that automate early stages of codebook development take less time to complete overall. Hybrid approaches (whether GPT participates early or late in the process) produce codebooks that can be applied more reliably and were rated as better quality by humans. Hybrid approaches and a fully human approach produce similar codebooks; the fully automated approach was an outlier. Findings indicate that ChatGPT can be valuable for improving qualitative codebooks for use in AIED research, but human participation is still essential. | |
From nCoder to ChatGPT: From Automated Coding to Refining Human Coding Conference paper | Research Article | nCoder to ChatGPT | September 18, 2024 | Data Analysis | Other | This paper investigates the potential of utilizing ChatGPT (GPT-4) as a tool for supporting coding processes for Quantitative Ethnography research. We compare the use of ChatGPT and nCoder, the most widely used automated coding tool in the QE community, on a dataset of press releases and public addresses delivered by governmental leaders from seven countries from late February to late March 2020. The study assesses the accuracy of the automated coding procedures between the two tools, and the role that ChatGPT’s explanations of its coding decisions can play in improving the consistency and construct validity of human-generated codes. Results suggest that both ChatGPT and nCoder have advantages and disadvantages depending on the context, nature of the data, and researchers’ goals. While nCoder is useful for straightforward coding schemes represented through regular expressions, ChatGPT can better capture a variety of language structures. ChatGPT's ability to provide explanations for its decisions can also help enhance construct validity, identify ambiguity in code definitions, and assist human coders in achieving high interrater reliability. Although we identify limitations of ChatGPT in coding constructs open to human interpretations and encompassing multiple concepts, we highlight opportunities and potential benefits provided by ChatGPT as a tool to support human researchers in their coding process. (qualitative) |