LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research
Description of Resource
The increasing use of text as data in social science research necessitates the
development of valid, consistent, reproducible, and efficient methods for generating text-based
concept measures. This paper presents a novel method that leverages the internal hidden states of
large language models (LLMs) to generate these concept measures. Specifically, the proposed
method learns a concept vector that captures how the LLM internally represents the target
concept, then estimates the concept value for text data by projecting the text’s LLM hidden states
onto the concept vector. Three replication studies demonstrate the method’s effectiveness in
producing highly valid, consistent, and reproducible text-based measures across various social
science research contexts, highlighting its potential as a valuable tool for the research
community.