Development Considerations

As a starting place, here are initial recommendations of questions to be asked when considering the use of an LLM as part of your scientific research workflow.

What protocols (guidelines) have been developed to ensure systematic use of LLMs?

What completion parameter s (e.g., temperature, presence penalty, frequency penalty, max tokens, logit bias) will be used?

Are pre-established decisions related to completion parameters readily available to those prompting the LLM?

Will embeddings (RAG) be utilized in the research?

What size chunks will be used in creating embeddings?

What size of overlap permitted when creating chunks provided?

What tool(s) will be used for similarity matching (i.e., vector database) provided and described (e.g., FAISS)?

What retrieval tools/techniques will be used (e.g., compression, context, rerank)?

Will embedding files be publicly available after the research is complete?

Will LLM agent(s) be used in the research?

Will LLM models be fine-tuned (e.g. LoRA)?

Will fine-tuned models be quantized (e.g., 8-bit or 4-bit)?

Will any code associated with the use of LLMs documented?

Will the code be publicly available ?

Will LLM responses be systematically checked for accuracy, bias, and other limitations?

What data management systems will put in place to secure the data (inputs and outputs) of the LLMs?

Highlighted Resources

Categories