Embeddings

As a starting place, here are initial recommendations of questions to be asked when reviewing research in which an LLM was used as part of their scientific research workflow.

Were embedding(s) (i.e., RA G) used in the research?

Is the tool used to create the embedding model provided and described?

Were multiple embeddings created, tested, or used (i.e., chained)?

Is the size of chunks used in preparing the data provided?

Were different sizes of chunks tested for influence on LLM performance?

Is the size of overlap permitted when creating chunks provided?

Is the tool used for similarity matching (i.e., vector database) provided and described (e.g., FAISS)?

What retrieval tools and/or techniques were used (e.g., compression, context, reranking)?

Is the code available?

* Fine Tuning vs. Embeddin g

Highlighted Resources

Categories