I’m trying to learn more about LLMs, but I haven’t found any explanation for what determines which prompt template format a model requires.
For example meta-llama’s llama-2 requires this format:
…INST and <<SYS>> tags, BOS and EOS tokens…
But if I instead download’s TheBloke’s version of llama-2 the prompt template should instead be:
SYSTEM: …
USER: {prompt}
ASSISTANT:
I thought this would have been determined how the original training data was formatted, but afaik TheBloke only converted the llama-2 models from one format to another. Looking at the documentation for the GGML format I don’t see anything related to the prompt being embedded in the model file.
Anyone who understands this stuff who could point me in the right direction?
Thanks! I’m going to do some experiments and see if I get different results. I’ve been using TheBloke’s format and it worked mostly well, but perhaps switching to meta-llama’s format will eliminate the occasional bugs I’ve had.
That’s probably the most reasonable thing you can do.
I’m not sure how much of a difference we expect from 100% the correct prompt compared to something roughly in that direction. I’ve been tinkering around with instruction style tuned models (from the previous/first llama) and sometimes it doesn’t seem to matter. I also sometimes used a ‘wrong’ prompt for days and couldn’t tell. Maybe the models are ‘intelligent’ enough to compensate for that. I’m not sure. I usually try to get it right to get all the performance out of it.
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/discussions/7