You must log in or # to comment.
This is a cool article.
But if they want LLMs to use fewer em dashes, why not find and replace with a comma or semicolon using a regex that matches known patterns so as to reduce it’s frequency in the training data?
they could just put it in the system prompt or so.
It apparently doesn’t work, from the article:
It’s also surprisingly hard to prompt models to avoid em-dashes: take this thread from the OpenAI forums where users share their unsuccessful attempts.
If they could use use regex they wouldn’t be using an LLM.




