Text Generation

What is text generation?

Do you have a set of texts that you produce regularly? Do you use templates for certain processes that prompt you to write in a certain way? If you often produce text content and want to automate this, you might consider developing a text generator.

ChatGPT is the most recent well-known example of a text generator that allows you to query and “chat” with it. However, there are several drawbacks to how it was trained that may mean you want to develop your own:

  1. Models like ChatGPT can only reference their training data.
    • The kind of responses you get from a text generation model will be only from the genres and sources it was trained on.
    • In the case of ChatGPT, we have very little insight into what data was used in training, though clearly it is a large database.
    • This means that such models lack localization, so that generated text may not accurately match the context it is intended to be used in.

  2. Models like ChatGPT are susceptible to security risks.
    • There are countless examples of secure data that has been exposed by users through careful “prompt engineering”.
    • If you want to automate text generation in cases where you may be dealing with sensitive information, you need to be extremely careful with accessible models such as ChatGPT.

  3. Models like ChatGPT “hallucinate” and give incorrect responses.
    • Since language models in this format are simply trained to predict the next token, they will often choose the wrong information.
    • This requires a decent amount of evaluation by a human with domain knowledge.

      Text generation is essentially training the computer to automatically create texts that align linguistically to a particular genre or style. This can be extremely useful, particularly in cases where you need to generate a large amount of text for very similar contexts. Given the drawbacks of models like ChatGPT, however, it may be worthwhile to develop your own in-house solution.