What is text generation?
Do you have a set of texts that you produce regularly? Do you use templates for certain processes that prompt you to write in a certain way? If you often produce text content and want to automate this, you might consider developing a text generator.
ChatGPT is the most recent well-known example of a text generator that allows you to query and “chat” with it. However, there are several drawbacks to how it was trained that may mean you want to develop your own:
- Models like ChatGPT can only reference their training
data.
- The kind of responses you get from a text generation model will be only from the genres and sources it was trained on.
- In the case of ChatGPT, we have very little insight into what data was used in training, though clearly it is a large database.
- This means that such models lack localization, so that generated
text may not accurately match the context it is intended to be used in.
- Models like ChatGPT are susceptible to security
risks.
- There are countless examples of secure data that has been exposed by users through careful “prompt engineering”.
- If you want to automate text generation in cases where you may be
dealing with sensitive information, you need to be extremely careful
with accessible models such as ChatGPT.
- Models like ChatGPT “hallucinate” and give incorrect
responses.
- Since language models in this format are simply trained to predict the next token, they will often choose the wrong information.
- This requires a decent amount of evaluation by a human with domain
knowledge.
Text generation is essentially training the computer to automatically create texts that align linguistically to a particular genre or style. This can be extremely useful, particularly in cases where you need to generate a large amount of text for very similar contexts. Given the drawbacks of models like ChatGPT, however, it may be worthwhile to develop your own in-house solution.