Large language model
Large language models (LLMs) are neural network language models trained on very large text corpora to predict and generate natural language. Contemporary LLMs are typically built on the Transformer architecture, which replaces recurrent or convolutional sequence models with attention mechanisms for sequence transduction. Prominent examples include GPT-3 and GPT-4, which helped popularize large-scale pretrained models and their downstream use, and ChatGPT, which reached a broad public audience through conversational access. [1][2][3][4]
Scaling the size of pretrained language models and the amount of training data has been shown to improve task-agnostic performance, including few-shot behavior where models follow instructions from a small number of examples. [2]
Research on compute-optimal training suggests that model size and training tokens should scale together to maximize performance under a fixed compute budget, informing how large models are trained and deployed. [5]
Capabilities
LLMs can perform a range of language tasks—such as translation, question answering, summarization, and code-related tasks—by conditioning on prompts rather than task-specific training data. Empirical results on large benchmarks show that scaling improves performance on many tasks and that instruction-following can be elicited with carefully constructed prompts. [2][3]
Limitations and risks
Despite strong performance, LLMs can produce confident but incorrect outputs, inherit biases from training data, and be misused for harmful or deceptive content. They are also expensive to train and run, raising environmental and economic concerns, and their emergent capabilities are not yet fully understood. [6]
Critical analyses have emphasized the environmental costs of training large models, the risks of documentation debt in massive web-scale datasets, and the amplification of social biases and harms when uncurated data are used at scale. [7]
- ^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; et al. (2017-06-12). Attention Is All You Need. arXiv. https://doi.org/10.48550/arXiv.1706.03762 https://arxiv.org/abs/1706.03762.
- ^a ^b ^c Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; et al. (2020-05-28). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165 https://arxiv.org/abs/2005.14165.
- ^a ^b OpenAI (2023-03-15). GPT-4 Technical Report. arXiv. https://doi.org/10.48550/arXiv.2303.08774 https://arxiv.org/abs/2303.08774.
- ^ OpenAI (2022-11-30). Introducing ChatGPT. https://openai.com/index/chatgpt/.
- ^ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; et al. (2022-03-29). Training Compute-Optimal Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2203.15556 https://arxiv.org/abs/2203.15556.
- ^ Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; et al. (2021-08-16). On the Opportunities and Risks of Foundation Models. arXiv. https://doi.org/10.48550/arXiv.2108.07258 https://arxiv.org/abs/2108.07258.
- ^ Bender, Emily M.; Gebru, Timnit; McMillan-Major, Angelina; Shmitchell, Shmargaret (2021-03-03). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3442188.3445922 https://doi.org/10.1145/3442188.3445922.