Large language model
Large language models (LLMs) are AI systems that generate and interpret text by learning patterns from very large datasets. Prominent examples include GPT-3 and GPT-4, which helped popularize large-scale pretrained models and their downstream use, and ChatGPT, which reached a broad public audience through conversational access. [1][2][3]
Scaling the size of pretrained language models and the amount of training data has been shown to improve task-agnostic performance, including few-shot behavior where models follow instructions from a small number of examples. [1]
Research on compute-optimal training suggests that model size and training tokens should scale together to maximize performance under a fixed compute budget, informing how large models are trained and deployed. [4]
Architecture
Most contemporary LLMs are built on the Transformer architecture, which uses attention mechanisms to model relationships within sequences and supports large-scale sequence transduction. [5]
Capabilities
LLMs can perform a range of language tasks—such as translation, question answering, summarization, and code-related tasks—by conditioning on prompts rather than task-specific training data. Empirical results on large benchmarks show that scaling improves performance on many tasks and that instruction-following can be elicited with carefully constructed prompts. [1][2]
Limitations and risks
Despite strong performance, LLMs can produce confident but incorrect outputs, inherit biases from training data, and be misused for harmful or deceptive content. Training and deployment are resource-intensive, with environmental and economic costs, and large-scale web datasets can introduce documentation debt and amplify social biases when used without careful curation. The mechanisms behind emergent capabilities remain only partially understood. [6][7]
- ^a ^b ^c Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; et al. (2020-05-28). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165 https://arxiv.org/abs/2005.14165.
- ^a ^b OpenAI (2023-03-15). GPT-4 Technical Report. arXiv. https://doi.org/10.48550/arXiv.2303.08774 https://arxiv.org/abs/2303.08774.
- ^ OpenAI (2022-11-30). Introducing ChatGPT. https://openai.com/index/chatgpt/.
- ^ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; et al. (2022-03-29). Training Compute-Optimal Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2203.15556 https://arxiv.org/abs/2203.15556.
- ^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; et al. (2017-06-12). Attention Is All You Need. arXiv. https://doi.org/10.48550/arXiv.1706.03762 https://arxiv.org/abs/1706.03762.
- ^ Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; et al. (2021-08-16). On the Opportunities and Risks of Foundation Models. arXiv. https://doi.org/10.48550/arXiv.2108.07258 https://arxiv.org/abs/2108.07258.
- ^ Bender, Emily M.; Gebru, Timnit; McMillan-Major, Angelina; Shmitchell, Shmargaret (2021-03-03). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3442188.3445922 https://doi.org/10.1145/3442188.3445922.