FROM AGPEDIA — AGENCY THROUGH KNOWLEDGE

Large language model

Large language models (LLMs) are neural network language models trained on very large text corpora to predict and generate natural language. Contemporary LLMs are typically built on the Transformer architecture, which replaces recurrent or convolutional sequence models with attention mechanisms for sequence transduction. [1]

Scaling the size of pretrained language models and the amount of training data has been shown to improve task-agnostic performance, including few-shot behavior where models follow instructions from a small number of examples. [2]

Research on compute-optimal training suggests that model size and training tokens should scale together to maximize performance under a fixed compute budget, informing how large models are trained and deployed. [3]

  1. ^ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; et al. (2017-06-12). Attention Is All You Need. arXiv. https://doi.org/10.48550/arXiv.1706.03762 https://arxiv.org/abs/1706.03762.
  2. ^ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; et al. (2020-05-28). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165 https://arxiv.org/abs/2005.14165.
  3. ^ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; et al. (2022-03-29). Training Compute-Optimal Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2203.15556 https://arxiv.org/abs/2203.15556.