- 查看更多前往 Wikipedia 查看全部内容
Transformer (deep learning architecture) - Wikipedia
A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from … 展开
Predecessors
For many years, sequence modelling and generation was done by using plain recurrent neural networks (RNNs). A well-cited early example was the 展开Methods for stabilizing training
The plain transformer architecture had difficulty converging. In the original paper the authors recommended using learning rate warmup. That is, the learning rate should linearly scale up from 0 to maximal value for the first part of … 展开Alternative activation functions
The original transformer uses ReLU activation function. Other activation functions were developed. … 展开The transformer has had great success in natural language processing (NLP). Many large language models such as GPT-2, GPT-3, 展开
All transformers have the same primary components:
• Tokenizers, which convert text into tokens. 展开Sublayers
Each encoder layer contains 2 sublayers: the self-attention and the feedforward network. Each decoder layer contains 3 sublayers: the causally masked self-attention, the cross-attention, and the feedforward network. 展开• seq2seq – Family of machine learning approaches
• Perceiver – Variant of Transformer designed for multimodal data
• Vision transformer – Variant of Transformer designed for vision processing 展开CC-BY-SA 许可证中的维基百科文本 [2009.06732] Efficient Transformers: A Survey - arXiv.org
[2001.04451] Reformer: The Efficient Transformer - arXiv.org
Dual-former: Hybrid Self-attention Transformer for Efficient Image ...
The Transformer Model - MachineLearningMastery.com
How Transformers Work: A Detailed Exploration of …
网页2024年1月9日 · A transformer is a type of artificial intelligence model that learns to understand and generate human-like text by analyzing patterns in large amounts of text data. Transformers are a current state-of-the-art …
A Historical Survey of Advances in Transformer Architectures
The Ultimate Guide to Transformer Deep Learning
网页A Transformer is a deep learning model that adopts the self-attention mechanism. This model also analyzes the input data by weighting each component differently. It is used primarily in artificial intelligence (AI) and …
Zero shot health trajectory prediction using transformer
A Deep Dive Into the Transformer Architecture — The …
网页2020年7月21日 · The introduction of the vanilla Transformer in 2017 disrupted sequence-based deep learning significantly. By doing away with recurrent connections entirely, transformer architectures are better …
Transformer (deep learning architecture)#Efficient implementa…