Copilot
你的日常 AI 助手
约 102,000 个结果
  1. 查看更多
    查看更多
    前往 Wikipedia 查看全部内容
    查看更多

    Transformer (deep learning architecture) - Wikipedia

    A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from … 展开

    Predecessors
    For many years, sequence modelling and generation was done by using plain recurrent neural networks (RNNs). A well-cited early example was the 展开

    Methods for stabilizing training
    The plain transformer architecture had difficulty converging. In the original paper the authors recommended using learning rate warmup. That is, the learning rate should linearly scale up from 0 to maximal value for the first part of … 展开

    Alternative activation functions
    The original transformer uses ReLU activation function. Other activation functions were developed. … 展开

    The transformer has had great success in natural language processing (NLP). Many large language models such as GPT-2, GPT-3, 展开

    概览 图像
    Architecture 图像
    Full transformer architecture 图像

    All transformers have the same primary components:
    • Tokenizers, which convert text into tokens. 展开

    Sublayers
    Each encoder layer contains 2 sublayers: the self-attention and the feedforward network. Each decoder layer contains 3 sublayers: the causally masked self-attention, the cross-attention, and the feedforward network. 展开

    seq2seq – Family of machine learning approaches
    Perceiver – Variant of Transformer designed for multimodal data
    Vision transformer – Variant of Transformer designed for vision processing 展开

    CC-BY-SA 许可证中的维基百科文本
  2. [2009.06732] Efficient Transformers: A Survey - arXiv.org

  3. [2001.04451] Reformer: The Efficient Transformer - arXiv.org

  4. Dual-former: Hybrid Self-attention Transformer for Efficient Image ...

  5. The Transformer Model - MachineLearningMastery.com

  6. How Transformers Work: A Detailed Exploration of …

    网页2024年1月9日 · A transformer is a type of artificial intelligence model that learns to understand and generate human-like text by analyzing patterns in large amounts of text data. Transformers are a current state-of-the-art …

  7. A Historical Survey of Advances in Transformer Architectures

  8. The Ultimate Guide to Transformer Deep Learning

    网页A Transformer is a deep learning model that adopts the self-attention mechanism. This model also analyzes the input data by weighting each component differently. It is used primarily in artificial intelligence (AI) and …

  9. Zero shot health trajectory prediction using transformer

  10. A Deep Dive Into the Transformer Architecture — The …

    网页2020年7月21日 · The introduction of the vanilla Transformer in 2017 disrupted sequence-based deep learning significantly. By doing away with recurrent connections entirely, transformer architectures are better …