Introduce

Input and output not same length

decided by model itself which is called Seq2Seq Model

image.png

Seq2Seq Model structure

Encoder and Decoder

Transformer is a enhanced S2S model

Encoder

image.png

Encoder in Transformer

image.png

Not a layer, is multi-layer

image.png

Decoder

called Autoregressive

image.png

Introduce one of two decoder - Autoregressive decoder

Decoder is generate result

there has a way make seq as input

result will contain a dictionary, with soft-max find MOST value(Classification)

image.png

next character will based on current character

image.png

What if Error propagation?

How model decide output length?

we need a special input ‘stop’

NAT

result is not generate one and another

result have same length as BEGIN

image.png

for example, In voice recognition, slow speed though add more BEGIN

While NAT is usually worse than AT

PUTTOGETHER

image.png

CROSS-ATTENTION

image.png


Last modified April 9, 2025: 4.9 (d1e7ba5)