Theoretical Analysis of Positional Encodings in Transformer Models

(arxiv.org)

36 points | by PaulHoule 1 day ago

1 comments

semiinfinitely 1 day ago
Kinda disappointing that rope- the most common pe- is given about one sentence in this work and omitted from the analysis.
[-]
- gsf_emergency_2 22 hours ago
  Maybe it's because ropes by themselves do nothing for the model capacity?
  [-]
  - semiinfinitely 6 hours ago
    Neither does alibi