Hi,
Im looking around Differential Transformer paper and code,
I found that github version is based on flash attention and rotary embedding.
I wonder that is there any plan to upload simple examp...
Hi,
Im looking around Differential Transformer paper and code,
I found that github version is based on flash attention and rotary embedding.
I wonder that is there any plan to upload simple examp...
Hi,
Im looking around Differential Transformer paper and code,
I found that github version is based on flash attention and rotary embedding.
I wonder that is there any plan to upload simple examp...