DefTruth/CUDA-Learn-Notes Events in 2024 - Ecosyste.ms: Timeline

DefTruth created a branch on DefTruth/CUDA-Learn-Notes

October 20, 2024 5:11am

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

October 20, 2024 5:11am

opt-hgemm-mma

DefTruth pushed 2 commits to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

October 20, 2024 5:10am

[HGEMM] update HGEMM benchmark option (#95) * update hgemm benchmark option * update hgemm benchmark option * ... 0c29631
Merge branch 'main' of github.com:DefTruth/CUDA-Learn-Notes into opt-hgemm-mma 465a960

View on GitHub

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes

October 20, 2024 5:10am

[HGEMM] update HGEMM benchmark option (#95) * update hgemm benchmark option * update hgemm benchmark option * ... 0c29631

View on GitHub

DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes

October 20, 2024 5:10am

[HGEMM] update HGEMM benchmark option

DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes

October 20, 2024 5:09am

[HGEMM] update HGEMM benchmark option

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

October 20, 2024 5:08am

update hgemm benchmark option f899cd7

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

October 20, 2024 3:13am

update hgemm benchmark option 2984f19

View on GitHub

kevin-hxq starred DefTruth/CUDA-Learn-Notes

October 20, 2024 3:02am

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

October 20, 2024 2:39am

update hgemm benchmark option d57932a

View on GitHub

DefTruth closed an issue on DefTruth/CUDA-Learn-Notes

October 20, 2024 2:19am

您好，请教一个关于代码中reduce相关的问题

1. `sum = warp_reduce_sum<NUM_WARPS>(sum);` 2. `if(warp==0) sum = warp_reduce_sum<NUM_WARPS>(sum);` 0x03 warp/block reduce sum/max 、0x09 softmax, softmax + vec4 做final sum的时候，用的是第一种形式 0x04 bl...

DefTruth closed an issue on DefTruth/CUDA-Learn-Notes

October 20, 2024 2:19am

__threadfence() 作用

佬有测试过 0x09 softmax 中的 `__threadfence()`吗?这个好像没办法达到grid级别线程之间的同步.

DefTruth closed an issue on DefTruth/CUDA-Learn-Notes

October 20, 2024 2:18am

layer norm实现

readme里面layer norm的实现是不是batch norm的啊

DefTruth created a branch on DefTruth/CUDA-Learn-Notes

October 20, 2024 1:36am

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

October 20, 2024 1:36am

opt-hgemm-mma

github-actions[bot] created a comment on an issue on DefTruth/CUDA-Learn-Notes

October 20, 2024 1:35am

This issue is stale because it has been open for 30 days with no activity.

View on GitHub

wxing2008666 starred DefTruth/CUDA-Learn-Notes

October 19, 2024 2:26pm

tiankonglongx starred DefTruth/CUDA-Learn-Notes

October 19, 2024 1:06pm

yahooo-m starred DefTruth/CUDA-Learn-Notes

October 19, 2024 11:41am

DefTruth pushed 2 commits to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

October 19, 2024 10:02am

[HGEMM] Add GeForce RTX 3080 Laptop benchmark (#94) * update hgemm benchmark * update hgemm benchmark ce095b5
Merge branch 'main' of github.com:DefTruth/CUDA-Learn-Notes into opt-hgemm-mma 6ea828e

View on GitHub

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes

October 19, 2024 9:57am

[HGEMM] Add GeForce RTX 3080 Laptop benchmark (#94) * update hgemm benchmark * update hgemm benchmark ce095b5

View on GitHub

DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes

October 19, 2024 9:57am

[HGEMM] Add GeForce RTX 3080 Laptop benchmark

DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes

October 19, 2024 9:57am

[HGEMM] Add GeForce RTX 3080 Laptop benchmark

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

October 19, 2024 9:50am

update hgemm benchmark be9742d

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

October 19, 2024 9:39am

update hgemm benchmark b3eeb55

View on GitHub

CodeSlogan starred DefTruth/CUDA-Learn-Notes

October 19, 2024 8:13am

DefTruth created a branch on DefTruth/CUDA-Learn-Notes

October 19, 2024 2:59am

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

October 19, 2024 2:59am

opt-hgemm-mma-2

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes

October 19, 2024 2:57am

[Docs] rename mat_transpose -> mat-transpose (#93) * Update sgemm_wmma_tf32_stage.cu * Update sgemm.py * Updat... 523a610

View on GitHub