Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

DefTruth/CUDA-Learn-Notes

DefTruth created a branch on DefTruth/CUDA-Learn-Notes

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

opt-hgemm-mma

DefTruth pushed 2 commits to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
  • [HGEMM] update HGEMM benchmark option (#95) * update hgemm benchmark option * update hgemm benchmark option * ... 0c29631
  • Merge branch 'main' of github.com:DefTruth/CUDA-Learn-Notes into opt-hgemm-mma 465a960

View on GitHub

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
  • [HGEMM] update HGEMM benchmark option (#95) * update hgemm benchmark option * update hgemm benchmark option * ... 0c29631

View on GitHub

DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] update HGEMM benchmark option
DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] update HGEMM benchmark option
DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
  • update hgemm benchmark option f899cd7

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
  • update hgemm benchmark option 2984f19

View on GitHub

kevin-hxq starred DefTruth/CUDA-Learn-Notes
DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
  • update hgemm benchmark option d57932a

View on GitHub

DefTruth closed an issue on DefTruth/CUDA-Learn-Notes
您好,请教一个关于代码中reduce相关的问题
1. `sum = warp_reduce_sum<NUM_WARPS>(sum);` 2. `if(warp==0) sum = warp_reduce_sum<NUM_WARPS>(sum);` 0x03 warp/block reduce sum/max 、0x09 softmax, softmax + vec4 做final sum的时候,用的是第一种形式 0x04 bl...
DefTruth closed an issue on DefTruth/CUDA-Learn-Notes
__threadfence() 作用
佬有测试过 0x09 softmax 中的 `__threadfence()`吗?这个好像没办法达到grid级别线程之间的同步.
DefTruth closed an issue on DefTruth/CUDA-Learn-Notes
layer norm实现
readme里面layer norm的实现是不是batch norm的啊
DefTruth created a branch on DefTruth/CUDA-Learn-Notes

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

opt-hgemm-mma

github-actions[bot] created a comment on an issue on DefTruth/CUDA-Learn-Notes
This issue is stale because it has been open for 30 days with no activity.

View on GitHub

wxing2008666 starred DefTruth/CUDA-Learn-Notes
tiankonglongx starred DefTruth/CUDA-Learn-Notes
yahooo-m starred DefTruth/CUDA-Learn-Notes
DefTruth pushed 2 commits to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
  • [HGEMM] Add GeForce RTX 3080 Laptop benchmark (#94) * update hgemm benchmark * update hgemm benchmark ce095b5
  • Merge branch 'main' of github.com:DefTruth/CUDA-Learn-Notes into opt-hgemm-mma 6ea828e

View on GitHub

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
  • [HGEMM] Add GeForce RTX 3080 Laptop benchmark (#94) * update hgemm benchmark * update hgemm benchmark ce095b5

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

CodeSlogan starred DefTruth/CUDA-Learn-Notes
DefTruth created a branch on DefTruth/CUDA-Learn-Notes

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

opt-hgemm-mma-2

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
  • [Docs] rename mat_transpose -> mat-transpose (#93) * Update sgemm_wmma_tf32_stage.cu * Update sgemm.py * Updat... 523a610

View on GitHub

Load more