Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes
[Docs] rename mat_transpose -> mat-transpose
DefTruth pushed 1 commit to opt-hgemm-mma-2 DefTruth/CUDA-Learn-Notes
- update hgemm benchmark e98439c
DefTruth pushed 1 commit to opt-hgemm-mma-2 DefTruth/CUDA-Learn-Notes
- update hgemm benchmark 0092b7b
DefTruth pushed 3 commits to opt-hgemm-mma-2 DefTruth/CUDA-Learn-Notes
DefTruth pushed 1 commit to opt-hgemm-mma-2 DefTruth/CUDA-Learn-Notes
- Update hgemm_wmma_stage.cu 5f7935f
DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
- [Mat][Trans] Add f32x4_shared/bcf row/col first kernel. (#91) * [Mat][Trans] Add f32x4_shared/bcf row/col first kern... 2f854e8
DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes
[Mat][Trans] Add f32x4_shared/bcf row/col first kernel.
DefTruth pushed 1 commit to opt-hgemm-mma-2 DefTruth/CUDA-Learn-Notes
- Update sgemm_wmma_tf32_stage.cu d75a69c
bear-zd created a review comment on a pull request on DefTruth/CUDA-Learn-Notes
或者我直接换成M和N吧
DefTruth created a review comment on a pull request on DefTruth/CUDA-Learn-Notes
这里 S, K,在mat的语义下,改成 M, K更合适
DefTruth created a branch on DefTruth/CUDA-Learn-Notes
opt-hgemm-mma-2 - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.