Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
DefTruth created a branch on DefTruth/CUDA-Learn-Notes
hgemm-col-major-2 - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
- [HGEMM] collective store via warp shfl® reuse (#101) * Update hgemm_mma_stage.cu * Update hgemm.py * Update... 6c89595
DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] collective store via warp shfl® reuse
DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] 128 bits collective store via warp shfl® reuse
DefTruth pushed 1 commit to hgemm-col-major DefTruth/CUDA-Learn-Notes
- Update hgemm_mma_stage_col_major.cu 07c2f7b
DefTruth pushed 1 commit to hgemm-col-major DefTruth/CUDA-Learn-Notes
- Create hgemm_mma_stage_col_major.cu bb1f626
DefTruth pushed 1 commit to hgemm-col-major DefTruth/CUDA-Learn-Notes
- Update hgemm_mma_stage.cu 4dfad47
DefTruth pushed 1 commit to hgemm-col-major DefTruth/CUDA-Learn-Notes
- Update hgemm_mma_stage.cu 45f1c4d
DefTruth created a branch on DefTruth/CUDA-Learn-Notes
hgemm-col-major - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
- [HGEMM] ldmatrix.x4.trans with reg double buffers (#100) * Update hgemm_mma_stage.cu * Update hgemm_mma_stage.cu ... bcd12bd
DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] ldmatrix.x4.trans with reg double buffers
DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] ldmatrix.x4.trans with reg double buffers
DefTruth pushed 1 commit to opt-hgemm-mma-col-major DefTruth/CUDA-Learn-Notes
- Update hgemm.py 76c0d02
DefTruth pushed 1 commit to opt-hgemm-mma-col-major DefTruth/CUDA-Learn-Notes
- Update hgemm_mma_stage.cu f23b810
DefTruth created a branch on DefTruth/CUDA-Learn-Notes
opt-hgemm-mma-col-major - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
- [HGEMM] HGEMM MMA with Reg Double Buffers (#99) * Update hgemm_mma_stage.cu * Update hgemm_mma.cu * Update hge... 8e869ef
DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] HGEMM MMA with Reg Double Buffers
DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
- Update hgemm_mma_stage.cu 1236e78