Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes
[HGEMM] mma4x4_warp4x4_stages with swizzle
DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
- Update hgemm_wmma_stage.cu 01d4710
DefTruth created a review comment on a pull request on DefTruth/CUDA-Learn-Notes
typo: relu -> swish ?
DefTruth created a review comment on a pull request on DefTruth/CUDA-Learn-Notes
pragma unroll和for对齐
DefTruth created a review comment on a pull request on DefTruth/CUDA-Learn-Notes
代码风格,本仓库使用2空格作为缩进
wangzijian1010 opened a pull request on DefTruth/CUDA-Learn-Notes
[SWISH][Half] support Swish kernel
DefTruth created a branch on DefTruth/CUDA-Learn-Notes
opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
- [SGEMM] SGEMM TF32 Thread Block Swizzle (#84) * Update sgemm.py * Update sgemm_wmma_tf32_stage.cu * Update sge... a10bcb4
DefTruth pushed 1 commit to opt-sgemm-swizzle DefTruth/CUDA-Learn-Notes
- Update hgemm_wmma_stage.cu 9c03d0f
DefTruth pushed 1 commit to opt-sgemm-swizzle DefTruth/CUDA-Learn-Notes
- Update sgemm_wmma_tf32_stage.cu 9a49ac7
DefTruth pushed 1 commit to opt-sgemm-swizzle DefTruth/CUDA-Learn-Notes
- Update sgemm_wmma_tf32_stage.cu cfed000