Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

DefTruth/CUDA-Learn-Notes

Wenzha0Wu starred DefTruth/CUDA-Learn-Notes
DefTruth published a release on DefTruth/CUDA-Learn-Notes
v2.4.12 SGEMM TF32 Block Swizzle
## What's Changed * [SGEMM] SGEMM TF32 Thread Block Swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/84 * [HGEMM] mma4x4_warp4x4_stages with swizzle by @DefTruth in https...
DefTruth created a tag on DefTruth/CUDA-Learn-Notes

v2.4.12 - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth created a branch on DefTruth/CUDA-Learn-Notes

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

opt-hgemm-mma

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
  • [SGEMM] Update SGEMM TF32 Benchmark (#87) * Update README.md * Update hgemm_wmma_stage.cu * Update README.md ... 8c6922b

View on GitHub

DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes
[SGEMM] Update SGEMM TF32 Benchmark
DefTruth opened a pull request on DefTruth/CUDA-Learn-Notes
[SGEMM] Update SGEMM TF32 Benchmark
DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

Dylan-cx starred DefTruth/CUDA-Learn-Notes
charlotteLive starred DefTruth/CUDA-Learn-Notes
DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes
  • Update hgemm_wmma_stage.cu 90d13c5

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth created a branch on DefTruth/CUDA-Learn-Notes

opt-hgemm-mma - 🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

opt-hgemm-mma

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
  • [SWISH] support Swish F32/F16 kernel (#85) * [SWISH][FP16] first commit,add FP16 FP32 and fp16x8_pack kernel. * [... c4db4f8

View on GitHub

DefTruth closed a pull request on DefTruth/CUDA-Learn-Notes
[SWISH] support Swish F32/F16 kernel
DefTruth created a review on a pull request on DefTruth/CUDA-Learn-Notes

View on GitHub

ywq880611 starred DefTruth/CUDA-Learn-Notes
DefTruth deleted a branch DefTruth/CUDA-Learn-Notes

opt-hgemm-mma

DefTruth pushed 1 commit to main DefTruth/CUDA-Learn-Notes
  • [HGEMM] mma4x4_warp4x4_stages with swizzle (#86) * Update hgemm_cublas.cu * Update hgemm_wmma_stage.cu * Updat... a83ff8d

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

DefTruth pushed 1 commit to opt-hgemm-mma DefTruth/CUDA-Learn-Notes

View on GitHub

Load more