Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
mht-sharma created a review on a pull request on huggingface/text-generation-inference
LGTM! Thanks for the PR @danieldk. This will help me enable FP8 KV cache on ROCm next.
mht-sharma created a review on a pull request on huggingface/text-generation-inference
LGTM! Thanks for the PR @danieldk. This will help me enable FP8 KV cache on ROCm next.
Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
- Fix integration mt0 (transformers update). e3db525
danieldk pushed 1 commit to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
- Remove fbgemm fb24d7a
danieldk pushed 9 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
- Test Marlin MoE with `desc_act=true` (#2622) Update the Mixtral GPTQ test to use a model with `desc_act=true` and `... 7f54b73
- break when there's nothing to read (#2582) Signed-off-by: Wang, Yi A <[email protected]> 058d306
- Add `impureWithCuda` dev shell (#2677) * Add `impureWithCuda` dev shell This shell is handy when developing some ... 9c9ef37
- Make moe-kernels and marlin-kernels mandatory in CUDA installs (#2632) f58eb70
- feat: natively support Granite models (#2682) * feat: natively support Granite models * Update doc 03c9388
- hotfix: fix flashllama 27ff187
- feat: allow any supported payload on /invocations (#2683) * feat: allow any supported payload on /invocations * u... 41c2623
- Add support for FP8 KV cache scales Since FP8 only has limited dynamic range, we can scale keys/values before storin... ba4ac96
- Update FP8 KV cache test to use checkpoint with scales 1f18cb6
danieldk pushed 8 commits to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
- Make moe-kernels and marlin-kernels mandatory in CUDA installs (#2632) f58eb70
- feat: natively support Granite models (#2682) * feat: natively support Granite models * Update doc 03c9388
- hotfix: fix flashllama 27ff187
- feat: allow any supported payload on /invocations (#2683) * feat: allow any supported payload on /invocations * u... 41c2623
- Add support for FP8 KV cache scales Since FP8 only has limited dynamic range, we can scale keys/values before storin... 14a5053
- WIP 56135ba
- scale upper bound as tensor for cutlass gemm 6f87b7f
- Remove fbgemm 14bffd8
danieldk pushed 2 commits to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
- Revert doc text. cacaba6
mfuntowicz pushed 1 commit to feat-backend-llamacpp huggingface/text-generation-inference
- feat(backend): wip Rust binding c4862be
Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
- Updating logic + non flash. 6994fa1
Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
- Much simpler logic after the overhead. 1053451
mfuntowicz pushed 1 commit to feat-backend-llamacpp huggingface/text-generation-inference
- chore(backend): minor formatting e23566e
lp-noel created a comment on an issue on huggingface/text-generation-inference
Same error here with Qwen2.5 on 4 GPUs, can this be re-opened?
mfuntowicz pushed 3 commits to trtllm-stop-words huggingface/text-generation-inference
sidharthrajaram closed a pull request on huggingface/text-generation-inference
Support OpenAI Structured Output by adding json_schema as an alias for JSON Grammar
# What does this PR do? ### tl;dr Supports `"json_schema"` for as a type for `response_format` in addition to the existing alias of `"json_object"` and `"json"`. This aligns TGI with the OpenAI...HuggingFaceDocBuilderDev created a comment on a pull request on huggingface/text-generation-inference
The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_2683). All of your documentation changes will be reflected on that endpoint. The docs are avai...
HuggingFaceDocBuilderDev created a comment on a pull request on huggingface/text-generation-inference
The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_2682). All of your documentation changes will be reflected on that endpoint. The docs are avai...
Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
- QuantLinear is rocm compatible. 849d882
danieldk pushed 2 commits to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
danieldk pushed 1 commit to main huggingface/text-generation-inference
- Make moe-kernels and marlin-kernels mandatory in CUDA installs (#2632) f58eb70