Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

mht-sharma created a review on a pull request on huggingface/text-generation-inference
LGTM! Thanks for the PR @danieldk. This will help me enable FP8 KV cache on ROCm next.

View on GitHub

mht-sharma created a review on a pull request on huggingface/text-generation-inference
LGTM! Thanks for the PR @danieldk. This will help me enable FP8 KV cache on ROCm next.

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
  • Fix integration mt0 (transformers update). e3db525

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference

View on GitHub

wakaka6 starred huggingface/text-generation-inference
danieldk pushed 1 commit to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference

View on GitHub

danieldk pushed 9 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
  • Test Marlin MoE with `desc_act=true` (#2622) Update the Mixtral GPTQ test to use a model with `desc_act=true` and `... 7f54b73
  • break when there's nothing to read (#2582) Signed-off-by: Wang, Yi A <[email protected]> 058d306
  • Add `impureWithCuda` dev shell (#2677) * Add `impureWithCuda` dev shell This shell is handy when developing some ... 9c9ef37
  • Make moe-kernels and marlin-kernels mandatory in CUDA installs (#2632) f58eb70
  • feat: natively support Granite models (#2682) * feat: natively support Granite models * Update doc 03c9388
  • hotfix: fix flashllama 27ff187
  • feat: allow any supported payload on /invocations (#2683) * feat: allow any supported payload on /invocations * u... 41c2623
  • Add support for FP8 KV cache scales Since FP8 only has limited dynamic range, we can scale keys/values before storin... ba4ac96
  • Update FP8 KV cache test to use checkpoint with scales 1f18cb6

View on GitHub

danieldk pushed 8 commits to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
  • Make moe-kernels and marlin-kernels mandatory in CUDA installs (#2632) f58eb70
  • feat: natively support Granite models (#2682) * feat: natively support Granite models * Update doc 03c9388
  • hotfix: fix flashllama 27ff187
  • feat: allow any supported payload on /invocations (#2683) * feat: allow any supported payload on /invocations * u... 41c2623
  • Add support for FP8 KV cache scales Since FP8 only has limited dynamic range, we can scale keys/values before storin... 14a5053
  • WIP 56135ba
  • scale upper bound as tensor for cutlass gemm 6f87b7f
  • Remove fbgemm 14bffd8

View on GitHub

danieldk pushed 2 commits to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
  • scale upper bound as tensor for cutlass gemm de0ba05
  • Remove fbgemm ce7f395

View on GitHub

zh190920 starred huggingface/text-generation-inference
Narsil pushed 1 commit to auto_length huggingface/text-generation-inference

View on GitHub

mfuntowicz pushed 1 commit to feat-backend-llamacpp huggingface/text-generation-inference
  • feat(backend): wip Rust binding c4862be

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
  • Updating logic + non flash. 6994fa1

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
  • Much simpler logic after the overhead. 1053451

View on GitHub

DandinPower starred huggingface/text-generation-inference
mfuntowicz pushed 1 commit to feat-backend-llamacpp huggingface/text-generation-inference
  • chore(backend): minor formatting e23566e

View on GitHub

lp-noel created a comment on an issue on huggingface/text-generation-inference
Same error here with Qwen2.5 on 4 GPUs, can this be re-opened?

View on GitHub

mfuntowicz pushed 3 commits to trtllm-stop-words huggingface/text-generation-inference
  • chore(docker): add mpi to ld_library_path ef00311
  • chore(docker): install transformers 6376fec
  • feat(trtllm): detect stop_words from generation_config.json 9cee00e

View on GitHub

geekdeedy starred huggingface/text-generation-inference
sidharthrajaram closed a pull request on huggingface/text-generation-inference
Support OpenAI Structured Output by adding json_schema as an alias for JSON Grammar
# What does this PR do? ### tl;dr Supports `"json_schema"` for as a type for `response_format` in addition to the existing alias of `"json_object"` and `"json"`. This aligns TGI with the OpenAI...
HuggingFaceDocBuilderDev created a comment on a pull request on huggingface/text-generation-inference
The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_2683). All of your documentation changes will be reflected on that endpoint. The docs are avai...

View on GitHub

HuggingFaceDocBuilderDev created a comment on a pull request on huggingface/text-generation-inference
The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_2682). All of your documentation changes will be reflected on that endpoint. The docs are avai...

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
  • QuantLinear is rocm compatible. 849d882

View on GitHub

danieldk pushed 2 commits to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
  • Add support for FP8 KV cache scales Since FP8 only has limited dynamic range, we can scale keys/values before storin... 59a6ba4
  • WIP 5595569

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference

View on GitHub

danieldk deleted a branch huggingface/text-generation-inference

maintenance/mandatory-moe-kernels

danieldk pushed 1 commit to main huggingface/text-generation-inference
  • Make moe-kernels and marlin-kernels mandatory in CUDA installs (#2632) f58eb70

View on GitHub

danieldk closed a pull request on huggingface/text-generation-inference
Make moe-kernels and marlin-kernels mandatory in CUDA installs
# What does this PR do? ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guidel...
danieldk created a comment on a pull request on huggingface/text-generation-inference
Merging (was already discussed on Slack).

View on GitHub

Kyosuketam starred huggingface/text-generation-inference
Load more