Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

Bihan created a comment on an issue on huggingface/text-generation-inference
> Any chance you could try `docker pull ghcr.io/huggingface/text-generation-inference:latest-rocm`? ROCm FP8 support was improved yesterday: > > #2588 @danieldk Yes sure.

View on GitHub

danieldk created a comment on an issue on huggingface/text-generation-inference
Any chance you could try `docker pull ghcr.io/huggingface/text-generation-inference:latest-rocm`? ROCm FP8 support was improved yesterday: https://github.com/huggingface/text-generation-inferenc...

View on GitHub

danieldk pushed 25 commits to maintenance/reshape-and-cache huggingface/text-generation-inference
  • enable mllama in intel platform (#2610) Signed-off-by: Wang, Yi A <[email protected]> 57f9685
  • Upgrade minor rust version (Fixes rust build compilation cache) (#2617) * Upgrade minor rust version (Fixes rust bui... 8b295aa
  • Add support for fused MoE Marlin for AWQ (#2616) * Add support for fused MoE Marlin for AWQ This uses the updated... 6414248
  • nix: move back to the tgi-nix main branch (#2620) 6db3bcb
  • CI (2599): Update ToolType input schema (#2601) * Update ToolType input schema * lint * fix: run formatter ... 8ad20da
  • nix: add black and isort to the closure (#2619) To make sure that everything is formatted with the same black versio... 9ed0c85
  • AMD CI (#2589) * Only run 1 valid test. * TRying the tailscale action quickly. * ? * bash spaces. * Remo... 43f39f6
  • feat: allow tool calling to respond without a tool (#2614) * feat: process token stream before returning to client ... e36dfaa
  • Update documentation to most recent stable version of TGI. (#2625) Update to most recent stable version of TGI. d912f0b
  • Intel ci (#2630) * Intel CI ? * Let's try non sharded gemma. * Snapshot rename * Apparently container can b... 3dbdf63
  • Fixing intel Supports windowing. (#2637) 0c47884
  • Small fixes for supported models (#2471) * Small improvements for docs * Update _toctree.yml * Updating the do... ce28ee8
  • Cpu perf (#2596) * break when there's nothing to read Signed-off-by: Wang, Yi A <[email protected]> * Differ... 3ea82d0
  • Clarify gated description and quicktour (#2631) Update quicktour.md 51f5401
  • update ipex to fix incorrect output of mllama in cpu (#2640) Signed-off-by: Wang, Yi A <[email protected]> 7a82ddc
  • feat: enable pytorch xpu support for non-attention models (#2561) XPU backend is available natively (without IPEX) i... 58848cb
  • Fixing linters. (#2650) cf04a43
  • Use flashinfer for Gemma 2. ce7e356
  • Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) As spotted by @philschmid, the payload ... ffe05cc
  • Fp8 e4m3_fnuz support for rocm (#2588) * (feat) fp8 fnuz support for rocm * (review comments) Fix compression_con... 704a58c
  • and 5 more ...

View on GitHub

danieldk created a comment on an issue on huggingface/text-generation-inference
Thanks for reporting! I updated the title to reflect that this issue only occurs on ROCm. It looks like we have to expand the shapes when dispatching to Torch scaled mm (for CUDA we don't use the T...

View on GitHub

danieldk created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Grey4sh created a comment on an issue on huggingface/text-generation-inference
Get it. Thank you for your nice suggestions.

View on GitHub

Grey4sh closed an issue on huggingface/text-generation-inference
TGI included marlin kernel is missing padding code (REOPEN)
### System Info ### TGI version tgi-2.3.1 docker image ### OS version ```shell torch install path ............... ['/home/chatgpt/.local/lib/python3.10/site-packages/torch'] torch version ....
Narsil created a review comment on a pull request on huggingface/text-generation-inference
```suggestion # Get prefill logprobs with inplace softmax (avoid copying the `out` tensor (max_batch_prefill_tokens * vocab_size)) ``` There is not batch size anymore per-se.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a comment on a pull request on huggingface/text-generation-inference
> For instance, when using meta-llama/Meta-Llama-3.1-8B-Instruct on an L4, this change allows running the model with --max-batch-prefill-tokens increased from 7192 to 9874 without exceeding memory ...

View on GitHub

Narsil created a comment on an issue on huggingface/text-generation-inference
TGI will always use all the allowed memory for KV-cache, to allow MANY users on the same machine. Specifying MAX_BATCH_SIZE is not used on Nvidia targets as mentionned in the docs: https://huggi...

View on GitHub

Narsil created a comment on an issue on huggingface/text-generation-inference
Also using 2x 4A100 should be more efficient in general if it works (less communication overhead between shards). If you have trouble with your current settings on 4 shards there are some new f...

View on GitHub

Narsil created a comment on an issue on huggingface/text-generation-inference
Okay. This is a won't fix for us. Having odd sized dimensions is an issue in many kernels, and padding is costly and wasting precious GPU ressources (you would essentially by computing 25% too much...

View on GitHub

Narsil closed a pull request on huggingface/text-generation-inference
fix tgi-entrypoint wrapper in docker file: exec instead of spawning a child process
reason: we added a docker wrapper script a while ago to fix missing .so issues encountered when spawning tgi in some cloud providers that add shared libs, related to cuda for example, but do not re...
Narsil created a review on a pull request on huggingface/text-generation-inference
LGTM

View on GitHub

Narsil opened a pull request on huggingface/text-generation-inference
Fixing "deadlock" when python prompts for trust_remote_code by always
specifiying a value. # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes ...
Narsil created a branch on huggingface/text-generation-inference

fixup_tokenizer_trust - Large Language Model Text Generation Inference

oOraph opened a pull request on huggingface/text-generation-inference
fix tgi-entrypoint wrapper in docker file: exec instead of spawning a child process
reason: we added a docker wrapper script a while ago to fix missing .so issues encountered when spawning tgi in some cloud providers that add shared libs, related to cuda for example, but do not re...
Narsil created a comment on an issue on huggingface/text-generation-inference
Thanks a lot for reopening with a lot more information, helps us narrow down the issue much faster.

View on GitHub

Narsil deleted a branch huggingface/text-generation-inference

maintenance/simplify-attention

Narsil pushed 1 commit to main huggingface/text-generation-inference
  • Simplify the `attention` function (#2609) * Simplify the `attention` function - Use one definition rather than mu... 59ea38c

View on GitHub

Narsil closed a pull request on huggingface/text-generation-inference
Simplify the `attention` function
# What does this PR do? - Use one definition rather than multiple (will make it easier to do shared things once, such as calculating the FP8 KV cache reciprocal). - Add `key`/`value` arguments,...
Narsil deleted a branch huggingface/text-generation-inference

feature/kv-cache-e4m3

Narsil pushed 1 commit to main huggingface/text-generation-inference
  • Support `e4m3fn` KV cache (#2655) * Support `e4m3fn` KV cache * Make check more obvious 5bbe1ce

View on GitHub

Narsil closed a pull request on huggingface/text-generation-inference
Support `e4m3fn` KV cache
# What does this PR do? Add support for `e4m3fn` KV caches as well. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case)...
Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference
LGTM

View on GitHub

josephrocca created a comment on an issue on huggingface/text-generation-inference
> With that in mind, it'll be much easier to assess a correct caching solution. Gotcha, makes sense. For reference, I use sticky sessions, and it's not much of a can of worms in my case, sinc...

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference
Should be fixed now, tested Llama & Mistral with `paged`, `flashattention` and `flashinfer`.

View on GitHub

Load more