Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
danieldk pushed 1 commit to maintenance/simplify-attention huggingface/text-generation-inference
- Fixup flashinfer support 7822bfd
Grey4sh closed an issue on huggingface/text-generation-inference
TGI included marlin kernel is missing padding code
### Feature request TGI included marlin kernel is missing padding code ### Motivation https://github.com/ModelCloud/GPTQModel/issues/328#issuecomment-2408339273 ```shell Now TGI has support...Grey4sh opened an issue on huggingface/text-generation-inference
TGI included marlin kernel is missing padding code (REOPEN)
### System Info ### TGI version tgi-2.3.1 docker image ### OS version ```shell torch install path ............... ['/home/chatgpt/.local/lib/python3.10/site-packages/torch'] torch version ....drbh opened a pull request on huggingface/text-generation-inference
fix: prefer inplace softmax to avoid copy
This PR modifies `log_softmax` to operate in place, eliminating the need to copy large tensors. This optimization reduces memory consumption during warmup. For instance, when using `meta-llama/M...drbh created a branch on huggingface/text-generation-inference
prefer-inplace-softmax-for-prefill-logprobs - Large Language Model Text Generation Inference
tjtanaa created a comment on an issue on huggingface/text-generation-inference
I found that running the following benchmark endpoint /generate_stream, all the requests are processed. ``` python benchmark_serving.py --backend tgi --model "/app/model/models--meta-llama--Lla...
tjtanaa closed an issue on huggingface/text-generation-inference
[Inference] Z ERROR text_generation_router::server: router/src/server.rs:863: Failed to send event. Receiver dropped.
### System Info I am using the docker image: `ghcr.io/huggingface/text-generation-inference:sha-ffe05cc-rocm` Hardware: MI300X AMD System Management Interface | Version: 24.6.2+2b02a07 | ROCm ...tjtanaa opened an issue on huggingface/text-generation-inference
[Inference] Z ERROR text_generation_router::server: router/src/server.rs:863: Failed to send event. Receiver dropped.
### System Info I am using the docker image: `ghcr.io/huggingface/text-generation-inference:sha-ffe05cc-rocm` Hardware: MI300X AMD System Management Interface | Version: 24.6.2+2b02a07 | ROCm ...tthakkal closed a pull request on huggingface/text-generation-inference
Remove References to torch compile mode in readme
# What does this PR do? Remove References to torch compile mode in ReadMe as there is known bug in 1.18 with TGI <!-- Congratulations! You've made it this far! You're not quite done yet though. ...tthakkal opened a pull request on huggingface/text-generation-inference
Remove References to torch compile mode in readme
# What does this PR do? Remove References to torch compile mode in ReadMe as there is known bug in 1.18 with TGI <!-- Congratulations! You've made it this far! You're not quite done yet though. ...aW3st opened a pull request on huggingface/text-generation-inference
Upgrade outlines to 0.1.1
# What does this PR do? Upgrades Outlines package in the server to 0.1.1. Outlines has released a number of fixes and improvements since the current version. Some highlights: [Compatibility wi...nbroad1881 created a comment on an issue on huggingface/text-generation-inference
TGI doesn't run gguf files. Use llama.cpp for that
Narsil created a comment on an issue on huggingface/text-generation-inference
Something is bugged in your cache I think. You are using a cache directory it seems `no API specified`, meaning you're pointing to a directory not to the raw model id (if the directory has the s...
Narsil created a review comment on a pull request on huggingface/text-generation-inference
And we cannot use the block_tables implementation for paged + v2, because that requires BLOCK_SIZE=256, where paged attention uses block_size = 16.
Narsil created a review comment on a pull request on huggingface/text-generation-inference
paged attention is not V1 vs V2, those are separate concerns.
Narsil created a review comment on a pull request on huggingface/text-generation-inference
You're breaking paged here. ATTENTION="paged" text-generation-launcher ... shows the issue. PAGED still uses v2, not v1 (unless sm is too low)
Narsil pushed 1 commit to omni_tokenizer huggingface/text-generation-inference
- Deprecation message. 8350797
danieldk pushed 1 commit to feature/kv-cache-e4m3 huggingface/text-generation-inference
- Make check more obvious 751f1bb
danieldk pushed 1 commit to maintenance/simplify-attention huggingface/text-generation-inference
- Simplify the `attention` function - Use one definition rather than multiple. - Add `key`/`value` arguments, so that ... 07128cc