Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

danieldk created a review on a pull request on huggingface/text-generation-inference

View on GitHub

danieldk pushed 1 commit to maintenance/simplify-attention huggingface/text-generation-inference

View on GitHub

long568 starred huggingface/text-generation-inference
thompson0012 starred huggingface/text-generation-inference
Grey4sh closed an issue on huggingface/text-generation-inference
TGI included marlin kernel is missing padding code
### Feature request TGI included marlin kernel is missing padding code ### Motivation https://github.com/ModelCloud/GPTQModel/issues/328#issuecomment-2408339273 ```shell Now TGI has support...
Grey4sh opened an issue on huggingface/text-generation-inference
TGI included marlin kernel is missing padding code (REOPEN)
### System Info ### TGI version tgi-2.3.1 docker image ### OS version ```shell torch install path ............... ['/home/chatgpt/.local/lib/python3.10/site-packages/torch'] torch version ....
drbh opened a pull request on huggingface/text-generation-inference
fix: prefer inplace softmax to avoid copy
This PR modifies `log_softmax` to operate in place, eliminating the need to copy large tensors. This optimization reduces memory consumption during warmup. For instance, when using `meta-llama/M...
drbh created a branch on huggingface/text-generation-inference

prefer-inplace-softmax-for-prefill-logprobs - Large Language Model Text Generation Inference

cvdong starred huggingface/text-generation-inference
meng-wenlong starred huggingface/text-generation-inference
tjtanaa created a comment on an issue on huggingface/text-generation-inference
I found that running the following benchmark endpoint /generate_stream, all the requests are processed. ``` python benchmark_serving.py --backend tgi --model "/app/model/models--meta-llama--Lla...

View on GitHub

tjtanaa closed an issue on huggingface/text-generation-inference
[Inference] Z ERROR text_generation_router::server: router/src/server.rs:863: Failed to send event. Receiver dropped.
### System Info I am using the docker image: `ghcr.io/huggingface/text-generation-inference:sha-ffe05cc-rocm` Hardware: MI300X AMD System Management Interface | Version: 24.6.2+2b02a07 | ROCm ...
tjtanaa opened an issue on huggingface/text-generation-inference
[Inference] Z ERROR text_generation_router::server: router/src/server.rs:863: Failed to send event. Receiver dropped.
### System Info I am using the docker image: `ghcr.io/huggingface/text-generation-inference:sha-ffe05cc-rocm` Hardware: MI300X AMD System Management Interface | Version: 24.6.2+2b02a07 | ROCm ...
wojtekqbiak starred huggingface/text-generation-inference
tthakkal closed a pull request on huggingface/text-generation-inference
Remove References to torch compile mode in readme
# What does this PR do? Remove References to torch compile mode in ReadMe as there is known bug in 1.18 with TGI <!-- Congratulations! You've made it this far! You're not quite done yet though. ...
tthakkal opened a pull request on huggingface/text-generation-inference
Remove References to torch compile mode in readme
# What does this PR do? Remove References to torch compile mode in ReadMe as there is known bug in 1.18 with TGI <!-- Congratulations! You've made it this far! You're not quite done yet though. ...
aW3st opened a pull request on huggingface/text-generation-inference
Upgrade outlines to 0.1.1
# What does this PR do? Upgrades Outlines package in the server to 0.1.1. Outlines has released a number of fixes and improvements since the current version. Some highlights: [Compatibility wi...
nbroad1881 created a comment on an issue on huggingface/text-generation-inference
TGI doesn't run gguf files. Use llama.cpp for that

View on GitHub

Narsil created a comment on an issue on huggingface/text-generation-inference
Something is bugged in your cache I think. You are using a cache directory it seems `no API specified`, meaning you're pointing to a directory not to the raw model id (if the directory has the s...

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference
And we cannot use the block_tables implementation for paged + v2, because that requires BLOCK_SIZE=256, where paged attention uses block_size = 16.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference
paged attention is not V1 vs V2, those are separate concerns.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference
You're breaking paged here. ATTENTION="paged" text-generation-launcher ... shows the issue. PAGED still uses v2, not v1 (unless sm is too low)

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil pushed 1 commit to omni_tokenizer huggingface/text-generation-inference

View on GitHub

mlxyz starred huggingface/text-generation-inference
danieldk pushed 1 commit to feature/kv-cache-e4m3 huggingface/text-generation-inference

View on GitHub

danieldk pushed 1 commit to maintenance/simplify-attention huggingface/text-generation-inference
  • Simplify the `attention` function - Use one definition rather than multiple. - Add `key`/`value` arguments, so that ... 07128cc

View on GitHub

Load more