huggingface/text-generation-inference Events in 2024 - Ecosyste.ms: Timeline

danieldk created a review on a pull request on huggingface/text-generation-inference

October 17, 2024 8:05am

View on GitHub

danieldk pushed 1 commit to maintenance/simplify-attention huggingface/text-generation-inference

October 17, 2024 8:04am

Fixup flashinfer support 7822bfd

View on GitHub

long568 starred huggingface/text-generation-inference

October 17, 2024 7:13am

thompson0012 starred huggingface/text-generation-inference

October 17, 2024 7:02am

Grey4sh closed an issue on huggingface/text-generation-inference

October 17, 2024 3:16am

TGI included marlin kernel is missing padding code

### Feature request TGI included marlin kernel is missing padding code ### Motivation https://github.com/ModelCloud/GPTQModel/issues/328#issuecomment-2408339273 ```shell Now TGI has support...

Grey4sh opened an issue on huggingface/text-generation-inference

October 17, 2024 3:15am

TGI included marlin kernel is missing padding code (REOPEN)

### System Info ### TGI version tgi-2.3.1 docker image ### OS version ```shell torch install path ............... ['/home/chatgpt/.local/lib/python3.10/site-packages/torch'] torch version ....

drbh opened a pull request on huggingface/text-generation-inference

October 17, 2024 3:05am

fix: prefer inplace softmax to avoid copy

This PR modifies `log_softmax` to operate in place, eliminating the need to copy large tensors. This optimization reduces memory consumption during warmup. For instance, when using `meta-llama/M...

drbh created a branch on huggingface/text-generation-inference

October 17, 2024 2:54am

prefer-inplace-softmax-for-prefill-logprobs - Large Language Model Text Generation Inference

cvdong starred huggingface/text-generation-inference

October 17, 2024 2:17am

meng-wenlong starred huggingface/text-generation-inference

October 17, 2024 1:58am

tjtanaa created a comment on an issue on huggingface/text-generation-inference

October 17, 2024 12:51am

I found that running the following benchmark endpoint /generate_stream, all the requests are processed. ``` python benchmark_serving.py --backend tgi --model "/app/model/models--meta-llama--Lla...

View on GitHub

tjtanaa closed an issue on huggingface/text-generation-inference

October 17, 2024 12:51am

[Inference] Z ERROR text_generation_router::server: router/src/server.rs:863: Failed to send event. Receiver dropped.

### System Info I am using the docker image: `ghcr.io/huggingface/text-generation-inference:sha-ffe05cc-rocm` Hardware: MI300X AMD System Management Interface | Version: 24.6.2+2b02a07 | ROCm ...

tjtanaa opened an issue on huggingface/text-generation-inference

October 17, 2024 12:00am

[Inference] Z ERROR text_generation_router::server: router/src/server.rs:863: Failed to send event. Receiver dropped.

### System Info I am using the docker image: `ghcr.io/huggingface/text-generation-inference:sha-ffe05cc-rocm` Hardware: MI300X AMD System Management Interface | Version: 24.6.2+2b02a07 | ROCm ...

wojtekqbiak starred huggingface/text-generation-inference

October 16, 2024 9:12pm

tthakkal closed a pull request on huggingface/text-generation-inference

October 16, 2024 7:46pm

Remove References to torch compile mode in readme

# What does this PR do? Remove References to torch compile mode in ReadMe as there is known bug in 1.18 with TGI <!-- Congratulations! You've made it this far! You're not quite done yet though. ...

tthakkal opened a pull request on huggingface/text-generation-inference

October 16, 2024 7:46pm

Remove References to torch compile mode in readme

# What does this PR do? Remove References to torch compile mode in ReadMe as there is known bug in 1.18 with TGI <!-- Congratulations! You've made it this far! You're not quite done yet though. ...

aW3st opened a pull request on huggingface/text-generation-inference

October 16, 2024 6:50pm

Upgrade outlines to 0.1.1

# What does this PR do? Upgrades Outlines package in the server to 0.1.1. Outlines has released a number of fixes and improvements since the current version. Some highlights: [Compatibility wi...

nbroad1881 created a comment on an issue on huggingface/text-generation-inference

October 16, 2024 6:44pm

TGI doesn't run gguf files. Use llama.cpp for that

View on GitHub

Narsil created a comment on an issue on huggingface/text-generation-inference

October 16, 2024 5:46pm

Something is bugged in your cache I think. You are using a cache directory it seems `no API specified`, meaning you're pointing to a directory not to the raw model id (if the directory has the s...

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 16, 2024 4:57pm

And we cannot use the block_tables implementation for paged + v2, because that requires BLOCK_SIZE=256, where paged attention uses block_size = 16.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 16, 2024 4:57pm

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 16, 2024 4:57pm

paged attention is not V1 vs V2, those are separate concerns.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 16, 2024 4:57pm

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 16, 2024 4:56pm

You're breaking paged here. ATTENTION="paged" text-generation-launcher ... shows the issue. PAGED still uses v2, not v1 (unless sm is too low)

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 16, 2024 4:56pm

View on GitHub

Narsil pushed 1 commit to omni_tokenizer huggingface/text-generation-inference

October 16, 2024 4:50pm

Deprecation message. 8350797

View on GitHub

mlxyz starred huggingface/text-generation-inference

October 16, 2024 2:55pm

itej89 forked huggingface/text-generation-inference

October 16, 2024 2:35pm

itej89/text-generation-inference

danieldk pushed 1 commit to feature/kv-cache-e4m3 huggingface/text-generation-inference

October 16, 2024 1:58pm

Make check more obvious 751f1bb

View on GitHub

danieldk pushed 1 commit to maintenance/simplify-attention huggingface/text-generation-inference

October 16, 2024 1:55pm

Simplify the `attention` function - Use one definition rather than multiple. - Add `key`/`value` arguments, so that ... 07128cc

View on GitHub