huggingface/text-generation-inference Events in 2024 - Ecosyste.ms: Timeline

Johnno1011 closed an issue on huggingface/text-generation-inference

October 18, 2024 1:07pm

llama3.1 /v1/chat/completions template not found

### System Info text generation inference v2.3.1 meta-llama/Meta-Llama-3.1-70B-Instruct ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [...

Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference

October 18, 2024 12:42pm

Upgrading the tests (TP>1 fix changes to use different kernels.) 8673bb0

View on GitHub

sywangyi created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 11:25am

https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/layers/gptq/__init__.py#L134,this line set the use_exllama to false, since in intel platform, exllam...

View on GitHub

sywangyi created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 11:25am

View on GitHub

trainerbox created a comment on an issue on huggingface/text-generation-inference

October 18, 2024 11:16am

TGI is not hardware compatible with Mac silicon https://github.com/huggingface/text-generation-inference Hardware support [Nvidia] [AMD] [Inferentia] [Gaudi] [Google TPU] Additional ref...

View on GitHub

mfuntowicz pushed 5 commits to trtllm-executor-thread huggingface/text-generation-inference

October 18, 2024 11:11am

misc(cuda): require 12.6 1c3e71e
chore(cmake): use correct policy for download_timestamp 09e8803
feat(looper): check engine and executorWorker paths exist before creating the backend 55adb74
chore(cmake): download timestamp should be before URL 0745f2b
feat(looper): minor optimizations to avoid growing too much the containers f45e180

View on GitHub

Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference

October 18, 2024 11:03am

Revert change after rebase. 5ca6da1

View on GitHub

Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference

October 18, 2024 10:29am

Fix redundant import. 7b29135

View on GitHub

Narsil pushed 16 commits to gpt_awq_4 huggingface/text-generation-inference

October 18, 2024 10:22am

Fixing linters. (#2650) cf04a43
Use flashinfer for Gemma 2. ce7e356
Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) As spotted by @philschmid, the payload ... ffe05cc
Fp8 e4m3_fnuz support for rocm (#2588) * (feat) fp8 fnuz support for rocm * (review comments) Fix compression_con... 704a58c
feat: prefill chunking (#2600) * wip * rollback * refactor to use prefix/postfix namming + fix all_input_ids_t... a6a0c97
Support `e4m3fn` KV cache (#2655) * Support `e4m3fn` KV cache * Make check more obvious 5bbe1ce
Simplify the `attention` function (#2609) * Simplify the `attention` function - Use one definition rather than mu... 59ea38c
fix tgi-entrypoint wrapper in docker file: exec instead of spawning a child process (#2663) tgi-entrypoint: exec ins... 1b97e08
fix: prefer inplace softmax to avoid copy (#2661) * fix: prefer inplace softmax to avoid copy * Update server/tex... 5f32dea
Break cycle between the attention implementations and KV cache (#2627) 8ec5755
add gptq and awq int4 support in intel platform Signed-off-by: Wang, Yi A <[email protected]> 61fe28e
fix ci failure Signed-off-by: Wang, Yi A <[email protected]> dd3fb81
set kv cache dtype Signed-off-by: Wang, Yi A <[email protected]> 645369b
refine the code according to the review command Signed-off-by: Wang, Yi A <[email protected]> f36c9a6
Simplifying conditionals + reverting integration tests values. 3e12402
Unused import ba7197c

View on GitHub

Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference

October 18, 2024 10:14am

Simplifying conditionals + reverting integration tests values. cf7a957

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 10:11am

Isn't keeping `use_exllama` and simply fixing the TP (with - g_idx[0]) in the conditional to fix the issues on IPEX ?

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 10:12am

View on GitHub

Narsil opened a pull request on huggingface/text-generation-inference

October 18, 2024 10:02am

CI job. Gpt awq 4

# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...

Narsil created a branch on huggingface/text-generation-inference

October 18, 2024 10:02am

gpt_awq_4 - Large Language Model Text Generation Inference

edwin-19 starred huggingface/text-generation-inference

October 18, 2024 2:11am

muscionig created a comment on an issue on huggingface/text-generation-inference

October 17, 2024 9:23pm

Hi @Johnno1011, I think this might help. I noticed that your `model-id` is set to `meta-llama/Meta-Llama-3.1-70B-Instruct`. While working with this model on the HF Hub I faced a similar issu...

View on GitHub

default-anton starred huggingface/text-generation-inference

October 17, 2024 4:11pm

Bihan created a comment on an issue on huggingface/text-generation-inference

October 17, 2024 3:48pm

@danieldk Deployed TGI with neuralmagic/Meta-Llama-3-70B-Instruct-FP8 and it worked.

View on GitHub

Nov05 starred huggingface/text-generation-inference

October 17, 2024 2:03pm

Johnno1011 created a comment on an issue on huggingface/text-generation-inference

October 17, 2024 1:37pm

Yeah interesting point, I have tried fiddling with this... I got myself a new copy of the model and removed the caching (so that it downloads directly into the container) but this still happens. No...

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 17, 2024 1:09pm

Isn't this `no_tool` with `snake_case` ?This should mean a rename of this property or `None`, no ? I don't think `schema(rename)` imples a serde `rename`.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 17, 2024 1:09pm

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 17, 2024 1:07pm

`vec![]` ?

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 17, 2024 1:07pm

View on GitHub

danieldk pushed 1 commit to main huggingface/text-generation-inference

October 17, 2024 12:54pm

Break cycle between the attention implementations and KV cache (#2627) 8ec5755

View on GitHub

drbh deleted a branch huggingface/text-generation-inference

October 17, 2024 12:49pm

prefer-inplace-softmax-for-prefill-logprobs

drbh pushed 1 commit to main huggingface/text-generation-inference

October 17, 2024 12:49pm

fix: prefer inplace softmax to avoid copy (#2661) * fix: prefer inplace softmax to avoid copy * Update server/tex... 5f32dea

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 17, 2024 12:48pm

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 17, 2024 12:15pm

View on GitHub

nikhil-weamai created a comment on an issue on huggingface/text-generation-inference

October 17, 2024 11:38am

please use the endpoint url like this : **https://xxx.cloud/v1/chat/completions** after that you will get the token count in response.

View on GitHub