Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
Johnno1011 closed an issue on huggingface/text-generation-inference
llama3.1 /v1/chat/completions template not found
### System Info text generation inference v2.3.1 meta-llama/Meta-Llama-3.1-70B-Instruct ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [...Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference
- Upgrading the tests (TP>1 fix changes to use different kernels.) 8673bb0
sywangyi created a review comment on a pull request on huggingface/text-generation-inference
https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/layers/gptq/__init__.py#L134,this line set the use_exllama to false, since in intel platform, exllam...
trainerbox created a comment on an issue on huggingface/text-generation-inference
TGI is not hardware compatible with Mac silicon https://github.com/huggingface/text-generation-inference Hardware support [Nvidia] [AMD] [Inferentia] [Gaudi] [Google TPU] Additional ref...
mfuntowicz pushed 5 commits to trtllm-executor-thread huggingface/text-generation-inference
- misc(cuda): require 12.6 1c3e71e
- chore(cmake): use correct policy for download_timestamp 09e8803
- feat(looper): check engine and executorWorker paths exist before creating the backend 55adb74
- chore(cmake): download timestamp should be before URL 0745f2b
- feat(looper): minor optimizations to avoid growing too much the containers f45e180
Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference
- Revert change after rebase. 5ca6da1
Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference
- Fix redundant import. 7b29135
Narsil pushed 16 commits to gpt_awq_4 huggingface/text-generation-inference
- Fixing linters. (#2650) cf04a43
- Use flashinfer for Gemma 2. ce7e356
- Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) As spotted by @philschmid, the payload ... ffe05cc
- Fp8 e4m3_fnuz support for rocm (#2588) * (feat) fp8 fnuz support for rocm * (review comments) Fix compression_con... 704a58c
- feat: prefill chunking (#2600) * wip * rollback * refactor to use prefix/postfix namming + fix all_input_ids_t... a6a0c97
- Support `e4m3fn` KV cache (#2655) * Support `e4m3fn` KV cache * Make check more obvious 5bbe1ce
- Simplify the `attention` function (#2609) * Simplify the `attention` function - Use one definition rather than mu... 59ea38c
- fix tgi-entrypoint wrapper in docker file: exec instead of spawning a child process (#2663) tgi-entrypoint: exec ins... 1b97e08
- fix: prefer inplace softmax to avoid copy (#2661) * fix: prefer inplace softmax to avoid copy * Update server/tex... 5f32dea
- Break cycle between the attention implementations and KV cache (#2627) 8ec5755
- add gptq and awq int4 support in intel platform Signed-off-by: Wang, Yi A <[email protected]> 61fe28e
- fix ci failure Signed-off-by: Wang, Yi A <[email protected]> dd3fb81
- set kv cache dtype Signed-off-by: Wang, Yi A <[email protected]> 645369b
- refine the code according to the review command Signed-off-by: Wang, Yi A <[email protected]> f36c9a6
- Simplifying conditionals + reverting integration tests values. 3e12402
- Unused import ba7197c
Narsil pushed 1 commit to gpt_awq_4 huggingface/text-generation-inference
- Simplifying conditionals + reverting integration tests values. cf7a957
Narsil created a review comment on a pull request on huggingface/text-generation-inference
Isn't keeping `use_exllama` and simply fixing the TP (with - g_idx[0]) in the conditional to fix the issues on IPEX ?
Narsil opened a pull request on huggingface/text-generation-inference
CI job. Gpt awq 4
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...Narsil created a branch on huggingface/text-generation-inference
gpt_awq_4 - Large Language Model Text Generation Inference
muscionig created a comment on an issue on huggingface/text-generation-inference
Hi @Johnno1011, I think this might help. I noticed that your `model-id` is set to `meta-llama/Meta-Llama-3.1-70B-Instruct`. While working with this model on the HF Hub I faced a similar issu...
Bihan created a comment on an issue on huggingface/text-generation-inference
@danieldk Deployed TGI with neuralmagic/Meta-Llama-3-70B-Instruct-FP8 and it worked.
Johnno1011 created a comment on an issue on huggingface/text-generation-inference
Yeah interesting point, I have tried fiddling with this... I got myself a new copy of the model and removed the caching (so that it downloads directly into the container) but this still happens. No...
Narsil created a review comment on a pull request on huggingface/text-generation-inference
Isn't this `no_tool` with `snake_case` ?This should mean a rename of this property or `None`, no ? I don't think `schema(rename)` imples a serde `rename`.
Narsil created a review comment on a pull request on huggingface/text-generation-inference
`vec![]` ?
danieldk pushed 1 commit to main huggingface/text-generation-inference
- Break cycle between the attention implementations and KV cache (#2627) 8ec5755
drbh deleted a branch huggingface/text-generation-inference
prefer-inplace-softmax-for-prefill-logprobs
drbh pushed 1 commit to main huggingface/text-generation-inference
- fix: prefer inplace softmax to avoid copy (#2661) * fix: prefer inplace softmax to avoid copy * Update server/tex... 5f32dea