Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

danieldk created a branch on huggingface/text-generation-inference

feature/kv-cache-e4m3 - Large Language Model Text Generation Inference

Bihan opened an issue on huggingface/text-generation-inference
TGI does not support FP8 quantized models
### System Info System Info TGI Docker Image: ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm MODEL: meta-llama/Llama-3.1-405B-Instruct-FP8 Hardware used: Intel® Xeon® Platinu...
SMAntony created a comment on an issue on huggingface/text-generation-inference
I have tried disabling Marlin by setting sym=False in quantize_config. When I am loading just the base model through TGI, its working. When I tried to load with Adapter (QLORA), it throws the follo...

View on GitHub

SMAntony opened an issue on huggingface/text-generation-inference
Unable to load GPTQ LoRA Adapter
### System Info text-generation-inference version: 2.3.1 OS version: Ubuntu 22.04 Model: TheBloke/WizardLM-13B-V1.2-GPTQ GPU used: +------------------------------------------------------------...
HuggingFaceDocBuilderDev created a comment on a pull request on huggingface/text-generation-inference
The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_2645). All of your documentation changes will be reflected on that endpoint. The docs are avai...

View on GitHub

drbh pushed 1 commit to pr-2634-ci-branch huggingface/text-generation-inference
  • feat: update docs and add tool choice configuration section d85133e

View on GitHub

Narsil pushed 1 commit to feat/prefix_chunking huggingface/text-generation-inference

View on GitHub

drbh closed a draft pull request on huggingface/text-generation-inference
fix: enforce default max request tokens in generate_internal
This PR moves the defaulting of `max_new_tokens` if not provided by the requester from the chat endpoint into `generate_internal`. This applies the default to all endpoints that use `generate_inte...
drbh pushed 10 commits to pr-2634-ci-branch huggingface/text-generation-inference
  • Fixing linters. (#2650) cf04a43
  • Use flashinfer for Gemma 2. ce7e356
  • Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) As spotted by @philschmid, the payload ... ffe05cc
  • add OpenAI like tool_choice for named choice a92db2e
  • add tests adf570b
  • fix: run linter and bump api docs 7e64efd
  • fix: consolidate changes and remove old tool type eab412e
  • feat: improve, simplify and rename tool choice struct add required support and refactor 6571693
  • fix: simplify tool choice logic, improve tests, openapi and rust docs e486589
  • fix: refactor away prepare_chat_input and improve tool grammar apply control flow 79a67aa

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil pushed 12 commits to feat/prefix_chunking huggingface/text-generation-inference
  • Update documentation to most recent stable version of TGI. (#2625) Update to most recent stable version of TGI. d912f0b
  • Intel ci (#2630) * Intel CI ? * Let's try non sharded gemma. * Snapshot rename * Apparently container can b... 3dbdf63
  • Fixing intel Supports windowing. (#2637) 0c47884
  • Small fixes for supported models (#2471) * Small improvements for docs * Update _toctree.yml * Updating the do... ce28ee8
  • Cpu perf (#2596) * break when there's nothing to read Signed-off-by: Wang, Yi A <[email protected]> * Differ... 3ea82d0
  • Clarify gated description and quicktour (#2631) Update quicktour.md 51f5401
  • update ipex to fix incorrect output of mllama in cpu (#2640) Signed-off-by: Wang, Yi A <[email protected]> 7a82ddc
  • feat: enable pytorch xpu support for non-attention models (#2561) XPU backend is available natively (without IPEX) i... 58848cb
  • Fixing linters. (#2650) cf04a43
  • Use flashinfer for Gemma 2. ce7e356
  • Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) As spotted by @philschmid, the payload ... ffe05cc
  • Merge branch 'main' into feat/prefix_chunking 5c8c5ac

View on GitHub

Narsil created a comment on an issue on huggingface/text-generation-inference
Can you please include the necessary information asked when you create an issue ? Ideally given the issue include a model link on the hub we can use the reproduce the issue. Thanks.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil closed a pull request on huggingface/text-generation-inference
Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat`
# What does this PR do? As spotted by @philschmid, the payload was compliant with Vertex AI, but just partially, since ideally the most compliant version would be with the generation kwargs flat...
Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil pushed 1 commit to feat/prefix_chunking huggingface/text-generation-inference
  • Fixing dtype + AMD, Ipex targets. fa491e7

View on GitHub

ptanov created a comment on an issue on huggingface/text-generation-inference
Thanks @kozistr , I'll try to check it tomorrow

View on GitHub

oOraph closed a pull request on huggingface/text-generation-inference
remove docker entrypoint
this entrypoint was meant to fix tgi on some cloud providers like gcp that do not ensure the ld cache is refreshed after adding some shared libraries in custom container hooks this should not be...
drbh opened a draft pull request on huggingface/text-generation-inference
fix: enforce default max request tokens in generate_internal
This PR moves the defaulting of `max_new_tokens` if not provided by the requester from the chat endpoint into `generate_internal`. This applies the default to all endpoints that use `generate_inte...
drbh pushed 1 commit to adjust-where-request-max-tokens-is-defaulted huggingface/text-generation-inference
  • fix: add limit to internal stream function too b3917ff

View on GitHub

drbh created a branch on huggingface/text-generation-inference

adjust-where-request-max-tokens-is-defaulted - Large Language Model Text Generation Inference

drbh created a review comment on a pull request on huggingface/text-generation-inference
this was originally added to allow passing tools to the chat template without enforcing a grammar, which was introduced in this PR https://github.com/huggingface/text-generation-inference/pull/2454...

View on GitHub

drbh created a review on a pull request on huggingface/text-generation-inference

View on GitHub

drbh created a review comment on a pull request on huggingface/text-generation-inference
good point, updated in the latest to commit to discard the tools and continue executing the function normally

View on GitHub

drbh created a review on a pull request on huggingface/text-generation-inference

View on GitHub

drbh pushed 1 commit to pr-2634-ci-branch huggingface/text-generation-inference
  • fix: refactor away prepare_chat_input and improve tool grammar apply control flow e1d1706

View on GitHub