Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

Narsil created a comment on a pull request on huggingface/text-generation-inference
Doesn't. fix it.

View on GitHub

Narsil closed a pull request on huggingface/text-generation-inference
Fixing performance degradation on Intel.
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...
nathan-az opened an issue on huggingface/text-generation-inference
(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently
### System Info ``` Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.0 Commit sha: a094729386b5689aabfba40b7fdb207142dec8d5 Docker label: sha-a094729 nvidia-smi: Mo...
mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
  • feat(trtllm): do not tokenize twice 8d1c3c8

View on GitHub

HuggingFaceDocBuilderDev created a comment on a pull request on huggingface/text-generation-inference
The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_2673). All of your documentation changes will be reflected on that endpoint. The docs are avai...

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference

View on GitHub

mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
  • misc(router): remove SchedulingError 1a3da05

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference
`!support_chunking`. :) We're keeping it for AMD/Intel.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

sywangyi created a comment on a pull request on huggingface/text-generation-inference
hang in the t.join(). since there's no break in the loop in log_line

View on GitHub

mfuntowicz pushed 2 commits to trtllm-executor-thread huggingface/text-generation-inference
  • chore(trtllm): remove unused method 3174716
  • feat(trtllm): cache maxNumTokens to avoid calling JSON everytime e6da212

View on GitHub

Narsil pushed 1 commit to close_dl_thread huggingface/text-generation-inference

View on GitHub

Narsil created a comment on a pull request on huggingface/text-generation-inference
Does this fix it : https://github.com/huggingface/text-generation-inference/pull/2674 too ?

View on GitHub

drbh created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil opened a pull request on huggingface/text-generation-inference
Fixing performance degradation on Intel.
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...
Narsil created a branch on huggingface/text-generation-inference

close_dl_thread - Large Language Model Text Generation Inference

Narsil created a comment on a pull request on huggingface/text-generation-inference
> Hi, @Narsil I debug in my environment. find it's blocked in this thread https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs#L1266-L1268, I print "n" in log_lin...

View on GitHub

sywangyi created a comment on a pull request on huggingface/text-generation-inference
Hi, @Narsil I debug in my environment. find it's blocked in this thread https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs#L1266-L1268, I print "n" in log_lines...

View on GitHub

allanbunch starred huggingface/text-generation-inference
ErikKaum created a comment on an issue on huggingface/text-generation-inference
Hi, I think [this was a start](https://github.com/huggingface/text-generation-inference/pull/2652) but there seems to be some direction change @drbh?

View on GitHub

Narsil opened a pull request on huggingface/text-generation-inference
Choosing input/total tokens automatically based on available VRAM?
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...
Narsil created a branch on huggingface/text-generation-inference

auto_length - Large Language Model Text Generation Inference

danieldk deleted a branch huggingface/text-generation-inference

tests/marlin-moe-desc-act

danieldk pushed 1 commit to main huggingface/text-generation-inference
  • Test Marlin MoE with `desc_act=true` (#2622) Update the Mixtral GPTQ test to use a model with `desc_act=true` and `... 7f54b73

View on GitHub

danieldk closed a pull request on huggingface/text-generation-inference
Test Marlin MoE with `desc_act=true`
# What does this PR do? Update the Mixtral GPTQ test to use a model with `desc_act=true` and `group_size!=-1` to ensure that we are checking activation sorting/non-full K (with tensor parallelis...
Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

danieldk pushed 29 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
  • enable mllama in intel platform (#2610) Signed-off-by: Wang, Yi A <[email protected]> 57f9685
  • Upgrade minor rust version (Fixes rust build compilation cache) (#2617) * Upgrade minor rust version (Fixes rust bui... 8b295aa
  • Add support for fused MoE Marlin for AWQ (#2616) * Add support for fused MoE Marlin for AWQ This uses the updated... 6414248
  • nix: move back to the tgi-nix main branch (#2620) 6db3bcb
  • CI (2599): Update ToolType input schema (#2601) * Update ToolType input schema * lint * fix: run formatter ... 8ad20da
  • nix: add black and isort to the closure (#2619) To make sure that everything is formatted with the same black versio... 9ed0c85
  • AMD CI (#2589) * Only run 1 valid test. * TRying the tailscale action quickly. * ? * bash spaces. * Remo... 43f39f6
  • feat: allow tool calling to respond without a tool (#2614) * feat: process token stream before returning to client ... e36dfaa
  • Update documentation to most recent stable version of TGI. (#2625) Update to most recent stable version of TGI. d912f0b
  • Intel ci (#2630) * Intel CI ? * Let's try non sharded gemma. * Snapshot rename * Apparently container can b... 3dbdf63
  • Fixing intel Supports windowing. (#2637) 0c47884
  • Small fixes for supported models (#2471) * Small improvements for docs * Update _toctree.yml * Updating the do... ce28ee8
  • Cpu perf (#2596) * break when there's nothing to read Signed-off-by: Wang, Yi A <[email protected]> * Differ... 3ea82d0
  • Clarify gated description and quicktour (#2631) Update quicktour.md 51f5401
  • update ipex to fix incorrect output of mllama in cpu (#2640) Signed-off-by: Wang, Yi A <[email protected]> 7a82ddc
  • feat: enable pytorch xpu support for non-attention models (#2561) XPU backend is available natively (without IPEX) i... 58848cb
  • Fixing linters. (#2650) cf04a43
  • Use flashinfer for Gemma 2. ce7e356
  • Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) As spotted by @philschmid, the payload ... ffe05cc
  • Fp8 e4m3_fnuz support for rocm (#2588) * (feat) fp8 fnuz support for rocm * (review comments) Fix compression_con... 704a58c
  • and 9 more ...

View on GitHub

mfuntowicz pushed 227 commits to trtllm-executor-thread huggingface/text-generation-inference
  • Pr 2290 ci run (#2329) * MODEL_ID propagation fix * fix: remove global model id --------- Co-authored-by: r... f7f6187
  • refactor usage stats (#2339) * refactor usage stats * Update docs/source/usage_statistics.md Co-authored-by: N... 7451041
  • enable HuggingFaceM4/idefics-9b in intel gpu (#2338) Signed-off-by: Wang, Yi A <[email protected]> 9ab9937
  • Fix cache block size for flash decoding (#2351) * Fix cache block size for flash decoding This seems to have been... 22fb1be
  • Unify attention output handling (#2343) - Always return the hidden states. - Create the output tensor inside the `a... 47447ef
  • fix: attempt forward on flash attn2 to check hardware support (#2335) * fix: attempt forward on flash attn2 to check... 215ed3a
  • feat: include local lora adapter loading docs (#2359) dd47a3d
  • fix: return the out tensor rather then the functions return value (#2361) 29b8d19
  • feat: implement a templated endpoint for visibility into chat requests (#2333) * feat: implement a templated endpoin... e11f5f1
  • feat: prefer stop over eos_token to align with openai finish_reason (#2344) f8a5b38
  • feat: return the generated text when parsing fails (#2353) 1768c00
  • fix: default num_ln_in_parallel_attn to one if not supplied (#2364) a64d407
  • fix: prefer original layernorm names for 180B (#2365) 133015f
  • fix: fix num_ln_in_parallel_attn attribute name typo in RWConfig (#2350) Co-authored-by: Islam Almersawi <islam.alme... 8094ecf
  • add gptj modeling in TGI #2366 (CI RUN) (#2372) * add gptj modeling Signed-off-by: Wang, Yi A <[email protected]... 21267f3
  • Fix the prefix for OPT model in opt_modelling.py #2370 (CI RUN) (#2371) * Fix the bug * fix: run lints * fix: ... a379d55
  • Pr 2374 ci branch (#2378) * Update __init__.py Fix issue with NoneType comparison for max_input_tokens and slidin... 82d19d7
  • fix EleutherAI/gpt-neox-20b does not work in tgi (#2346) Signed-off-by: Wang, Yi A <[email protected]> 689b1ab
  • Pr 2337 ci branch (#2379) * hotfix: fix xpu crash brought by code refine. torch.xpu rely on import ipex Signed-of... 2ca5980
  • fix: prefer hidden_activation over hidden_act in gemma2 (#2381) f852190
  • and 207 more ...

View on GitHub

paulcx created a comment on an issue on huggingface/text-generation-inference
> Okay gotcha! Thanks for being elaborate on this 👍 The difference between `v1/chat/completions` and `/generate` is indeed a bit off. > > I'll ping @drbh I think he might know better! Hi @Eri...

View on GitHub

lurker18 starred huggingface/text-generation-inference
Load more