Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
Narsil created a comment on a pull request on huggingface/text-generation-inference
Doesn't. fix it.
Narsil closed a pull request on huggingface/text-generation-inference
Fixing performance degradation on Intel.
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...nathan-az opened an issue on huggingface/text-generation-inference
(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently
### System Info ``` Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.0 Commit sha: a094729386b5689aabfba40b7fdb207142dec8d5 Docker label: sha-a094729 nvidia-smi: Mo...mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
- feat(trtllm): do not tokenize twice 8d1c3c8
HuggingFaceDocBuilderDev created a comment on a pull request on huggingface/text-generation-inference
The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/text-generation-inference/pr_2673). All of your documentation changes will be reflected on that endpoint. The docs are avai...
mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
- misc(router): remove SchedulingError 1a3da05
Narsil created a review comment on a pull request on huggingface/text-generation-inference
`!support_chunking`. :) We're keeping it for AMD/Intel.
sywangyi created a comment on a pull request on huggingface/text-generation-inference
hang in the t.join(). since there's no break in the loop in log_line
mfuntowicz pushed 2 commits to trtllm-executor-thread huggingface/text-generation-inference
Narsil pushed 1 commit to close_dl_thread huggingface/text-generation-inference
- Clean both threads. fe8d55d
Narsil created a comment on a pull request on huggingface/text-generation-inference
Does this fix it : https://github.com/huggingface/text-generation-inference/pull/2674 too ?
Narsil opened a pull request on huggingface/text-generation-inference
Fixing performance degradation on Intel.
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...Narsil created a branch on huggingface/text-generation-inference
close_dl_thread - Large Language Model Text Generation Inference
Narsil created a comment on a pull request on huggingface/text-generation-inference
> Hi, @Narsil I debug in my environment. find it's blocked in this thread https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs#L1266-L1268, I print "n" in log_lin...
sywangyi created a comment on a pull request on huggingface/text-generation-inference
Hi, @Narsil I debug in my environment. find it's blocked in this thread https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs#L1266-L1268, I print "n" in log_lines...
ErikKaum created a comment on an issue on huggingface/text-generation-inference
Hi, I think [this was a start](https://github.com/huggingface/text-generation-inference/pull/2652) but there seems to be some direction change @drbh?
Narsil opened a pull request on huggingface/text-generation-inference
Choosing input/total tokens automatically based on available VRAM?
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...Narsil created a branch on huggingface/text-generation-inference
auto_length - Large Language Model Text Generation Inference
danieldk pushed 1 commit to main huggingface/text-generation-inference
- Test Marlin MoE with `desc_act=true` (#2622) Update the Mixtral GPTQ test to use a model with `desc_act=true` and `... 7f54b73
danieldk closed a pull request on huggingface/text-generation-inference
Test Marlin MoE with `desc_act=true`
# What does this PR do? Update the Mixtral GPTQ test to use a model with `desc_act=true` and `group_size!=-1` to ensure that we are checking activation sorting/non-full K (with tensor parallelis...danieldk pushed 29 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
- enable mllama in intel platform (#2610) Signed-off-by: Wang, Yi A <[email protected]> 57f9685
- Upgrade minor rust version (Fixes rust build compilation cache) (#2617) * Upgrade minor rust version (Fixes rust bui... 8b295aa
- Add support for fused MoE Marlin for AWQ (#2616) * Add support for fused MoE Marlin for AWQ This uses the updated... 6414248
- nix: move back to the tgi-nix main branch (#2620) 6db3bcb
- CI (2599): Update ToolType input schema (#2601) * Update ToolType input schema * lint * fix: run formatter ... 8ad20da
- nix: add black and isort to the closure (#2619) To make sure that everything is formatted with the same black versio... 9ed0c85
- AMD CI (#2589) * Only run 1 valid test. * TRying the tailscale action quickly. * ? * bash spaces. * Remo... 43f39f6
- feat: allow tool calling to respond without a tool (#2614) * feat: process token stream before returning to client ... e36dfaa
- Update documentation to most recent stable version of TGI. (#2625) Update to most recent stable version of TGI. d912f0b
- Intel ci (#2630) * Intel CI ? * Let's try non sharded gemma. * Snapshot rename * Apparently container can b... 3dbdf63
- Fixing intel Supports windowing. (#2637) 0c47884
- Small fixes for supported models (#2471) * Small improvements for docs * Update _toctree.yml * Updating the do... ce28ee8
- Cpu perf (#2596) * break when there's nothing to read Signed-off-by: Wang, Yi A <[email protected]> * Differ... 3ea82d0
- Clarify gated description and quicktour (#2631) Update quicktour.md 51f5401
- update ipex to fix incorrect output of mllama in cpu (#2640) Signed-off-by: Wang, Yi A <[email protected]> 7a82ddc
- feat: enable pytorch xpu support for non-attention models (#2561) XPU backend is available natively (without IPEX) i... 58848cb
- Fixing linters. (#2650) cf04a43
- Use flashinfer for Gemma 2. ce7e356
- Rollback to `ChatRequest` for Vertex AI Chat instead of `VertexChat` (#2651) As spotted by @philschmid, the payload ... ffe05cc
- Fp8 e4m3_fnuz support for rocm (#2588) * (feat) fp8 fnuz support for rocm * (review comments) Fix compression_con... 704a58c
- and 9 more ...
mfuntowicz pushed 227 commits to trtllm-executor-thread huggingface/text-generation-inference
- Pr 2290 ci run (#2329) * MODEL_ID propagation fix * fix: remove global model id --------- Co-authored-by: r... f7f6187
- refactor usage stats (#2339) * refactor usage stats * Update docs/source/usage_statistics.md Co-authored-by: N... 7451041
- enable HuggingFaceM4/idefics-9b in intel gpu (#2338) Signed-off-by: Wang, Yi A <[email protected]> 9ab9937
- Fix cache block size for flash decoding (#2351) * Fix cache block size for flash decoding This seems to have been... 22fb1be
- Unify attention output handling (#2343) - Always return the hidden states. - Create the output tensor inside the `a... 47447ef
- fix: attempt forward on flash attn2 to check hardware support (#2335) * fix: attempt forward on flash attn2 to check... 215ed3a
- feat: include local lora adapter loading docs (#2359) dd47a3d
- fix: return the out tensor rather then the functions return value (#2361) 29b8d19
- feat: implement a templated endpoint for visibility into chat requests (#2333) * feat: implement a templated endpoin... e11f5f1
- feat: prefer stop over eos_token to align with openai finish_reason (#2344) f8a5b38
- feat: return the generated text when parsing fails (#2353) 1768c00
- fix: default num_ln_in_parallel_attn to one if not supplied (#2364) a64d407
- fix: prefer original layernorm names for 180B (#2365) 133015f
- fix: fix num_ln_in_parallel_attn attribute name typo in RWConfig (#2350) Co-authored-by: Islam Almersawi <islam.alme... 8094ecf
- add gptj modeling in TGI #2366 (CI RUN) (#2372) * add gptj modeling Signed-off-by: Wang, Yi A <[email protected]... 21267f3
- Fix the prefix for OPT model in opt_modelling.py #2370 (CI RUN) (#2371) * Fix the bug * fix: run lints * fix: ... a379d55
- Pr 2374 ci branch (#2378) * Update __init__.py Fix issue with NoneType comparison for max_input_tokens and slidin... 82d19d7
- fix EleutherAI/gpt-neox-20b does not work in tgi (#2346) Signed-off-by: Wang, Yi A <[email protected]> 689b1ab
- Pr 2337 ci branch (#2379) * hotfix: fix xpu crash brought by code refine. torch.xpu rely on import ipex Signed-of... 2ca5980
- fix: prefer hidden_activation over hidden_act in gemma2 (#2381) f852190
- and 207 more ...