Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
drbh created a review comment on a pull request on huggingface/text-generation-inference
updated in latest commit
drbh pushed 1 commit to pr-2634-ci-branch huggingface/text-generation-inference
- fix: adjust tool choice none logic, add test and small refactors c1fac74
mfuntowicz pushed 47 commits to feat-backend-llamacpp huggingface/text-generation-inference
- Remove compute capability lazy cell (#2580) Remove compute capability lock We are only calling the `get_cuda_capa... afc7ded
- Update architecture.md (#2577) e790cfc
- Update ROCM libs and improvements (#2579) * style * update torch * ix issues * fix clone * revert mkl ... f9e561e
- Add support for GPTQ-quantized MoE models using MoE Marlin (#2557) This change add support for MoE models that use G... 90a1d04
- feat: support phi3.5 moe (#2479) * feat: support phi3.5 moe model loading * fix: prefer llama base model and impr... 93a7042
- Move flake back to tgi-nix `main` (#2586) d1f257a
- MoE Marlin: support `desc_act` for `groupsize != -1` (#2590) This change uses the updated Marlin MoE kernel from vLL... 1c84a30
- nix: experimental support for building a Docker container (#2470) * nix: experimental support for building a Docker... 584b4d7
- Mllama flash version (#2585) * Working loading state. * Preprocessing. * Working state ? (Broke idefics1 tempo... d18ed5c
- Max token capacity metric (#2595) * adding max_token_capacity_metric * added tgi to name of metric * Adding ma... 0204946
- CI (2592): Allow LoRA adapter revision in server launcher (#2602) allow revision for lora adapters from launcher ... 2335459
- Unroll notify error into generate response (#2597) * feat: unroll notify_error if no tool is choosen * fix: expec... d22b0c1
- New release 2.3.1 (#2604) * New release 2.3.1 * Update doc number f6e2f05
- Revert "Unroll notify error into generate response" (#2605) Revert "Unroll notify error into generate response (#259... 3011639
- nix: example of local package overrides during development (#2607) 6810307
- Add basic FP8 KV cache support (#2603) * Add basic FP8 KV cache support This change adds rudimentary FP8 KV cache... 2358c2b
- Fix FP8 KV-cache condition (#2611) Update kv_cache.py 0da4df4
- enable mllama in intel platform (#2610) Signed-off-by: Wang, Yi A <[email protected]> 57f9685
- Upgrade minor rust version (Fixes rust build compilation cache) (#2617) * Upgrade minor rust version (Fixes rust bui... 8b295aa
- Add support for fused MoE Marlin for AWQ (#2616) * Add support for fused MoE Marlin for AWQ This uses the updated... 6414248
- and 27 more ...
mht-sharma created a review on a pull request on huggingface/text-generation-inference
Thanks @danieldk, LGTM This make things clearer
mht-sharma created a review comment on a pull request on huggingface/text-generation-inference
same for scalar
mht-sharma created a review comment on a pull request on huggingface/text-generation-inference
Type hint?
mht-sharma created a review on a pull request on huggingface/text-generation-inference
Thanks @danieldk, LGTM This makes things clearer
mht-sharma created a review on a pull request on huggingface/text-generation-inference
Thanks @danieldk, LGTM This makes things clearer
sywangyi created a comment on a pull request on huggingface/text-generation-inference
do you using ping cpu cores like " --cpuset-cpus=0-55"?
Narsil created a comment on a pull request on huggingface/text-generation-inference
I'm really struggling to reproduce anything. I reproduce your command line with every argument (even though I don't understand why --privileged --net host --ipc host are actually required) and I...
danieldk opened a pull request on huggingface/text-generation-inference
Make handling of FP8 scales more consisent
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...danieldk created a branch on huggingface/text-generation-inference
maintenance/reciprocal-handling - Large Language Model Text Generation Inference