Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

drbh created a review on a pull request on huggingface/text-generation-inference

View on GitHub

drbh created a review comment on a pull request on huggingface/text-generation-inference
updated in latest commit

View on GitHub

drbh created a review on a pull request on huggingface/text-generation-inference

View on GitHub

drbh pushed 1 commit to pr-2634-ci-branch huggingface/text-generation-inference
  • fix: adjust tool choice none logic, add test and small refactors c1fac74

View on GitHub

mfuntowicz pushed 47 commits to feat-backend-llamacpp huggingface/text-generation-inference
  • Remove compute capability lazy cell (#2580) Remove compute capability lock We are only calling the `get_cuda_capa... afc7ded
  • Update architecture.md (#2577) e790cfc
  • Update ROCM libs and improvements (#2579) * style * update torch * ix issues * fix clone * revert mkl ... f9e561e
  • Add support for GPTQ-quantized MoE models using MoE Marlin (#2557) This change add support for MoE models that use G... 90a1d04
  • feat: support phi3.5 moe (#2479) * feat: support phi3.5 moe model loading * fix: prefer llama base model and impr... 93a7042
  • Move flake back to tgi-nix `main` (#2586) d1f257a
  • MoE Marlin: support `desc_act` for `groupsize != -1` (#2590) This change uses the updated Marlin MoE kernel from vLL... 1c84a30
  • nix: experimental support for building a Docker container (#2470) * nix: experimental support for building a Docker... 584b4d7
  • Mllama flash version (#2585) * Working loading state. * Preprocessing. * Working state ? (Broke idefics1 tempo... d18ed5c
  • Max token capacity metric (#2595) * adding max_token_capacity_metric * added tgi to name of metric * Adding ma... 0204946
  • CI (2592): Allow LoRA adapter revision in server launcher (#2602) allow revision for lora adapters from launcher ... 2335459
  • Unroll notify error into generate response (#2597) * feat: unroll notify_error if no tool is choosen * fix: expec... d22b0c1
  • New release 2.3.1 (#2604) * New release 2.3.1 * Update doc number f6e2f05
  • Revert "Unroll notify error into generate response" (#2605) Revert "Unroll notify error into generate response (#259... 3011639
  • nix: example of local package overrides during development (#2607) 6810307
  • Add basic FP8 KV cache support (#2603) * Add basic FP8 KV cache support This change adds rudimentary FP8 KV cache... 2358c2b
  • Fix FP8 KV-cache condition (#2611) Update kv_cache.py 0da4df4
  • enable mllama in intel platform (#2610) Signed-off-by: Wang, Yi A <[email protected]> 57f9685
  • Upgrade minor rust version (Fixes rust build compilation cache) (#2617) * Upgrade minor rust version (Fixes rust bui... 8b295aa
  • Add support for fused MoE Marlin for AWQ (#2616) * Add support for fused MoE Marlin for AWQ This uses the updated... 6414248
  • and 27 more ...

View on GitHub

mht-sharma created a review on a pull request on huggingface/text-generation-inference
Thanks @danieldk, LGTM This make things clearer

View on GitHub

mht-sharma created a review comment on a pull request on huggingface/text-generation-inference
same for scalar

View on GitHub

mht-sharma created a review comment on a pull request on huggingface/text-generation-inference
Type hint?

View on GitHub

mht-sharma created a review on a pull request on huggingface/text-generation-inference
Thanks @danieldk, LGTM This makes things clearer

View on GitHub

mht-sharma created a review on a pull request on huggingface/text-generation-inference
Thanks @danieldk, LGTM This makes things clearer

View on GitHub

sywangyi created a comment on a pull request on huggingface/text-generation-inference
do you using ping cpu cores like " --cpuset-cpus=0-55"?

View on GitHub

Narsil created a comment on a pull request on huggingface/text-generation-inference
I'm really struggling to reproduce anything. I reproduce your command line with every argument (even though I don't understand why --privileged --net host --ipc host are actually required) and I...

View on GitHub

danieldk opened a pull request on huggingface/text-generation-inference
Make handling of FP8 scales more consisent
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...
danieldk created a branch on huggingface/text-generation-inference

maintenance/reciprocal-handling - Large Language Model Text Generation Inference

Narsil created a review comment on a pull request on huggingface/text-generation-inference
This is all copied from the original GTPQ code.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference
It's all copied from the cuda code.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference
It's all copied from the cuda code.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference
YEs this is the core idea of the fix. The logic was only working on TP=1, it's now working on TP>1 (to detect that g_idx is redundant and can be safely ignored)

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference
Not used.

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference
I think it's subtracting the first index, because say we have `g_idx` with `groupsize = 2` ``` [0 0 1 1 2 2 3 3] ``` If we have two shards, then it gets broken up into ``` [0 0 1 1] [2 ...

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference
Doesn't seem used?

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference
This block needs a comment what is computed here. The old code was already difficult to read, but I the way I read it is that it is checking whether `g_idx` incrementing indices by group (so there ...

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference
I don't know why this ignores the value of `desc_act` given by the configuration, isn't that the source of truth? Do we expect some models to use activation sorting but lying about it in the config...

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference
Isn't this guaranteed by how `self.out_features` is defined above?

View on GitHub

danieldk created a review on a pull request on huggingface/text-generation-inference

View on GitHub

Johnno1011 created a comment on an issue on huggingface/text-generation-inference
You could still use the openai.chat.completions.create but reset the chat history each time? For example: ``` def generate(prompt: str) -> ChatCompletion: messages = [ { ...

View on GitHub

Load more