huggingface/text-generation-inference Events in 2024 - Ecosyste.ms: Timeline

drbh created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 3:37pm

View on GitHub

drbh created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 3:36pm

updated in latest commit

View on GitHub

drbh created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 3:36pm

View on GitHub

drbh pushed 1 commit to pr-2634-ci-branch huggingface/text-generation-inference

October 18, 2024 3:36pm

fix: adjust tool choice none logic, add test and small refactors c1fac74

View on GitHub

mfuntowicz pushed 47 commits to feat-backend-llamacpp huggingface/text-generation-inference

October 18, 2024 3:26pm

Remove compute capability lazy cell (#2580) Remove compute capability lock We are only calling the `get_cuda_capa... afc7ded
Update architecture.md (#2577) e790cfc
Update ROCM libs and improvements (#2579) * style * update torch * ix issues * fix clone * revert mkl ... f9e561e
Add support for GPTQ-quantized MoE models using MoE Marlin (#2557) This change add support for MoE models that use G... 90a1d04
feat: support phi3.5 moe (#2479) * feat: support phi3.5 moe model loading * fix: prefer llama base model and impr... 93a7042
Move flake back to tgi-nix `main` (#2586) d1f257a
MoE Marlin: support `desc_act` for `groupsize != -1` (#2590) This change uses the updated Marlin MoE kernel from vLL... 1c84a30
nix: experimental support for building a Docker container (#2470) * nix: experimental support for building a Docker... 584b4d7
Mllama flash version (#2585) * Working loading state. * Preprocessing. * Working state ? (Broke idefics1 tempo... d18ed5c
Max token capacity metric (#2595) * adding max_token_capacity_metric * added tgi to name of metric * Adding ma... 0204946
CI (2592): Allow LoRA adapter revision in server launcher (#2602) allow revision for lora adapters from launcher ... 2335459
Unroll notify error into generate response (#2597) * feat: unroll notify_error if no tool is choosen * fix: expec... d22b0c1
New release 2.3.1 (#2604) * New release 2.3.1 * Update doc number f6e2f05
Revert "Unroll notify error into generate response" (#2605) Revert "Unroll notify error into generate response (#259... 3011639
nix: example of local package overrides during development (#2607) 6810307
Add basic FP8 KV cache support (#2603) * Add basic FP8 KV cache support This change adds rudimentary FP8 KV cache... 2358c2b
Fix FP8 KV-cache condition (#2611) Update kv_cache.py 0da4df4
enable mllama in intel platform (#2610) Signed-off-by: Wang, Yi A <[email protected]> 57f9685
Upgrade minor rust version (Fixes rust build compilation cache) (#2617) * Upgrade minor rust version (Fixes rust bui... 8b295aa
Add support for fused MoE Marlin for AWQ (#2616) * Add support for fused MoE Marlin for AWQ This uses the updated... 6414248
and 27 more ...

View on GitHub

mht-sharma created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 2:49pm

Thanks @danieldk, LGTM This make things clearer

View on GitHub

mht-sharma created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 2:39pm

same for scalar

View on GitHub

mht-sharma created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 2:39pm

Type hint?

View on GitHub

mht-sharma created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 2:43pm

Thanks @danieldk, LGTM This makes things clearer

View on GitHub

mht-sharma created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 2:43pm

Thanks @danieldk, LGTM This makes things clearer

View on GitHub

sywangyi created a comment on a pull request on huggingface/text-generation-inference

October 18, 2024 2:36pm

do you using ping cpu cores like " --cpuset-cpus=0-55"?

View on GitHub

Narsil created a comment on a pull request on huggingface/text-generation-inference

October 18, 2024 2:25pm

I'm really struggling to reproduce anything. I reproduce your command line with every argument (even though I don't understand why --privileged --net host --ipc host are actually required) and I...

View on GitHub

danieldk opened a pull request on huggingface/text-generation-inference

October 18, 2024 2:18pm

Make handling of FP8 scales more consisent

# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...

danieldk created a branch on huggingface/text-generation-inference

October 18, 2024 2:18pm

maintenance/reciprocal-handling - Large Language Model Text Generation Inference

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 2:04pm

This is all copied from the original GTPQ code.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 2:04pm

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:42pm

It's all copied from the cuda code.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 1:42pm

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:42pm

It's all copied from the cuda code.

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 1:42pm

View on GitHub

Narsil created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:42pm

YEs this is the core idea of the fix. The logic was only working on TP=1, it's now working on TP>1 (to detect that g_idx is redundant and can be safely ignored)

View on GitHub

Narsil created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 1:42pm

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:30pm

Not used.

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:26pm

I think it's subtracting the first index, because say we have `g_idx` with `groupsize = 2` ``` [0 0 1 1 2 2 3 3] ``` If we have two shards, then it gets broken up into ``` [0 0 1 1] [2 ...

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:30pm

Doesn't seem used?

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:23pm

This block needs a comment what is computed here. The old code was already difficult to read, but I the way I read it is that it is checking whether `g_idx` incrementing indices by group (so there ...

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:19pm

I don't know why this ignores the value of `desc_act` given by the configuration, isn't that the source of truth? Do we expect some models to use activation sorting but lying about it in the config...

View on GitHub

danieldk created a review comment on a pull request on huggingface/text-generation-inference

October 18, 2024 1:12pm

Isn't this guaranteed by how `self.out_features` is defined above?

View on GitHub

danieldk created a review on a pull request on huggingface/text-generation-inference

October 18, 2024 1:31pm

View on GitHub

Johnno1011 created a comment on an issue on huggingface/text-generation-inference

October 18, 2024 1:18pm

You could still use the openai.chat.completions.create but reset the chat history each time? For example: ``` def generate(prompt: str) -> ChatCompletion: messages = [ { ...

View on GitHub