Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

bad-beets starred huggingface/text-generation-inference
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
After disabling prefix caching I seem the be getting but same response across different different machines

View on GitHub

sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
To disable prefix caching you have to set both `USE_PREFIX_CACHING=0` AND `PREFIX_CACHING=0` in v2.3.1

View on GitHub

mfuntowicz pushed 6 commits to trtllm-stop-words huggingface/text-generation-inference
  • chore(trtllm): create specific parallelconfig factory and logging init methods 75e4466
  • chore(trtllm): define a macro for SizeType cast ea82247
  • chore(trtllm): use GetParallelConfig b999c04
  • chore(trtllm): minor refactoring 98dcde0
  • chore(trtllm): validate there are enough GPus on the system for the desired model 1b56a33
  • chore(trtllm): ensure max throughput scheduling policy is selected 4a0f05e

View on GitHub

sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
This also prevents you from using `ATTENTION=paged` since the prefix caching is always true which crashes the model shards on launch

View on GitHub

sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
waiting of #2676 to validate if this is a prefix caching issue but I have confirmed with LOG_LEVEL=debug that the exact same params and input render different results with seed set

View on GitHub

sam-ulrich1 opened an issue on huggingface/text-generation-inference
PREFIX_CACHING=0 does not disable prefix caching in v2.3.1
### System Info Ubuntu 20.04 Host, Docker image v2.3.1 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ### Repro...
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
This unfortunately did not work for me on the docker image

View on GitHub

sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
Awesome, thank you

View on GitHub

sam-ulrich1 closed an issue on huggingface/text-generation-inference
Optionally log input tokens/prompt
### Feature request Optionally log the input prompt/tokens for improved debugging. ### Motivation I am currently attempting to debug why in a prod env I am getting garbage but when replicating t...
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
Sweet I'll give that a try

View on GitHub

mfuntowicz pushed 6 commits to trtllm-stop-words huggingface/text-generation-inference
  • chore(rebase): fix invalid references d73401a
  • feat(trtllm): rewrite health to not account for current state 9afcb48
  • chore(looper): cleanup a bit more b3d27e6
  • feat(post_processing): max_new_tokens is const evaluated now 582551d
  • chore(ffi):formatting 56cad9f
  • feat(trtllm): add stop words handling # Conflicts: # backends/trtllm/lib/backend.cpp 8b8daac

View on GitHub

mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
  • chore(rebase): fix invalid references d73401a

View on GitHub

mfuntowicz pushed 1 commit to trtllm-stop-words huggingface/text-generation-inference
  • chore(rebase): fix invalid references 2c8ecdb

View on GitHub

lgf5090 starred huggingface/text-generation-inference
danieldk pushed 2 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
  • Add support for FP8 KV cache scales Since FP8 only has limited dynamic range, we can scale keys/values before storin... 4097a20
  • Update FP8 KV cache test to use checkpoint with scales 98efcb4

View on GitHub

danieldk pushed 2 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
  • Add support for FP8 KV cache scales Since FP8 only has limited dynamic range, we can scale keys/values before storin... a4cb3d3
  • Update FP8 KV cache test to use checkpoint with scales 08c0b3f

View on GitHub

claudioMontanari created a comment on an issue on huggingface/text-generation-inference
You should be able to disable prefix caching by starting the server with `PREFIX_CACHING=0`. That's how I got the `llama 3.2 vision` models to work.

View on GitHub

claudioMontanari created a comment on an issue on huggingface/text-generation-inference
Prompts should be logged (as well as other info) if you start the server with `LOG_LEVEL=debug text-generation-launcher ...`. Hope this helps!

View on GitHub

james-deee created a comment on an issue on huggingface/text-generation-inference
This is so bizarre that this is closed. You can aboslutely positively (pun intended) send a temperature of `0.0` to these models. Why in the world is this restricted in this?

View on GitHub

mfuntowicz pushed 6 commits to trtllm-stop-words huggingface/text-generation-inference
  • Revert "chore(trtllm): remove unused method" This reverts commit 31747163 f5b9ee3
  • feat(trtllm): rewrite health to not account for current state 7a14185
  • chore(looper): cleanup a bit more 2ab1a8b
  • feat(post_processing): max_new_tokens is const evaluated now e4beada
  • chore(ffi):formatting 983ecf1
  • feat(trtllm): add stop words handling # Conflicts: # backends/trtllm/lib/backend.cpp f631742

View on GitHub

mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
  • Revert "chore(trtllm): remove unused method" This reverts commit 31747163 f5b9ee3

View on GitHub

sywangyi created a comment on a pull request on huggingface/text-generation-inference
I am also curious about which the thread is kept looping while the process is close.

View on GitHub

sywangyi created a comment on a pull request on huggingface/text-generation-inference
docker --version Docker version 26.1.3, build b72abbb lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 224 On-lin...

View on GitHub

Narsil pushed 1 commit to auto_length huggingface/text-generation-inference

View on GitHub

Narsil pushed 1 commit to main huggingface/text-generation-inference

View on GitHub

Narsil closed a pull request on huggingface/text-generation-inference
break when there's nothing to read
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...
Narsil created a comment on a pull request on huggingface/text-generation-inference
Oh right, let's go with your fix then. I'm still not sure this is technically correct given the docs: https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read But Doing a "cleaner" fix in...

View on GitHub

ita9naiwa starred huggingface/text-generation-inference
Load more