Ecosyste.ms: Timeline
Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
After disabling prefix caching I seem the be getting but same response across different different machines
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
To disable prefix caching you have to set both `USE_PREFIX_CACHING=0` AND `PREFIX_CACHING=0` in v2.3.1
mfuntowicz pushed 6 commits to trtllm-stop-words huggingface/text-generation-inference
- chore(trtllm): create specific parallelconfig factory and logging init methods 75e4466
- chore(trtllm): define a macro for SizeType cast ea82247
- chore(trtllm): use GetParallelConfig b999c04
- chore(trtllm): minor refactoring 98dcde0
- chore(trtllm): validate there are enough GPus on the system for the desired model 1b56a33
- chore(trtllm): ensure max throughput scheduling policy is selected 4a0f05e
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
This also prevents you from using `ATTENTION=paged` since the prefix caching is always true which crashes the model shards on launch
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
waiting of #2676 to validate if this is a prefix caching issue but I have confirmed with LOG_LEVEL=debug that the exact same params and input render different results with seed set
sam-ulrich1 opened an issue on huggingface/text-generation-inference
PREFIX_CACHING=0 does not disable prefix caching in v2.3.1
### System Info Ubuntu 20.04 Host, Docker image v2.3.1 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ### Repro...sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
This unfortunately did not work for me on the docker image
sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
Awesome, thank you
sam-ulrich1 closed an issue on huggingface/text-generation-inference
Optionally log input tokens/prompt
### Feature request Optionally log the input prompt/tokens for improved debugging. ### Motivation I am currently attempting to debug why in a prod env I am getting garbage but when replicating t...sam-ulrich1 created a comment on an issue on huggingface/text-generation-inference
Sweet I'll give that a try
mfuntowicz pushed 6 commits to trtllm-stop-words huggingface/text-generation-inference
- chore(rebase): fix invalid references d73401a
- feat(trtllm): rewrite health to not account for current state 9afcb48
- chore(looper): cleanup a bit more b3d27e6
- feat(post_processing): max_new_tokens is const evaluated now 582551d
- chore(ffi):formatting 56cad9f
- feat(trtllm): add stop words handling # Conflicts: # backends/trtllm/lib/backend.cpp 8b8daac
mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
- chore(rebase): fix invalid references d73401a
mfuntowicz pushed 1 commit to trtllm-stop-words huggingface/text-generation-inference
- chore(rebase): fix invalid references 2c8ecdb
danieldk pushed 2 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
danieldk pushed 2 commits to feature/fp8-kv-cache-scale huggingface/text-generation-inference
claudioMontanari created a comment on an issue on huggingface/text-generation-inference
You should be able to disable prefix caching by starting the server with `PREFIX_CACHING=0`. That's how I got the `llama 3.2 vision` models to work.
claudioMontanari created a comment on an issue on huggingface/text-generation-inference
Prompts should be logged (as well as other info) if you start the server with `LOG_LEVEL=debug text-generation-launcher ...`. Hope this helps!
james-deee created a comment on an issue on huggingface/text-generation-inference
This is so bizarre that this is closed. You can aboslutely positively (pun intended) send a temperature of `0.0` to these models. Why in the world is this restricted in this?
mfuntowicz pushed 6 commits to trtllm-stop-words huggingface/text-generation-inference
- Revert "chore(trtllm): remove unused method" This reverts commit 31747163 f5b9ee3
- feat(trtllm): rewrite health to not account for current state 7a14185
- chore(looper): cleanup a bit more 2ab1a8b
- feat(post_processing): max_new_tokens is const evaluated now e4beada
- chore(ffi):formatting 983ecf1
- feat(trtllm): add stop words handling # Conflicts: # backends/trtllm/lib/backend.cpp f631742
mfuntowicz pushed 1 commit to trtllm-executor-thread huggingface/text-generation-inference
- Revert "chore(trtllm): remove unused method" This reverts commit 31747163 f5b9ee3
sywangyi created a comment on a pull request on huggingface/text-generation-inference
I am also curious about which the thread is kept looping while the process is close.
sywangyi created a comment on a pull request on huggingface/text-generation-inference
docker --version Docker version 26.1.3, build b72abbb lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 224 On-lin...
Narsil pushed 1 commit to auto_length huggingface/text-generation-inference
- Remove generated files. a31db04
Narsil pushed 1 commit to main huggingface/text-generation-inference
- break when there's nothing to read (#2582) Signed-off-by: Wang, Yi A <[email protected]> 058d306