Ecosyste.ms: Timeline

Browse the timeline of events for every public repo on GitHub. Data updated hourly from GH Archive.

huggingface/text-generation-inference

drbh pushed 13 commits to support-qwen2-vl huggingface/text-generation-inference
  • CI job. Gpt awq 4 (#2665) * add gptq and awq int4 support in intel platform Signed-off-by: Wang, Yi A <yi.a.wang@... 153ff37
  • Make handling of FP8 scales more consisent (#2666) Change `fp8_quantize` so that we can pass around reciprocals ever... 5e0fb46
  • Test Marlin MoE with `desc_act=true` (#2622) Update the Mixtral GPTQ test to use a model with `desc_act=true` and `... 7f54b73
  • break when there's nothing to read (#2582) Signed-off-by: Wang, Yi A <[email protected]> 058d306
  • Add `impureWithCuda` dev shell (#2677) * Add `impureWithCuda` dev shell This shell is handy when developing some ... 9c9ef37
  • Make moe-kernels and marlin-kernels mandatory in CUDA installs (#2632) f58eb70
  • feat: natively support Granite models (#2682) * feat: natively support Granite models * Update doc 03c9388
  • hotfix: fix flashllama 27ff187
  • feat: allow any supported payload on /invocations (#2683) * feat: allow any supported payload on /invocations * u... 41c2623
  • flashinfer: reminder to remove contiguous call in the future (#2685) 1b914f3
  • Fix Phi 3.5 MoE tests (#2684) PR #2682 also fixed in issue in Phi MoE, but it changes the test outputs a bit. Fix t... 14a0df3
  • Add support for FP8 KV cache scales (#2628) * Add support for FP8 KV cache scales Since FP8 only has limited dyna... eab07f7
  • feat: add support for qwen2 vl model b735e79

View on GitHub

drbh created a branch on huggingface/text-generation-inference

support-qwen2-vl - Large Language Model Text Generation Inference

danieldk pushed 1 commit to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
  • Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels Performance and accuracy of these kernels are on pa... c6281a4

View on GitHub

danieldk opened a pull request on huggingface/text-generation-inference
Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels
# What does this PR do? Performance and accuracy of these kernels are on par (tested with Llama 70B and 405B). Removes a dependency and resolves some stability issues we have been seeing. **D...
danieldk pushed 1 commit to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
  • Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels Performance and accuracy of these kernels are on pa... 197d45e

View on GitHub

danieldk pushed 4 commits to feature/cc89-cutlass-w8a8 huggingface/text-generation-inference
  • flashinfer: reminder to remove contiguous call in the future (#2685) 1b914f3
  • Fix Phi 3.5 MoE tests (#2684) PR #2682 also fixed in issue in Phi MoE, but it changes the test outputs a bit. Fix t... 14a0df3
  • Add support for FP8 KV cache scales (#2628) * Add support for FP8 KV cache scales Since FP8 only has limited dyna... eab07f7
  • Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels bee95a3

View on GitHub

mfuntowicz pushed 1 commit to feat-backend-llamacpp huggingface/text-generation-inference
  • feat(backend): build and link through build.rs 84854a7

View on GitHub

danieldk deleted a branch huggingface/text-generation-inference

feature/fp8-kv-cache-scale

danieldk pushed 1 commit to main huggingface/text-generation-inference
  • Add support for FP8 KV cache scales (#2628) * Add support for FP8 KV cache scales Since FP8 only has limited dyna... eab07f7

View on GitHub

danieldk closed a pull request on huggingface/text-generation-inference
Add support for FP8 KV cache scales
# What does this PR do? Since FP8 only has limited dynamic range, we can scale keys/values before storing them into the cache (and unscale them in attention). To avoid rescaling the cache as the...
mht-sharma created a review on a pull request on huggingface/text-generation-inference
LGTM thanks!

View on GitHub

drbh opened a pull request on huggingface/text-generation-inference
fix: improve find_segments via numpy diff
This PR updates `find_segments` to use a numpy array rather than a list. This allow us to a more efficiently find the segments via masking. when testing the method in isolation this results in ...
drbh created a branch on huggingface/text-generation-inference

improve-find-segments-function - Large Language Model Text Generation Inference

nhplwww starred huggingface/text-generation-inference
danieldk deleted a branch huggingface/text-generation-inference

maintenance/phi35-test-fix

danieldk pushed 1 commit to main huggingface/text-generation-inference
  • Fix Phi 3.5 MoE tests (#2684) PR #2682 also fixed in issue in Phi MoE, but it changes the test outputs a bit. Fix t... 14a0df3

View on GitHub

danieldk closed a pull request on huggingface/text-generation-inference
Fix Phi 3.5 MoE tests
# What does this PR do? PR #2682 also fixed in issue in Phi MoE, but it changes the test outputs a bit. Fix this. <!-- Congratulations! You've made it this far! You're not quite done yet tho...
danieldk deleted a branch huggingface/text-generation-inference

maintenance/flashinfer-noncontiguous-reminder

danieldk pushed 1 commit to main huggingface/text-generation-inference
  • flashinfer: reminder to remove contiguous call in the future (#2685) 1b914f3

View on GitHub

danieldk closed a pull request on huggingface/text-generation-inference
flashinfer: reminder to remove contiguous call in the future
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...
danieldk opened a pull request on huggingface/text-generation-inference
flashinfer: reminder to remove contiguous call in the future
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, ...
danieldk created a review comment on a pull request on huggingface/text-generation-inference
Ah yes, good catch! I added an additional condition to `can_scale` to check that `ATTENTION=="flashinfer"`.

View on GitHub

danieldk created a review on a pull request on huggingface/text-generation-inference

View on GitHub

danieldk pushed 1 commit to feature/fp8-kv-cache-scale huggingface/text-generation-inference
  • `can_scale`: check that the attention is flashinfer a68fae0

View on GitHub

danieldk created a comment on an issue on huggingface/text-generation-inference
Made mandatory and installed through a `make install` in #2632, so should fixed in the next release. Feel free to reopen if the issue occurs after the next release.

View on GitHub

danieldk closed an issue on huggingface/text-generation-inference
No module named moe_kernel in Flash Attention Installation while compiling TGI2.3.1
**Task - Flash Attention Installation from Source. [Completed] Run- TGI2.3.1 with models that support for Flash attention enabled models.** [Issue does not occur in TGI2.2.0] **Error -** 202...
danieldk opened a pull request on huggingface/text-generation-inference
Fix Phi 3.5 MoE tests
# What does this PR do? PR #2682 also fixed in issue in Phi MoE, but it changes the test outputs a bit. Fix this. <!-- Congratulations! You've made it this far! You're not quite done yet tho...
danieldk created a branch on huggingface/text-generation-inference

maintenance/phi35-test-fix - Large Language Model Text Generation Inference

mht-sharma created a review comment on a pull request on huggingface/text-generation-inference
Should this be only done when ATTENTION is `flashinfer`, since scales are not passed to flash decoding and paged attention yet?

View on GitHub

Load more