I just installed this
Opened gradio, inseretd a 20 sec audio from a video game character talking (without music)
Gave it a text
and the result had an imaginary word at the beginning (hallucinati...
Possible solution:
1. Use less nfe_step for speed-quality trade-off.
2. Try distillation techniques.
3. Train a smaller model from scratch if single-language needed application scenario.
0....
Hi @v3ucn .
Yes, what I mean is:
will we need also change the parts in code
```
with open(file_metadata, "r", encoding="utf-8") as f:
data = f.read()
```
to `utf-8-sig`, as y...
> > Could you provide me with your Vietnamese vocab.txt? I haven't set it up yet, and I really need it for testing.
> > > [#57 (comment)](https://github.com/SWivid/F5-TTS/discussions/57#discussion...
> Could you provide me with your Vietnamese vocab.txt? I haven't set it up yet, and I really need it for testing.
>
> > [#57 (comment)](https://github.com/SWivid/F5-TTS/discussions/57#discussion...
# Can f5-tts inference more faster?
## environment:
Machine type:vps
python:12.3
OS:ubuntu 22.04
MEM:64G
vCPU:16
GPU: one of nvidia 4090
install: follow the installation step in README.md
...
cuz attention as a quadratic mem cost,
need to adjust current rough batchsampler taking longer samples into consideration, e.g. dynamically adjust threshold, smaller for longer samples.
Could you provide me with your Vietnamese vocab.txt? I haven't set it up yet, and I really need it for testing.
> [#57 (comment)](https://github.com/SWivid/F5-TTS/discussions/57#discussioncomment-...