Hi, thanks for the great work. I trained the model on my own dataset (Vietnamese, 1 speaker, 1.5 hours, just for testing) for about 70k steps. In [this comment,](https://github.com/SWivid/F5-TTS/is...
Could the following happen? Suppose we have these examples with these frames lengths and durations.
ex1: 100, 2.0
ex2: 50, 1.0
ex3: 10, 0.2
ex4: 100, 2.0
Suppose frames_threshold is 101. The...
> Use lower case as we suggested in readme, or you are telling model to read letter by letter Also check if reference audio uploaded correctly, will show waveform if so
done both unfortunately n...
@lpscr the `dev` branch is from this repo.
updates is mainly for fair comparison with other models with different train set.
as the trainset is of different size, so academic comparison general...
@jpgallegoar Hi, very cool chat!
I just test it, may need few tweaks with your help:
1. will the ref_text saved and not calling asr pipeline to do transcription again if not ref_audio not chang...
@SWivid @jpgallegoar @JarodMica @lpscr @cocktailpeanut, are there any difference in multi-style and podcast tabs other than markup format?
### Podcast
Speaker1: Hello
Speaker2: Hi
...
### ...