@SWivid also about the vocab new from dataset
i think be good to extend with vocab for Emilia_ZH_EN_pinyin
for example if there is new symbols not in let to add
this from train from the scrat...
just another great update fix some stuf
first add create vocan from the dataset and you can see
![image](https://github.com/user-attachments/assets/9784504d-b772-4369-b275-5e8dc9dd7d19)
s...
Hi, thanks for the great work. I trained the model on my own dataset (Vietnamese, 1 speaker, 1.5 hours, just for testing) for about 70k steps. In [this comment,](https://github.com/SWivid/F5-TTS/is...
Could the following happen? Suppose we have these examples with these frames lengths and durations.
ex1: 100, 2.0
ex2: 50, 1.0
ex3: 10, 0.2
ex4: 100, 2.0
Suppose frames_threshold is 101. The...
> Use lower case as we suggested in readme, or you are telling model to read letter by letter Also check if reference audio uploaded correctly, will show waveform if so
done both unfortunately n...
@lpscr the `dev` branch is from this repo.
updates is mainly for fair comparison with other models with different train set.
as the trainset is of different size, so academic comparison general...
@jpgallegoar Hi, very cool chat!
I just test it, may need few tweaks with your help:
1. will the ref_text saved and not calling asr pipeline to do transcription again if not ref_audio not chang...