Here are the complete logs:
```
Epoch 1/10: 67%|█████████████████████████████████████████████████████████████████████████▎ | 5143/7653 [05:15<1:32:12, 2.20s/st...
@SWivid This one I will not merge myself because I need your confirmation. This adds an AI Voice Chat feature using LLM Qwen2.5-3B (very fast), 3gb only, but needs pip install transformers_stream_g...
> > > > > Will you consider to make a Drag n Drop (via Gradio) UI easy to TRAIN other languages locally? I would like to train Hebrew language, but I'm not a programmer.
> > > > > If you'll make a...
> So it's possible for DynamicBatchSampler to sometimes exceed the frames_threshold.
The `self.get_frame_len()` is intended to get duration exactly for a queried index, while `__getitem__` do ca...
Our reproduced E2 model doesn't train with that scheme. Check E2 paper:
![image](https://github.com/user-attachments/assets/cad5427e-9794-4b85-8116-8d7411d07ccd)
We just use characters, no random...
Use lower case as we suggested in readme, or you are telling model to read letter by letter
Also check if reference audio uploaded correctly, will show waveform if so
When I try to generate text with ARPAbet phones in parenthesis like you see in the "Specifying the pronunciation without model re-training" of https://www.microsoft.com/en-us/research/project/e2-tt...