> According to my tests, ort_session_A and ort_session_C together take up less than 1% of the time cost, while ort_session_B occupies the majority of the time.
Yes and is why inference speed is ...
Mel is not sufficient for quality generation, none of codecs will work well, you'd better consider something token-based.
If you stick with MEL, firefly-gan from fishaudio is better than bigvgan.
Thank you for your testing. However, the setup for the English version may need to be answered by the original author of the F5-TTS project. The code for ONNX export and execution is based on the o...