In the paper about PaliGemma, it is indicated that it supports tasks such as Image Captioning, Visual Question Answering, Detection, and Referring Expression Segmentation.
Can Swift support fine...
In the paper about PaliGemma, it is indicated that it supports tasks such as Image Captioning, Visual Question Answering, Detection, and Referring Expression Segmentation.
Can Llama-Factory sup...
Hi,
I'm excited about your work and I'd like to synchronize you more about the results of other VLM models on GMAI-MMBench **(VAL)**.
All our results are with VLMEvalKit default settings. I ...