sam-hey Events in 2024 - Ecosyste.ms: Timeline

When using the GermanDPR dataset with a CrossEncoder, the dataset is returning a dict instead of a str. This results in an error because the CrossEncoder expects text data as a string. The follo...

sam-hey pushed 8 commits to main sam-hey/mteb

December 17, 2024 7:26am

doc: colbert add score_function & doc section (#1592) * doc: colbert add score_function & doc section * doc: Update... 992b20b
Feat: add support for scoring function (#1594) * add support for scoring function * lint * move similarity to wrap... 8e6ee46
Add new models nvidia, gte, linq (#1436) * Add new models nvidia, gte, linq * add warning for gte-Qwen and nvidia m... 95d5ae5
Leaderboard: Refined plots (#1601) * Added embedding size guide to performance-size plot, removed shading on radar c... 0c9e046
fix: Leaderboard refinements (#1603) * Added explanation of aggregate measures * Added download button to result ... 6ecc86f
1.25.1 Automatically generated by python-semantic-release 5e9c468
Feat: Use similarity scores if available (#1602) * Use similarity scores if available * lint b81b584
Merge branch 'embeddings-benchmark:main' into main 4a75f71

View on GitHub

sam-hey pushed 1 commit to main sam-hey/ColBERT-training

December 16, 2024 4:36pm

dont wait index del dd83d32

View on GitHub

sam-hey pushed 1 commit to main sam-hey/mteb

December 14, 2024 7:25pm

doc: Update README.md Co-authored-by: Isaac Chung <[email protected]> 41fab68

View on GitHub

sam-hey pushed 1 commit to main sam-hey/mteb

December 14, 2024 6:38pm

doc: Update README.md Co-authored-by: Kenneth Enevoldsen <[email protected]> 3b1e15d

View on GitHub

sam-hey created a review comment on a pull request on embeddings-benchmark/mteb

December 14, 2024 6:35pm

I completely agree with you. In my opinion, it's a bit surprising that `ModelMeta.similarity_fn_name` isn't being utilized. Priorities: 1. Pass `score_function` directly to `run()` 2. Utilize ...

View on GitHub

sam-hey created a review on a pull request on embeddings-benchmark/mteb

December 14, 2024 6:35pm

View on GitHub

sam-hey created a review comment on a pull request on embeddings-benchmark/mteb

December 14, 2024 5:39pm

https://github.com/embeddings-benchmark/mteb/pull/1592 Resolved this issue. Apologies for the inconvenience, and thank you very much for your support!

View on GitHub

sam-hey created a review on a pull request on embeddings-benchmark/mteb

December 14, 2024 5:39pm

View on GitHub

sam-hey opened a pull request on embeddings-benchmark/mteb

December 14, 2024 5:37pm

doc: colbert add score_function & doc section

Closes: https://github.com/embeddings-benchmark/mteb/issues/1589 Improve the documentation and add information about the PLAID Index in PyLate.

sam-hey pushed 1 commit to main sam-hey/mteb

December 14, 2024 5:33pm

doc: colbert add score_function & doc section fb628ee

View on GitHub

sam-hey pushed 0 commits to main sam-hey/mteb

December 14, 2024 5:17pm

View on GitHub

sam-hey pushed 5 commits to main sam-hey/mteb

December 14, 2024 5:11pm

fix: Eval langs not correctly passed to monolingual tasks (#1587) * fix SouthAfricanLangClassification.py * add che... 373db74
1.24.2 Automatically generated by python-semantic-release eecc9f1
feat: Add ColBert (#1563) * feat: add max_sim operator for IR tasks to support multi-vector models * docs: add doc ... fdfdaef
1.25.0 Automatically generated by python-semantic-release b466051
Merge branch 'embeddings-benchmark:main' into main 1fbbd4e

View on GitHub

sam-hey created a comment on an issue on embeddings-benchmark/mteb

December 14, 2024 5:08pm

Hello @CZH-THU, passing a ColBERT model directly is not supported and will default to cosine similarity, which results in an error. You can refer to this example for guidance: [Using Late Inter...

View on GitHub

sam-hey pushed 1 commit to main sam-hey/RAGatouille-training

December 13, 2024 12:57pm

fix: python number to big for srsly fa3a7fb

View on GitHub

sam-hey created a comment on a pull request on embeddings-benchmark/mteb

December 13, 2024 10:06am

@Samoed ran additional tasks, and the results were as expected. Added a note to the documentation indicating that MaxSim becomes resource-intensive with large datasets. A solution is already und...

View on GitHub

sam-hey created a branch on sam-hey/mteb

December 13, 2024 9:00am

colbert-with-index - MTEB: Massive Text Embedding Benchmark

sam-hey pushed 17 commits to main sam-hey/mteb

December 12, 2024 3:32pm

fix(bm25s): search implementation (#1566) fix: bm25s implementation ac44e58
1.22.1 Automatically generated by python-semantic-release b8ff89c
docs: Fix dependency library name for bm25s (#1568) * fix: bm25s implementation * correct library name --------- ... 03347eb
fix: Add training dataset to model meta (#1561) * fix: Add training dataset to model meta Adresses #1556 * Add... 6489fca
feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564) * feat: batch reques... 1d21818
fix(publichealth-qa): ignore rows with `None` values in `question` or `answer` (#1565) 68bd8ac
1.23.0 Automatically generated by python-semantic-release 2550a27
fix: Added metadata for miscellaneous models (#1557) * Added script for generating metadata, and metadata for the li... ce8c175
1.23.1 Automatically generated by python-semantic-release f9ede12
fix: Added radar chart displaying capabilities on task types (#1570) * Added radar chart displaying capabilities on ... c49f838
1.23.2 Automatically generated by python-semantic-release e605c7b
feat: add new arctic v2.0 models (#1574) * feat: add new arctic v2.0 models * chore: make lint 53756ad
1.24.0 Automatically generated by python-semantic-release 27f7d8c
fix: Add namaa MrTydi reranking dataset (#1573) * Add dataset class and file requirements * pass tests * make ... 7b9b3c9
Update tasks table 1101db7
1.24.1 Automatically generated by python-semantic-release 9c0b208
Merge branch 'embeddings-benchmark:main' into main a3a126f

View on GitHub

sam-hey pushed 1 commit to main sam-hey/mteb

December 12, 2024 10:13am

doc: warning: higher resource usage for MaxSim 67e5200

View on GitHub

sam-hey created a comment on a pull request on embeddings-benchmark/mteb

December 12, 2024 8:18am

@isaac-chung and @Samoed, Thanks for your support! 😊 I’ve added the handling of `prompt_name` and integrated jinja-colbertv2 (128). Let me know your thoughts!

View on GitHub

sam-hey pushed 1 commit to main sam-hey/mteb

December 12, 2024 8:03am

feat: integrate Jinja templates for ColBERTv2 and add model prompt handling 8b64f4c

View on GitHub

sam-hey created a comment on a pull request on embeddings-benchmark/mteb

December 10, 2024 6:54pm

> @sam-hey Just pushed a few changes, and now the example script runs and gives `ndcg_at_10": 0.27872` on NFCorpus, which is close to the 0.338 reported in the [Colbert v2 paper](https://arxiv.org/...

View on GitHub

sam-hey pushed 1 commit to main sam-hey/mteb

December 10, 2024 4:23pm

fix: max_sim add pad_sequence 216d3f8

View on GitHub

sam-hey pushed 1 commit to main sam-hey/mteb

December 8, 2024 5:11pm

fix: pass is_query to pylate 1167517

View on GitHub

sam-hey created a comment on a pull request on embeddings-benchmark/mteb

December 8, 2024 2:20pm

@isaac-chung Yes, I put it to draft as I found a major bug. The Encode function needs is_query to work. I am really sorry - I am working on a fix...

View on GitHub