Hey Deepchem’ers,
You may be interested in a data challenge i’m helping to organize at this years NeurIPS 2021 in large scale similarity search. A lot of databases are now storing neural embeddings as representations of objects - including images, text documents, and even molecule embeddings. These databases are already getting quite large (well into the billions of records.). Approximate nearest neighbor algorithms are a class of algorithms that can efficiently search those databases with low latency.
The challenge will evaluate participant algorithms across a range of datasets and attempts to standardize important benchmarks such as recall-vs-throughput, power consumption, and hardware cost.
There is more information here if you are interested in learning more: https://medium.com/big-ann-benchmarks/neurips-2021-announcement-the-billion-scale-approximate-nearest-neighbor-search-challenge-72858f768f69
Thanks!