Show HN: I've Published 28B Molecule Embeddings on AWS Open Data https://ift.tt/tmlxj27
Show HN: I've Published 28B Molecule Embeddings on AWS Open Data I’ve finally finished a project that involved gathering 7 billion small molecules, each represented in SMILES notation and having fewer than 50 “heavy” non-hydrogen atoms. Those molecules were “fingerprinted”, producing 28 billion structural embeddings, using MACCS, PubChem, ECFP4, and FCFP4 techniques. These embeddings were indexed using Unum’s open-source tool USearch, to accelerate molecule search. This extensive dataset is now made available globally for free, thanks to AWS Open Data. You can find the complete data sheet and scripts for data visualization on GitHub. https://ift.tt/bVX2e53 November 21, 2023 at 11:30PM
Komentar
Posting Komentar