Show HN: I've Published 28B Molecule Embeddings on AWS Open Data https://ift.tt/tmlxj27

Show HN: I've Published 28B Molecule Embeddings on AWS Open Data I’ve finally finished a project that involved gathering 7 billion small molecules, each represented in SMILES notation and having fewer than 50 “heavy” non-hydrogen atoms. Those molecules were “fingerprinted”, producing 28 billion structural embeddings, using MACCS, PubChem, ECFP4, and FCFP4 techniques. These embeddings were indexed using Unum’s open-source tool USearch, to accelerate molecule search. This extensive dataset is now made available globally for free, thanks to AWS Open Data. You can find the complete data sheet and scripts for data visualization on GitHub. https://ift.tt/bVX2e53 November 21, 2023 at 11:30PM

Komentar

Postingan populer dari blog ini

Show HN: Interactive exercises for GNU grep, sed and awk https://ift.tt/OxeFwah

Show HN: Create demos & guides just with a simple prompt https://ift.tt/HfWo3mz