Show HN: What's HN Working On – A structured dataset https://ift.tt/D8Eewyc

Show HN: What's HN Working On – A structured dataset The latest Ask HN: What are you working[1] on thread just dropped. And to give my own answer, building structured datasets! I wrote a quick scraper for the HN comments. Just pulling every top level comment along with its replies as a nested object. This ended up pulling 642 top level comments with about 458 replies. I created a posrgres db with this original data set. The replies I just concatenated together in the order they came in (with an indent field to mark what level comment it was. Then stringified the json array and added it to the db. I generated the structured data using my own tool of course (OmniAI[2]). And pulled out the following values: - project_category - Enum - PERSONAL_PROJECT, STARTUP, SELF_IMPROVEMENT, OTHER - is_open_source - Boolean - github_link - String - project_industry - Enum - SOFTWARE_DEVELOPMENT, HEALTHCARE, EDUCATION, TRANSPORTATION, etc. - one_liner - String - A one line pitch for the project - tech_stack - String[] - reply_sentiment - Num - Sentiment betwee 0 and 2 for the comment replies - demo_link - String - ai_project - Boolean [1] https://ift.tt/8GZ6Bho [2] https://getomni.ai/ https://ift.tt/zKVqXtP August 26, 2024 at 04:58AM

Komentar

Postingan populer dari blog ini

Show HN: Interactive exercises for GNU grep, sed and awk https://ift.tt/OxeFwah

Show HN: My Book Bulletproof TLS and PKI (Second Edition) Is Out https://ift.tt/5PZ9mxF