Show HN: What's HN Working On – A structured dataset https://ift.tt/D8Eewyc
Show HN: What's HN Working On – A structured dataset The latest Ask HN: What are you working[1] on thread just dropped. And to give my own answer, building structured datasets! I wrote a quick scraper for the HN comments. Just pulling every top level comment along with its replies as a nested object. This ended up pulling 642 top level comments with about 458 replies. I created a posrgres db with this original data set. The replies I just concatenated together in the order they came in (with an indent field to mark what level comment it was. Then stringified the json array and added it to the db. I generated the structured data using my own tool of course (OmniAI[2]). And pulled out the following values: - project_category - Enum - PERSONAL_PROJECT, STARTUP, SELF_IMPROVEMENT, OTHER - is_open_source - Boolean - github_link - String - project_industry - Enum - SOFTWARE_DEVELOPMENT, HEALTHCARE, EDUCATION, TRANSPORTATION, etc. - one_liner - String - A one line pitch for the project - tech_stack - String[] - reply_sentiment - Num - Sentiment betwee 0 and 2 for the comment replies - demo_link - String - ai_project - Boolean [1] https://ift.tt/8GZ6Bho [2] https://getomni.ai/ https://ift.tt/zKVqXtP August 26, 2024 at 04:58AM
Komentar
Posting Komentar