Show HN: Skyvern – Browser automation using LLMs and computer vision https://ift.tt/fFJrGKh

Show HN: Skyvern – Browser automation using LLMs and computer vision Hey HN, we're building Skyvern ( https://www.skyvern.com ), an open-source tool that uses LLMs and computer vision to help companies automate browser-based workflows. We provide a natural-language API to automate repetitive manual workflows that happen within the companies' backoffices. You can check out our code and play with Skyvern here: https://ift.tt/GEJIaOe We talked to hundreds of companies about things they do in the background and found that most of them depend on repetitive manual workflows. The breadth of these workflows surprised us – most companies started off doing things manually, and eventually either hired people to scale the manual work, or wrote scripts using Selenium-like browser automation libraries. In these conversations, one common point stood out: scaling is a pain either way. Companies relying on hiring struggled to adjust team sizes with fluctuating demand. Companies using Selenium and similar tools had a different problem: it would take weeks to get a new workflow automated, and then would require ongoing maintenance any time the underlying websites changed because their XPath based interaction logic suddenly became invalid. We felt like there was a way to get the best of both worlds with LLMs. We could use LLMs to reason through a website’s layout, while preserving the advantage of traditional browser automations allowing it to scale alongside demand. This led us to build Skyvern with a few core functionalities: 1. Skyvern can operate on websites it’s never seen before by connecting visible elements with the natural language instructions provided to us. We use a blend of computer vision and DOM parsing to identify a set of possible actions on a website, and multi-modal LLMs to map the natural language instructions to the available actions on the page. 2. Skyvern is resistant to website layout changes, as it doesn’t depend on any predetermined XPaths or other selectors. If a layout ever changes, we can leverage the methodology in #1 to complete the user-specified goal. 3. Skyvern accepts a blob of information when navigating workflows. We rely on LLMs to reason through both the blob of information and the information on the screen to create real-time associations between user-supplied data and the information on the screen. a. For example: While generating a quote from Geico, they commonly ask “Were you eligible to drive at 21?”. The answer could be inferred from the driver receiving their license in 2012, and having a birth date of 1996. We’ve seen the above strategy adapt well to a number of use-cases that Skyvern is helping companies with today [1]: 1. Automating materials procurement by searching for, adding to cart, and transacting products through vendor websites that don’t have APIs 2. Registering accounts, filing forms, and searching for information on government websites (ex: registering franchise tax information for Delaware C-corps) 3. Generating insurance quotes by completing multi-step dynamic forms on insurance websites 4. Automating the job application process by mapping user-specified information (such as a Resume) to a job posting And there are some use-cases we’re actively looking to expand into: 1. Automating post-checkup data entry with patient data inside medical EHR systems (ie submitting billing codes, adding notes, etc) 2. Doing customer research ahead of discovery calls by analyzing landing pages and other metadata about a specific business We have a quick demo of Skyvern in action here, along with some instructions on running it locally ( https://ift.tt/Ktj64Df... ) We’re still very early and would love to get your feedback! [1] https://ift.tt/GdvrBxi... https://ift.tt/zNmnSPu March 14, 2024 at 11:31PM

Komentar

Postingan populer dari blog ini

Show HN: Create demos & guides just with a simple prompt https://ift.tt/HfWo3mz

Show HN: Interactive exercises for GNU grep, sed and awk https://ift.tt/OxeFwah

How To Navigate Transfers on the New T Third