Show HN: fenic – LLMs as dataframe operators, query meaning and structure https://ift.tt/yAlak3m

Show HN: fenic – LLMs as dataframe operators, query meaning and structure Hey friends. I'd like to share a project that's dear to me. fenic is a dataframe API with LLMs added as first-class citizens, a classic lazy dataframe API extended with new operators that are backed by LLMs. What this gets you is the ability to work with structured and unstructured data in the same context. Most importantly, the LLMs aren't integrates as opaque UDF black boxes. They're exposed as "semantic" operators that the planner can reason about alongside the classic ones. (There are examples and code snippets on the repo to see how everything works together) Why build this? I'm a data infra / systems person. When LLMs showed up, what I saw was a new type of compute that changes the characteristics of the workloads we deal with. I wanted to experiment with how our current systems can absorb these new workloads and compute types, and what it would take to make the DX as seamless as possible, that's where the UDF + arbitrary prompt was feeling too problematic. To support this properly, we had to introduce a few really cool things: New plan operators. You don't just send prompts at an LLM. You use operators like semantic join, semantic map and reduce, and semantic filter, among others. They mix with the classic operators, and because the planner sees them as real operators rather than black boxes, it can reorder work around them. Typed outputs. There's ergonomics to turn the output of a semantic operator straight into a typed dataframe column. A Pydantic schema for the LLM output becomes a typed struct column you can unnest, explode, and so on. New data types like a markdown data type. Markdown became an important way to share information with LLMs, even though it started life as a way to format text for presentation. It carries structure, and being able to access that structure the way you would a struct or JSON type adds to the developer experience I mentioned. Async UDFs. One of the more interesting shifts in workloads from the LLM explosion is the need to put heavily I/O-bound steps in your pipeline: fetching a response from an API, crawling a website, and so on. Async UDFs fill that gap, and the implementation handles the nuances for you: concurrency, retries, and the rest. An LLM-inference-aware planner and runtime. This is one of the parts I'm most excited about, and there's a lot still to do. Today: identical prompts within a batch collapse to a single model call, so duplicates cost zero tokens; requests are dispatched concurrently under per-provider rpm/tpm limits with retries and backoff; null and empty cells skip the model entirely; and you get token and cost metrics per operator. There's also an optional persistent response cache so re-runs skip the model. MCP as a new catalog primitive. Much like a registered view, you can register a dataframe pipeline as an MCP tool in the catalog. fenic then serves an MCP server with that pipeline as the tool's logic, executed over your data. These are just some of what's gone into fenic while experimenting with how LLMs can become part of our compute infrastructure. There's more, and plenty more to polish on what's already there. I've been using fenic for all sorts of things. On the small/personal end, I use it to take my podcast audio recordings and turn them into nicely structured tables of metadata I can research. On the heavier end, I use it as tooling for agents to analyze agent traces exported from Pydantic Logfire, to discover evals and turn them into reproducible artifacts in the form of dataframe pipelines. pip install fenic Repo: https://ift.tt/aVsZDmC Docs: https://docs.fenic.ai There's also a skill you can use with claude code, codex etc. to quickly get started with fenic in your favourite agentic coding environment. I'd love to hear your thoughts, criticism, and anything else that comes to mind. I'm here to answer questions. https://ift.tt/aVsZDmC June 30, 2026 at 11:39PM

Komentar

Postingan populer dari blog ini

Show HN: Guish – A GUI for constructing and executing Unix pipelines https://ift.tt/HrXz5ub

Launch HN: PillarPlus (YC W20) – Automatically create construction blueprints https://ift.tt/2yet5m3