Chief of Staff to the Agents

Business of Primary Care Business of Primary Care

Business of Primary Care

Full-time

United States

Apply for this job Contact employer

Job Description:

The company

Rosemary is building AI that delivers advance care planning (ACP) better than humans can. End-to-end, autonomously, at scale.

ACP is the conversation that determines how seriously ill people experience the rest of their lives — what they want, what they don't, who decides when they can't. Done well, it's the single highest-leverage moment in serious-illness care. Done badly, or not at all, it's the reason families end up making the worst decisions of their lives at 2am in an ICU.

Today it's mostly not done. The system pays lip service to the conversation and ignores the substance. The vendors who do it use humans on the phone, ten minutes a call, and the quality is what you'd expect.

We think a well-designed AI agent — talking with a patient over weeks, in their own home, on their own time — can do this better than a stranger with a clipboard. We have early evidence it can. We're now building the operating layer that turns "the demo works" into "ten thousand patients a month, every one of them getting the conversation they deserve."

The role

We need a Chief of Staff for our agents.

Think of it this way: if Rosemary's AI agents were a team of ACP Facilitators, this person would be their Program Director. Sets the standard. Measures the work. Runs the QA program. Owns escalation. Drives the feedback loop into the next version of the team.

The agents are the workforce. You manage the workforce.

You are not building the agents. We have engineers for that. You are not having the conversations yourself. The agents are for that. You are the person who ensures the agents are good, getting better every week, and you can prove it with data.

What you'll actually do

Define and own conversation quality. What does a great ACP conversation look like? You write the rubric. You build the eval set. You grade the work — at first by hand, then by training other models to grade for you. You publish the score every week. When it drops, you find out why before anyone else notices.

Run the feedback loop. Real patient conversations come in. Expert reviewers grade a sample. You turn those grades into training signal, prompt updates, escalation rules, and conversation design changes that ship to production. You close the loop in days, not quarters.

Own escalation and safety. When an agent encounters something it shouldn't handle alone, the handoff to a human has to be flawless. You design that protocol, you measure how often it fires, you tune it so it fires when it should and not when it shouldn't.

Be the source of truth on agent performance. Whenever anyone — me, the board, a payer customer, the clinical team — asks "how are the agents doing this week?" you have the answer in 30 seconds and the data to back it up.

Manage the agent roster. Different agents for different parts of the conversation. Different versions in different states of A/B test. You're the one who knows what's running where, what's winning, and what's about to be retired.

The bar

This role only works if you have both halves. Most candidates have one.

Half one: You've shipped production AI agents.

Not demos. Not internal tools. Not "I used Claude to write a script." Agents doing real work for real users where mistakes matter. You've built eval sets and watched them tell you uncomfortable truths. You've run prompt versioning, regression tests, and conversation grading at scale. You know the difference between an agent that benchmarks well and one that actually works in the wild.

If your answer to "how do you know your agent is good" starts with "vibes" or "the demo went well," this isn't the role.

Half two: You have real healthcare operations background.

RN, MD, NP, or operator inside a payer, provider, ACO, or risk-bearing primary care group. You've had thousands of these conversations yourself, or you've supervised people who have. You know what a good ACP conversation sounds like because you've sat in on the good ones and the bad ones. You can read a transcript and tell me, in 30 seconds, whether the agent handled it well.

Bonus, not required: palliative care, hospice, geriatrics, or serious-illness program experience. Familiarity with how value-based care groups actually run ACP today.

What we offer

A founding-team seat on the operating side of a category-defining AI company.
Equity heavy. Cash competitive for a Series Seed-stage company.
Remote, U.S.-based. We get together in person every 6–8 weeks.
Direct line to the founder. Small team. No layers.
The mission justifies the hours, and the hours are real.