Why Meta Engineers Want This PM's AI Development Workflow

Zevi Arnovitz studied music. He couldn't write a single line of code. Yet he's shipped two live products solo, and Meta's engineering team asked him to teach them his workflow. That's not a typo.

In a recent Lenny's Podcast episode, Zevi broke down exactly how he went from zero coding knowledge to independently building products like StudyMate and Dibur2text using Cursor and Claude Code -- all within about a year. When you dig into his process, it's clear this isn't casual vibe coding. It's a structured, repeatable system that any PM can learn.

Here's what makes his approach different and what you can steal from it today.

Vibe Coding Sounds Great Until It Breaks

Vibe coding is a term coined by Andrej Karpathy in early 2025. It refers to describing what you want in plain language and letting AI generate the code. Tools like Bolt, Lovable, and Replit have made "build an app without coding" a mainstream pitch in 2026.

Zevi started there too. He'd type "add a payment feature" into a chat UI, and code would appear. It felt magical -- until it wasn't. Complex features like payment integration broke badly. The root cause? These tools' system prompts are designed to write code immediately, skipping the planning stage entirely.

This isn't just Zevi's experience. Research shows that roughly 45% of AI-generated code contains security flaws -- missing authentication, unvalidated inputs, exposed endpoints. When your only instruction is "build it," there's no mechanism to catch these gaps.

That realization pushed Zevi to switch to Cursor and Claude Code. "Code is just words. It's files on a computer," he explains. Once he internalized that, he stopped treating AI output as a black box and started owning every decision.

The 6-Step Slash Command Pipeline

The backbone of Zevi's workflow is Claude Code's slash commands -- markdown-based prompt templates stored as files and executed instantly with /command-name. He chains six of these into a development pipeline.

Step 1-2: Capture and Explore

Everything starts with /create-issue. When a bug or idea surfaces mid-work, this command captures it as a Linear issue without breaking flow. Context stays intact.

Next comes /exploration. Claude analyzes the problem and asks clarifying questions. Here's the clever part: Zevi's prompt explicitly instructs Claude to challenge his assumptions. It pushes back with "Are you sure that's right?" -- a deliberate safeguard against the bad assumptions non-technical builders are prone to making.

Step 3-4: Plan and Execute

/create-plan generates a structured markdown plan with a TLDR, key decisions, and task-by-task status tracking. This plan file is the linchpin of the entire system because it's shareable across AI models. The same plan can go to Claude for backend work and Gemini for frontend tasks.

/execute-plan kicks off implementation. Zevi matches models to their strengths: Cursor Composer for speed on straightforward tasks, Gemini 2.5 for UI work. Each model gets the same plan but handles its own domain.

Step 5-6: Review and Document

/review triggers Claude to self-review its own code. Then Codex (GPT) and Cursor Composer each run independent reviews. Finally, /peer-review feeds those external reviews back to Claude.

The result is a structured debate. Claude might respond: "This concern has been raised three times now -- it's an intentional design choice, not a bug." Or it accepts valid criticism and revises. You're watching AI models argue with each other, and the code gets better for it.

Why Multi-Model Peer Review Actually Works

This isn't just a fun experiment. A study evaluating LLMs for code review found that GPT-4o identified code correctness issues at a 68.50% rate, while Gemini 2.0 Flash caught 63.89%. Neither model alone is sufficient. They spot different problems from different angles.

Cross-validation across models catches what any single model misses. According to Addy Osmani's research on AI code review, teams using multi-model review workflows have reported up to 58% reductions in code review time while maintaining quality.

For PMs without engineering backgrounds, this is critical. You can't manually spot a security flaw in authentication logic. But you can set up a system where three AI models check each other's work.

Three AI Models, Three Team Members

Zevi describes each model as a distinct team personality, and the framing is surprisingly useful for deciding who does what.

Model	Role	Personality	Best For
Claude	CTO	Opinionated, communicative, collaborative. Pushes back but stays constructive. Needs prompt tuning to curb "people-pleaser" tendencies via `CLAUDE.md`.	Architecture, planning, complex logic
Codex (GPT)	Senior Engineer	Quiet, heads-down, doesn't explain much -- but fixes the gnarliest bugs.	Deep debugging, tricky edge cases
Gemini	Creative Scientist	Produces beautiful UI but works in terrifying ways (deletes entire dashboards, then rebuilds them). Process is chaotic; output is polished.	Frontend, UI/UX, visual components

This division works because the shared plan file acts as the coordination layer. Each model reads the same spec and executes its slice. It's essentially running a small engineering team where you're the PM -- which is, of course, exactly what PMs are trained to do.

What You Can Actually Use Tomorrow

Let's be honest: replicating Zevi's full workflow took him a year. But several pieces are immediately actionable.

Reusable slash commands for your team. Store prompt templates as markdown files in .claude/commands/. Instead of retyping instructions every session, run /command-name and get consistent results. These files are version-controlled and shareable across your entire team. According to Claude Code's documentation, slash commands support variable interpolation and can be chained for complex workflows.

Multi-model review beyond code. The peer review pattern works for reports, strategy docs, and content. Have Claude draft something, ask GPT to critique it, then show Claude the critique. The back-and-forth consistently produces stronger output than any single model alone.

Failure-driven documentation. Zevi's /update-docs command is underrated. When AI makes the same mistake twice, he updates CLAUDE.md or project docs so it doesn't happen a third time. It's system-level learning -- you're training your workflow, not just prompting a model.

So Should PMs Learn to Code Now?

Zevi's answer is clear: a PM's job is to deliver the right solution to users as fast as possible. If AI is the fastest path, use it. But the moment you say "the AI built it" as an excuse for poor quality, you've failed. Ownership and accountability still belong to humans.

In enterprise environments, there are real limits. Complex database migrations aren't PM territory. But isolated UI projects where a PM opens a pull request and asks an engineer for final review? That's already happening. The fact that Meta engineers sought out Zevi's workflow validates this shift.

This isn't about learning to code. It's about managing AI like a team, designing repeatable workflows, and taking responsibility for what ships. Those are core PM skills. The tools just changed.

Frequently Asked Questions

What is vibe coding and why is it risky for PMs?

Vibe coding means describing what you want in natural language and letting AI generate code without manual review. It's risky because AI-generated code frequently contains security flaws -- research indicates roughly 45% has vulnerabilities. Without a structured review process, non-technical builders have no way to catch these issues before they reach users.

Can a non-technical PM really ship production-quality products with AI?

Yes, but not by prompting alone. Zevi Arnovitz's success comes from a disciplined 6-step workflow with planning, multi-model peer review, and systematic documentation. The key insight is treating AI models as team members who need coordination, not magic boxes that produce finished products.

What tools does Zevi Arnovitz use in his AI development workflow?

Zevi primarily uses Claude Code for planning, architecture, and code review; Cursor Composer for fast execution of straightforward tasks; and Gemini 2.5 for UI and frontend work. He also uses Codex (GPT) as an independent code reviewer in his multi-model peer review pipeline.

How do slash commands work in Claude Code?

Slash commands are reusable prompt templates saved as markdown files in your project's .claude/commands/ directory. You execute them by typing /command-name in Claude Code. They support variables, can be chained together, and are version-controlled with your codebase -- making them shareable across teams.

Is this workflow practical for PMs at large companies?

Parts of it are immediately applicable. Reusable slash commands, multi-model content review, and failure-driven documentation work at any scale. For actual code contributions, Zevi recommends starting with isolated UI projects where you can open a PR and have an engineer do final review -- a pattern already adopted at companies like Meta.

The Bottom Line

Zevi Arnovitz's story isn't about a PM who learned to code. It's about a PM who learned to manage AI -- assigning roles, enforcing review processes, building reusable systems, and owning the output. The 6-step slash command pipeline and multi-model peer review aren't hacks. They're project management applied to AI tools.

The barrier to building products isn't technical skill anymore. It's the discipline to treat AI as a team that needs structure, not a genie that grants wishes. And that's a skill PMs already have.

For more AI trends and analysis, visit aboutcorelab.blogspot.com.

aboutcorelab

Search This Blog