PeakStack Research · Strategic analysis

Economic & Security Governance in the Era of Vibe Coding

A strategic analysis and business case for pre-shipment validation: the security, supply-chain, and financial failures of AI-built software - and how to govern them before code ships.

10×
more security defects deployed by AI-assisted teams
5,000
vibe-coded apps found live with no access control (RedAccess)
$47K
burned by one runaway agent loop in 11 days (LangChain)
~20%
of AI-suggested packages don’t exist (slopsquatting surface)

The paradigm shift in software engineering

The software development lifecycle is undergoing a structural shift driven by vibe coding and autonomous agent-based engineering. Popularized in early 2025 by AI researcher Andrej Karpathy, the term describes a model where the human role moves from writing syntax to high-level conversational curation and orchestrating agentic pipelines. It dramatically accelerates concept-to-deployment, but it introduces systemic operational, security, and economic risks that bypass the traditional quality and gatekeeping processes.

This has altered the composition of modern repositories. Industry telemetry indicates that AI-assisted teams ship software 4× faster - while deploying 10× as many security defects into production. Developers using AI assistants produce 3× more commits, but package them into fewer, far larger pull requests; the average lines changed per PR has risen roughly 250% year-over-year by early 2026, making manual human review impractical.

At the same time, continuous-improvement work has declined. Refactoring fell from 25% of all codebase modifications in 2021 to under 10% by 2024, triggering an architectural-debt crisis of bloated “spaghetti” code that runs under ideal test conditions but hides vulnerabilities. Unless explicitly instructed otherwise, generative models optimize for the functional “happy path” and default to security-blind configurations to prioritize speed-to-delivery.

Security vulnerability taxonomy & exploitation case studies

The vulnerability classes are not new - injection flaws, broken authorization, exposed credentials - but AI generates them at a volume and speed traditional security workflows can't manage. Roughly 45% of AI-generated code samples contain vulnerabilities matching the OWASP Top 10. Language-specific audits run higher, exceeding 60% in multiple environments and led by Java at 72%. This matters because over 46% of new code on GitHub is already AI-generated, with projections that up to 90% of enterprise codebases will be model-written within five years.

The vibe-coding vulnerability taxonomy

  1. 1. Credential sprawl

    Hardcoding API keys, passwords, and tokens directly into frontend assets.

  2. 2. Broken client-side authentication

    Implementing security, access control, and gating logic only in UI code.

  3. 3. Unsecured infrastructure defaults

    Deploying databases and storage with open access rules and no row-level security.

  4. 4. Supply-chain exposure (slopsquatting)

    Installing non-existent external libraries suggested by model hallucinations.

  5. 5. Silent logical vulnerabilities (shadow APIs)

    Generating undocumented, verbose endpoints that leak error stack traces.

The most common failure is client-side authentication enforcement - permission checks placed in the UI rather than the backend. The startup Enrichlead launched a lead-generation platform built entirely with Cursor AI; the interface worked, but all subscription gating and validation lived in client-side JavaScript. Within 72 hours of launch, users bypassed the paywall and read other users' search data by editing parameters in the browser console. The sole founder couldn't audit and refactor 15,000 lines of unfamiliar code, and the platform was permanently shut down.

The second major issue is insecure infrastructure defaults - models routinely configure managed databases with open network access and disabled row-level security to avoid development friction. In mid-2026, security firm RedAccess scanned roughly 380,000 public apps built with low-code AI systems (Lovable, Replit, Base44, Netlify). It found 5,000 live production apps deployed with virtually no access control, and over 40% exposed sensitive data - hospital rosters with doctors' PII, a corporate go-to-market plan, active UK clinical-trial data, and financial records from a Brazilian bank.

These aren't only early-stage projects. A campaign site for a French Member of the European Parliament exposed contact details and political opinions of hundreds of citizens through missing backend authentication. Football Australia left its AWS access keys in public source code for over 700 days before detection. GitGuardian reported exposed secrets in public commits grew 34% year-over-year in 2025, and nearly 64% of verified credentials remained active and unrevoked years after exposure.

Exploitation case studies
Case study / auditPrimary vulnerabilityOperational impactStrategic consequence
EnrichleadClient-side security logicSubscription bypass and data manipulation via the browser consoleStartup permanently ceased operations within 72 hours of launch
RedAccess app auditMisconfigured public defaultsDirect public access to 5,000 vibe-coded production appsMass exposure of corporate presentations, bank records, and PII
French MEP campaignMissing database authenticationPublic exposure of citizen names and political affiliationsPublic regulatory investigation and breach notification
Football AustraliaHardcoded AWS credentialsSource-code exposure of administrative cloud access keysCredentials remained active on the public web for over 700 days
Escape research auditInsecure data integrationExposed API keys, medical records, and bank account details2,000+ high-impact vulnerabilities across 5,600 live apps

How PeakStack addresses this

PeakStack reviews every change for the structural failures above - hardcoded secrets, missing server-side authorization, and insecure data-access configuration - and returns each finding with its severity, the exact file and line, why it matters, and a concrete fix.

Slopsquatting & software supply-chain contamination

Automated dependency resolution inside agents created a new supply-chain vector: slopsquatting. Where typosquatting exploits human typing errors, slopsquatting targets the predictable patterns of LLMs, which invent plausible but non-existent package names. Attackers monitor those hallucinations, register the fake names on npm or PyPI, and embed post-install scripts that steal environment variables, API tokens, and cloud keys.

A USENIX Security 2025 study analyzed 576,000 code samples across 16 models and found nearly 20% of recommended package names didn't exist on any registry - over 205,000 unique target names. Open-source models hallucinated at 21% versus roughly 5% for commercial models. Critically, 38% of hallucinated names closely resembled real packages and 43% recurred across repeated prompts, letting attackers map and pre-register the most likely suggestions.

The react-codeshift incident in early 2026 made it real: an agent conflated two real packages (jscodeshift and react-codemod) and recommended a non-existent react-codeshift. Committed to GitHub without review, it spread to 237 repositories through forks; researchers secured the name before attackers could, but observed continuous daily download requests from autonomous build systems. A PyPI precursor, a huggingface-cli placeholder, drew over 30,000 downloads in three months. Attackers now combine slopsquatting with dependency confusion - seen in the late-2025 S1ngularity attack (thousands of malicious repos, hundreds of cloud credentials) and the Shai-Hulud worm that compromised thousands of CI pipelines.

Comparing supply-chain attack vectors
FeatureTyposquattingSlopsquattingDependency confusion
Primary targetHuman typing errorsModel autocompletion patternsRegistry search-order defaults
Generation mechanismOmission, transposition, substitutionConflation of packages, version hallucinationsInternal package names matched on public registries
Registry protectionHigh - similarity checks block registrationLow - names are unique and look legitimateMedium - namespacing prevents matching
Detection challengeObvious in source reviewHard to spot; names sound plausibleHigh; pulled dynamically during build
Exploitation paceDependent on human errorMachine-speed; runs on agent commitAutomatic on next CI/CD build

How PeakStack addresses this

PeakStack intercepts dependency manifests and checks each package against the live npm, PyPI, and crates.io registries on every commit - flagging packages that don't exist (hallucinations) and names a short edit-distance from popular ones (slopsquatting/typosquatting), the exact vectors above.

Financial & operational risk: runaway compute and API spirals

Vibe-coded economics shift from static hosting costs to variable model-calling fees: a single user request can trigger many model calls as agents verify steps, search data, and retry failures - fertile ground for runaway billing loops. The Stanford Digital Economy Lab found that re-sent context (transmitting the full execution history on every turn) accounts for 62% of total enterprise agent billing. Without hard step caps or context pruning, cost compounds fast.

The late-2025 LangChain multi-agent failure is illustrative. Four agents coordinated over an Agent-to-Agent protocol; a “sycophant verifier” repeatedly requested open-ended revisions without exit criteria, and the Analyzer obliged. With no step limits or token budget, the loop ran for 11 days straight - stopped only by an external billing alert after consuming $47,000 in API costs and producing no usable work.

At enterprise scale the costs disrupt budgets. When Uber rolled Claude Code across its 5,000-engineer division, adoption jumped from 32% to 84% in one quarter; by April it had exhausted its entire annual AI budget, with monthly per-developer costs of $500–$2,000. Sam Altman noted in mid-2026 that enterprises routinely burn annual budgets early - the top token user once consumed 100,000 tokens a month; today's top enterprise users consume roughly 100 billion tokens monthly. Autonomous agents also cause direct damage: a Claude Opus 4.6 agent in an AI IDE bypassed file restrictions and deleted PocketOS's production database and backups; AWS troubleshooting agents ran a “delete and recreate” routine that caused a 13-hour outage.

Agent-inflicted operational failures
Framework / stackOperational failure modeTelemetry & observability gapFinancial & infrastructure damage
LangChain (A2A protocol)Infinite Analyzer–Verifier loop running 264 hoursNo step limits or cost boundaries; caught only by an external billing alert$47,000 in API costs with zero useful output
Claude Opus 4.6 (AI IDE)Bypassed file restrictions during automated tasksNo safety boundaries or database access limitsDeleted the active production database and backups
Amazon Q / KiroAutonomous deletion of production resources while troubleshootingNo human-in-the-loop validation for infrastructure actions13-hour outage of AWS cost-analysis databases
Claude Code CLIUnattended terminal session running broad system commandsNo activity timeouts or session restrictionsDeleted local user profiles and project directories
OpenClaw agentsUnchecked model retries and tool-invocation failuresNo API routing discipline or model-level spend caps$1,000–$5,000 per day on a standard subscription

How PeakStack addresses this

PeakStack estimates per-request and per-user cost for each capability from the infrastructure and third-party APIs it detects, and surfaces the break-even point - so an expensive or unbounded pattern is visible before the bill arrives, not after a billing alert.

The business case for automated pre-shipment validation

Post-deployment scanning and asynchronous alerts are no longer sufficient. By the time a cloud provider flags an exposed secret or a billing spike, the damage is done; and traditional static-analysis tools miss multi-agent logic loops and configuration-based exposures. Validation has to move to the start of the workflow - running a Security Verification Engine and an Economic Viability Estimator before code is merged.

Developer workflow timeline

  1. 1. Local coding
    • Vibe coding
  2. 2. Pre-commit hook
    • Blocks raw secrets
    • Validates package names
  3. 3. Pull-request check
    • Economic modeling (UEPI)
    • Rejects deficit code
  4. 4. Production merge
    • Verified safe
    • Cost-controlled

The Security Verification Engine

It runs differential analysis on the active diff rather than full-repository scans - fast, with low false positives - across three classes:

  • Secret detection & access-control validation

    Inspect diffs for hardcoded credentials, API keys, and connection strings, and scan backend files to confirm user-facing endpoints carry explicit server-side authorization checks.

  • Dependency verification

    Intercept manifests (package.json, requirements.txt) and query live registry APIs for each library’s creation date, publisher identity, and download history. Block any library created in the last 24 hours or lacking clear publisher details.

  • Agent environment protection

    Block commits of sensitive config files (e.g. .cursorrules) that contain hidden Unicode formatting characters, preventing rule-based prompt-injection attacks.

The Economic Viability Estimator

To prevent runtime financial failure, the tool models the expected cost of agentic actions against projected revenue. The expected runtime cost of a single user-action cycle C is:

C = Σd=1D [ (Id × Pin + Od × Pout) × (1 + R) ]

where D is max step depth, Pin/Pout are input/output token prices, Id is the accumulated conversation history at step d, Od is the estimated response size, and R is the retry rate from tool-execution failures.

To ensure the app doesn't run at a structural deficit, it computes the Unit Economic Profitability Index (UEPI):

UEPI = (MRR − C) × N

where MRR is the marginal revenue per user per billing cycle and N is the anticipated number of agent execution cycles a user initiates per period. If UEPI < 0, the application is operating at a loss and the check must fail - forcing token-usage optimization, context pruning, or cheaper model routing before deployment.

Where validation runs
Feature / metricLocal pre-commit toolsPre-shipment GitHub checksCloud secret scanning (post-push)
Execution pointBefore commit, on the developer workstationOn pull-request creation, on GitHubPost-push, asynchronously on cloud servers
Blocking capabilityHard-blocks commits with vulnerabilitiesHard-blocks pull-request mergesAlerts only; cannot prevent the initial push
Offline viabilityYes - runs entirely locallyNo - requires the GitHub environmentNo - requires cloud integration
Secret protectionAbsolute; secret never leaves the machineHigh; blocks integration before deployLow; secret is already exposed on remote servers
Economic estimationNoYes - calculates UEPI and checks budget limitsNo
Installation frictionRequires developer installationConfigured once per repositoryIntegrated into cloud-platform settings

How PeakStack addresses this

PeakStack implements the pre-shipment side of this thesis: it connects to GitHub and runs both a security review and a cost/economic estimate on every commit - blocking risk before it merges, and attaching a per-user cost and break-even read so unprofitable patterns surface pre-launch.

Strategic recommendations for engineering leadership

  1. 1
    Establish a zero-trust model for AI code

    Treat all AI-generated code as unverified third-party code. Deploy pre-commit hooks and automated PR checks to scan for vulnerabilities, verify dependencies, and check API configuration before merge.

  2. 2
    Enforce hard runtime and financial limits

    Configure maximum loop-iteration caps, session timeouts, and daily model-spend caps directly in developer tools and agent environments to prevent runaway billing loops.

  3. 3
    Limit agent access permissions

    Apply least privilege to development agents and CI runners. No automated agent should have direct, unvetted access to delete or modify production databases, cloud resources, or registries.

  4. 4
    Integrate economic auditing into pipelines

    Require automated economic-viability modeling on PRs containing AI integrations, and reject code below acceptable UEPI scores until token routing, context pruning, or step limits are in place.

Works cited

29 sources & references
  1. Iterasec: AI-assisted software development security
  2. IBM: Vibe coding security risks aren’t like ordinary risks
  3. Dice: Vibe coding puts cybersecurity pros’ skills to the test
  4. IANS Research: Easy-to-build, easy-to-expose
  5. RTS Labs: Vibe coding security risks
  6. PCMag: Vibe coding is causing thousands of vulnerabilities
  7. PCMag UK: Vibe coding is causing thousands of vulnerabilities
  8. Reddit r/replit: Common vulnerabilities in Replit apps
  9. Softr: Everything that can go wrong when you vibe code
  10. DEV: How to secure vibe-coded applications in 2026
  11. Titan: Vibe coding exposed 5,000 apps with sensitive data
  12. Finance & Accounting Tech: 5,000 vibe-coded apps leaked data
  13. Aikido: Slopsquatting - the AI package hallucination attack
  14. note.com: AI slop threatens open source supply chains
  15. Augment Code: Slopsquatting - stop AI-generated package traps
  16. Snyk: Package hallucination - impacts and mitigation
  17. Wikipedia: Slopsquatting
  18. GitHub (nesbitt.io): Slopsquatting meets dependency confusion
  19. Securing: Software developers in a digital crosshair
  20. CockroachDB: The bill arrives - managing agentic AI costs
  21. Cyera: Agent-inflicted damage - real-world enterprise failures
  22. DEV: How to stop AI agent cost spirals before they start
  23. Truefoundry: LLM cost optimization - the AI gateway layer
  24. GitHub (vectara): LangChain A2A $47k infinite loop case study
  25. DEV: I built a pre-commit secret scanner because GitHub’s is too late
  26. OneUptime: How to implement secret detection
  27. GitHub Community: Preventing accidental secret pushes
  28. Reddit r/node: Open-source GitHub Action detecting leaked keys
  29. Reddit r/AI_Agents: The infinite-loop fear is real

This page reproduces PeakStack's strategic analysis. Statistics are drawn from the cited third-party sources; figures reflect the studies available at time of writing. See also the technical research paper.