The Expert Test. How to Tell a Chat Window From an Agent, and How to Keep AI From Sinking Your Bid.
Companion to “The Expert You Cannot Buy Off the Shelf.” The public issue makes the case. This is the operator’s layer: the three-rung diagnostic, the six-question SME-inside test, the pricing pressure-test against CALC and cost realism, the human-gate standard for AI-generated content, the nine agent-security questions, and the build-buy call that decides the small-business trap.
★ Premium Capture Corner
The full operator’s companion: the three-rung diagnostic that names what your “AI” actually is, the six-question SME-inside test for any vendor selling judgment, the pricing pressure-test that catches the 47-percent rate before cost realism does, the human-gate standard with the override-rate metric that proves a review is real, the nine agent-security questions to put in writing, and the what-to-do-this-week list. Free members see the framing; premium gets the full board.
See premium plansCapture Corner is the premium BD intelligence companion to Mission Meets Tech. Public-record sourced. Independent analysis. Not a recommendation, not vendor advocacy, not capture material. Built for federal health BD, capture, and proposal leaders who need analytical depth, not headlines.
This issue is the operator's companion to the public piece, "The Expert You Cannot Buy Off the Shelf." It turns the argument into the tests a capture lead runs before buying an AI agent, before letting AI output ship, and before signing a vendor's "agent." Public-record sourced. Not a recommendation, not vendor advocacy.
The public issue made the case: the expert who makes a federal AI agent safe is the product, not the cost the agent removes. Capture Corner turns that into the checks you run this week, on the tools you are buying, the output you are shipping, and the vendors selling you the word "agent."
Two failures from the public issue anchor everything here. A pricing sheet built on AI-invented rates, one line priced 47 percent under GSA for a master's-degree professional with ten years. A hundred AI-generated questions forwarded to a subcontractor for whom not one of them fit. Neither was a model failure. Both were the failure of the check that was supposed to stand between the tool and the bid. These are the checks.
The scale of the gap is now measured. Reading Deltek's 2026 Clarity numbers, GovCon analyst Jean Watterson puts it at 94 percent of firms using AI and 5 percent with governance behind it. Her conclusion is the one to keep: in a protest, the firm that can show a human reviewed and validated the AI's work is the one that survives.
1. The three-rung diagnostic: model, project, or agent?
Before any other decision, name what you actually have. Most "AI adoption" is a chat seat, and a chat seat carries none of the judgment a bid needs.
| What you have | What it is | The tell | Trust it with |
|---|---|---|---|
| A chat seat (Claude, ChatGPT) | A bare model | Answers anything, grounded in nothing, no memory of your rules | First drafts a human rewrites in full |
| A project | A model plus your data, prompts, and guardrails | Knows your context, holds to a defined task | Structured work inside set lanes, human reviews every output |
| An agent | A project that plans and acts across steps and tools | Takes actions, calls other systems, picks its own path | Only what an expert has bounded, gated, and can stop |
If your "agent" cannot show you the rules it enforces, the data it is grounded in, and the actions it will not take without a human, you have a chat window with a markup. Andrew Ng's benchmark is the reason this matters: a weaker model wired well beat a stronger model used raw, 95 percent against 67. The wiring is the product, and the wiring is what you are auditing in every section below.
2. The SME-inside test: is real expertise actually wired in?
Run this on any vendor selling you a proposal or capture agent, and on any internal build. The point is to find out whose judgment is inside it.
Ask, and require evidence:
- Who authored the judgment? Name the capture and proposal SMEs, their years, their win and loss record. If the only names are engineers, that is the tell.
- Show me the rules it enforces. Produce the labor-rate bands, the scope boundaries, the compliance logic, the discriminators it weights. A real build can show them. A wrapper cannot.
- What does it refuse to do without a human? Name the high-stakes actions that halt and hand back control.
- How is it grounded? Which authoritative sources does it reason over, the current schedule, the actual solicitation, real labor data, versus the base model's general knowledge.
- How does it get corrected, and re-trained when the model changes underneath? Ask who owns that loop and how often it runs.
- Who can tell when it has drifted? If the answer is no one, the agent is unsupervised.
Scoring: a vendor who can name the SMEs and show the encoded rules is selling you a project or an agent. A vendor who answers in model names and benchmarks is selling you a chat window. Buy accordingly.
3. The pricing pressure-test: cost realism before it ships
This is the 47-percent failure, turned into a procedure. Run it on any AI-built pricing input before it enters a bid.
- Treat the AI rates as a hypothesis, not an answer. The tool produced numbers. The numbers are unverified until a human walks them.
- Walk each labor category against the current GSA schedule and real market data. CALC rates are not-to-exceed ceilings, not market truth, and GSA's own inspector general found the CALC data incomplete, inaccurate, and duplicative. Use it as a sanity bound, not a source.
- Run the staffing math. At this rate, can you hire and retain a person who meets the labor-category qualifications? If the rate cannot fill the seat, the rate is wrong.
- Flag anything outside the defensible band for SME review before it goes in.
Red flags that should stop a sheet cold:
| Red flag | Why it is dangerous |
|---|---|
| A rate far below GSA for the stated qualifications | The 47-percent line. You cannot staff it |
| A labor category that does not map to the SOW | Fabricated category feeding distorted pricing |
| A category your SME does not recognize | The model invented it |
| Round numbers with no build-up | No basis of estimate behind them |
The standard behind this: FAR 15.404-1 lets the government test whether a price is realistic and downgrade a bid that is not. An under-market rate is not a competitive edge. It is an evaluation risk and a staffing failure waiting for award.
4. The human-gate standard for AI-generated content
This is the hundred-questions failure, turned into a rule. Nothing AI-generated leaves the building unread by the right human. The proposal field's own floor says the same: treat every AI output like a first draft from a talented junior analyst, with a human on it at every stage.
Who reads what:
- Pricing inputs go to the capture or pricing lead.
- Questions or content bound for the government or a teammate go to the person who owns that relationship and knows the scope.
- Technical narrative goes to the SME who owns the solution.
The relevance check, before anything sends. Confirm audience and scope. Is this for us, for the prime, or for the government? Does it fit our agreement? The hundred questions failed this one test, and it cost the relationship, not just the hour.
Measure that the gate is real. Track the override rate, how often the human actually changes the AI output. A near-zero override rate means a rubber stamp, not a review. Audit a sample of shipped AI content monthly. Microsoft's AI red team, after a year of testing agent systems, found human-in-the-loop bypass the most consistently exploited failure mode of all. The gate fails hardest where people trust it most.
Why the stakes are now legal. Hallucinated citations are drawing consequences. A construction firm's brief was struck at the ASBCA with more than seventy percent of citations inaccurate. GAO dismissed a protest as a sanction in what appears to be a first. A second GAO protest drew a warning over the hallmarks of AI invention. One tracker counts thirty-one AI-misuse filings in federal procurement disputes last year, twenty-three at GAO. A hallucination that reaches a filing is a sanction exposure with your firm's name on it.
It is coming into the solicitations. OMB M-24-18 advises agencies to consider requirements language asking vendors to report any proposed use of AI in their submissions. Build a gate you can describe and document, because you may soon have to.
The speed trap. The busy afternoon before a deadline is exactly when the gate gets skipped. Design it to survive the crunch, or it is not a control.
5. The agent-security questions for any vendor selling you an "agent"
If a vendor is selling autonomy, these are the questions that separate a built agent from a liability. Put them in writing.
- The trifecta. Does this agent touch private, PHI, or proprietary data, read untrusted input, and communicate externally? All three together are the exploitable combination. Microsoft published the same boundary in June 2026 as its Agents Rule of Two: never let an agent hold all three at once. Two security shops, one conclusion. Ask how they keep the agent under two.
- Scoped identity. Can the agent reach only what its task needs? A pricing agent should never be able to touch a personnel or pharmacy record.
- Runtime gating. Can a wrong action be caught and stopped in the moment before it executes?
- Human approval on high-stakes actions, defined by the builder and not overridable by the agent.
- Observability. Is PHI and proprietary data kept out of the logs?
- Hijacking posture. NIST red-teaming drove agent-hijacking success from 11 percent to 81 percent against the strongest method. Ask how they test against prompt injection.
- Shadow control. A Cloud Security Alliance survey found 82 percent of organizations had discovered agents running that they did not know about. A separate alliance survey found 47 percent had already had a security incident involving an agent, and most of them needed five hours or more to detect and respond. Ask whether they can see and stop every agent in the environment, and how long detection takes.
- The FAR gap. No FAR clause governs AI-agent security yet. NIST is developing SP 800-53 control overlays for agent systems through its COSAiS project, with initial public drafts expected in 2026 and the full series not finalized until 2027. No date is locked. Ask whether they build to CISA and OWASP guidance now, and whether they commit to the NIST overlays when they land.
- Model plurality. If the agent depends on a single frontier lab, ask what happens to your deployment if that lab's federal standing changes. A program that cannot survive one provider going dark is a continuity risk.
A vendor who answers all nine with specifics is selling an agent. A vendor who answers with a demo is selling a risk.
One obligation your readers carry that a general AI buyer does not. There may be no FAR clause for agent security yet, but if the agent touches PHI or CUI, the rules already attach. HIPAA's minimum-necessary standard and audit-control requirements govern PHI, and CMMC with NIST SP 800-171 and the new 800-172 Revision 3 govern CUI. Treat any agent that can reach email, documents, or a system of record as a privileged integration: role-scoped, logged, and denied unrestricted access by default.
6. Build, buy, and the small-business trap
Reliability decides this more than capability does. Sol Rashidi, a former chief AI officer, wrote in June that she shut down two of her own agents after weeks of work because they ran a failure rate near 50 percent. Her takeaway: automation is the easy part, reliability is the hard one. If a CAIO cannot hold an agent reliable inside her own shop, a three-person bid team will not either.
When to build with your SME. When the judgment is your differentiator, your win themes, your pricing discipline, your past-performance framing, you do not hand that to a vendor's general model. You encode it with your own expert. The seasoned shops say the same: keep your experts on the win themes, the solutioning, and the price, and let AI take the bios and the matrices.
When to buy. When the task is commodity workflow, buy, but only from a vendor who clears Section 2. The agent inherits whatever judgment was wired in. Buy the one with judgment in it.
The trap that catches small shops. The smaller the firm, the louder the pitch. The promise sold to a three-person business is that AI lets it bid like a thirty-person business, eighteen to twenty-four bids a quarter instead of six. The capability is real. What the three-person shop does not have is the second capture veteran who catches the agent when it invents a rate or misreads a scope. A large program survives a bad draft because ten people touch it. A small business bets the company on a draft that went out without a senior set of eyes. The tool does not supply that reviewer. You still have to. Do not let the velocity talk you out of the one review you can least afford to skip.
The goal here is better AI use, not less of it. Mark Freeman, a capture professional, drew the boundary cleanly: AI inside a disciplined pursuit process is a force multiplier, and the same AI in front of a broken go or no-go process is an accelerant. Every check in this issue keeps you on the multiplier side of that line.
7. What to do this week
- Inventory your stack against the three rungs. For every "AI" your team uses on bids, label it a chat seat, a project, or an agent, and write down what each is allowed to touch unsupervised.
- Run the SME-inside test on your top vendor. Send the six questions in Section 2. The answers tell you whether you bought judgment or a wrapper.
- Adopt the pricing pressure-test as a gate. No AI-built rate enters a bid without a walk against GSA and a staffing-math check. Make the four red flags a stop-the-line list.
- Write the human-gate standard down. Name who reads what, set a relevance check before anything sends, and start tracking override rates so you can prove the gate is real.
- Put the nine security questions in your next vendor evaluation. Score every "agent" pitch against them before you sign.
- If you are a small shop, name the one human who reads every bid before it ships, and protect that step from the deadline crunch. It is the cheapest insurance you have.
Editorial discipline note
Capture Corner is built to be useful, not provocative. It does not name preferred vendors. It does not recommend a build-or-buy choice for your firm. It does not characterize any vendor's product beyond what public records and the vendor's own claims support. It does not reveal nonpublic information. What it does is take the public issue's argument and turn it into the practitioner-level checks BD and capture leaders actually have to run. Use it accordingly. Vendor performance figures cited here are the vendors' own claims, labeled as such, and are competitive context, not endorsements. Practitioner statements are attributed to the named individuals who made them, and the 94 and 5 percent figures are Jean Watterson's reading of Deltek's 2026 Clarity report.
Mary
Mission Meets Tech Premium
Sources
[CC1] Jean Watterson, LinkedIn, May 19, 2026, citing Deltek's 2026 Clarity Report (94% of GovCon firms using AI, 5% with governance; the firm able to show a human reviewed and validated AI output is the one that survives a protest). https://www.linkedin.com/posts/jean-watterson_govcon-bidandproposal-aigovernance-activity-7462493307206639616-M1Pw
[CC2] Andrew Ng, "How Agents Can Improve LLM Performance," DeepLearning.AI, The Batch (HumanEval 48.1% / 67.0% / 95.1%). https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/
[CC3] U.S. General Services Administration, OASIS+ Buyers Guide / Complete Market Research (CALC+ rates as not-to-exceed ceiling rates). https://www.gsa.gov/buy-through-us/products-and-services/professional-services/buy-services/oasis-plus/buyers-guide/complete-market-research
[CC4] GSA Office of Inspector General, Report A180068, December 23, 2019 (CALC data incomplete, inaccurate, and duplicative). https://www.gsaig.gov/sites/default/files/audit-reports/A180068_1.pdf
[CC5] Federal Acquisition Regulation 15.404-1, "Proposal analysis techniques" (cost realism). https://www.acquisition.gov/far/15.404-1
[CC6] Coley GSA, "AI in Government Contracting: Risks and Benefits" (fabricated labor categories). https://www.coleygsa.com/ai-in-government-contracting-risks-benefits/
[CC7] GovBidLab, "AI Government Proposal Writing Compliance Guide," March 26, 2026 (treat output like a junior analyst's first draft; human review at every stage). https://govbidlab.com/blog/ai-government-proposal-writing-compliance-guide
[CC8] Armed Services Board of Contract Appeals, Huffman Construction, ASBCA Nos. 62591, 62783, October 23, 2025 (over 70% of citations inaccurate; brief struck). https://www.asbca.mil/LinkClick.aspx?fileticket=eAIy2KSL_Zg%3D&portalid=143
[CC9] Government Accountability Office, Oready, LLC, B-423649, September 25, 2025 (sanctions dismissal, reported as an apparent first), via GovCon Judicata. https://www.govconjudicata.com/single-post/alert-using-sanctions-gao-dismisses-protest-where-protester-used-non-existent-citations-and-cases
[CC10] Government Accountability Office, Bramstedt Surgical, B-424064, January 28, 2026 ("hallmarks" of AI-generated authority), via PilieroMazza. https://www.pilieromazza.com/beware-the-hallmarks-of-ai-recent-gao-decision-provides-cautionary-tale-for-protesters/
[CC11] Burr & Forman, "Gen-AI Misuse in Procurement Litigation" (31 filings in 2025; 23 at GAO). https://www.burr.com/government-contracting/gen-ai-misuse-in-procurement-litigation
[CC12] Office of Management and Budget, M-24-18, "Advancing the Responsible Acquisition of Artificial Intelligence in Government," September 24, 2024 (agencies may require vendors to report proposed AI use in submissions; verified at section 4(a)(i)). https://www.whitehouse.gov/wp-content/uploads/2024/10/M-24-18-AI-Acquisition-Memorandum.pdf
[CC13] Microsoft AI Red Team, "Updating the Taxonomy of Failure Modes in Agentic AI Systems," June 4, 2026 (human-in-the-loop bypass the most consistently exploited failure mode). https://www.microsoft.com/en-us/security/blog/2026/06/04/updating-taxonomy-failure-modes-agentic-ai-systems-year-red-teaming-taught-us/
[CC14] Simon Willison, "The Lethal Trifecta for AI Agents," June 16, 2025. https://simonw.substack.com/p/the-lethal-trifecta-for-ai-agents
[CC15] Microsoft Security, "Securing CI/CD in an Agentic World," June 5, 2026 (the "Agents Rule of Two": an agent should not simultaneously hold untrusted input, sensitive access, and the ability to change state or communicate externally). https://www.microsoft.com/en-us/security/blog/2026/06/05/securing-ci-cd-in-agentic-world-claude-code-github-action-case/
[CC16] NIST, "Strengthening AI Agent Hijacking Evaluations," January 2025 (attack success 11% to 81%). https://www.nist.gov/news-events/news/2025/01/technical-blog-strengthening-ai-agent-hijacking-evaluations
[CC17] Cloud Security Alliance / Token Security, "82% of Enterprises Have Unknown AI Agents in Their Environments," April 21, 2026. https://cloudsecurityalliance.org/press-releases/2026/04/21/new-cloud-security-alliance-survey-reveals-82-of-enterprises-have-unknown-ai-agents-in-their-environments
[CC18] Cloud Security Alliance, "AI Agent Security Starts with Scope Control," May 12, 2026 (47% had a security incident involving an AI agent in the past 12 months; 58% take five hours or more to detect and respond). https://cloudsecurityalliance.org/blog/2026/05/12/ai-agent-security-starts-with-scope-control
[CC19] NIST, "Announcing the AI Agent Standards Initiative," February 17, 2026 (promises research, guidelines, and further deliverables; does not commit to an "overlay" timeline). https://www.nist.gov/news-events/news/2026/02/announcing-ai-agent-standards-initiative-interoperable-and-secure
[CC20] Cloud Security Alliance research note, "Federal Agentic AI Security: NIST's Emerging Standards Initiative," March 30, 2026 (NIST COSAiS SP 800-53 control overlays still in development, no timeline announced, full series targeted for 2027; no FAR clause governs AI-agent security; FAR Part 39 is general IT policy). https://labs.cloudsecurityalliance.org/research/csa-research-note-nist-ai-agent-standards-federal-framework/
[CC21] OWASP, "AI Agent Security Cheat Sheet" (least-privilege tooling, explicit approval for high-impact or irreversible actions, treat external data as untrusted, log and redact). https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html
[CC22] HHS Office for Civil Rights, "Summary of the HIPAA Security Rule" (minimum-necessary standard; audit controls to record and examine activity in systems that contain or use ePHI). https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html
[CC23] NIST, Special Publication 800-172 Revision 3 (finalized May 13, 2026); CMMC and NIST SP 800-171 govern CUI handling for defense contractors. https://csrc.nist.gov/pubs/sp/800/172/r3/final
[CC24] GovDash, "GovCon AI: How AI Can Scale Your Proposal Team," January 13, 2026 (vendor claim: a three-person team handling 18 to 24 bids per quarter instead of six). https://www.govdash.com/blog/govcon-ai-how-ai-can-scale-your-proposal-team
[CC25] Markon, "Why AI Alone Won't Win Government Contracts," August 27, 2025 (keep SMEs on win themes, solutioning, and price; let AI take bios and matrices). https://www.markonsolutions.com/blog/why-ai-alone-wont-win-government-contracts-and-what-to-do-instead
[CC26] Sol Rashidi, LinkedIn, June 3, 2026 (shut down two of her own agents over a roughly 50% failure rate; automation is the easy part, reliability is the hard one). https://www.linkedin.com/posts/sol-rashidi-mba-a672291_on-monday-i-fired-2-agents-so-3-weeks-ago-activity-7468009797192224768-cY4B
[CC27] Mark Freeman, LinkedIn, May 14, 2026 (AI inside a disciplined pursuit process is a force multiplier; in front of a broken go/no-go process it is an accelerant). https://www.linkedin.com/posts/markafreeman1_ai-is-making-your-proposal-team-more-productive-activity-7460645133282349057-Leud
Want a custom deep-dive on any line in this brief?
This Capture Corner maps the three-rung diagnostic, the SME-inside test, the pricing pressure-test, the human-gate standard, the nine security questions, and the build-buy call. Where it stops short — vetting a specific vendor’s “agent” against the SME-inside and security tests, walking your own AI-built pricing sheet against CALC and the staffing math, drafting the human-gate SOP for your shop, or scoping what your own SME should encode into an internal build — the next layer of intelligence is custom by request.
I will pull the additional public-record sources, read the visible signal, and write a 4–6 page custom intelligence memo on the area you select. You name the question; I do the BD-grade analysis.
Buy a Custom Deep Dive — $50 →
Examples: "Expert Test Deep Dive — score [vendor] against the SME-inside and nine security questions" · "Deep Dive — pressure-test my AI-built pricing sheet against CALC and staffing math" · "Deep Dive — the human-gate SOP for my proposal shop" · "Deep Dive — what my SME should encode into our internal agent build"