The Business & Technology Network
Helping Business Interpret and Use Technology
S M T W T F S
 
 
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
 
28
 
29
 
30
 
31
 

The era of agentic AI demands a data constitution, not better prompts

DATE POSTED:January 25, 2026

The industry consensus is that 2026 will be the year of "agentic AI." We are rapidly moving past chatbots that simply summarize text. We are entering the era of autonomous agents that execute tasks. We expect them to book flights, diagnose system outages, manage cloud infrastructure and personalize media streams in real-time.

As a technology executive overseeing platforms that serve 30 million concurrent users during massive global events like the Olympics and the Super Bowl, I have seen the unsexy reality behind the hype: Agents are incredibly fragile.

Executives and VCs obsess over model benchmarks. They debate Llama 3 versus GPT-4. They focus on maximizing context window sizes. Yet they are ignoring the actual failure point. The primary reason autonomous agents fail in production is often due to data hygiene issues.

In the previous era of "human-in-the-loop" analytics, data quality was a manageable nuisance. If an ETL pipeline experiences an issue, a dashboard may display an incorrect revenue number. A human analyst would spot the anomaly, flag it and fix it. The blast radius was contained.

In the new world of autonomous agents, that safety net is gone.

If a data pipeline drifts today, an agent doesn't just report the wrong number. It takes the wrong action. It provisions the wrong server type. It recommends a horror movie to a user watching cartoons. It hallucinates a customer service answer based on corrupted vector embeddings.

To run AI at the scale of the NFL or the Olympics, I realized that standard data cleaning is insufficient. We cannot just "monitor" data. We must legislate it.

A solution to this specific problem could be in the form of a ‘data quality – creed’ framework. It functions as a 'data constitution.' It enforces thousands of automated rules before a single byte of data is allowed to touch an AI model. While I applied this specifically to the streaming architecture at NBCUniversal, the methodology is universal for any enterprise looking to operationalize AI agents.

Here is why "defensive data engineering" and the Creed philosophy are the only ways to survive the Agentic era.

The vector database trap

The core problem with AI Agents is that they trust the context you give them implicitly. If you are using RAG, your vector database is the agent’s long-term memory.

Standard data quality issues are catastrophic for vector databases. In traditional SQL databases, a null value is just a null value. In a vector database, a null value or a schema mismatch can warp the semantic meaning of the entire embedding.

Consider a scenario where metadata drifts. Suppose your pipeline ingests video metadata, but a race condition causes the "genre" tag to slip. Your metadata might tag a video as "live sports," but the embedding was generated from a "news clip." When an agent queries the database for "touchdown highlights," it retrieves the news clip because the vector similarity search is operating on a corrupted signal. The agent then serves that clip to millions of users.

At scale, you cannot rely on downstream monitoring to catch this. By the time an anomaly alarm goes off, the agent has already made thousands of bad decisions. Quality controls must shift to the absolute "left" of the pipeline.

The "Creed" framework: 3 principles for survival

The Creed framework is expected to act as a gatekeeper. It is a multi-tenant quality architecture that sits between ingestion sources and AI models.

For technology leaders looking to build their own "constitution," here are the three non-negotiable principles I recommend.

1. The "quarantine" pattern is mandatory: In many modern data organizations, engineers favor the "ELT" approach. They dump raw data into a lake and clean it up later. For AI Agents, this is unacceptable. You cannot let an agent drink from a polluted lake.

The Creed methodology enforces a strict "dead letter queue." If a data packet violates a contract, it is immediately quarantined. It never reaches the vector database. It is far better for an agent to say "I don't know" due to missing data than to confidently lie due to bad data. This "circuit breaker" pattern is essential for preventing high-profile hallucinations.

2. Schema is law: For years, the industry moved toward "schemaless" flexibility to move fast. We must reverse that trend for core AI pipelines. We must enforce strict typing and referential integrity.

In my experience, a robust system requires scale. The implementation I oversee currently enforces more than 1,000 active rules running across real-time streams. These aren't just checking for nulls. They check for business logic consistency.

  • Example: Does the "user_segment" in the event stream match the active taxonomy in the feature store? If not, block it.

  • Example: Is the timestamp within the acceptable latency window for real-time inference? If not, drop it.

3. Vector consistency checks This is the new frontier for SREs. We must implement automated checks to ensure that the text chunks stored in a vector database actually match the embedding vectors associated with them. "Silent" failures in an embedding model API often leave you with vectors that point to nothing. This causes agents to retrieve pure noise.

The culture war: Engineers vs. governance

Implementing a framework like Creed is not just a technical challenge. It is a cultural one.

Engineers generally hate guardrails. They view strict schemas and data contracts as bureaucratic hurdles that slow down deployment velocity. When introducing a data constitution, leaders often face pushback. Teams feel they are returning to the "waterfall" era of rigid database administration.

To succeed, you must flip the incentive structure. We demonstrated that Creed was actually an accelerator. By guaranteeing the purity of the input data, we eliminated the weeks data scientists used to spend debugging model hallucinations. We turned data governance from a compliance task into a "quality of service" guarantee.

The lesson for data decision makers

If you are building an AI strategy for 2026, stop buying more GPUs. Stop worrying about which foundation model is slightly higher on the leaderboard this week.

Start auditing your data contracts.

An AI Agent is only as autonomous as its data is reliable. Without a strict, automated data constitution like the Creed framework, your agents will eventually go rogue. In an SRE’s world, a rogue agent is far worse than a broken dashboard. It is a silent killer of trust, revenue, and customer experience.

Manoj Yerrasani is a senior technology executive.