AI Roundup: March 24, 2026

Quick Hits

OpenAI retiring legacy deep research mode on March 26: The legacy deep research interface in ChatGPT will be removed this Thursday; existing conversations and results will remain accessible, but the entry point is being removed in favor of the current deep research experience. The deprecation follows the March 11 retirement of all GPT-5.1 model variants. Source
EU releases second draft of AI content labeling code of practice: The European Commission published its second draft of the Code of Practice on Marking and Labelling of AI-generated content on March 5, incorporating feedback from hundreds of industry, academic, and civil society stakeholders. The updated draft streamlines compliance obligations, encourages open technical standards for AI content identification, and proposes a standardized EU icon for cross-platform labeling. Feedback on the second draft closes March 30; the code is expected to be finalized by June and becomes mandatory under the AI Act on August 2. Source
DeepSeek V4 and Tencent Hunyuan both slip to April: Chinese tech outlet Whale Lab reports that DeepSeek V4 and Tencent’s Hunyuan model are both now targeting an April launch, with the Financial Times’ earlier March window missed. A “V4 Lite” interface update on the DeepSeek web app (spotted March 9) showed improved coding performance and an updated May 2025 knowledge cutoff, but DeepSeek has not confirmed it as part of the V4 rollout. The public API still lists no V4 model ID. Source
METR: AI agents now complete tasks over 4 hours autonomously at 50% reliability: METR published updated data showing the length of real-world tasks that frontier AI agents can complete autonomously at 50% success has been doubling approximately every 7 months since 2019, with models in early 2026 handling tasks that take over 4 hours. At the current doubling rate, 8-hour workday-length autonomous tasks are expected within 2026. Source
Claude Code leads developer mindshare at 46%; Cursor crosses $2B ARR: A Pragmatic Engineer survey of AI coding tool adoption finds Claude Code is the “most loved” tool among developers at 46%, ahead of Cursor at 19% and GitHub Copilot at 9%. Separately, Cursor crossed a $2 billion annualized revenue run rate in early March, a figure that underscores how quickly the AI coding tool category has monetized. Source
55% of engineers now use AI agents regularly: A DEV Community survey tracking AI adoption in software development shows 55% of engineers now use AI agents regularly, up from a handful of early adopters in March 2024. The survey also documents a shift from short prompt-response interactions to agents running autonomously for minutes or hours, with extended agent runtimes now a baseline expectation rather than a premium feature. Source

Analysis

The METR task-length data is the most technically significant item in today’s roundup, and it deserves more attention than it typically gets. The 7-month doubling curve has been consistent for long enough that it is starting to function as a planning baseline rather than a prediction. If the trend holds, AI agents capable of completing an 8-hour autonomous workday are a 2026 development, not a 2028 one. That has downstream effects on enterprise deployment decisions today: teams integrating agents into production workflows are not just choosing between tools, they are choosing how much autonomous surface area they are comfortable exposing before the next capability jump arrives. The METR numbers also reframe the EU content labeling discussion. Mandatory disclosure requirements designed around “AI-generated content” (images, text, video) will need to account for agentic workflows where an AI is not generating a discrete artifact but executing a multi-step sequence of decisions, edits, and API calls on behalf of a human. The second draft’s current framing doesn’t address this cleanly, and the March 30 feedback window is an opportunity for the developer community to say so.

The EU AI Act content labeling code deserves scrutiny beyond the compliance calendar. The proposed standardized EU icon for AI-generated content is operationally appealing (a single recognizable marker beats 27 member-state variations), but the open question is how it interacts with platform-level content moderation at scale. A social media post generated by an AI agent, modified by a human, and then reformatted by a second AI tool is not cleanly “AI-generated” or “human-generated.” The second draft’s requirement to disclose AI involvement at the point of publication pushes the technical problem of detection and attribution upstream to creators and platforms in a way that may be unworkable without standardized provenance infrastructure that does not yet exist.

DeepSeek V4 slipping to April is a signal, not just a schedule update. The original March window was set when V3 benchmarks were dominant; GPT-5.4, Claude Opus 4.6, and Gemini 3.1 have all shipped since then. DeepSeek releasing a model that merely ties the current frontier is a different story than releasing one that undercuts it at a fraction of the cost, which is what made V3 consequential. The April target also puts it in direct proximity with Tencent’s Hunyuan launch, which means the Chinese open-source model space is about to get considerably more crowded in a short window.