Daily Roundup

AI Roundup: March 28, 2026

1. ARC-AGI-3 Benchmark Exposes AI Limitations on Novel Tasks

The ARC Prize Foundation released ARC-AGI-3, a new benchmark testing AI models on unfamiliar interactive tasks that require genuine reasoning rather than pattern matching. All frontier models — GPT-5.4, Claude Opus 4.6, and Grok 4.2 — scored below 1%, while humans scored 100%. The result underscores the gap between current models’ ability to excel on trained task distributions and their capacity for open-ended problem-solving. Source

2. Luma AI Launches Uni-1, Outperforming Google and OpenAI on Reasoning Benchmarks

Luma AI released Uni-1, a decoder-only autoregressive transformer for interleaved text and image generation that outperforms Google and OpenAI offerings on reasoning benchmarks at 10-30% lower inference cost. The model uses a unified architecture rather than separate text and vision pipelines, enabling more coherent multimodal outputs. Source

Harvey, the legal AI startup, closed a $200 million funding round led by GIC and Sequoia at an $11 billion valuation. The raise signals institutional confidence in AI for professional services and positions Harvey as one of the highest-valued vertical AI companies, competing directly with Legora in the legal tech market. Source

4. Bank of America Deploys AI Agents to 1,000 Financial Advisors

Bank of America deployed a Salesforce Agentforce-powered platform to approximately 1,000 financial advisors, with its virtual assistant Erica now performing work equivalent to 11,000 employees. The deployment represents one of the largest enterprise AI agent rollouts in financial services. Source

5. Cloud Security Alliance Launches CSAI for AI Agent Security

The Cloud Security Alliance launched CSAI, a new nonprofit focused specifically on security frameworks for autonomous AI agents. As AI systems become increasingly independent — executing code, making API calls, and managing infrastructure — CSAI aims to establish baseline security standards for agent identity, permissions, and audit trails. Source

6. Cisco Launches Duo Agentic Identity with DefenseClaw Open Source Tool

Cisco released Duo Agentic Identity, a security framework for monitoring and enforcing least-privilege access across autonomous AI agent workloads. The release includes DefenseClaw, an open-source tool for auditing agent permissions, addressing the growing challenge of managing machine-to-machine access in environments where AI agents operate with significant autonomy. Source

7. Figma Launches AI Agent Canvas in Beta

Figma introduced Agent Canvas in beta, enabling AI agents to generate and edit design assets directly on the Figma canvas. The feature integrates with Claude, VS Code, and Cursor, allowing developers to describe UI changes in natural language and have agents manipulate design files without switching between code and design tools. Source

8. MIT Develops AI System That Increases Warehouse Robot Throughput by 25%

MIT researchers, in collaboration with Symbotic, developed a hybrid AI system using deep reinforcement learning that increases warehouse robot throughput by 25% compared to traditional traffic management algorithms. The system learns to coordinate hundreds of autonomous robots navigating shared corridors and intersections, reducing deadlocks and idle time. Source

9. Flashpoint Report: Agentic AI Cybercrime Discussions Surge 1,500%

Flashpoint reported that underground discussions about AI-powered cybercrime surged 1,500% between November and December 2025. The report also found 11.1 million machines infected with info-stealing malware and a 53% increase in ransomware attacks, pointing to agentic AI tools lowering the barrier to entry for less sophisticated threat actors. Source

10. Nebius Launches Token Factory for Open-Source LLM Deployment

Nebius released Token Factory, a production-ready platform for deploying, fine-tuning, and scaling open-source LLMs without integrating disparate tools. The platform handles model serving, inference optimization, and scaling in a single managed service, targeting teams that want to run models like Llama or Mistral without building custom infrastructure. Source