Catch Me If You Can: Unpacking the Stealth Strategy Behind Horizon Beta

Stress testing the Horizon Beta model with Kilo Code

Aug 05, 2025

The landscape of artificial intelligence model deployment sometimes involves "stealth tests," where new models are quietly released under generic or "blanket names" before their official unveiling. This strategy allows developers to test performance in real-world scenarios, gather community feedback, and potentially refine the models ahead of a major launch. Recent release of Horizon Alpha on July 30th and subsequent Horizon Beta on August 1st raises the question whether they could be stealth release of a GPT-5 model, which is scheduled for August 2025.

The Precedent: GPT-4.1 and "Blanket Names"

OpenAI has a history of making new models available on platforms like Open Router under a different name before a public announcement. For instance, OpenAI previously made GPT 4.1 available early on Open Router under a blanket name. While the sources do not explicitly name "Quasar Alpha," this prior action by OpenAI sets a precedent for how new, potentially unannounced models might be tested in the wild. This strategy allows for a form of soft launch or beta testing, gathering practical insights into the model's performance and behavior without the immediate pressure and scrutiny of a full-scale public release.

Horizon Alpha & Beta: A Potential GPT-5 Stealth Test?

A new "stealth model" called Horizon Alpha and Horizon Beta has recently appeared on Open Router, sparking significant community speculation about its origins and purpose. While its true identity is not officially confirmed, many community guesses suggest that Horizon Alpha and Beta are an OpenAI model, potentially their open-source model that is supposed to come soon along with GPT-5. The reasoning behind this speculation is rooted in OpenAI's past practice of releasing models like GPT 4.1 early on Open Router under a nondescript name.

Evidence for OpenAI Origin and Open-Source Potential:

Precedent of GPT 4.1: As mentioned, OpenAI previously made GPT 4.1 available on Open Router under a "blanket name," making it "most likely that this is a new open AI model".
Model Size and Speed: Many believe it could be GPT-5 Nano as open-source, indicating it's a small model. Its token speeds are notably great, and the fact that it doesn't run on specialized hardware like Grog suggests it's a relatively small model or a mixture-of-experts model. The model is stated to generate code at 150 tokens per second, with all examples shown in the video being built in under a minute. This speed makes it highly appealing for everyday use, especially if it can be run locally.

Capabilities and Performance of Horizon Beta: Horizon Beta boasts impressive capabilities and performance across several domains:

Writing Prowess: It scores exceptionally well on writing benchmarks like EQBench. It is considered really good at writing and outperforms a recent release - Kimi K2. It can easily imitate writing styles and produces content that does not appear to be AI-generated, notably avoiding overuse of M-dashes.
Context Window: Horizon Beta handles a large context of about 256,000 tokens, with a maximum output of about 128,000 tokens.
Coding Performance:
- It excels at front-end tasks, such as generating impressive SVGs, and creating appealing landing pages, including a portfolio page for a React developer.
- However, it struggles with game logic in applications, even simple onClick actions seemed challenging when I was building a test application.

Stress test

As a stress test of the Horizon Beta I created a simple tile matching game using Horizon Beta model with Kilo Code called “Match me If you can“. Check it out here!

Strengths of Horizon Beta

Image-to-design accuracy: When I uploaded an airline ticket as a visual reference, Horizon Beta created a remarkably close match in terms of layout, color scheme, and visual hierarchy
Design consistency: The model maintained visual coherence throughout the interface elements
Rapid prototyping: Quick generation of UI components based on visual inputs

Weaknesses of Horizon Beta

Codebase was not well organized, lots of spaghetti code and, Horizon Beta started the project with simple HTML, JavaScript and CSS in the working directory without creating any folders and subfolders
Game logic failures: The model struggled significantly with implementing the core tile matching logic, requiring multiple iterations and ultimately failing to produce working code, so I switched to Claude Sonnet 4 for this task
State management issues: Difficulty handling complex application state and game progression

Conclusion

The emergence of Horizon Alpha as a "stealth model" on Open Router, mirroring the earlier unbranded release of GPT 4.1, highlights a strategic approach to AI model deployment. While its definitive link to GPT-5 or an associated open-source model remains speculative, the community's excitement is palpable. If Horizon Alpha indeed turns out to be a smaller, open-weights model, its current performance, especially in writing and front-end coding, combined with its remarkable speed and generous context window, makes it a highly promising development for the AI community. The ongoing observation of such "stealth tests" provides valuable insights into the iterative development and anticipated capabilities of future large language models.

Kilo Code Blog

Discussion about this post