ByteDanceβs Doubao AI Challenges OpenAIβs o1: A New Challenger Emerges? 
The artificial intelligence arena is heating up, and a significant contender has entered the ring. ByteDance, the parent company of TikTok, has recently launched the Doubao Large Model 1.5 Pro, an upgraded AI model thatβs making waves by outperforming OpenAIβs o1 on the challenging AIME benchmark. This development signals a potential shift in the AI landscape, as Chinese tech firms increasingly challenge Western dominance. Weβll explore how Doubao 1.5 Pro stacks up against OpenAIβs o1, delving into the benchmarks, technical aspects, and what it all means for the future of AI. We will also examine how these models compare to DeepSeekβs R1 and V3, along with a look at the βchain of thoughtβ reasoning approach.
The AIME Benchmark: A High Bar for AI Reasoning
The AIME (American Invitational Mathematics Examination) is a notoriously difficult math competition, requiring advanced multi-step reasoning. Itβs a stern test for AI models, pushing their logical and problem-solving abilities to their limits. For AI, achieving a high score on AIME is a strong indicator of advanced reasoning capabilities. This benchmark has become a key battleground for comparing and contrasting the performance of different AI models, especially large language models (LLMs).
Doubao 1.5 Pro: ByteDanceβs Ambitious AI Push
ByteDanceβs Doubao 1.5 Pro isnβt just another AI model; it represents a concerted effort by the company to assert its presence in the AI market. This model incorporates a resource-efficient training approach using a flexible server cluster with support for lower-end chips, allowing the company to potentially reduce infrastructure costs. This is noteworthy because access to advanced chips has become an increasingly prominent issue. According to ByteDance, this efficient design doesnβt compromise on performance. Doubao 1.5 Pro boasts a context window of up to 256k tokens for its most advanced version, enabling it to process large amounts of text efficiently.
OpenAIβs o1: A Benchmark for AI Reasoning
OpenAIβs o1 models, including o1-preview and o1-mini, have consistently been regarded as top-tier in the field, setting a high standard for other models. The o1 models have demonstrated excellent results in complex reasoning tasks, particularly in STEM fields, achieving high scores on benchmarks such as the AIME and Codeforces. OpenAIβs o1 is designed to handle complex reasoning, and its performance on benchmarks like AIME has been considered a hallmark of its advanced abilities. Itβs crucial to understand that AIME tests advanced multi-step mathematical reasoning. The o1 model has a context window of 128k tokens, though some recent versions have expanded this to 200k tokens. OpenAI also employs a βchain of thoughtβ (CoT) approach, where the model reasons through a problem step-by-step internally, leading to more accurate results.
The Benchmark Battle: Doubao 1.5 Pro vs. OpenAI o1 and DeepSeek

The central claim surrounding Doubao 1.5 Pro is its superior performance on the AIME benchmark, reportedly surpassing OpenAIβs o1. Hereβs a comprehensive comparison, also including DeepSeekβs R1 and V3:
Benchmark | Doubao 1.5 Pro | OpenAI o1 | DeepSeek R1 | DeepSeek V3 | Notes |
---|---|---|---|---|---|
AIME (Mathematics) | Higher | Lower | 79.8% | 39.2% | ByteDance claims outperformance of o1; DeepSeek R1 slightly outperforms both, V3 lower |
MATH-500 | β | 96.4% | 97.3% | β | DeepSeek R1 outperforms o1 on this more diverse math test |
Codeforces (Coding) | Comparable | Comparable | 96.3% | β | o1 generally performs slightly better than DeepSeek, with Doubao comparable |
MMLU (General Knowledge) | Comparable | Slightly Higher | 90.8% | 88.5% | o1 edges out on general knowledge, DeepSeek V3 also performs well |
SWE-bench Verified (Coding) | β | 48.9% | 49.2% | β | DeepSeek R1 has a slight lead. |
DROP (Reasoning) | β | β | β | 91.6% | DeepSeek V3 shows strong performance |
LOT 3.1 | β | β | β | β | Used for long text reasoning |
Key Takeaway: The Doubao 1.5 Pro appears to have gained an edge in mathematical reasoning as measured by the AIME, whereas o1 maintains a slight edge on general knowledge. DeepSeek R1 and V3 showcase very competitive results, often matching or slightly exceeding o1 in specific areas like math and coding.
DeepSeek V3: A Strong Open-Source Contender
DeepSeek V3 emerges as a noteworthy open-source AI model, demonstrating strong performance across various benchmarks. It achieves 88.5% accuracy on the MMLU benchmark and a notable 91.6% on the DROP benchmark, highlighting its strong reasoning capabilities. DeepSeek V3 is also known for its competitive performance in coding challenges, surpassing Claude-3.5 Sonnet on the Codeforces benchmark, and can handle context window lengths up to 128k tokens.
The Economics of AI Reasoning
Beyond performance, cost is a crucial factor. ByteDanceβs Doubao 1.5 Pro is priced very aggressively, with models costing as little as 2 yuan (~$0.28 USD) per million tokens for the Doubao-1.5-pro-32k version. This significantly undercuts OpenAIβs pricing, and also DeepSeekβs pricing, making AI reasoning more accessible. For example, DeepSeekβs R1 is priced at 16 yuan (approximately $2.20 USD) per million tokens, while OpenAIβs o1 costs considerably more at around 438 yuan (approximately $60 USD) per million tokens. This cost difference could be a considerable advantage for ByteDance, enabling wider adoption of their AI models.
Model | Cost per Million Tokens (USD) | Context Window | Notes |
---|---|---|---|
Doubao 1.5 Pro (32k) | ~$0.28 | 32k | Aggressively priced; entry-level model |
Doubao 1.5 Pro (256k) | ~$1.26 | 256k | More advanced model, still competitively priced |
DeepSeek R1 | ~$2.20 | 128k | Competitive model with high performance |
DeepSeek V3 | β | 128k | Open-source, strong coding and reasoning |
OpenAI o1 | ~$60 | 128k-200k | Higher priced, premium AI model |
What This Means for the AI Landscape
The emergence of Doubao 1.5 Pro, DeepSeekβs R1 and V3, and the advancements in reasoning methods like βchain of thoughtβ highlight a growing trend: AI innovation is not limited to a few select companies. These advancements demonstrate the potential for a more diverse and competitive AI ecosystem. Specifically, it highlights the growth and capabilities of the Chinese AI industry, as well as the power of open source models.
Increased Competition: The AI market is becoming more competitive, pushing companies to innovate faster and offer better performance at lower prices.
Accessibility: The aggressive pricing of models like Doubao 1.5 Pro makes advanced AI capabilities more accessible to a wider audience, including smaller companies and individuals.
Shifting Power Dynamics: The rise of models from companies like ByteDance and open-source models like DeepSeek challenges the established dominance of Western tech giants in the AI sector.
βO1 Thinkingβ: The Power of Chain of Thought Reasoning
OpenAIβs o1 models, like many advanced AI systems, utilize a βchain of thoughtβ (CoT) reasoning approach internally. This technique allows the model to break down complex problems into smaller, more manageable steps, enhancing its reasoning capabilities. Instead of providing a direct answer, the model first generates a series of logical steps, mirroring human-like thought processes. This method improves accuracy, especially for complex tasks, and while other models can utilize it, the o1 series was specifically trained with this capability. Some have referred to this as βoven thinking,β as the model internally βcooksβ or processes the problem step-by-step before providing the final answer.
The Road Ahead for Reasoning Models
The race to build more intelligent AI models is far from over. As companies continue to push the boundaries, we can expect further advancements in reasoning capabilities. Hereβs where this might lead:
Better Reasoning: The development of more advanced models, combined with techniques like βchain of thoughtβ, will likely result in AI systems with improved logical, problem-solving, and decision-making abilities. This will lead to more sophisticated applications.
New Applications: We can anticipate AI being applied to a wider range of complex tasks, including scientific research, advanced software development, and more.
Further Cost Optimization: Competition will continue to drive down the cost of AI, making it more accessible and commonplace.
The Rising Tide of AI Innovation
The advancements made by ByteDance with the Doubao 1.5 Pro, the competitive pressure from DeepSeek with R1 and V3, and the ongoing evolution of reasoning methods highlight the dynamic and rapidly evolving nature of artificial intelligence. These models are not just about benchmarks; they represent progress towards making AI more accessible and powerful. As the global AI ecosystem continues to mature, weβll likely see more powerful models from more diverse sources. This ultimately benefits everyone, pushing the boundaries of whatβs possible with AI.
For further reading on the Doubao large language model, you can explore ByteDanceβs cloud platform, Volcano Engine.
Data Availability Status for LLM Comparisons
This chart illustrates the current availability of verified data points across different comparison metrics for LLM models.