29. Skeleton of Thought (SoT)
Mini-Project: Skeleton of Thought for Fast Report Generation
Generates a structured outline for a question first, then expands each section in parallel, and assembles them into a comprehensive final answer — significantly faster than sequential generation.
Description
Skeleton of Thought (SoT) is a reasoning acceleration pattern where the LLM first generates a high-level skeleton (outline) of its answer, then fills in each section in parallel. Instead of sequential token generation, SoT enables batched parallel expansion of skeleton points, significantly reducing end-to-end latency for long-form responses.
The SoT paper (Ning et al., 2023) demonstrated up to 2x speedup with comparable or better quality, because each section is generated with focused context rather than long sequential dependencies.
When to Use
- Long-form content generation (articles, reports, documentation)
- When latency is critical and the response has natural sections
- Tasks where an outline naturally precedes detailed writing
- Parallel generation infrastructure is available
Benefits
| Benefit | Description |
|---|---|
| Speed | Parallel expansion of sections reduces latency |
| Structure | Outline-first approach ensures coherent organization |
| Quality | Each section gets focused context for generation |
| Scalability | More sections = more parallelism = more speedup |
Architecture Diagram
flowchart TD
A[Question] --> B[Generate Skeleton]
B --> C[Point 1]
B --> D[Point 2]
B --> E[Point 3]
B --> F[Point N]
C --> G[Expand Point 1]
D --> H[Expand Point 2]
E --> I[Expand Point 3]
F --> J[Expand Point N]
G --> K[Assemble Final Answer]
H --> K
I --> K
J --> K
style A fill:#4CAF50,color:#fff
style B fill:#FF5722,color:#fff
style G fill:#2196F3,color:#fff
style H fill:#2196F3,color:#fff
style I fill:#2196F3,color:#fff
style J fill:#2196F3,color:#fff
style K fill:#4CAF50,color:#fff