Skip to content

29. Skeleton of Thought (SoT)

Mini-Project: Skeleton of Thought for Fast Report Generation

Generates a structured outline for a question first, then expands each section in parallel, and assembles them into a comprehensive final answer — significantly faster than sequential generation.

View on GitHub


Description

Skeleton of Thought (SoT) is a reasoning acceleration pattern where the LLM first generates a high-level skeleton (outline) of its answer, then fills in each section in parallel. Instead of sequential token generation, SoT enables batched parallel expansion of skeleton points, significantly reducing end-to-end latency for long-form responses.

The SoT paper (Ning et al., 2023) demonstrated up to 2x speedup with comparable or better quality, because each section is generated with focused context rather than long sequential dependencies.

When to Use

  • Long-form content generation (articles, reports, documentation)
  • When latency is critical and the response has natural sections
  • Tasks where an outline naturally precedes detailed writing
  • Parallel generation infrastructure is available

Benefits

Benefit Description
Speed Parallel expansion of sections reduces latency
Structure Outline-first approach ensures coherent organization
Quality Each section gets focused context for generation
Scalability More sections = more parallelism = more speedup

Architecture Diagram

flowchart TD
    A[Question] --> B[Generate Skeleton]
    B --> C[Point 1]
    B --> D[Point 2]
    B --> E[Point 3]
    B --> F[Point N]
    C --> G[Expand Point 1]
    D --> H[Expand Point 2]
    E --> I[Expand Point 3]
    F --> J[Expand Point N]
    G --> K[Assemble Final Answer]
    H --> K
    I --> K
    J --> K

    style A fill:#4CAF50,color:#fff
    style B fill:#FF5722,color:#fff
    style G fill:#2196F3,color:#fff
    style H fill:#2196F3,color:#fff
    style I fill:#2196F3,color:#fff
    style J fill:#2196F3,color:#fff
    style K fill:#4CAF50,color:#fff