生成式 AI 的串流回應 - Nigel Lee Digest

🎯 快速結論

stream: true 是 LLM API 的一個參數，決定 AI 回覆是一次性全部傳回，還是逐塊（chunk）即時傳回。

🤔 為什麼需要 Stream？

傳統方式（stream: false）

Client → API: "你好嗎？" → Server
Server → 處理中...（等 3 秒）
Server → 回傳完整回覆："我很好，謝謝關心！"
Client → 顯示文字

使用者必須等 AI 全部處理完才能看到任何內容，感覺就像：

「按了送信，等了 3 秒，然後突然跳出一大段話」

Stream 方式（stream: true）

Client → API: "你好嗎？" → Server
Server → "我" → Client（0.1秒）
Server → "很好" → Client（0.2秒）
Server → "，謝謝" → Client（0.3秒）
Server → "關心！" → Client（0.4秒）
Client → 即時顯示文字，使用者感受：「AI 正在打字給我」

📊 兩種模式的比較

特性	stream: false	stream: true
回覆時間	等待完整回覆	即時看到輸出
使用者體驗	卡顿感	流暢自然
實作複雜度	簡單	需要處理 SSE
適合場景	短回覆、背景任務	對話、長文本生成
API 行為	一次 HTTP 200	持續的 SSE 流

💻 程式碼範例

Python + OpenAI SDK

from openai import OpenAI

client = OpenAI(api_key="your-api-key")

# ❌ 傳統方式：等全部回傳
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "講一個笑話"}],
    stream=False  # 預設 false
)
print(response.choices[0].message.content)

# ✅ Stream 方式：逐字顯示
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "講一個笑話"}],
    stream=True  # 開啟串流
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js

import OpenAI from 'openai';

const client = new OpenAI({ apiKey: 'your-api-key' });

const stream = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: '寫一首詩' }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

🔧 Stream 的底層技術：Server-Sent Events (SSE)

Stream 模式使用 Server-Send Events，是一種基於 HTTP 的單向通訊協定：

HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"content": "我"}

data: {"content": "很好"}

data: {"content": "！"}

data: [DONE]

特點：

✅ 單向通道（伺服器推給客戶端）
✅ 自動重連機制
✅ 簡單的格式（data: 開頭）

🎨 實際應用場景

1. 聊天機器人

[AI 正在輸入...]

像 LINE、Discord Bot 一樣，顯示「正在輸入」的效果。

2. 程式碼生成助手

看著 AI 逐行寫出程式碼，比等待完整結果更能理解邏輯。

3. 長文章摘要

即時看到摘要內容產出，不需要痴痴等待。

4. AI 朗讀稿

文字逐塊出現，配合語音合成打造流暢體驗。

⚠️ 使用注意事項

錯誤處理：Stream 中斷可能是網路問題，需有重試機制
Token 計算：Stream 模式下，最終需回溯計算總 token 數
相容性：並非所有 LLM 提供商都支援 streaming（如某些本地模型）
緩衝考量：客戶端建議做小量緩衝，避免單字跳動（word wrapping）

🚀 總結

stream: true 讓 AI 回覆從「郵寄信件」變成「即時對話」：

比喻	stream: false	stream: true
像...	寄 email	傳 LINE 訊息
感受	等載入	對話流暢
延遲	明顯	無感

在對話式 AI 時代，stream 是必備功能。現在就試著把 stream: true 加入你的專案，感受那種「AI 正在與你對話」的流暢體驗吧！