feat: add private chat (p2p) message collection via user_access_token

- Add user_access_token support to api_get/api_post for user-identity API calls - Add fetch_p2p_messages() to collect both sides of a private conversation - Extend collect_messages() to combine p2p + group chat messages - Add --exchange-code to convert OAuth code to user_access_token - Add --user-token, --p2p-chat-id, --open-id CLI flags - Update SKILL.md with p2p collection flow and permission requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 22:59:06 +08:00 · 2026-04-02 20:18:58 +08:00
parent ce30311a72
commit a2b6ef3903
2 changed files with 331 additions and 44 deletions
--- a/SKILL.md
+++ b/SKILL.md
@@ -104,7 +104,7 @@ allowed-tools: Read, Write, Edit, Bash
 python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
 ```

-配置完成后，只需输入姓名，自动完成所有采集：
+**群聊采集**（使用 tenant_access_token，需 bot 在群内）：
 ```bash
 python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
  --name "{name}" \
@@ -113,19 +113,51 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
  --doc-limit 20
 ```

+**私聊采集**（需要 user_access_token + 私聊 chat_id）：
+
+私聊消息需要用户本人授权，步骤：
+1. 飞书应用需开通**用户权限**：`im:message`、`im:chat`
+2. 生成 OAuth 授权链接（替换 APP_ID）：
+   ```
+   https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
+   ```
+3. 用户在浏览器打开授权，从回调 URL 复制 code
+4. 换取 token：
+   ```bash
+   python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
+   ```
+5. 采集私聊（需提供私聊 chat_id，可在发送消息的 API 返回值中获取）：
+   ```bash
+   python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
+     --name "{name}" \
+     --p2p-chat-id oc_xxx \
+     --output-dir ./knowledge/{slug} \
+     --msg-limit 1000
+   ```
+6. 也可以直接指定 open_id 跳过用户搜索：
+   ```bash
+   python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
+     --open-id ou_xxx \
+     --p2p-chat-id oc_xxx \
+     --name "{name}"
+   ```
+
 自动采集内容：
- 所有与他共同群聊中他发出的消息（过滤系统消息、表情包）
+- 群聊：所有与他共同群聊中他发出的消息（过滤系统消息、表情包）
+- 私聊：与他的私聊完整对话（含双方消息，用于理解对话语境）
 - 他创建/编辑的飞书文档和 Wiki
 - 相关多维表格（如有权限）

 采集完成后用 `Read` 读取输出目录下的文件：
- `knowledge/{slug}/messages.txt` → 消息记录
+- `knowledge/{slug}/messages.txt` → 消息记录（群聊 + 私聊）
 - `knowledge/{slug}/docs.txt` → 文档内容
 - `knowledge/{slug}/collection_summary.json` → 采集摘要

-如果采集失败（权限不足 / bot 未加群），告知用户需要：
-1. 将飞书 App bot 添加到相关群聊
-2. 或改用方式 B/C
+如果采集失败，告知用户检查：
+- 群聊采集：bot 是否已添加到相关群聊
+- 私聊采集：是否已配置 user_access_token 和 p2p_chat_id
+- 权限：应用是否开通了 im:message 和 im:chat 用户权限
+- 或改用方式 B/C

 ---

@@ -521,7 +553,7 @@ First-time setup:
 python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
 ```

-After setup, just enter the name:
+**Group chat collection** (uses tenant_access_token, bot must be in the group):
 ```bash
 python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
  --name "{name}" \
@@ -530,19 +562,50 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
  --doc-limit 20
 ```

+**Private chat (P2P) collection** (requires user_access_token + p2p chat_id):
+
+Private messages require user authorization:
+1. Enable **user scopes** in Feishu app: `im:message`, `im:chat`
+2. Generate OAuth URL (replace APP_ID):
+   ```
+   https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
+   ```
+3. User opens URL in browser, authorizes, copies code from callback URL
+4. Exchange code for token:
+   ```bash
+   python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
+   ```
+5. Collect with p2p chat_id:
+   ```bash
+   python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
+     --name "{name}" \
+     --p2p-chat-id oc_xxx \
+     --output-dir ./knowledge/{slug}
+   ```
+6. Or skip user search by providing open_id directly:
+   ```bash
+   python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
+     --open-id ou_xxx \
+     --p2p-chat-id oc_xxx \
+     --name "{name}"
+   ```
+
 Auto-collected content:
- All messages sent by them in shared group chats (system messages and stickers filtered)
+- Group chats: messages sent by them (system messages and stickers filtered)
+- Private chats: full conversation with both parties (for context understanding)
 - Feishu docs and Wikis they created/edited
 - Related spreadsheets (if accessible)

 After collection, `Read` the output files:
- `knowledge/{slug}/messages.txt` → messages
+- `knowledge/{slug}/messages.txt` → messages (group + private)
 - `knowledge/{slug}/docs.txt` → document content
 - `knowledge/{slug}/collection_summary.json` → collection summary

-If collection fails (insufficient permissions / bot not in chat), inform user to:
-1. Add the Feishu App bot to relevant group chats
-2. Or switch to Option B/C
+If collection fails, check:
+- Group chat: bot must be added to relevant group chats
+- Private chat: user_access_token and p2p_chat_id must be configured
+- Permissions: app must have im:message and im:chat user scopes enabled
+- Or switch to Option B/C

 ---

--- a/tools/feishu_auto_collector.py
+++ b/tools/feishu_auto_collector.py
@@ -5,17 +5,38 @@
 输入同事姓名，自动：
  1. 搜索飞书用户，获取 user_id
  2. 找到与他共同的群聊，拉取他的消息记录
-  3. 搜索他创建/编辑的文档和 Wiki
-  4. 拉取文档内容
-  5. 拉取多维表格（如有）
-  6. 输出统一格式，直接进 create-colleague 分析流程
+  3. 拉取私聊消息（需要 user_access_token）
+  4. 搜索他创建/编辑的文档和 Wiki
+  5. 拉取文档内容
+  6. 拉取多维表格（如有）
+  7. 输出统一格式，直接进 create-colleague 分析流程

 前置：
  python3 feishu_auto_collector.py --setup   # 配置 App ID / Secret（一次性）

+私聊采集（需额外步骤）：
+  1. 飞书应用开通用户权限：im:message, im:chat
+  2. 获取 OAuth 授权码：
+     浏览器打开: https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
+     授权后从地址栏复制 code
+  3. 换取 token：
+     python3 feishu_auto_collector.py --exchange-code {CODE}
+  4. 采集时指定私聊 chat_id：
+     python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
+
 用法：
+  # 群聊采集（原有方式）
  python3 feishu_auto_collector.py --name "张三" --output-dir ./knowledge/zhangsan
  python3 feishu_auto_collector.py --name "张三" --msg-limit 1000 --doc-limit 20
+
+  # 私聊采集
+  python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
+
+  # 直接指定 open_id + 私聊（跳过用户搜索）
+  python3 feishu_auto_collector.py --open-id ou_xxx --p2p-chat-id oc_xxx --name "张三"
+
+  # 换取 user_access_token
+  python3 feishu_auto_collector.py --exchange-code {CODE}
 """

 from __future__ import annotations
@@ -57,11 +78,15 @@ def setup_config() -> None:
    print("=== 飞书自动采集配置 ===\n")
    print("请前往 https://open.feishu.cn 创建企业自建应用，开通以下权限：")
    print()
-    print("  消息类：")
+    print("  消息类（应用权限，用于群聊采集）：")
    print("    im:message:readonly          读取消息")
    print("    im:chat:readonly             读取群聊信息")
    print("    im:chat.members:readonly     读取群成员")
    print()
+    print("  消息类（用户权限，用于私聊采集）：")
+    print("    im:message                   以用户身份读取/发送消息")
+    print("    im:chat                      以用户身份读取会话列表")
+    print()
    print("  用户类：")
    print("    contact:user.base:readonly       读取用户基本信息")
    print("    contact:department.base:readonly  遍历部门查找用户（按姓名搜索必需）")
@@ -74,11 +99,26 @@ def setup_config() -> None:
    print("  多维表格：")
    print("    bitable:app:readonly         读取多维表格")
    print()
+    print("  ─── 私聊采集说明 ───")
+    print("  私聊消息必须通过 user_access_token 获取（应用身份无权访问私聊）。")
+    print("  获取方式：OAuth 授权，授权链接格式：")
+    print("    https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri={REDIRECT}&scope=im:message%20im:chat")
+    print("  授权后从回调 URL 中取 code，用 --exchange-code 换取 token。")
+    print()

    app_id = input("App ID (cli_xxx): ").strip()
    app_secret = input("App Secret: ").strip()

    config = {"app_id": app_id, "app_secret": app_secret}
+
+    print("\n是否配置 user_access_token？（用于私聊消息采集，可跳过）")
+    user_token = input("user_access_token (留空跳过): ").strip()
+    if user_token:
+        config["user_access_token"] = user_token
+    p2p_chat_id = input("私聊 chat_id (留空跳过): ").strip()
+    if p2p_chat_id:
+        config["p2p_chat_id"] = p2p_chat_id
+
    save_config(config)
    print(f"\n✅ 配置已保存到 {CONFIG_PATH}")

@@ -110,8 +150,11 @@ def get_tenant_token(config: dict) -> str:
    return token


-def api_get(path: str, params: dict, config: dict) -> dict:
-    token = get_tenant_token(config)
+def api_get(path: str, params: dict, config: dict, use_user_token: bool = False) -> dict:
+    if use_user_token and config.get("user_access_token"):
+        token = config["user_access_token"]
+    else:
+        token = get_tenant_token(config)
    resp = requests.get(
        f"{BASE_URL}{path}",
        params=params,
@@ -121,8 +164,11 @@ def api_get(path: str, params: dict, config: dict) -> dict:
    return resp.json()


-def api_post(path: str, body: dict, config: dict) -> dict:
-    token = get_tenant_token(config)
+def api_post(path: str, body: dict, config: dict, use_user_token: bool = False) -> dict:
+    if use_user_token and config.get("user_access_token"):
+        token = config["user_access_token"]
+    else:
+        token = get_tenant_token(config)
    resp = requests.post(
        f"{BASE_URL}{path}",
        json=body,
@@ -132,6 +178,22 @@ def api_post(path: str, body: dict, config: dict) -> dict:
    return resp.json()


+def exchange_code_for_token(code: str, config: dict) -> dict:
+    """用 OAuth 授权码换取 user_access_token"""
+    app_token = get_tenant_token(config)
+    resp = requests.post(
+        f"{BASE_URL}/authen/v1/oidc/access_token",
+        headers={"Authorization": f"Bearer {app_token}"},
+        json={"grant_type": "authorization_code", "code": code},
+        timeout=10,
+    )
+    data = resp.json()
+    if data.get("code") != 0:
+        print(f"换取 token 失败：{data}", file=sys.stderr)
+        return {}
+    return data.get("data", {})
+
+
 # ─── 用户搜索 ─────────────────────────────────────────────────────────────────

 def _find_user_by_contact(name: str, config: dict) -> Optional[dict]:
@@ -421,42 +483,156 @@ def fetch_messages_from_chat(
    return messages[:limit]


+def fetch_p2p_messages(
+    chat_id: str,
+    user_open_id: str,
+    limit: int,
+    config: dict,
+) -> list:
+    """使用 user_access_token 从私聊会话拉取消息（包含双方所有消息）"""
+    messages = []
+    page_token = None
+
+    while len(messages) < limit:
+        params = {
+            "container_id_type": "chat",
+            "container_id": chat_id,
+            "page_size": 50,
+            "sort_type": "ByCreateTimeDesc",
+        }
+        if page_token:
+            params["page_token"] = page_token
+
+        data = api_get("/im/v1/messages", params, config, use_user_token=True)
+        if data.get("code") != 0:
+            print(f"  拉取私聊消息失败（code={data.get('code')}）：{data.get('msg')}", file=sys.stderr)
+            break
+
+        items = data.get("data", {}).get("items", [])
+        if not items:
+            break
+
+        for item in items:
+            sender = item.get("sender", {})
+            sender_id = sender.get("id") or sender.get("open_id", "")
+
+            # 解析消息内容
+            content_raw = item.get("body", {}).get("content", "")
+            try:
+                content_obj = json.loads(content_raw)
+                if isinstance(content_obj, dict):
+                    # 纯文本消息
+                    if "text" in content_obj:
+                        content = content_obj["text"]
+                    else:
+                        # 富文本消息
+                        text_parts = []
+                        for line in content_obj.get("content", []):
+                            for seg in line:
+                                if seg.get("tag") in ("text", "a"):
+                                    text_parts.append(seg.get("text", ""))
+                        content = " ".join(text_parts)
+                else:
+                    content = str(content_obj)
+            except Exception:
+                content = content_raw
+
+            content = content.strip()
+            if not content or content in ("[图片]", "[文件]", "[表情]", "[语音]"):
+                continue
+
+            ts = item.get("create_time", "")
+            if ts:
+                try:
+                    ts = datetime.fromtimestamp(int(ts) / 1000).strftime("%Y-%m-%d %H:%M")
+                except Exception:
+                    pass
+
+            is_target = (sender_id == user_open_id)
+            messages.append({
+                "content": content,
+                "time": ts,
+                "sender_id": sender_id,
+                "is_target": is_target,
+            })
+
+        if not data.get("data", {}).get("has_more"):
+            break
+        page_token = data.get("data", {}).get("page_token")
+
+    return messages[:limit]
+
+
 def collect_messages(
    user: dict,
    msg_limit: int,
    config: dict,
 ) -> str:
-    """采集目标用户的所有消息记录"""
+    """采集目标用户的所有消息记录（群聊 + 私聊）"""
    user_open_id = user.get("open_id") or user.get("user_id", "")
    name = user.get("name", "")

-    chats = get_chats_with_user(user_open_id, config)
-    if not chats:
-        return f"# 消息记录\n\n未找到与 {name} 共同的群聊（请确认 bot 已被添加到相关群）\n"
-
    all_messages = []
-    per_chat_limit = max(100, msg_limit // len(chats))
+    chat_sources = []

-    for chat in chats:
-        chat_id = chat.get("chat_id")
-        chat_name = chat.get("name", chat_id)
-        print(f"  拉取「{chat_name}」消息 ...", file=sys.stderr)
+    # ── 私聊采集（需要 user_access_token + p2p_chat_id）──
+    p2p_chat_id = config.get("p2p_chat_id", "")
+    user_token = config.get("user_access_token", "")

-        msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config)
-        for m in msgs:
-            m["chat"] = chat_name
-        all_messages.extend(msgs)
-        print(f"    获取 {len(msgs)} 条", file=sys.stderr)
+    if user_token and p2p_chat_id:
+        print(f"  📱 采集私聊消息（chat_id: {p2p_chat_id}）...", file=sys.stderr)
+        p2p_msgs = fetch_p2p_messages(p2p_chat_id, user_open_id, msg_limit, config)
+        for m in p2p_msgs:
+            m["chat"] = "私聊"
+        all_messages.extend(p2p_msgs)
+        chat_sources.append(f"私聊（{len(p2p_msgs)} 条）")
+        print(f"    获取 {len(p2p_msgs)} 条私聊消息", file=sys.stderr)
+    elif user_token and not p2p_chat_id:
+        print(f"  ⚠️  有 user_access_token 但未配置 p2p_chat_id，跳过私聊采集", file=sys.stderr)
+        print(f"     请在配置中添加 p2p_chat_id（通过发送消息 API 返回值获取）", file=sys.stderr)
+
+    # ── 群聊采集（使用 tenant_access_token）──
+    remaining = msg_limit - len(all_messages)
+    if remaining > 0:
+        chats = get_chats_with_user(user_open_id, config)
+        if chats:
+            per_chat_limit = max(100, remaining // len(chats))
+            for chat in chats:
+                chat_id = chat.get("chat_id")
+                chat_name = chat.get("name", chat_id)
+                print(f"  拉取「{chat_name}」消息 ...", file=sys.stderr)
+
+                msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config)
+                for m in msgs:
+                    m["chat"] = chat_name
+                all_messages.extend(msgs)
+                chat_sources.append(f"{chat_name}（{len(msgs)} 条）")
+                print(f"    获取 {len(msgs)} 条", file=sys.stderr)
+
+    if not all_messages:
+        tips = f"# 消息记录\n\n未找到 {name} 的消息记录。\n\n"
+        tips += "可能原因：\n"
+        tips += "  - 群聊采集：bot 未被添加到相关群聊\n"
+        tips += "  - 私聊采集：未配置 user_access_token 或 p2p_chat_id\n"
+        tips += "\n私聊采集配置方法：\n"
+        tips += "  1. 在飞书开放平台开通 im:message 和 im:chat 用户权限\n"
+        tips += "  2. 通过 OAuth 授权获取 user_access_token（--exchange-code）\n"
+        tips += "  3. 配置 p2p_chat_id（私聊会话 ID）\n"
+        return tips

    # 分类输出
-    long_msgs = [m for m in all_messages if len(m.get("content", "")) > 50]
-    short_msgs = [m for m in all_messages if len(m.get("content", "")) <= 50]
+    # 私聊消息包含双方对话，标注发言人
+    target_msgs = [m for m in all_messages if m.get("is_target", True)]
+    other_msgs = [m for m in all_messages if not m.get("is_target", True)]
+
+    long_msgs = [m for m in target_msgs if len(m.get("content", "")) > 50]
+    short_msgs = [m for m in target_msgs if len(m.get("content", "")) <= 50]

    lines = [
        f"# 飞书消息记录（自动采集）",
        f"目标：{name}",
-        f"来源群聊：{', '.join(c.get('name', '') for c in chats)}",
-        f"共 {len(all_messages)} 条消息",
+        f"来源：{', '.join(chat_sources)}",
+        f"共 {len(all_messages)} 条消息（目标用户 {len(target_msgs)} 条，对话方 {len(other_msgs)} 条）",
        "",
        "---",
        "",
@@ -471,6 +647,16 @@ def collect_messages(
    for m in short_msgs[:300]:
        lines.append(f"[{m.get('time', '')}] {m['content']}")

+    # 私聊对话上下文（保留双方对话，便于理解语境）
+    p2p_msgs = [m for m in all_messages if m.get("chat") == "私聊"]
+    if p2p_msgs:
+        lines += ["", "---", "", "## 私聊对话上下文（含双方消息）", ""]
+        # 按时间正序
+        p2p_sorted = sorted(p2p_msgs, key=lambda x: x.get("time", ""))
+        for m in p2p_sorted[:500]:
+            who = f"[{name}]" if m.get("is_target") else "[对方]"
+            lines.append(f"[{m.get('time', '')}] {who} {m['content']}")
+
    return "\n".join(lines)


@@ -707,6 +893,10 @@ def main() -> None:
    parser.add_argument("--output-dir", default=None, help="输出目录（默认 ./knowledge/{name}）")
    parser.add_argument("--msg-limit", type=int, default=1000, help="最多采集消息条数（默认 1000）")
    parser.add_argument("--doc-limit", type=int, default=20, help="最多采集文档篇数（默认 20）")
+    parser.add_argument("--exchange-code", metavar="CODE", help="用 OAuth 授权码换取 user_access_token 并保存到配置")
+    parser.add_argument("--user-token", metavar="TOKEN", help="直接指定 user_access_token（覆盖配置文件）")
+    parser.add_argument("--p2p-chat-id", metavar="CHAT_ID", help="私聊会话 ID（覆盖配置文件）")
+    parser.add_argument("--open-id", metavar="OPEN_ID", help="直接指定目标用户的 open_id（跳过用户搜索）")

    args = parser.parse_args()

@@ -714,11 +904,45 @@ def main() -> None:
        setup_config()
        return

-    if not args.name:
-        parser.error("请提供 --name")
-
    config = load_config()
-    output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name}")
+
+    # 换取 user_access_token
+    if args.exchange_code:
+        token_data = exchange_code_for_token(args.exchange_code, config)
+        if token_data:
+            config["user_access_token"] = token_data["access_token"]
+            config["refresh_token"] = token_data.get("refresh_token", "")
+            save_config(config)
+            print(f"✅ user_access_token 已保存（scope: {token_data.get('scope', '')}）")
+            print(f"   token: {token_data['access_token'][:20]}...")
+        else:
+            print("❌ 换取失败，请检查 code 是否有效")
+        return
+
+    if not args.name and not args.open_id:
+        parser.error("请提供 --name 或 --open-id")
+
+    # 命令行参数覆盖配置
+    if args.user_token:
+        config["user_access_token"] = args.user_token
+    if args.p2p_chat_id:
+        config["p2p_chat_id"] = args.p2p_chat_id
+
+    output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name or 'target'}")
+
+    # 如果提供了 open_id，跳过用户搜索
+    if args.open_id:
+        user = {"open_id": args.open_id, "name": args.name or "target"}
+        output_dir.mkdir(parents=True, exist_ok=True)
+        print(f"\n🔍 使用指定 open_id: {args.open_id}\n", file=sys.stderr)
+
+        # 只采集消息
+        print(f"📨 采集消息记录（上限 {args.msg_limit} 条）...", file=sys.stderr)
+        msg_content = collect_messages(user, args.msg_limit, config)
+        msg_path = output_dir / "messages.txt"
+        msg_path.write_text(msg_content, encoding="utf-8")
+        print(f"  ✅ 消息记录 → {msg_path}", file=sys.stderr)
+        return

    collect_all(
        name=args.name,