diff --git a/SKILL.md b/SKILL.md index 489e196..2f31671 100644 --- a/SKILL.md +++ b/SKILL.md @@ -104,7 +104,7 @@ allowed-tools: Read, Write, Edit, Bash python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup ``` -配置完成后,只需输入姓名,自动完成所有采集: +**群聊采集**(使用 tenant_access_token,需 bot 在群内): ```bash python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ --name "{name}" \ @@ -113,19 +113,51 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ --doc-limit 20 ``` +**私聊采集**(需要 user_access_token + 私聊 chat_id): + +私聊消息需要用户本人授权,步骤: +1. 飞书应用需开通**用户权限**:`im:message`、`im:chat` +2. 生成 OAuth 授权链接(替换 APP_ID): + ``` + https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat + ``` +3. 用户在浏览器打开授权,从回调 URL 复制 code +4. 换取 token: + ```bash + python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE} + ``` +5. 采集私聊(需提供私聊 chat_id,可在发送消息的 API 返回值中获取): + ```bash + python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ + --name "{name}" \ + --p2p-chat-id oc_xxx \ + --output-dir ./knowledge/{slug} \ + --msg-limit 1000 + ``` +6. 也可以直接指定 open_id 跳过用户搜索: + ```bash + python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ + --open-id ou_xxx \ + --p2p-chat-id oc_xxx \ + --name "{name}" + ``` + 自动采集内容: -- 所有与他共同群聊中他发出的消息(过滤系统消息、表情包) +- 群聊:所有与他共同群聊中他发出的消息(过滤系统消息、表情包) +- 私聊:与他的私聊完整对话(含双方消息,用于理解对话语境) - 他创建/编辑的飞书文档和 Wiki - 相关多维表格(如有权限) 采集完成后用 `Read` 读取输出目录下的文件: -- `knowledge/{slug}/messages.txt` → 消息记录 +- `knowledge/{slug}/messages.txt` → 消息记录(群聊 + 私聊) - `knowledge/{slug}/docs.txt` → 文档内容 - `knowledge/{slug}/collection_summary.json` → 采集摘要 -如果采集失败(权限不足 / bot 未加群),告知用户需要: -1. 将飞书 App bot 添加到相关群聊 -2. 或改用方式 B/C +如果采集失败,告知用户检查: +- 群聊采集:bot 是否已添加到相关群聊 +- 私聊采集:是否已配置 user_access_token 和 p2p_chat_id +- 权限:应用是否开通了 im:message 和 im:chat 用户权限 +- 或改用方式 B/C --- @@ -521,7 +553,7 @@ First-time setup: python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup ``` -After setup, just enter the name: +**Group chat collection** (uses tenant_access_token, bot must be in the group): ```bash python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ --name "{name}" \ @@ -530,19 +562,50 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ --doc-limit 20 ``` +**Private chat (P2P) collection** (requires user_access_token + p2p chat_id): + +Private messages require user authorization: +1. Enable **user scopes** in Feishu app: `im:message`, `im:chat` +2. Generate OAuth URL (replace APP_ID): + ``` + https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat + ``` +3. User opens URL in browser, authorizes, copies code from callback URL +4. Exchange code for token: + ```bash + python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE} + ``` +5. Collect with p2p chat_id: + ```bash + python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ + --name "{name}" \ + --p2p-chat-id oc_xxx \ + --output-dir ./knowledge/{slug} + ``` +6. Or skip user search by providing open_id directly: + ```bash + python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ + --open-id ou_xxx \ + --p2p-chat-id oc_xxx \ + --name "{name}" + ``` + Auto-collected content: -- All messages sent by them in shared group chats (system messages and stickers filtered) +- Group chats: messages sent by them (system messages and stickers filtered) +- Private chats: full conversation with both parties (for context understanding) - Feishu docs and Wikis they created/edited - Related spreadsheets (if accessible) After collection, `Read` the output files: -- `knowledge/{slug}/messages.txt` → messages +- `knowledge/{slug}/messages.txt` → messages (group + private) - `knowledge/{slug}/docs.txt` → document content - `knowledge/{slug}/collection_summary.json` → collection summary -If collection fails (insufficient permissions / bot not in chat), inform user to: -1. Add the Feishu App bot to relevant group chats -2. Or switch to Option B/C +If collection fails, check: +- Group chat: bot must be added to relevant group chats +- Private chat: user_access_token and p2p_chat_id must be configured +- Permissions: app must have im:message and im:chat user scopes enabled +- Or switch to Option B/C --- diff --git a/tools/feishu_auto_collector.py b/tools/feishu_auto_collector.py index 89cc250..dc35e8c 100644 --- a/tools/feishu_auto_collector.py +++ b/tools/feishu_auto_collector.py @@ -5,17 +5,38 @@ 输入同事姓名,自动: 1. 搜索飞书用户,获取 user_id 2. 找到与他共同的群聊,拉取他的消息记录 - 3. 搜索他创建/编辑的文档和 Wiki - 4. 拉取文档内容 - 5. 拉取多维表格(如有) - 6. 输出统一格式,直接进 create-colleague 分析流程 + 3. 拉取私聊消息(需要 user_access_token) + 4. 搜索他创建/编辑的文档和 Wiki + 5. 拉取文档内容 + 6. 拉取多维表格(如有) + 7. 输出统一格式,直接进 create-colleague 分析流程 前置: python3 feishu_auto_collector.py --setup # 配置 App ID / Secret(一次性) +私聊采集(需额外步骤): + 1. 飞书应用开通用户权限:im:message, im:chat + 2. 获取 OAuth 授权码: + 浏览器打开: https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat + 授权后从地址栏复制 code + 3. 换取 token: + python3 feishu_auto_collector.py --exchange-code {CODE} + 4. 采集时指定私聊 chat_id: + python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx + 用法: + # 群聊采集(原有方式) python3 feishu_auto_collector.py --name "张三" --output-dir ./knowledge/zhangsan python3 feishu_auto_collector.py --name "张三" --msg-limit 1000 --doc-limit 20 + + # 私聊采集 + python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx + + # 直接指定 open_id + 私聊(跳过用户搜索) + python3 feishu_auto_collector.py --open-id ou_xxx --p2p-chat-id oc_xxx --name "张三" + + # 换取 user_access_token + python3 feishu_auto_collector.py --exchange-code {CODE} """ from __future__ import annotations @@ -57,11 +78,15 @@ def setup_config() -> None: print("=== 飞书自动采集配置 ===\n") print("请前往 https://open.feishu.cn 创建企业自建应用,开通以下权限:") print() - print(" 消息类:") + print(" 消息类(应用权限,用于群聊采集):") print(" im:message:readonly 读取消息") print(" im:chat:readonly 读取群聊信息") print(" im:chat.members:readonly 读取群成员") print() + print(" 消息类(用户权限,用于私聊采集):") + print(" im:message 以用户身份读取/发送消息") + print(" im:chat 以用户身份读取会话列表") + print() print(" 用户类:") print(" contact:user.base:readonly 读取用户基本信息") print(" contact:department.base:readonly 遍历部门查找用户(按姓名搜索必需)") @@ -74,11 +99,26 @@ def setup_config() -> None: print(" 多维表格:") print(" bitable:app:readonly 读取多维表格") print() + print(" ─── 私聊采集说明 ───") + print(" 私聊消息必须通过 user_access_token 获取(应用身份无权访问私聊)。") + print(" 获取方式:OAuth 授权,授权链接格式:") + print(" https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri={REDIRECT}&scope=im:message%20im:chat") + print(" 授权后从回调 URL 中取 code,用 --exchange-code 换取 token。") + print() app_id = input("App ID (cli_xxx): ").strip() app_secret = input("App Secret: ").strip() config = {"app_id": app_id, "app_secret": app_secret} + + print("\n是否配置 user_access_token?(用于私聊消息采集,可跳过)") + user_token = input("user_access_token (留空跳过): ").strip() + if user_token: + config["user_access_token"] = user_token + p2p_chat_id = input("私聊 chat_id (留空跳过): ").strip() + if p2p_chat_id: + config["p2p_chat_id"] = p2p_chat_id + save_config(config) print(f"\n✅ 配置已保存到 {CONFIG_PATH}") @@ -110,8 +150,11 @@ def get_tenant_token(config: dict) -> str: return token -def api_get(path: str, params: dict, config: dict) -> dict: - token = get_tenant_token(config) +def api_get(path: str, params: dict, config: dict, use_user_token: bool = False) -> dict: + if use_user_token and config.get("user_access_token"): + token = config["user_access_token"] + else: + token = get_tenant_token(config) resp = requests.get( f"{BASE_URL}{path}", params=params, @@ -121,8 +164,11 @@ def api_get(path: str, params: dict, config: dict) -> dict: return resp.json() -def api_post(path: str, body: dict, config: dict) -> dict: - token = get_tenant_token(config) +def api_post(path: str, body: dict, config: dict, use_user_token: bool = False) -> dict: + if use_user_token and config.get("user_access_token"): + token = config["user_access_token"] + else: + token = get_tenant_token(config) resp = requests.post( f"{BASE_URL}{path}", json=body, @@ -132,6 +178,22 @@ def api_post(path: str, body: dict, config: dict) -> dict: return resp.json() +def exchange_code_for_token(code: str, config: dict) -> dict: + """用 OAuth 授权码换取 user_access_token""" + app_token = get_tenant_token(config) + resp = requests.post( + f"{BASE_URL}/authen/v1/oidc/access_token", + headers={"Authorization": f"Bearer {app_token}"}, + json={"grant_type": "authorization_code", "code": code}, + timeout=10, + ) + data = resp.json() + if data.get("code") != 0: + print(f"换取 token 失败:{data}", file=sys.stderr) + return {} + return data.get("data", {}) + + # ─── 用户搜索 ───────────────────────────────────────────────────────────────── def _find_user_by_contact(name: str, config: dict) -> Optional[dict]: @@ -421,42 +483,156 @@ def fetch_messages_from_chat( return messages[:limit] +def fetch_p2p_messages( + chat_id: str, + user_open_id: str, + limit: int, + config: dict, +) -> list: + """使用 user_access_token 从私聊会话拉取消息(包含双方所有消息)""" + messages = [] + page_token = None + + while len(messages) < limit: + params = { + "container_id_type": "chat", + "container_id": chat_id, + "page_size": 50, + "sort_type": "ByCreateTimeDesc", + } + if page_token: + params["page_token"] = page_token + + data = api_get("/im/v1/messages", params, config, use_user_token=True) + if data.get("code") != 0: + print(f" 拉取私聊消息失败(code={data.get('code')}):{data.get('msg')}", file=sys.stderr) + break + + items = data.get("data", {}).get("items", []) + if not items: + break + + for item in items: + sender = item.get("sender", {}) + sender_id = sender.get("id") or sender.get("open_id", "") + + # 解析消息内容 + content_raw = item.get("body", {}).get("content", "") + try: + content_obj = json.loads(content_raw) + if isinstance(content_obj, dict): + # 纯文本消息 + if "text" in content_obj: + content = content_obj["text"] + else: + # 富文本消息 + text_parts = [] + for line in content_obj.get("content", []): + for seg in line: + if seg.get("tag") in ("text", "a"): + text_parts.append(seg.get("text", "")) + content = " ".join(text_parts) + else: + content = str(content_obj) + except Exception: + content = content_raw + + content = content.strip() + if not content or content in ("[图片]", "[文件]", "[表情]", "[语音]"): + continue + + ts = item.get("create_time", "") + if ts: + try: + ts = datetime.fromtimestamp(int(ts) / 1000).strftime("%Y-%m-%d %H:%M") + except Exception: + pass + + is_target = (sender_id == user_open_id) + messages.append({ + "content": content, + "time": ts, + "sender_id": sender_id, + "is_target": is_target, + }) + + if not data.get("data", {}).get("has_more"): + break + page_token = data.get("data", {}).get("page_token") + + return messages[:limit] + + def collect_messages( user: dict, msg_limit: int, config: dict, ) -> str: - """采集目标用户的所有消息记录""" + """采集目标用户的所有消息记录(群聊 + 私聊)""" user_open_id = user.get("open_id") or user.get("user_id", "") name = user.get("name", "") - chats = get_chats_with_user(user_open_id, config) - if not chats: - return f"# 消息记录\n\n未找到与 {name} 共同的群聊(请确认 bot 已被添加到相关群)\n" - all_messages = [] - per_chat_limit = max(100, msg_limit // len(chats)) + chat_sources = [] - for chat in chats: - chat_id = chat.get("chat_id") - chat_name = chat.get("name", chat_id) - print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr) + # ── 私聊采集(需要 user_access_token + p2p_chat_id)── + p2p_chat_id = config.get("p2p_chat_id", "") + user_token = config.get("user_access_token", "") - msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config) - for m in msgs: - m["chat"] = chat_name - all_messages.extend(msgs) - print(f" 获取 {len(msgs)} 条", file=sys.stderr) + if user_token and p2p_chat_id: + print(f" 📱 采集私聊消息(chat_id: {p2p_chat_id})...", file=sys.stderr) + p2p_msgs = fetch_p2p_messages(p2p_chat_id, user_open_id, msg_limit, config) + for m in p2p_msgs: + m["chat"] = "私聊" + all_messages.extend(p2p_msgs) + chat_sources.append(f"私聊({len(p2p_msgs)} 条)") + print(f" 获取 {len(p2p_msgs)} 条私聊消息", file=sys.stderr) + elif user_token and not p2p_chat_id: + print(f" ⚠️ 有 user_access_token 但未配置 p2p_chat_id,跳过私聊采集", file=sys.stderr) + print(f" 请在配置中添加 p2p_chat_id(通过发送消息 API 返回值获取)", file=sys.stderr) + + # ── 群聊采集(使用 tenant_access_token)── + remaining = msg_limit - len(all_messages) + if remaining > 0: + chats = get_chats_with_user(user_open_id, config) + if chats: + per_chat_limit = max(100, remaining // len(chats)) + for chat in chats: + chat_id = chat.get("chat_id") + chat_name = chat.get("name", chat_id) + print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr) + + msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config) + for m in msgs: + m["chat"] = chat_name + all_messages.extend(msgs) + chat_sources.append(f"{chat_name}({len(msgs)} 条)") + print(f" 获取 {len(msgs)} 条", file=sys.stderr) + + if not all_messages: + tips = f"# 消息记录\n\n未找到 {name} 的消息记录。\n\n" + tips += "可能原因:\n" + tips += " - 群聊采集:bot 未被添加到相关群聊\n" + tips += " - 私聊采集:未配置 user_access_token 或 p2p_chat_id\n" + tips += "\n私聊采集配置方法:\n" + tips += " 1. 在飞书开放平台开通 im:message 和 im:chat 用户权限\n" + tips += " 2. 通过 OAuth 授权获取 user_access_token(--exchange-code)\n" + tips += " 3. 配置 p2p_chat_id(私聊会话 ID)\n" + return tips # 分类输出 - long_msgs = [m for m in all_messages if len(m.get("content", "")) > 50] - short_msgs = [m for m in all_messages if len(m.get("content", "")) <= 50] + # 私聊消息包含双方对话,标注发言人 + target_msgs = [m for m in all_messages if m.get("is_target", True)] + other_msgs = [m for m in all_messages if not m.get("is_target", True)] + + long_msgs = [m for m in target_msgs if len(m.get("content", "")) > 50] + short_msgs = [m for m in target_msgs if len(m.get("content", "")) <= 50] lines = [ f"# 飞书消息记录(自动采集)", f"目标:{name}", - f"来源群聊:{', '.join(c.get('name', '') for c in chats)}", - f"共 {len(all_messages)} 条消息", + f"来源:{', '.join(chat_sources)}", + f"共 {len(all_messages)} 条消息(目标用户 {len(target_msgs)} 条,对话方 {len(other_msgs)} 条)", "", "---", "", @@ -471,6 +647,16 @@ def collect_messages( for m in short_msgs[:300]: lines.append(f"[{m.get('time', '')}] {m['content']}") + # 私聊对话上下文(保留双方对话,便于理解语境) + p2p_msgs = [m for m in all_messages if m.get("chat") == "私聊"] + if p2p_msgs: + lines += ["", "---", "", "## 私聊对话上下文(含双方消息)", ""] + # 按时间正序 + p2p_sorted = sorted(p2p_msgs, key=lambda x: x.get("time", "")) + for m in p2p_sorted[:500]: + who = f"[{name}]" if m.get("is_target") else "[对方]" + lines.append(f"[{m.get('time', '')}] {who} {m['content']}") + return "\n".join(lines) @@ -707,6 +893,10 @@ def main() -> None: parser.add_argument("--output-dir", default=None, help="输出目录(默认 ./knowledge/{name})") parser.add_argument("--msg-limit", type=int, default=1000, help="最多采集消息条数(默认 1000)") parser.add_argument("--doc-limit", type=int, default=20, help="最多采集文档篇数(默认 20)") + parser.add_argument("--exchange-code", metavar="CODE", help="用 OAuth 授权码换取 user_access_token 并保存到配置") + parser.add_argument("--user-token", metavar="TOKEN", help="直接指定 user_access_token(覆盖配置文件)") + parser.add_argument("--p2p-chat-id", metavar="CHAT_ID", help="私聊会话 ID(覆盖配置文件)") + parser.add_argument("--open-id", metavar="OPEN_ID", help="直接指定目标用户的 open_id(跳过用户搜索)") args = parser.parse_args() @@ -714,11 +904,45 @@ def main() -> None: setup_config() return - if not args.name: - parser.error("请提供 --name") - config = load_config() - output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name}") + + # 换取 user_access_token + if args.exchange_code: + token_data = exchange_code_for_token(args.exchange_code, config) + if token_data: + config["user_access_token"] = token_data["access_token"] + config["refresh_token"] = token_data.get("refresh_token", "") + save_config(config) + print(f"✅ user_access_token 已保存(scope: {token_data.get('scope', '')})") + print(f" token: {token_data['access_token'][:20]}...") + else: + print("❌ 换取失败,请检查 code 是否有效") + return + + if not args.name and not args.open_id: + parser.error("请提供 --name 或 --open-id") + + # 命令行参数覆盖配置 + if args.user_token: + config["user_access_token"] = args.user_token + if args.p2p_chat_id: + config["p2p_chat_id"] = args.p2p_chat_id + + output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name or 'target'}") + + # 如果提供了 open_id,跳过用户搜索 + if args.open_id: + user = {"open_id": args.open_id, "name": args.name or "target"} + output_dir.mkdir(parents=True, exist_ok=True) + print(f"\n🔍 使用指定 open_id: {args.open_id}\n", file=sys.stderr) + + # 只采集消息 + print(f"📨 采集消息记录(上限 {args.msg_limit} 条)...", file=sys.stderr) + msg_content = collect_messages(user, args.msg_limit, config) + msg_path = output_dir / "messages.txt" + msg_path.write_text(msg_content, encoding="utf-8") + print(f" ✅ 消息记录 → {msg_path}", file=sys.stderr) + return collect_all( name=args.name,