feat: add private chat (p2p) message collection via user_access_token

- Add user_access_token support to api_get/api_post for user-identity API calls
- Add fetch_p2p_messages() to collect both sides of a private conversation
- Extend collect_messages() to combine p2p + group chat messages
- Add --exchange-code to convert OAuth code to user_access_token
- Add --user-token, --p2p-chat-id, --open-id CLI flags
- Update SKILL.md with p2p collection flow and permission requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
titanwings
2026-04-02 20:18:58 +08:00
parent ce30311a72
commit a2b6ef3903
2 changed files with 331 additions and 44 deletions

View File

@@ -104,7 +104,7 @@ allowed-tools: Read, Write, Edit, Bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
```
配置完成后,只需输入姓名,自动完成所有采集
**群聊采集**(使用 tenant_access_token需 bot 在群内)
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--name "{name}" \
@@ -113,19 +113,51 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--doc-limit 20
```
**私聊采集**(需要 user_access_token + 私聊 chat_id
私聊消息需要用户本人授权,步骤:
1. 飞书应用需开通**用户权限**`im:message``im:chat`
2. 生成 OAuth 授权链接(替换 APP_ID
```
https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
```
3. 用户在浏览器打开授权,从回调 URL 复制 code
4. 换取 token
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
```
5. 采集私聊(需提供私聊 chat_id可在发送消息的 API 返回值中获取):
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--name "{name}" \
--p2p-chat-id oc_xxx \
--output-dir ./knowledge/{slug} \
--msg-limit 1000
```
6. 也可以直接指定 open_id 跳过用户搜索:
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--open-id ou_xxx \
--p2p-chat-id oc_xxx \
--name "{name}"
```
自动采集内容:
- 所有与他共同群聊中他发出的消息(过滤系统消息、表情包)
- 群聊:所有与他共同群聊中他发出的消息(过滤系统消息、表情包)
- 私聊:与他的私聊完整对话(含双方消息,用于理解对话语境)
- 他创建/编辑的飞书文档和 Wiki
- 相关多维表格(如有权限)
采集完成后用 `Read` 读取输出目录下的文件:
- `knowledge/{slug}/messages.txt` → 消息记录
- `knowledge/{slug}/messages.txt` → 消息记录(群聊 + 私聊)
- `knowledge/{slug}/docs.txt` → 文档内容
- `knowledge/{slug}/collection_summary.json` → 采集摘要
如果采集失败(权限不足 / bot 未加群),告知用户需要
1. 将飞书 App bot 添加到相关群聊
2. 或改用方式 B/C
如果采集失败,告知用户检查
- 群聊采集:bot 是否已添加到相关群聊
- 私聊采集:是否已配置 user_access_token 和 p2p_chat_id
- 权限:应用是否开通了 im:message 和 im:chat 用户权限
- 或改用方式 B/C
---
@@ -521,7 +553,7 @@ First-time setup:
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
```
After setup, just enter the name:
**Group chat collection** (uses tenant_access_token, bot must be in the group):
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--name "{name}" \
@@ -530,19 +562,50 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--doc-limit 20
```
**Private chat (P2P) collection** (requires user_access_token + p2p chat_id):
Private messages require user authorization:
1. Enable **user scopes** in Feishu app: `im:message`, `im:chat`
2. Generate OAuth URL (replace APP_ID):
```
https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
```
3. User opens URL in browser, authorizes, copies code from callback URL
4. Exchange code for token:
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
```
5. Collect with p2p chat_id:
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--name "{name}" \
--p2p-chat-id oc_xxx \
--output-dir ./knowledge/{slug}
```
6. Or skip user search by providing open_id directly:
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--open-id ou_xxx \
--p2p-chat-id oc_xxx \
--name "{name}"
```
Auto-collected content:
- All messages sent by them in shared group chats (system messages and stickers filtered)
- Group chats: messages sent by them (system messages and stickers filtered)
- Private chats: full conversation with both parties (for context understanding)
- Feishu docs and Wikis they created/edited
- Related spreadsheets (if accessible)
After collection, `Read` the output files:
- `knowledge/{slug}/messages.txt` → messages
- `knowledge/{slug}/messages.txt` → messages (group + private)
- `knowledge/{slug}/docs.txt` → document content
- `knowledge/{slug}/collection_summary.json` → collection summary
If collection fails (insufficient permissions / bot not in chat), inform user to:
1. Add the Feishu App bot to relevant group chats
2. Or switch to Option B/C
If collection fails, check:
- Group chat: bot must be added to relevant group chats
- Private chat: user_access_token and p2p_chat_id must be configured
- Permissions: app must have im:message and im:chat user scopes enabled
- Or switch to Option B/C
---

View File

@@ -5,17 +5,38 @@
输入同事姓名,自动:
1. 搜索飞书用户,获取 user_id
2. 找到与他共同的群聊,拉取他的消息记录
3. 搜索他创建/编辑的文档和 Wiki
4. 拉取文档内容
5. 拉取多维表格(如有)
6. 输出统一格式,直接进 create-colleague 分析流程
3. 拉取私聊消息(需要 user_access_token
4. 搜索他创建/编辑的文档和 Wiki
5. 拉取文档内容
6. 拉取多维表格(如有)
7. 输出统一格式,直接进 create-colleague 分析流程
前置:
python3 feishu_auto_collector.py --setup # 配置 App ID / Secret一次性
私聊采集(需额外步骤):
1. 飞书应用开通用户权限im:message, im:chat
2. 获取 OAuth 授权码:
浏览器打开: https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
授权后从地址栏复制 code
3. 换取 token
python3 feishu_auto_collector.py --exchange-code {CODE}
4. 采集时指定私聊 chat_id
python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
用法:
# 群聊采集(原有方式)
python3 feishu_auto_collector.py --name "张三" --output-dir ./knowledge/zhangsan
python3 feishu_auto_collector.py --name "张三" --msg-limit 1000 --doc-limit 20
# 私聊采集
python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
# 直接指定 open_id + 私聊(跳过用户搜索)
python3 feishu_auto_collector.py --open-id ou_xxx --p2p-chat-id oc_xxx --name "张三"
# 换取 user_access_token
python3 feishu_auto_collector.py --exchange-code {CODE}
"""
from __future__ import annotations
@@ -57,11 +78,15 @@ def setup_config() -> None:
print("=== 飞书自动采集配置 ===\n")
print("请前往 https://open.feishu.cn 创建企业自建应用,开通以下权限:")
print()
print(" 消息类:")
print(" 消息类(应用权限,用于群聊采集)")
print(" im:message:readonly 读取消息")
print(" im:chat:readonly 读取群聊信息")
print(" im:chat.members:readonly 读取群成员")
print()
print(" 消息类(用户权限,用于私聊采集):")
print(" im:message 以用户身份读取/发送消息")
print(" im:chat 以用户身份读取会话列表")
print()
print(" 用户类:")
print(" contact:user.base:readonly 读取用户基本信息")
print(" contact:department.base:readonly 遍历部门查找用户(按姓名搜索必需)")
@@ -74,11 +99,26 @@ def setup_config() -> None:
print(" 多维表格:")
print(" bitable:app:readonly 读取多维表格")
print()
print(" ─── 私聊采集说明 ───")
print(" 私聊消息必须通过 user_access_token 获取(应用身份无权访问私聊)。")
print(" 获取方式OAuth 授权,授权链接格式:")
print(" https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri={REDIRECT}&scope=im:message%20im:chat")
print(" 授权后从回调 URL 中取 code用 --exchange-code 换取 token。")
print()
app_id = input("App ID (cli_xxx): ").strip()
app_secret = input("App Secret: ").strip()
config = {"app_id": app_id, "app_secret": app_secret}
print("\n是否配置 user_access_token用于私聊消息采集可跳过")
user_token = input("user_access_token (留空跳过): ").strip()
if user_token:
config["user_access_token"] = user_token
p2p_chat_id = input("私聊 chat_id (留空跳过): ").strip()
if p2p_chat_id:
config["p2p_chat_id"] = p2p_chat_id
save_config(config)
print(f"\n✅ 配置已保存到 {CONFIG_PATH}")
@@ -110,8 +150,11 @@ def get_tenant_token(config: dict) -> str:
return token
def api_get(path: str, params: dict, config: dict) -> dict:
token = get_tenant_token(config)
def api_get(path: str, params: dict, config: dict, use_user_token: bool = False) -> dict:
if use_user_token and config.get("user_access_token"):
token = config["user_access_token"]
else:
token = get_tenant_token(config)
resp = requests.get(
f"{BASE_URL}{path}",
params=params,
@@ -121,8 +164,11 @@ def api_get(path: str, params: dict, config: dict) -> dict:
return resp.json()
def api_post(path: str, body: dict, config: dict) -> dict:
token = get_tenant_token(config)
def api_post(path: str, body: dict, config: dict, use_user_token: bool = False) -> dict:
if use_user_token and config.get("user_access_token"):
token = config["user_access_token"]
else:
token = get_tenant_token(config)
resp = requests.post(
f"{BASE_URL}{path}",
json=body,
@@ -132,6 +178,22 @@ def api_post(path: str, body: dict, config: dict) -> dict:
return resp.json()
def exchange_code_for_token(code: str, config: dict) -> dict:
"""用 OAuth 授权码换取 user_access_token"""
app_token = get_tenant_token(config)
resp = requests.post(
f"{BASE_URL}/authen/v1/oidc/access_token",
headers={"Authorization": f"Bearer {app_token}"},
json={"grant_type": "authorization_code", "code": code},
timeout=10,
)
data = resp.json()
if data.get("code") != 0:
print(f"换取 token 失败:{data}", file=sys.stderr)
return {}
return data.get("data", {})
# ─── 用户搜索 ─────────────────────────────────────────────────────────────────
def _find_user_by_contact(name: str, config: dict) -> Optional[dict]:
@@ -421,42 +483,156 @@ def fetch_messages_from_chat(
return messages[:limit]
def fetch_p2p_messages(
chat_id: str,
user_open_id: str,
limit: int,
config: dict,
) -> list:
"""使用 user_access_token 从私聊会话拉取消息(包含双方所有消息)"""
messages = []
page_token = None
while len(messages) < limit:
params = {
"container_id_type": "chat",
"container_id": chat_id,
"page_size": 50,
"sort_type": "ByCreateTimeDesc",
}
if page_token:
params["page_token"] = page_token
data = api_get("/im/v1/messages", params, config, use_user_token=True)
if data.get("code") != 0:
print(f" 拉取私聊消息失败code={data.get('code')}{data.get('msg')}", file=sys.stderr)
break
items = data.get("data", {}).get("items", [])
if not items:
break
for item in items:
sender = item.get("sender", {})
sender_id = sender.get("id") or sender.get("open_id", "")
# 解析消息内容
content_raw = item.get("body", {}).get("content", "")
try:
content_obj = json.loads(content_raw)
if isinstance(content_obj, dict):
# 纯文本消息
if "text" in content_obj:
content = content_obj["text"]
else:
# 富文本消息
text_parts = []
for line in content_obj.get("content", []):
for seg in line:
if seg.get("tag") in ("text", "a"):
text_parts.append(seg.get("text", ""))
content = " ".join(text_parts)
else:
content = str(content_obj)
except Exception:
content = content_raw
content = content.strip()
if not content or content in ("[图片]", "[文件]", "[表情]", "[语音]"):
continue
ts = item.get("create_time", "")
if ts:
try:
ts = datetime.fromtimestamp(int(ts) / 1000).strftime("%Y-%m-%d %H:%M")
except Exception:
pass
is_target = (sender_id == user_open_id)
messages.append({
"content": content,
"time": ts,
"sender_id": sender_id,
"is_target": is_target,
})
if not data.get("data", {}).get("has_more"):
break
page_token = data.get("data", {}).get("page_token")
return messages[:limit]
def collect_messages(
user: dict,
msg_limit: int,
config: dict,
) -> str:
"""采集目标用户的所有消息记录"""
"""采集目标用户的所有消息记录(群聊 + 私聊)"""
user_open_id = user.get("open_id") or user.get("user_id", "")
name = user.get("name", "")
chats = get_chats_with_user(user_open_id, config)
if not chats:
return f"# 消息记录\n\n未找到与 {name} 共同的群聊(请确认 bot 已被添加到相关群)\n"
all_messages = []
per_chat_limit = max(100, msg_limit // len(chats))
chat_sources = []
for chat in chats:
chat_id = chat.get("chat_id")
chat_name = chat.get("name", chat_id)
print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr)
# ── 私聊采集(需要 user_access_token + p2p_chat_id──
p2p_chat_id = config.get("p2p_chat_id", "")
user_token = config.get("user_access_token", "")
msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config)
for m in msgs:
m["chat"] = chat_name
all_messages.extend(msgs)
print(f" 获取 {len(msgs)}", file=sys.stderr)
if user_token and p2p_chat_id:
print(f" 📱 采集私聊消息chat_id: {p2p_chat_id}...", file=sys.stderr)
p2p_msgs = fetch_p2p_messages(p2p_chat_id, user_open_id, msg_limit, config)
for m in p2p_msgs:
m["chat"] = "私聊"
all_messages.extend(p2p_msgs)
chat_sources.append(f"私聊({len(p2p_msgs)} 条)")
print(f" 获取 {len(p2p_msgs)} 条私聊消息", file=sys.stderr)
elif user_token and not p2p_chat_id:
print(f" ⚠️ 有 user_access_token 但未配置 p2p_chat_id跳过私聊采集", file=sys.stderr)
print(f" 请在配置中添加 p2p_chat_id通过发送消息 API 返回值获取)", file=sys.stderr)
# ── 群聊采集(使用 tenant_access_token──
remaining = msg_limit - len(all_messages)
if remaining > 0:
chats = get_chats_with_user(user_open_id, config)
if chats:
per_chat_limit = max(100, remaining // len(chats))
for chat in chats:
chat_id = chat.get("chat_id")
chat_name = chat.get("name", chat_id)
print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr)
msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config)
for m in msgs:
m["chat"] = chat_name
all_messages.extend(msgs)
chat_sources.append(f"{chat_name}{len(msgs)} 条)")
print(f" 获取 {len(msgs)}", file=sys.stderr)
if not all_messages:
tips = f"# 消息记录\n\n未找到 {name} 的消息记录。\n\n"
tips += "可能原因:\n"
tips += " - 群聊采集bot 未被添加到相关群聊\n"
tips += " - 私聊采集:未配置 user_access_token 或 p2p_chat_id\n"
tips += "\n私聊采集配置方法:\n"
tips += " 1. 在飞书开放平台开通 im:message 和 im:chat 用户权限\n"
tips += " 2. 通过 OAuth 授权获取 user_access_token--exchange-code\n"
tips += " 3. 配置 p2p_chat_id私聊会话 ID\n"
return tips
# 分类输出
long_msgs = [m for m in all_messages if len(m.get("content", "")) > 50]
short_msgs = [m for m in all_messages if len(m.get("content", "")) <= 50]
# 私聊消息包含双方对话,标注发言人
target_msgs = [m for m in all_messages if m.get("is_target", True)]
other_msgs = [m for m in all_messages if not m.get("is_target", True)]
long_msgs = [m for m in target_msgs if len(m.get("content", "")) > 50]
short_msgs = [m for m in target_msgs if len(m.get("content", "")) <= 50]
lines = [
f"# 飞书消息记录(自动采集)",
f"目标:{name}",
f"来源群聊{', '.join(c.get('name', '') for c in chats)}",
f"{len(all_messages)} 条消息",
f"来源:{', '.join(chat_sources)}",
f"{len(all_messages)} 条消息(目标用户 {len(target_msgs)} 条,对话方 {len(other_msgs)} 条)",
"",
"---",
"",
@@ -471,6 +647,16 @@ def collect_messages(
for m in short_msgs[:300]:
lines.append(f"[{m.get('time', '')}] {m['content']}")
# 私聊对话上下文(保留双方对话,便于理解语境)
p2p_msgs = [m for m in all_messages if m.get("chat") == "私聊"]
if p2p_msgs:
lines += ["", "---", "", "## 私聊对话上下文(含双方消息)", ""]
# 按时间正序
p2p_sorted = sorted(p2p_msgs, key=lambda x: x.get("time", ""))
for m in p2p_sorted[:500]:
who = f"[{name}]" if m.get("is_target") else "[对方]"
lines.append(f"[{m.get('time', '')}] {who} {m['content']}")
return "\n".join(lines)
@@ -707,6 +893,10 @@ def main() -> None:
parser.add_argument("--output-dir", default=None, help="输出目录(默认 ./knowledge/{name}")
parser.add_argument("--msg-limit", type=int, default=1000, help="最多采集消息条数(默认 1000")
parser.add_argument("--doc-limit", type=int, default=20, help="最多采集文档篇数(默认 20")
parser.add_argument("--exchange-code", metavar="CODE", help="用 OAuth 授权码换取 user_access_token 并保存到配置")
parser.add_argument("--user-token", metavar="TOKEN", help="直接指定 user_access_token覆盖配置文件")
parser.add_argument("--p2p-chat-id", metavar="CHAT_ID", help="私聊会话 ID覆盖配置文件")
parser.add_argument("--open-id", metavar="OPEN_ID", help="直接指定目标用户的 open_id跳过用户搜索")
args = parser.parse_args()
@@ -714,11 +904,45 @@ def main() -> None:
setup_config()
return
if not args.name:
parser.error("请提供 --name")
config = load_config()
output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name}")
# 换取 user_access_token
if args.exchange_code:
token_data = exchange_code_for_token(args.exchange_code, config)
if token_data:
config["user_access_token"] = token_data["access_token"]
config["refresh_token"] = token_data.get("refresh_token", "")
save_config(config)
print(f"✅ user_access_token 已保存scope: {token_data.get('scope', '')}")
print(f" token: {token_data['access_token'][:20]}...")
else:
print("❌ 换取失败,请检查 code 是否有效")
return
if not args.name and not args.open_id:
parser.error("请提供 --name 或 --open-id")
# 命令行参数覆盖配置
if args.user_token:
config["user_access_token"] = args.user_token
if args.p2p_chat_id:
config["p2p_chat_id"] = args.p2p_chat_id
output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name or 'target'}")
# 如果提供了 open_id跳过用户搜索
if args.open_id:
user = {"open_id": args.open_id, "name": args.name or "target"}
output_dir.mkdir(parents=True, exist_ok=True)
print(f"\n🔍 使用指定 open_id: {args.open_id}\n", file=sys.stderr)
# 只采集消息
print(f"📨 采集消息记录(上限 {args.msg_limit} 条)...", file=sys.stderr)
msg_content = collect_messages(user, args.msg_limit, config)
msg_path = output_dir / "messages.txt"
msg_path.write_text(msg_content, encoding="utf-8")
print(f" ✅ 消息记录 → {msg_path}", file=sys.stderr)
return
collect_all(
name=args.name,