Compare commits

...

2 Commits

Author SHA1 Message Date
titanwings
03f5da5439 docs: improve p2p collection guide - teach model to obtain chat_id and tokens dynamically
- Add detailed OAuth flow with step-by-step instructions
- Document how to obtain chat_id via send message API (GET /im/v1/chats doesn't return p2p)
- Add flexibility principle: model can write scripts directly instead of relying on collector
- Include full Feishu API reference for token, message, and contact endpoints
- Add contact/v3/scopes for open_id discovery

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 20:40:31 +08:00
titanwings
a2b6ef3903 feat: add private chat (p2p) message collection via user_access_token
- Add user_access_token support to api_get/api_post for user-identity API calls
- Add fetch_p2p_messages() to collect both sides of a private conversation
- Extend collect_messages() to combine p2p + group chat messages
- Add --exchange-code to convert OAuth code to user_access_token
- Add --user-token, --p2p-chat-id, --open-id CLI flags
- Update SKILL.md with p2p collection flow and permission requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 20:18:58 +08:00
2 changed files with 434 additions and 44 deletions

190
SKILL.md
View File

@@ -104,7 +104,7 @@ allowed-tools: Read, Write, Edit, Bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
``` ```
配置完成后,只需输入姓名,自动完成所有采集 **群聊采集**(使用 tenant_access_token需 bot 在群内)
```bash ```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--name "{name}" \ --name "{name}" \
@@ -113,19 +113,102 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--doc-limit 20 --doc-limit 20
``` ```
**私聊采集**(需要 user_access_token + 私聊 chat_id
私聊消息只能通过用户身份user_access_token获取应用身份无权访问私聊。
**前置条件**
用户需要提供以下信息:
1. **飞书应用凭证**`app_id``app_secret`(在飞书开放平台创建自建应用获取)
2. **用户权限**应用需开通以下用户权限scope
- `im:message` — 以用户身份读取/发送消息
- `im:chat` — 以用户身份读取会话列表
3. **OAuth 授权码code**:用户在浏览器中完成 OAuth 授权后,从回调 URL 中获取
如果用户缺少以上任何信息,引导他们完成配置。不要假设用户已经配好了。
**获取 user_access_token 的完整流程**
当用户提供了 app_id、app_secret并确认已开通用户权限后
1. 帮用户生成 OAuth 授权链接:
```
https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
```
> ⚠️ 注意:`redirect_uri` 需要在飞书应用的「安全设置 → 重定向 URL」中添加 `http://www.example.com`
2. 用户在浏览器打开链接,登录并授权
3. 页面会跳转到 `http://www.example.com?code=xxx`,用户复制 code 给你
4. 用 code 换取 token
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
```
或者你自己写 Python 脚本调飞书 API 换取:
```python
# 1. 获取 app_access_token
POST https://open.feishu.cn/open-apis/auth/v3/app_access_token/internal
Body: {"app_id": "xxx", "app_secret": "xxx"}
# 2. 用 code 换 user_access_token
POST https://open.feishu.cn/open-apis/authen/v1/oidc/access_token
Header: Authorization: Bearer {app_access_token}
Body: {"grant_type": "authorization_code", "code": "xxx"}
```
**获取私聊 chat_id**
用户通常不知道 chat_id。当用户有了 user_access_token 但没有 chat_id 时,你应该**自己写 Python 脚本**来获取:
- **方法**:用 user_access_token 向对方的 open_id 发一条消息,返回值中会包含 chat_id
```python
POST https://open.feishu.cn/open-apis/im/v1/messages?receive_id_type=open_id
Header: Authorization: Bearer {user_access_token}
Body: {"receive_id": "{对方open_id}", "msg_type": "text", "content": "{\"text\":\"你好\"}"}
# 返回值中的 chat_id 就是私聊会话 ID
```
- **注意**`GET /im/v1/chats` 不会返回私聊会话,这是飞书 API 的限制,不是权限问题,不要尝试用这个接口找私聊
- 如果用户不知道对方的 open_id可以用 tenant_access_token 调通讯录 API 搜索:
```python
GET https://open.feishu.cn/open-apis/contact/v3/scopes
# 返回应用可见范围内所有用户的 open_id
```
**执行采集**
拿到 user_access_token 和 chat_id 后:
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--open-id {对方open_id} \
--p2p-chat-id {chat_id} \
--user-token {user_access_token} \
--name "{name}" \
--output-dir ./knowledge/{slug} \
--msg-limit 1000
```
**灵活性原则**:以上 API 调用不一定要用 collector 脚本,如果脚本跑不通或者场景不匹配,你可以直接写 Python 脚本调飞书 API 完成任务。核心 API 参考:
- 获取 token`POST /auth/v3/app_access_token/internal`、`POST /authen/v1/oidc/access_token`
- 发消息(获取 chat_id`POST /im/v1/messages?receive_id_type=open_id`
- 拉消息:`GET /im/v1/messages?container_id_type=chat&container_id={chat_id}`
- 查通讯录:`GET /contact/v3/scopes`、`GET /contact/v3/users/{user_id}`
自动采集内容: 自动采集内容:
- 所有与他共同群聊中他发出的消息(过滤系统消息、表情包) - 群聊:所有与他共同群聊中他发出的消息(过滤系统消息、表情包)
- 私聊:与他的私聊完整对话(含双方消息,用于理解对话语境)
- 他创建/编辑的飞书文档和 Wiki - 他创建/编辑的飞书文档和 Wiki
- 相关多维表格(如有权限) - 相关多维表格(如有权限)
采集完成后用 `Read` 读取输出目录下的文件: 采集完成后用 `Read` 读取输出目录下的文件:
- `knowledge/{slug}/messages.txt` → 消息记录 - `knowledge/{slug}/messages.txt` → 消息记录(群聊 + 私聊)
- `knowledge/{slug}/docs.txt` → 文档内容 - `knowledge/{slug}/docs.txt` → 文档内容
- `knowledge/{slug}/collection_summary.json` → 采集摘要 - `knowledge/{slug}/collection_summary.json` → 采集摘要
如果采集失败(权限不足 / bot 未加群),告知用户需要 如果采集失败,根据报错自行判断原因并尝试修复,常见问题
1. 将飞书 App bot 添加到相关群聊 - 群聊采集:bot 添加到群聊
2. 或改用方式 B/C - 私聊采集user_access_token 过期(有效期 2 小时,可用 refresh_token 刷新)
- 权限不足:引导用户在飞书开放平台开通对应权限并重新授权
- 或改用方式 B/C
--- ---
@@ -521,7 +604,7 @@ First-time setup:
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
``` ```
After setup, just enter the name: **Group chat collection** (uses tenant_access_token, bot must be in the group):
```bash ```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--name "{name}" \ --name "{name}" \
@@ -530,19 +613,102 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--doc-limit 20 --doc-limit 20
``` ```
**Private chat (P2P) collection** (requires user_access_token + p2p chat_id):
Private messages can only be accessed via user identity (user_access_token). App identity cannot access private chats.
**Prerequisites**:
The user needs to provide:
1. **Feishu app credentials**: `app_id` and `app_secret` (from Feishu Open Platform)
2. **User scopes**: The app must have these user scopes enabled:
- `im:message` — read/send messages as user
- `im:chat` — read chat list as user
3. **OAuth authorization code**: obtained after user completes OAuth in browser
If the user is missing any of these, guide them through setup. Don't assume anything is pre-configured.
**Getting user_access_token**:
Once the user provides app_id, app_secret, and confirms scopes are enabled:
1. Generate the OAuth URL for them:
```
https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
```
> ⚠️ The redirect_uri must be added in the app's "Security Settings → Redirect URLs"
2. User opens URL, logs in, authorizes
3. Page redirects to `http://www.example.com?code=xxx`, user copies the code
4. Exchange code for token:
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
```
Or write a Python script to call the Feishu API directly:
```python
# 1. Get app_access_token
POST https://open.feishu.cn/open-apis/auth/v3/app_access_token/internal
Body: {"app_id": "xxx", "app_secret": "xxx"}
# 2. Exchange code for user_access_token
POST https://open.feishu.cn/open-apis/authen/v1/oidc/access_token
Header: Authorization: Bearer {app_access_token}
Body: {"grant_type": "authorization_code", "code": "xxx"}
```
**Getting the p2p chat_id**:
Users typically don't know their chat_id. When the user has a user_access_token but no chat_id, **write a Python script yourself** to obtain it:
- **Method**: Send a message to the other user's open_id — the response includes the chat_id
```python
POST https://open.feishu.cn/open-apis/im/v1/messages?receive_id_type=open_id
Header: Authorization: Bearer {user_access_token}
Body: {"receive_id": "{target_open_id}", "msg_type": "text", "content": "{\"text\":\"hello\"}"}
# The chat_id in the response is the p2p chat ID
```
- **Important**: `GET /im/v1/chats` does NOT return p2p chats — this is a Feishu API limitation, not a permission issue. Do not try to use it for finding private chats.
- If the user doesn't know the target's open_id, use tenant_access_token to search contacts:
```python
GET https://open.feishu.cn/open-apis/contact/v3/scopes
# Returns open_ids of all users visible to the app
```
**Running collection**:
Once you have user_access_token and chat_id:
```bash
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
--open-id {target_open_id} \
--p2p-chat-id {chat_id} \
--user-token {user_access_token} \
--name "{name}" \
--output-dir ./knowledge/{slug} \
--msg-limit 1000
```
**Flexibility principle**: The above API calls don't have to go through the collector script. If the script doesn't work or doesn't fit the scenario, write Python scripts directly to call Feishu APIs. Key API reference:
- Get token: `POST /auth/v3/app_access_token/internal`, `POST /authen/v1/oidc/access_token`
- Send message (get chat_id): `POST /im/v1/messages?receive_id_type=open_id`
- Fetch messages: `GET /im/v1/messages?container_id_type=chat&container_id={chat_id}`
- Search contacts: `GET /contact/v3/scopes`, `GET /contact/v3/users/{user_id}`
Auto-collected content: Auto-collected content:
- All messages sent by them in shared group chats (system messages and stickers filtered) - Group chats: messages sent by them (system messages and stickers filtered)
- Private chats: full conversation with both parties (for context understanding)
- Feishu docs and Wikis they created/edited - Feishu docs and Wikis they created/edited
- Related spreadsheets (if accessible) - Related spreadsheets (if accessible)
After collection, `Read` the output files: After collection, `Read` the output files:
- `knowledge/{slug}/messages.txt` → messages - `knowledge/{slug}/messages.txt` → messages (group + private)
- `knowledge/{slug}/docs.txt` → document content - `knowledge/{slug}/docs.txt` → document content
- `knowledge/{slug}/collection_summary.json` → collection summary - `knowledge/{slug}/collection_summary.json` → collection summary
If collection fails (insufficient permissions / bot not in chat), inform user to: If collection fails, diagnose the error and attempt to fix it. Common issues:
1. Add the Feishu App bot to relevant group chats - Group chat: bot not added to the group
2. Or switch to Option B/C - Private chat: user_access_token expired (2-hour TTL, refresh with refresh_token)
- Insufficient permissions: guide user to enable scopes and re-authorize
- Or switch to Option B/C
--- ---

View File

@@ -5,17 +5,38 @@
输入同事姓名,自动: 输入同事姓名,自动:
1. 搜索飞书用户,获取 user_id 1. 搜索飞书用户,获取 user_id
2. 找到与他共同的群聊,拉取他的消息记录 2. 找到与他共同的群聊,拉取他的消息记录
3. 搜索他创建/编辑的文档和 Wiki 3. 拉取私聊消息(需要 user_access_token
4. 拉取文档内容 4. 搜索他创建/编辑的文档和 Wiki
5. 拉取多维表格(如有) 5. 拉取文档内容
6. 输出统一格式,直接进 create-colleague 分析流程 6. 拉取多维表格(如有)
7. 输出统一格式,直接进 create-colleague 分析流程
前置: 前置:
python3 feishu_auto_collector.py --setup # 配置 App ID / Secret一次性 python3 feishu_auto_collector.py --setup # 配置 App ID / Secret一次性
私聊采集(需额外步骤):
1. 飞书应用开通用户权限im:message, im:chat
2. 获取 OAuth 授权码:
浏览器打开: https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
授权后从地址栏复制 code
3. 换取 token
python3 feishu_auto_collector.py --exchange-code {CODE}
4. 采集时指定私聊 chat_id
python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
用法: 用法:
# 群聊采集(原有方式)
python3 feishu_auto_collector.py --name "张三" --output-dir ./knowledge/zhangsan python3 feishu_auto_collector.py --name "张三" --output-dir ./knowledge/zhangsan
python3 feishu_auto_collector.py --name "张三" --msg-limit 1000 --doc-limit 20 python3 feishu_auto_collector.py --name "张三" --msg-limit 1000 --doc-limit 20
# 私聊采集
python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
# 直接指定 open_id + 私聊(跳过用户搜索)
python3 feishu_auto_collector.py --open-id ou_xxx --p2p-chat-id oc_xxx --name "张三"
# 换取 user_access_token
python3 feishu_auto_collector.py --exchange-code {CODE}
""" """
from __future__ import annotations from __future__ import annotations
@@ -57,11 +78,15 @@ def setup_config() -> None:
print("=== 飞书自动采集配置 ===\n") print("=== 飞书自动采集配置 ===\n")
print("请前往 https://open.feishu.cn 创建企业自建应用,开通以下权限:") print("请前往 https://open.feishu.cn 创建企业自建应用,开通以下权限:")
print() print()
print(" 消息类:") print(" 消息类(应用权限,用于群聊采集)")
print(" im:message:readonly 读取消息") print(" im:message:readonly 读取消息")
print(" im:chat:readonly 读取群聊信息") print(" im:chat:readonly 读取群聊信息")
print(" im:chat.members:readonly 读取群成员") print(" im:chat.members:readonly 读取群成员")
print() print()
print(" 消息类(用户权限,用于私聊采集):")
print(" im:message 以用户身份读取/发送消息")
print(" im:chat 以用户身份读取会话列表")
print()
print(" 用户类:") print(" 用户类:")
print(" contact:user.base:readonly 读取用户基本信息") print(" contact:user.base:readonly 读取用户基本信息")
print(" contact:department.base:readonly 遍历部门查找用户(按姓名搜索必需)") print(" contact:department.base:readonly 遍历部门查找用户(按姓名搜索必需)")
@@ -74,11 +99,26 @@ def setup_config() -> None:
print(" 多维表格:") print(" 多维表格:")
print(" bitable:app:readonly 读取多维表格") print(" bitable:app:readonly 读取多维表格")
print() print()
print(" ─── 私聊采集说明 ───")
print(" 私聊消息必须通过 user_access_token 获取(应用身份无权访问私聊)。")
print(" 获取方式OAuth 授权,授权链接格式:")
print(" https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri={REDIRECT}&scope=im:message%20im:chat")
print(" 授权后从回调 URL 中取 code用 --exchange-code 换取 token。")
print()
app_id = input("App ID (cli_xxx): ").strip() app_id = input("App ID (cli_xxx): ").strip()
app_secret = input("App Secret: ").strip() app_secret = input("App Secret: ").strip()
config = {"app_id": app_id, "app_secret": app_secret} config = {"app_id": app_id, "app_secret": app_secret}
print("\n是否配置 user_access_token用于私聊消息采集可跳过")
user_token = input("user_access_token (留空跳过): ").strip()
if user_token:
config["user_access_token"] = user_token
p2p_chat_id = input("私聊 chat_id (留空跳过): ").strip()
if p2p_chat_id:
config["p2p_chat_id"] = p2p_chat_id
save_config(config) save_config(config)
print(f"\n✅ 配置已保存到 {CONFIG_PATH}") print(f"\n✅ 配置已保存到 {CONFIG_PATH}")
@@ -110,8 +150,11 @@ def get_tenant_token(config: dict) -> str:
return token return token
def api_get(path: str, params: dict, config: dict) -> dict: def api_get(path: str, params: dict, config: dict, use_user_token: bool = False) -> dict:
token = get_tenant_token(config) if use_user_token and config.get("user_access_token"):
token = config["user_access_token"]
else:
token = get_tenant_token(config)
resp = requests.get( resp = requests.get(
f"{BASE_URL}{path}", f"{BASE_URL}{path}",
params=params, params=params,
@@ -121,8 +164,11 @@ def api_get(path: str, params: dict, config: dict) -> dict:
return resp.json() return resp.json()
def api_post(path: str, body: dict, config: dict) -> dict: def api_post(path: str, body: dict, config: dict, use_user_token: bool = False) -> dict:
token = get_tenant_token(config) if use_user_token and config.get("user_access_token"):
token = config["user_access_token"]
else:
token = get_tenant_token(config)
resp = requests.post( resp = requests.post(
f"{BASE_URL}{path}", f"{BASE_URL}{path}",
json=body, json=body,
@@ -132,6 +178,22 @@ def api_post(path: str, body: dict, config: dict) -> dict:
return resp.json() return resp.json()
def exchange_code_for_token(code: str, config: dict) -> dict:
"""用 OAuth 授权码换取 user_access_token"""
app_token = get_tenant_token(config)
resp = requests.post(
f"{BASE_URL}/authen/v1/oidc/access_token",
headers={"Authorization": f"Bearer {app_token}"},
json={"grant_type": "authorization_code", "code": code},
timeout=10,
)
data = resp.json()
if data.get("code") != 0:
print(f"换取 token 失败:{data}", file=sys.stderr)
return {}
return data.get("data", {})
# ─── 用户搜索 ───────────────────────────────────────────────────────────────── # ─── 用户搜索 ─────────────────────────────────────────────────────────────────
def _find_user_by_contact(name: str, config: dict) -> Optional[dict]: def _find_user_by_contact(name: str, config: dict) -> Optional[dict]:
@@ -421,42 +483,156 @@ def fetch_messages_from_chat(
return messages[:limit] return messages[:limit]
def fetch_p2p_messages(
chat_id: str,
user_open_id: str,
limit: int,
config: dict,
) -> list:
"""使用 user_access_token 从私聊会话拉取消息(包含双方所有消息)"""
messages = []
page_token = None
while len(messages) < limit:
params = {
"container_id_type": "chat",
"container_id": chat_id,
"page_size": 50,
"sort_type": "ByCreateTimeDesc",
}
if page_token:
params["page_token"] = page_token
data = api_get("/im/v1/messages", params, config, use_user_token=True)
if data.get("code") != 0:
print(f" 拉取私聊消息失败code={data.get('code')}{data.get('msg')}", file=sys.stderr)
break
items = data.get("data", {}).get("items", [])
if not items:
break
for item in items:
sender = item.get("sender", {})
sender_id = sender.get("id") or sender.get("open_id", "")
# 解析消息内容
content_raw = item.get("body", {}).get("content", "")
try:
content_obj = json.loads(content_raw)
if isinstance(content_obj, dict):
# 纯文本消息
if "text" in content_obj:
content = content_obj["text"]
else:
# 富文本消息
text_parts = []
for line in content_obj.get("content", []):
for seg in line:
if seg.get("tag") in ("text", "a"):
text_parts.append(seg.get("text", ""))
content = " ".join(text_parts)
else:
content = str(content_obj)
except Exception:
content = content_raw
content = content.strip()
if not content or content in ("[图片]", "[文件]", "[表情]", "[语音]"):
continue
ts = item.get("create_time", "")
if ts:
try:
ts = datetime.fromtimestamp(int(ts) / 1000).strftime("%Y-%m-%d %H:%M")
except Exception:
pass
is_target = (sender_id == user_open_id)
messages.append({
"content": content,
"time": ts,
"sender_id": sender_id,
"is_target": is_target,
})
if not data.get("data", {}).get("has_more"):
break
page_token = data.get("data", {}).get("page_token")
return messages[:limit]
def collect_messages( def collect_messages(
user: dict, user: dict,
msg_limit: int, msg_limit: int,
config: dict, config: dict,
) -> str: ) -> str:
"""采集目标用户的所有消息记录""" """采集目标用户的所有消息记录(群聊 + 私聊)"""
user_open_id = user.get("open_id") or user.get("user_id", "") user_open_id = user.get("open_id") or user.get("user_id", "")
name = user.get("name", "") name = user.get("name", "")
chats = get_chats_with_user(user_open_id, config)
if not chats:
return f"# 消息记录\n\n未找到与 {name} 共同的群聊(请确认 bot 已被添加到相关群)\n"
all_messages = [] all_messages = []
per_chat_limit = max(100, msg_limit // len(chats)) chat_sources = []
for chat in chats: # ── 私聊采集(需要 user_access_token + p2p_chat_id──
chat_id = chat.get("chat_id") p2p_chat_id = config.get("p2p_chat_id", "")
chat_name = chat.get("name", chat_id) user_token = config.get("user_access_token", "")
print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr)
msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config) if user_token and p2p_chat_id:
for m in msgs: print(f" 📱 采集私聊消息chat_id: {p2p_chat_id}...", file=sys.stderr)
m["chat"] = chat_name p2p_msgs = fetch_p2p_messages(p2p_chat_id, user_open_id, msg_limit, config)
all_messages.extend(msgs) for m in p2p_msgs:
print(f" 获取 {len(msgs)}", file=sys.stderr) m["chat"] = "私聊"
all_messages.extend(p2p_msgs)
chat_sources.append(f"私聊({len(p2p_msgs)} 条)")
print(f" 获取 {len(p2p_msgs)} 条私聊消息", file=sys.stderr)
elif user_token and not p2p_chat_id:
print(f" ⚠️ 有 user_access_token 但未配置 p2p_chat_id跳过私聊采集", file=sys.stderr)
print(f" 请在配置中添加 p2p_chat_id通过发送消息 API 返回值获取)", file=sys.stderr)
# ── 群聊采集(使用 tenant_access_token──
remaining = msg_limit - len(all_messages)
if remaining > 0:
chats = get_chats_with_user(user_open_id, config)
if chats:
per_chat_limit = max(100, remaining // len(chats))
for chat in chats:
chat_id = chat.get("chat_id")
chat_name = chat.get("name", chat_id)
print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr)
msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config)
for m in msgs:
m["chat"] = chat_name
all_messages.extend(msgs)
chat_sources.append(f"{chat_name}{len(msgs)} 条)")
print(f" 获取 {len(msgs)}", file=sys.stderr)
if not all_messages:
tips = f"# 消息记录\n\n未找到 {name} 的消息记录。\n\n"
tips += "可能原因:\n"
tips += " - 群聊采集bot 未被添加到相关群聊\n"
tips += " - 私聊采集:未配置 user_access_token 或 p2p_chat_id\n"
tips += "\n私聊采集配置方法:\n"
tips += " 1. 在飞书开放平台开通 im:message 和 im:chat 用户权限\n"
tips += " 2. 通过 OAuth 授权获取 user_access_token--exchange-code\n"
tips += " 3. 配置 p2p_chat_id私聊会话 ID\n"
return tips
# 分类输出 # 分类输出
long_msgs = [m for m in all_messages if len(m.get("content", "")) > 50] # 私聊消息包含双方对话,标注发言人
short_msgs = [m for m in all_messages if len(m.get("content", "")) <= 50] target_msgs = [m for m in all_messages if m.get("is_target", True)]
other_msgs = [m for m in all_messages if not m.get("is_target", True)]
long_msgs = [m for m in target_msgs if len(m.get("content", "")) > 50]
short_msgs = [m for m in target_msgs if len(m.get("content", "")) <= 50]
lines = [ lines = [
f"# 飞书消息记录(自动采集)", f"# 飞书消息记录(自动采集)",
f"目标:{name}", f"目标:{name}",
f"来源群聊{', '.join(c.get('name', '') for c in chats)}", f"来源:{', '.join(chat_sources)}",
f"{len(all_messages)} 条消息", f"{len(all_messages)} 条消息(目标用户 {len(target_msgs)} 条,对话方 {len(other_msgs)} 条)",
"", "",
"---", "---",
"", "",
@@ -471,6 +647,16 @@ def collect_messages(
for m in short_msgs[:300]: for m in short_msgs[:300]:
lines.append(f"[{m.get('time', '')}] {m['content']}") lines.append(f"[{m.get('time', '')}] {m['content']}")
# 私聊对话上下文(保留双方对话,便于理解语境)
p2p_msgs = [m for m in all_messages if m.get("chat") == "私聊"]
if p2p_msgs:
lines += ["", "---", "", "## 私聊对话上下文(含双方消息)", ""]
# 按时间正序
p2p_sorted = sorted(p2p_msgs, key=lambda x: x.get("time", ""))
for m in p2p_sorted[:500]:
who = f"[{name}]" if m.get("is_target") else "[对方]"
lines.append(f"[{m.get('time', '')}] {who} {m['content']}")
return "\n".join(lines) return "\n".join(lines)
@@ -707,6 +893,10 @@ def main() -> None:
parser.add_argument("--output-dir", default=None, help="输出目录(默认 ./knowledge/{name}") parser.add_argument("--output-dir", default=None, help="输出目录(默认 ./knowledge/{name}")
parser.add_argument("--msg-limit", type=int, default=1000, help="最多采集消息条数(默认 1000") parser.add_argument("--msg-limit", type=int, default=1000, help="最多采集消息条数(默认 1000")
parser.add_argument("--doc-limit", type=int, default=20, help="最多采集文档篇数(默认 20") parser.add_argument("--doc-limit", type=int, default=20, help="最多采集文档篇数(默认 20")
parser.add_argument("--exchange-code", metavar="CODE", help="用 OAuth 授权码换取 user_access_token 并保存到配置")
parser.add_argument("--user-token", metavar="TOKEN", help="直接指定 user_access_token覆盖配置文件")
parser.add_argument("--p2p-chat-id", metavar="CHAT_ID", help="私聊会话 ID覆盖配置文件")
parser.add_argument("--open-id", metavar="OPEN_ID", help="直接指定目标用户的 open_id跳过用户搜索")
args = parser.parse_args() args = parser.parse_args()
@@ -714,11 +904,45 @@ def main() -> None:
setup_config() setup_config()
return return
if not args.name:
parser.error("请提供 --name")
config = load_config() config = load_config()
output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name}")
# 换取 user_access_token
if args.exchange_code:
token_data = exchange_code_for_token(args.exchange_code, config)
if token_data:
config["user_access_token"] = token_data["access_token"]
config["refresh_token"] = token_data.get("refresh_token", "")
save_config(config)
print(f"✅ user_access_token 已保存scope: {token_data.get('scope', '')}")
print(f" token: {token_data['access_token'][:20]}...")
else:
print("❌ 换取失败,请检查 code 是否有效")
return
if not args.name and not args.open_id:
parser.error("请提供 --name 或 --open-id")
# 命令行参数覆盖配置
if args.user_token:
config["user_access_token"] = args.user_token
if args.p2p_chat_id:
config["p2p_chat_id"] = args.p2p_chat_id
output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name or 'target'}")
# 如果提供了 open_id跳过用户搜索
if args.open_id:
user = {"open_id": args.open_id, "name": args.name or "target"}
output_dir.mkdir(parents=True, exist_ok=True)
print(f"\n🔍 使用指定 open_id: {args.open_id}\n", file=sys.stderr)
# 只采集消息
print(f"📨 采集消息记录(上限 {args.msg_limit} 条)...", file=sys.stderr)
msg_content = collect_messages(user, args.msg_limit, config)
msg_path = output_dir / "messages.txt"
msg_path.write_text(msg_content, encoding="utf-8")
print(f" ✅ 消息记录 → {msg_path}", file=sys.stderr)
return
collect_all( collect_all(
name=args.name, name=args.name,