mirror of
https://github.com/titanwings/colleague-skill.git
synced 2026-04-04 22:59:06 +08:00
Compare commits
2 Commits
ce30311a72
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
03f5da5439 | ||
|
|
a2b6ef3903 |
190
SKILL.md
190
SKILL.md
@@ -104,7 +104,7 @@ allowed-tools: Read, Write, Edit, Bash
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
|
||||
```
|
||||
|
||||
配置完成后,只需输入姓名,自动完成所有采集:
|
||||
**群聊采集**(使用 tenant_access_token,需 bot 在群内):
|
||||
```bash
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
|
||||
--name "{name}" \
|
||||
@@ -113,19 +113,102 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
|
||||
--doc-limit 20
|
||||
```
|
||||
|
||||
**私聊采集**(需要 user_access_token + 私聊 chat_id):
|
||||
|
||||
私聊消息只能通过用户身份(user_access_token)获取,应用身份无权访问私聊。
|
||||
|
||||
**前置条件**:
|
||||
|
||||
用户需要提供以下信息:
|
||||
1. **飞书应用凭证**:`app_id` 和 `app_secret`(在飞书开放平台创建自建应用获取)
|
||||
2. **用户权限**:应用需开通以下用户权限(scope):
|
||||
- `im:message` — 以用户身份读取/发送消息
|
||||
- `im:chat` — 以用户身份读取会话列表
|
||||
3. **OAuth 授权码(code)**:用户在浏览器中完成 OAuth 授权后,从回调 URL 中获取
|
||||
|
||||
如果用户缺少以上任何信息,引导他们完成配置。不要假设用户已经配好了。
|
||||
|
||||
**获取 user_access_token 的完整流程**:
|
||||
|
||||
当用户提供了 app_id、app_secret,并确认已开通用户权限后:
|
||||
|
||||
1. 帮用户生成 OAuth 授权链接:
|
||||
```
|
||||
https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
|
||||
```
|
||||
> ⚠️ 注意:`redirect_uri` 需要在飞书应用的「安全设置 → 重定向 URL」中添加 `http://www.example.com`
|
||||
|
||||
2. 用户在浏览器打开链接,登录并授权
|
||||
3. 页面会跳转到 `http://www.example.com?code=xxx`,用户复制 code 给你
|
||||
4. 用 code 换取 token:
|
||||
```bash
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
|
||||
```
|
||||
或者你自己写 Python 脚本调飞书 API 换取:
|
||||
```python
|
||||
# 1. 获取 app_access_token
|
||||
POST https://open.feishu.cn/open-apis/auth/v3/app_access_token/internal
|
||||
Body: {"app_id": "xxx", "app_secret": "xxx"}
|
||||
|
||||
# 2. 用 code 换 user_access_token
|
||||
POST https://open.feishu.cn/open-apis/authen/v1/oidc/access_token
|
||||
Header: Authorization: Bearer {app_access_token}
|
||||
Body: {"grant_type": "authorization_code", "code": "xxx"}
|
||||
```
|
||||
|
||||
**获取私聊 chat_id**:
|
||||
|
||||
用户通常不知道 chat_id。当用户有了 user_access_token 但没有 chat_id 时,你应该**自己写 Python 脚本**来获取:
|
||||
|
||||
- **方法**:用 user_access_token 向对方的 open_id 发一条消息,返回值中会包含 chat_id
|
||||
```python
|
||||
POST https://open.feishu.cn/open-apis/im/v1/messages?receive_id_type=open_id
|
||||
Header: Authorization: Bearer {user_access_token}
|
||||
Body: {"receive_id": "{对方open_id}", "msg_type": "text", "content": "{\"text\":\"你好\"}"}
|
||||
# 返回值中的 chat_id 就是私聊会话 ID
|
||||
```
|
||||
- **注意**:`GET /im/v1/chats` 不会返回私聊会话,这是飞书 API 的限制,不是权限问题,不要尝试用这个接口找私聊
|
||||
- 如果用户不知道对方的 open_id,可以用 tenant_access_token 调通讯录 API 搜索:
|
||||
```python
|
||||
GET https://open.feishu.cn/open-apis/contact/v3/scopes
|
||||
# 返回应用可见范围内所有用户的 open_id
|
||||
```
|
||||
|
||||
**执行采集**:
|
||||
|
||||
拿到 user_access_token 和 chat_id 后:
|
||||
```bash
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
|
||||
--open-id {对方open_id} \
|
||||
--p2p-chat-id {chat_id} \
|
||||
--user-token {user_access_token} \
|
||||
--name "{name}" \
|
||||
--output-dir ./knowledge/{slug} \
|
||||
--msg-limit 1000
|
||||
```
|
||||
|
||||
**灵活性原则**:以上 API 调用不一定要用 collector 脚本,如果脚本跑不通或者场景不匹配,你可以直接写 Python 脚本调飞书 API 完成任务。核心 API 参考:
|
||||
- 获取 token:`POST /auth/v3/app_access_token/internal`、`POST /authen/v1/oidc/access_token`
|
||||
- 发消息(获取 chat_id):`POST /im/v1/messages?receive_id_type=open_id`
|
||||
- 拉消息:`GET /im/v1/messages?container_id_type=chat&container_id={chat_id}`
|
||||
- 查通讯录:`GET /contact/v3/scopes`、`GET /contact/v3/users/{user_id}`
|
||||
|
||||
自动采集内容:
|
||||
- 所有与他共同群聊中他发出的消息(过滤系统消息、表情包)
|
||||
- 群聊:所有与他共同群聊中他发出的消息(过滤系统消息、表情包)
|
||||
- 私聊:与他的私聊完整对话(含双方消息,用于理解对话语境)
|
||||
- 他创建/编辑的飞书文档和 Wiki
|
||||
- 相关多维表格(如有权限)
|
||||
|
||||
采集完成后用 `Read` 读取输出目录下的文件:
|
||||
- `knowledge/{slug}/messages.txt` → 消息记录
|
||||
- `knowledge/{slug}/messages.txt` → 消息记录(群聊 + 私聊)
|
||||
- `knowledge/{slug}/docs.txt` → 文档内容
|
||||
- `knowledge/{slug}/collection_summary.json` → 采集摘要
|
||||
|
||||
如果采集失败(权限不足 / bot 未加群),告知用户需要:
|
||||
1. 将飞书 App bot 添加到相关群聊
|
||||
2. 或改用方式 B/C
|
||||
如果采集失败,根据报错自行判断原因并尝试修复,常见问题:
|
||||
- 群聊采集:bot 未添加到群聊
|
||||
- 私聊采集:user_access_token 过期(有效期 2 小时,可用 refresh_token 刷新)
|
||||
- 权限不足:引导用户在飞书开放平台开通对应权限并重新授权
|
||||
- 或改用方式 B/C
|
||||
|
||||
---
|
||||
|
||||
@@ -521,7 +604,7 @@ First-time setup:
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --setup
|
||||
```
|
||||
|
||||
After setup, just enter the name:
|
||||
**Group chat collection** (uses tenant_access_token, bot must be in the group):
|
||||
```bash
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
|
||||
--name "{name}" \
|
||||
@@ -530,19 +613,102 @@ python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
|
||||
--doc-limit 20
|
||||
```
|
||||
|
||||
**Private chat (P2P) collection** (requires user_access_token + p2p chat_id):
|
||||
|
||||
Private messages can only be accessed via user identity (user_access_token). App identity cannot access private chats.
|
||||
|
||||
**Prerequisites**:
|
||||
|
||||
The user needs to provide:
|
||||
1. **Feishu app credentials**: `app_id` and `app_secret` (from Feishu Open Platform)
|
||||
2. **User scopes**: The app must have these user scopes enabled:
|
||||
- `im:message` — read/send messages as user
|
||||
- `im:chat` — read chat list as user
|
||||
3. **OAuth authorization code**: obtained after user completes OAuth in browser
|
||||
|
||||
If the user is missing any of these, guide them through setup. Don't assume anything is pre-configured.
|
||||
|
||||
**Getting user_access_token**:
|
||||
|
||||
Once the user provides app_id, app_secret, and confirms scopes are enabled:
|
||||
|
||||
1. Generate the OAuth URL for them:
|
||||
```
|
||||
https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
|
||||
```
|
||||
> ⚠️ The redirect_uri must be added in the app's "Security Settings → Redirect URLs"
|
||||
|
||||
2. User opens URL, logs in, authorizes
|
||||
3. Page redirects to `http://www.example.com?code=xxx`, user copies the code
|
||||
4. Exchange code for token:
|
||||
```bash
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py --exchange-code {CODE}
|
||||
```
|
||||
Or write a Python script to call the Feishu API directly:
|
||||
```python
|
||||
# 1. Get app_access_token
|
||||
POST https://open.feishu.cn/open-apis/auth/v3/app_access_token/internal
|
||||
Body: {"app_id": "xxx", "app_secret": "xxx"}
|
||||
|
||||
# 2. Exchange code for user_access_token
|
||||
POST https://open.feishu.cn/open-apis/authen/v1/oidc/access_token
|
||||
Header: Authorization: Bearer {app_access_token}
|
||||
Body: {"grant_type": "authorization_code", "code": "xxx"}
|
||||
```
|
||||
|
||||
**Getting the p2p chat_id**:
|
||||
|
||||
Users typically don't know their chat_id. When the user has a user_access_token but no chat_id, **write a Python script yourself** to obtain it:
|
||||
|
||||
- **Method**: Send a message to the other user's open_id — the response includes the chat_id
|
||||
```python
|
||||
POST https://open.feishu.cn/open-apis/im/v1/messages?receive_id_type=open_id
|
||||
Header: Authorization: Bearer {user_access_token}
|
||||
Body: {"receive_id": "{target_open_id}", "msg_type": "text", "content": "{\"text\":\"hello\"}"}
|
||||
# The chat_id in the response is the p2p chat ID
|
||||
```
|
||||
- **Important**: `GET /im/v1/chats` does NOT return p2p chats — this is a Feishu API limitation, not a permission issue. Do not try to use it for finding private chats.
|
||||
- If the user doesn't know the target's open_id, use tenant_access_token to search contacts:
|
||||
```python
|
||||
GET https://open.feishu.cn/open-apis/contact/v3/scopes
|
||||
# Returns open_ids of all users visible to the app
|
||||
```
|
||||
|
||||
**Running collection**:
|
||||
|
||||
Once you have user_access_token and chat_id:
|
||||
```bash
|
||||
python3 ${CLAUDE_SKILL_DIR}/tools/feishu_auto_collector.py \
|
||||
--open-id {target_open_id} \
|
||||
--p2p-chat-id {chat_id} \
|
||||
--user-token {user_access_token} \
|
||||
--name "{name}" \
|
||||
--output-dir ./knowledge/{slug} \
|
||||
--msg-limit 1000
|
||||
```
|
||||
|
||||
**Flexibility principle**: The above API calls don't have to go through the collector script. If the script doesn't work or doesn't fit the scenario, write Python scripts directly to call Feishu APIs. Key API reference:
|
||||
- Get token: `POST /auth/v3/app_access_token/internal`, `POST /authen/v1/oidc/access_token`
|
||||
- Send message (get chat_id): `POST /im/v1/messages?receive_id_type=open_id`
|
||||
- Fetch messages: `GET /im/v1/messages?container_id_type=chat&container_id={chat_id}`
|
||||
- Search contacts: `GET /contact/v3/scopes`, `GET /contact/v3/users/{user_id}`
|
||||
|
||||
Auto-collected content:
|
||||
- All messages sent by them in shared group chats (system messages and stickers filtered)
|
||||
- Group chats: messages sent by them (system messages and stickers filtered)
|
||||
- Private chats: full conversation with both parties (for context understanding)
|
||||
- Feishu docs and Wikis they created/edited
|
||||
- Related spreadsheets (if accessible)
|
||||
|
||||
After collection, `Read` the output files:
|
||||
- `knowledge/{slug}/messages.txt` → messages
|
||||
- `knowledge/{slug}/messages.txt` → messages (group + private)
|
||||
- `knowledge/{slug}/docs.txt` → document content
|
||||
- `knowledge/{slug}/collection_summary.json` → collection summary
|
||||
|
||||
If collection fails (insufficient permissions / bot not in chat), inform user to:
|
||||
1. Add the Feishu App bot to relevant group chats
|
||||
2. Or switch to Option B/C
|
||||
If collection fails, diagnose the error and attempt to fix it. Common issues:
|
||||
- Group chat: bot not added to the group
|
||||
- Private chat: user_access_token expired (2-hour TTL, refresh with refresh_token)
|
||||
- Insufficient permissions: guide user to enable scopes and re-authorize
|
||||
- Or switch to Option B/C
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -5,17 +5,38 @@
|
||||
输入同事姓名,自动:
|
||||
1. 搜索飞书用户,获取 user_id
|
||||
2. 找到与他共同的群聊,拉取他的消息记录
|
||||
3. 搜索他创建/编辑的文档和 Wiki
|
||||
4. 拉取文档内容
|
||||
5. 拉取多维表格(如有)
|
||||
6. 输出统一格式,直接进 create-colleague 分析流程
|
||||
3. 拉取私聊消息(需要 user_access_token)
|
||||
4. 搜索他创建/编辑的文档和 Wiki
|
||||
5. 拉取文档内容
|
||||
6. 拉取多维表格(如有)
|
||||
7. 输出统一格式,直接进 create-colleague 分析流程
|
||||
|
||||
前置:
|
||||
python3 feishu_auto_collector.py --setup # 配置 App ID / Secret(一次性)
|
||||
|
||||
私聊采集(需额外步骤):
|
||||
1. 飞书应用开通用户权限:im:message, im:chat
|
||||
2. 获取 OAuth 授权码:
|
||||
浏览器打开: https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri=http://www.example.com&scope=im:message%20im:chat
|
||||
授权后从地址栏复制 code
|
||||
3. 换取 token:
|
||||
python3 feishu_auto_collector.py --exchange-code {CODE}
|
||||
4. 采集时指定私聊 chat_id:
|
||||
python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
|
||||
|
||||
用法:
|
||||
# 群聊采集(原有方式)
|
||||
python3 feishu_auto_collector.py --name "张三" --output-dir ./knowledge/zhangsan
|
||||
python3 feishu_auto_collector.py --name "张三" --msg-limit 1000 --doc-limit 20
|
||||
|
||||
# 私聊采集
|
||||
python3 feishu_auto_collector.py --name "张三" --p2p-chat-id oc_xxx
|
||||
|
||||
# 直接指定 open_id + 私聊(跳过用户搜索)
|
||||
python3 feishu_auto_collector.py --open-id ou_xxx --p2p-chat-id oc_xxx --name "张三"
|
||||
|
||||
# 换取 user_access_token
|
||||
python3 feishu_auto_collector.py --exchange-code {CODE}
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@@ -57,11 +78,15 @@ def setup_config() -> None:
|
||||
print("=== 飞书自动采集配置 ===\n")
|
||||
print("请前往 https://open.feishu.cn 创建企业自建应用,开通以下权限:")
|
||||
print()
|
||||
print(" 消息类:")
|
||||
print(" 消息类(应用权限,用于群聊采集):")
|
||||
print(" im:message:readonly 读取消息")
|
||||
print(" im:chat:readonly 读取群聊信息")
|
||||
print(" im:chat.members:readonly 读取群成员")
|
||||
print()
|
||||
print(" 消息类(用户权限,用于私聊采集):")
|
||||
print(" im:message 以用户身份读取/发送消息")
|
||||
print(" im:chat 以用户身份读取会话列表")
|
||||
print()
|
||||
print(" 用户类:")
|
||||
print(" contact:user.base:readonly 读取用户基本信息")
|
||||
print(" contact:department.base:readonly 遍历部门查找用户(按姓名搜索必需)")
|
||||
@@ -74,11 +99,26 @@ def setup_config() -> None:
|
||||
print(" 多维表格:")
|
||||
print(" bitable:app:readonly 读取多维表格")
|
||||
print()
|
||||
print(" ─── 私聊采集说明 ───")
|
||||
print(" 私聊消息必须通过 user_access_token 获取(应用身份无权访问私聊)。")
|
||||
print(" 获取方式:OAuth 授权,授权链接格式:")
|
||||
print(" https://open.feishu.cn/open-apis/authen/v1/authorize?app_id={APP_ID}&redirect_uri={REDIRECT}&scope=im:message%20im:chat")
|
||||
print(" 授权后从回调 URL 中取 code,用 --exchange-code 换取 token。")
|
||||
print()
|
||||
|
||||
app_id = input("App ID (cli_xxx): ").strip()
|
||||
app_secret = input("App Secret: ").strip()
|
||||
|
||||
config = {"app_id": app_id, "app_secret": app_secret}
|
||||
|
||||
print("\n是否配置 user_access_token?(用于私聊消息采集,可跳过)")
|
||||
user_token = input("user_access_token (留空跳过): ").strip()
|
||||
if user_token:
|
||||
config["user_access_token"] = user_token
|
||||
p2p_chat_id = input("私聊 chat_id (留空跳过): ").strip()
|
||||
if p2p_chat_id:
|
||||
config["p2p_chat_id"] = p2p_chat_id
|
||||
|
||||
save_config(config)
|
||||
print(f"\n✅ 配置已保存到 {CONFIG_PATH}")
|
||||
|
||||
@@ -110,8 +150,11 @@ def get_tenant_token(config: dict) -> str:
|
||||
return token
|
||||
|
||||
|
||||
def api_get(path: str, params: dict, config: dict) -> dict:
|
||||
token = get_tenant_token(config)
|
||||
def api_get(path: str, params: dict, config: dict, use_user_token: bool = False) -> dict:
|
||||
if use_user_token and config.get("user_access_token"):
|
||||
token = config["user_access_token"]
|
||||
else:
|
||||
token = get_tenant_token(config)
|
||||
resp = requests.get(
|
||||
f"{BASE_URL}{path}",
|
||||
params=params,
|
||||
@@ -121,8 +164,11 @@ def api_get(path: str, params: dict, config: dict) -> dict:
|
||||
return resp.json()
|
||||
|
||||
|
||||
def api_post(path: str, body: dict, config: dict) -> dict:
|
||||
token = get_tenant_token(config)
|
||||
def api_post(path: str, body: dict, config: dict, use_user_token: bool = False) -> dict:
|
||||
if use_user_token and config.get("user_access_token"):
|
||||
token = config["user_access_token"]
|
||||
else:
|
||||
token = get_tenant_token(config)
|
||||
resp = requests.post(
|
||||
f"{BASE_URL}{path}",
|
||||
json=body,
|
||||
@@ -132,6 +178,22 @@ def api_post(path: str, body: dict, config: dict) -> dict:
|
||||
return resp.json()
|
||||
|
||||
|
||||
def exchange_code_for_token(code: str, config: dict) -> dict:
|
||||
"""用 OAuth 授权码换取 user_access_token"""
|
||||
app_token = get_tenant_token(config)
|
||||
resp = requests.post(
|
||||
f"{BASE_URL}/authen/v1/oidc/access_token",
|
||||
headers={"Authorization": f"Bearer {app_token}"},
|
||||
json={"grant_type": "authorization_code", "code": code},
|
||||
timeout=10,
|
||||
)
|
||||
data = resp.json()
|
||||
if data.get("code") != 0:
|
||||
print(f"换取 token 失败:{data}", file=sys.stderr)
|
||||
return {}
|
||||
return data.get("data", {})
|
||||
|
||||
|
||||
# ─── 用户搜索 ─────────────────────────────────────────────────────────────────
|
||||
|
||||
def _find_user_by_contact(name: str, config: dict) -> Optional[dict]:
|
||||
@@ -421,42 +483,156 @@ def fetch_messages_from_chat(
|
||||
return messages[:limit]
|
||||
|
||||
|
||||
def fetch_p2p_messages(
|
||||
chat_id: str,
|
||||
user_open_id: str,
|
||||
limit: int,
|
||||
config: dict,
|
||||
) -> list:
|
||||
"""使用 user_access_token 从私聊会话拉取消息(包含双方所有消息)"""
|
||||
messages = []
|
||||
page_token = None
|
||||
|
||||
while len(messages) < limit:
|
||||
params = {
|
||||
"container_id_type": "chat",
|
||||
"container_id": chat_id,
|
||||
"page_size": 50,
|
||||
"sort_type": "ByCreateTimeDesc",
|
||||
}
|
||||
if page_token:
|
||||
params["page_token"] = page_token
|
||||
|
||||
data = api_get("/im/v1/messages", params, config, use_user_token=True)
|
||||
if data.get("code") != 0:
|
||||
print(f" 拉取私聊消息失败(code={data.get('code')}):{data.get('msg')}", file=sys.stderr)
|
||||
break
|
||||
|
||||
items = data.get("data", {}).get("items", [])
|
||||
if not items:
|
||||
break
|
||||
|
||||
for item in items:
|
||||
sender = item.get("sender", {})
|
||||
sender_id = sender.get("id") or sender.get("open_id", "")
|
||||
|
||||
# 解析消息内容
|
||||
content_raw = item.get("body", {}).get("content", "")
|
||||
try:
|
||||
content_obj = json.loads(content_raw)
|
||||
if isinstance(content_obj, dict):
|
||||
# 纯文本消息
|
||||
if "text" in content_obj:
|
||||
content = content_obj["text"]
|
||||
else:
|
||||
# 富文本消息
|
||||
text_parts = []
|
||||
for line in content_obj.get("content", []):
|
||||
for seg in line:
|
||||
if seg.get("tag") in ("text", "a"):
|
||||
text_parts.append(seg.get("text", ""))
|
||||
content = " ".join(text_parts)
|
||||
else:
|
||||
content = str(content_obj)
|
||||
except Exception:
|
||||
content = content_raw
|
||||
|
||||
content = content.strip()
|
||||
if not content or content in ("[图片]", "[文件]", "[表情]", "[语音]"):
|
||||
continue
|
||||
|
||||
ts = item.get("create_time", "")
|
||||
if ts:
|
||||
try:
|
||||
ts = datetime.fromtimestamp(int(ts) / 1000).strftime("%Y-%m-%d %H:%M")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
is_target = (sender_id == user_open_id)
|
||||
messages.append({
|
||||
"content": content,
|
||||
"time": ts,
|
||||
"sender_id": sender_id,
|
||||
"is_target": is_target,
|
||||
})
|
||||
|
||||
if not data.get("data", {}).get("has_more"):
|
||||
break
|
||||
page_token = data.get("data", {}).get("page_token")
|
||||
|
||||
return messages[:limit]
|
||||
|
||||
|
||||
def collect_messages(
|
||||
user: dict,
|
||||
msg_limit: int,
|
||||
config: dict,
|
||||
) -> str:
|
||||
"""采集目标用户的所有消息记录"""
|
||||
"""采集目标用户的所有消息记录(群聊 + 私聊)"""
|
||||
user_open_id = user.get("open_id") or user.get("user_id", "")
|
||||
name = user.get("name", "")
|
||||
|
||||
chats = get_chats_with_user(user_open_id, config)
|
||||
if not chats:
|
||||
return f"# 消息记录\n\n未找到与 {name} 共同的群聊(请确认 bot 已被添加到相关群)\n"
|
||||
|
||||
all_messages = []
|
||||
per_chat_limit = max(100, msg_limit // len(chats))
|
||||
chat_sources = []
|
||||
|
||||
for chat in chats:
|
||||
chat_id = chat.get("chat_id")
|
||||
chat_name = chat.get("name", chat_id)
|
||||
print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr)
|
||||
# ── 私聊采集(需要 user_access_token + p2p_chat_id)──
|
||||
p2p_chat_id = config.get("p2p_chat_id", "")
|
||||
user_token = config.get("user_access_token", "")
|
||||
|
||||
msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config)
|
||||
for m in msgs:
|
||||
m["chat"] = chat_name
|
||||
all_messages.extend(msgs)
|
||||
print(f" 获取 {len(msgs)} 条", file=sys.stderr)
|
||||
if user_token and p2p_chat_id:
|
||||
print(f" 📱 采集私聊消息(chat_id: {p2p_chat_id})...", file=sys.stderr)
|
||||
p2p_msgs = fetch_p2p_messages(p2p_chat_id, user_open_id, msg_limit, config)
|
||||
for m in p2p_msgs:
|
||||
m["chat"] = "私聊"
|
||||
all_messages.extend(p2p_msgs)
|
||||
chat_sources.append(f"私聊({len(p2p_msgs)} 条)")
|
||||
print(f" 获取 {len(p2p_msgs)} 条私聊消息", file=sys.stderr)
|
||||
elif user_token and not p2p_chat_id:
|
||||
print(f" ⚠️ 有 user_access_token 但未配置 p2p_chat_id,跳过私聊采集", file=sys.stderr)
|
||||
print(f" 请在配置中添加 p2p_chat_id(通过发送消息 API 返回值获取)", file=sys.stderr)
|
||||
|
||||
# ── 群聊采集(使用 tenant_access_token)──
|
||||
remaining = msg_limit - len(all_messages)
|
||||
if remaining > 0:
|
||||
chats = get_chats_with_user(user_open_id, config)
|
||||
if chats:
|
||||
per_chat_limit = max(100, remaining // len(chats))
|
||||
for chat in chats:
|
||||
chat_id = chat.get("chat_id")
|
||||
chat_name = chat.get("name", chat_id)
|
||||
print(f" 拉取「{chat_name}」消息 ...", file=sys.stderr)
|
||||
|
||||
msgs = fetch_messages_from_chat(chat_id, user_open_id, per_chat_limit, config)
|
||||
for m in msgs:
|
||||
m["chat"] = chat_name
|
||||
all_messages.extend(msgs)
|
||||
chat_sources.append(f"{chat_name}({len(msgs)} 条)")
|
||||
print(f" 获取 {len(msgs)} 条", file=sys.stderr)
|
||||
|
||||
if not all_messages:
|
||||
tips = f"# 消息记录\n\n未找到 {name} 的消息记录。\n\n"
|
||||
tips += "可能原因:\n"
|
||||
tips += " - 群聊采集:bot 未被添加到相关群聊\n"
|
||||
tips += " - 私聊采集:未配置 user_access_token 或 p2p_chat_id\n"
|
||||
tips += "\n私聊采集配置方法:\n"
|
||||
tips += " 1. 在飞书开放平台开通 im:message 和 im:chat 用户权限\n"
|
||||
tips += " 2. 通过 OAuth 授权获取 user_access_token(--exchange-code)\n"
|
||||
tips += " 3. 配置 p2p_chat_id(私聊会话 ID)\n"
|
||||
return tips
|
||||
|
||||
# 分类输出
|
||||
long_msgs = [m for m in all_messages if len(m.get("content", "")) > 50]
|
||||
short_msgs = [m for m in all_messages if len(m.get("content", "")) <= 50]
|
||||
# 私聊消息包含双方对话,标注发言人
|
||||
target_msgs = [m for m in all_messages if m.get("is_target", True)]
|
||||
other_msgs = [m for m in all_messages if not m.get("is_target", True)]
|
||||
|
||||
long_msgs = [m for m in target_msgs if len(m.get("content", "")) > 50]
|
||||
short_msgs = [m for m in target_msgs if len(m.get("content", "")) <= 50]
|
||||
|
||||
lines = [
|
||||
f"# 飞书消息记录(自动采集)",
|
||||
f"目标:{name}",
|
||||
f"来源群聊:{', '.join(c.get('name', '') for c in chats)}",
|
||||
f"共 {len(all_messages)} 条消息",
|
||||
f"来源:{', '.join(chat_sources)}",
|
||||
f"共 {len(all_messages)} 条消息(目标用户 {len(target_msgs)} 条,对话方 {len(other_msgs)} 条)",
|
||||
"",
|
||||
"---",
|
||||
"",
|
||||
@@ -471,6 +647,16 @@ def collect_messages(
|
||||
for m in short_msgs[:300]:
|
||||
lines.append(f"[{m.get('time', '')}] {m['content']}")
|
||||
|
||||
# 私聊对话上下文(保留双方对话,便于理解语境)
|
||||
p2p_msgs = [m for m in all_messages if m.get("chat") == "私聊"]
|
||||
if p2p_msgs:
|
||||
lines += ["", "---", "", "## 私聊对话上下文(含双方消息)", ""]
|
||||
# 按时间正序
|
||||
p2p_sorted = sorted(p2p_msgs, key=lambda x: x.get("time", ""))
|
||||
for m in p2p_sorted[:500]:
|
||||
who = f"[{name}]" if m.get("is_target") else "[对方]"
|
||||
lines.append(f"[{m.get('time', '')}] {who} {m['content']}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
@@ -707,6 +893,10 @@ def main() -> None:
|
||||
parser.add_argument("--output-dir", default=None, help="输出目录(默认 ./knowledge/{name})")
|
||||
parser.add_argument("--msg-limit", type=int, default=1000, help="最多采集消息条数(默认 1000)")
|
||||
parser.add_argument("--doc-limit", type=int, default=20, help="最多采集文档篇数(默认 20)")
|
||||
parser.add_argument("--exchange-code", metavar="CODE", help="用 OAuth 授权码换取 user_access_token 并保存到配置")
|
||||
parser.add_argument("--user-token", metavar="TOKEN", help="直接指定 user_access_token(覆盖配置文件)")
|
||||
parser.add_argument("--p2p-chat-id", metavar="CHAT_ID", help="私聊会话 ID(覆盖配置文件)")
|
||||
parser.add_argument("--open-id", metavar="OPEN_ID", help="直接指定目标用户的 open_id(跳过用户搜索)")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
@@ -714,11 +904,45 @@ def main() -> None:
|
||||
setup_config()
|
||||
return
|
||||
|
||||
if not args.name:
|
||||
parser.error("请提供 --name")
|
||||
|
||||
config = load_config()
|
||||
output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name}")
|
||||
|
||||
# 换取 user_access_token
|
||||
if args.exchange_code:
|
||||
token_data = exchange_code_for_token(args.exchange_code, config)
|
||||
if token_data:
|
||||
config["user_access_token"] = token_data["access_token"]
|
||||
config["refresh_token"] = token_data.get("refresh_token", "")
|
||||
save_config(config)
|
||||
print(f"✅ user_access_token 已保存(scope: {token_data.get('scope', '')})")
|
||||
print(f" token: {token_data['access_token'][:20]}...")
|
||||
else:
|
||||
print("❌ 换取失败,请检查 code 是否有效")
|
||||
return
|
||||
|
||||
if not args.name and not args.open_id:
|
||||
parser.error("请提供 --name 或 --open-id")
|
||||
|
||||
# 命令行参数覆盖配置
|
||||
if args.user_token:
|
||||
config["user_access_token"] = args.user_token
|
||||
if args.p2p_chat_id:
|
||||
config["p2p_chat_id"] = args.p2p_chat_id
|
||||
|
||||
output_dir = Path(args.output_dir) if args.output_dir else Path(f"./knowledge/{args.name or 'target'}")
|
||||
|
||||
# 如果提供了 open_id,跳过用户搜索
|
||||
if args.open_id:
|
||||
user = {"open_id": args.open_id, "name": args.name or "target"}
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
print(f"\n🔍 使用指定 open_id: {args.open_id}\n", file=sys.stderr)
|
||||
|
||||
# 只采集消息
|
||||
print(f"📨 采集消息记录(上限 {args.msg_limit} 条)...", file=sys.stderr)
|
||||
msg_content = collect_messages(user, args.msg_limit, config)
|
||||
msg_path = output_dir / "messages.txt"
|
||||
msg_path.write_text(msg_content, encoding="utf-8")
|
||||
print(f" ✅ 消息记录 → {msg_path}", file=sys.stderr)
|
||||
return
|
||||
|
||||
collect_all(
|
||||
name=args.name,
|
||||
|
||||
Reference in New Issue
Block a user