新增浏览器工具

This commit is contained in:
violettools
2026-01-28 17:18:11 +08:00
parent 70486331b7
commit 9dc80e3eb1
38 changed files with 18750 additions and 1 deletions

145
.github/workflows/build-browser.yml vendored Normal file
View File

@@ -0,0 +1,145 @@
name: Build CFspider Smart Browser
on:
push:
tags:
- 'browser-v*'
workflow_dispatch:
inputs:
version:
description: 'Version tag (e.g., 1.0.0)'
required: false
default: ''
jobs:
build-windows:
runs-on: windows-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: cfspider-browser/package-lock.json
- name: Install dependencies
working-directory: cfspider-browser
run: npm ci
- name: Build Electron app
working-directory: cfspider-browser
run: npm run electron:build-win
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload Windows artifacts
uses: actions/upload-artifact@v4
with:
name: cfspider-browser-windows
path: cfspider-browser/release/*.exe
retention-days: 30
build-macos:
runs-on: macos-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: cfspider-browser/package-lock.json
- name: Install dependencies
working-directory: cfspider-browser
run: npm ci
- name: Build Electron app
working-directory: cfspider-browser
run: npm run electron:build-mac
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload macOS artifacts
uses: actions/upload-artifact@v4
with:
name: cfspider-browser-macos
path: |
cfspider-browser/release/*.dmg
cfspider-browser/release/*.zip
retention-days: 30
build-linux:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: cfspider-browser/package-lock.json
- name: Install dependencies
working-directory: cfspider-browser
run: npm ci
- name: Build Electron app
working-directory: cfspider-browser
run: npm run electron:build-linux
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload Linux artifacts
uses: actions/upload-artifact@v4
with:
name: cfspider-browser-linux
path: |
cfspider-browser/release/*.AppImage
cfspider-browser/release/*.deb
retention-days: 30
release:
needs: [build-windows, build-macos, build-linux]
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags/')
permissions:
contents: write
steps:
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
path: artifacts
- name: Create Release
uses: softprops/action-gh-release@v1
with:
name: CFspider Smart Browser ${{ github.ref_name }}
body: |
## CFspider 智能浏览器 ${{ github.ref_name }}
AI 驱动的智能浏览器,通过自然语言对话控制浏览器自动化。
### 下载
- **Windows**: cfspider-browser-Setup-*.exe
- **macOS**: cfspider-browser-*.dmg
- **Linux**: cfspider-browser-*.AppImage
### 功能特性
- 自然语言控制浏览器
- 支持 Ollama、OpenAI、DeepSeek 等多种 AI 模型
- 真人模拟操作,可视化虚拟鼠标
- 网站安全检测
files: |
artifacts/**/*
draft: false
prerelease: false
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

23
.gitignore vendored
View File

@@ -66,3 +66,26 @@ cfspider教程.mp4
# Remotion 视频项目
cfspider-video/
# ========================================
# 住宅代理配置文档(含敏感信息)
# ========================================
docs/xray-residential-proxy.md
# ========================================
# Tauri 版本(开发中,暂不提交)
# ========================================
cfspider-tauri/
# ========================================
# cfspider-browser Electron 项目
# ========================================
cfspider-browser/node_modules/
cfspider-browser/dist/
cfspider-browser/dist-electron/
cfspider-browser/release/
cfspider-browser/.vscode/
cfspider-browser/*.log
# Cursor 工作区
.cursor/

View File

@@ -1,4 +1,4 @@
# CFspider - Cloudflare Workers Spider
# CFspider - Cloudflare Workers Spider
[![PyPI version](https://img.shields.io/pypi/v/cfspider)](https://pypi.org/project/cfspider/)
[![Python](https://img.shields.io/pypi/pyversions/cfspider)](https://pypi.org/project/cfspider/)
@@ -2563,6 +2563,43 @@ Apache License 2.0
本项目采用 Apache 2.0 许可证。Apache 2.0 许可证已包含免责条款第7、8条请仔细阅读 [LICENSE](LICENSE) 文件。
## CFspider 智能浏览器
CFspider 项目现已包含一个 AI 驱动的智能浏览器应用cfspider-browser支持通过自然语言对话控制浏览器。
### 核心功能
- **AI 智能助手**:通过自然语言对话控制浏览器,支持多种 AI 模型
- **真人模拟操作**AI 像真人一样点击、输入、滚动,完整展示操作过程
- **虚拟鼠标**:可视化鼠标移动和点击动画
- **多标签页浏览**:支持新建、关闭、切换标签页
### AI 服务商支持
| 服务商 | 说明 |
|--------|------|
| **Ollama** | 本地运行,无需 API Key推荐 |
| OpenAI | GPT-4o, GPT-4, GPT-3.5 |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| Groq | 超快推理速度 |
| Moonshot | Kimi 大模型 |
| 智谱 AI | GLM-4 系列 |
| 通义千问 | Qwen 系列 |
| SiliconFlow | 国产模型聚合平台 |
| 自定义 | 任意 OpenAI 兼容 API |
**支持自定义模型名称**:可直接输入任意模型名称,不限于预设列表。
### 快速开始
```bash
cd cfspider-browser
npm install
npm run electron:dev
```
详细文档请查看 [cfspider-browser/README.md](cfspider-browser/README.md)
## 链接
- GitHub: https://github.com/violettoolssite/CFspider

BIN
baidu_result.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 192 KiB

BIN
bing_result.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 943 KiB

23
cfspider-browser/.gitignore vendored Normal file
View File

@@ -0,0 +1,23 @@
# Dependencies
node_modules/
# Build outputs
dist/
dist-electron/
release/
# IDE
.vscode/
.idea/
# OS
.DS_Store
Thumbs.db
# Logs
*.log
npm-debug.log*
# Environment
.env
.env.local

130
cfspider-browser/README.md Normal file
View File

@@ -0,0 +1,130 @@
# cfspider-智能浏览器
AI 驱动的智能浏览器 - 通过自然语言对话控制浏览器,像真人一样操作网页
## 功能特性
### 核心功能
- **AI 智能助手**: 通过自然语言对话控制浏览器,支持多种 AI 模型
- **真人模拟操作**: AI 像真人一样点击、输入、滚动,完整展示操作过程
- **虚拟鼠标**: 可视化鼠标移动和点击动画,直观展示 AI 操作
### 浏览器功能
- **多标签页**: 支持新建、关闭、切换标签页Ctrl+T/Ctrl+W
- **历史记录**: 自动记录访问历史,支持查看和清空
- **搜索引擎切换**: 支持 Bing、Google、百度、DuckDuckGo
- **自动点击验证**: 自动点击年龄验证、Cookie 同意等弹窗
### 快捷键
- `Ctrl+T` - 新建标签页
- `Ctrl+W` - 关闭当前标签页
- `Ctrl+R` / `F5` - 刷新页面
- `Alt+←` / `Alt+→` - 后退/前进
- `Ctrl+L` - 聚焦地址栏
- `F12` - 开发者工具
## 快速开始
### 安装依赖
```bash
npm install
```
### 开发模式
```bash
npm run electron:dev
```
### 构建应用
```bash
# Windows
npm run electron:build-win
# macOS
npm run electron:build-mac
```
## 使用方法
### 1. 配置 AI
1. 点击右上角设置按钮
2. 选择 AI 服务商或自定义 API 地址
3. 输入 API 密钥
4. 选择模型
支持的 AI 服务商:
- **Ollama** - 本地运行,无需 API Key推荐
- OpenAI (GPT-4, GPT-3.5)
- DeepSeek
- Groq
- Moonshot (Kimi)
- 智谱 AI
- 通义千问
- SiliconFlow
- 其他 OpenAI 兼容 API
**支持自定义模型名称**:在模型下拉框中可直接输入任意模型名称
### 2. 与 AI 对话
点击右下角蓝色按钮打开 AI 对话框,输入自然语言指令:
- "打开 GitHub" - AI 会通过搜索引擎搜索并点击打开
- "搜索 Python 教程" - 在当前搜索引擎搜索
- "把搜索引擎改成谷歌" - 切换默认搜索引擎
- "在 GitHub 搜索 vue" - 先打开 GitHub 再搜索
- "返回上一页" - 点击后退
### 3. 搜索引擎设置
1. 打开设置 → 搜索引擎
2. 选择默认搜索引擎
3. 设置会自动保存
## 技术栈
- **Electron** - 桌面应用框架
- **React 18** - UI 框架
- **TypeScript** - 类型安全
- **Tailwind CSS** - 样式
- **Zustand** - 状态管理
- **Vite** - 构建工具
## 项目结构
```
cfspider-browser/
├── electron/ # Electron 主进程
│ ├── main.ts # 主进程入口
│ └── preload.ts # 预加载脚本
├── src/
│ ├── components/ # React 组件
│ │ ├── Browser/ # 浏览器面板
│ │ │ ├── Browser.tsx
│ │ │ ├── TabBar.tsx # 标签栏
│ │ │ ├── Toolbar.tsx # 工具栏
│ │ │ ├── AddressBar.tsx # 地址栏
│ │ │ └── VirtualMouse.tsx # 虚拟鼠标
│ │ ├── AIChat/ # AI 对话
│ │ └── Settings/ # 设置
│ ├── services/ # 服务层
│ │ └── ai.ts # AI API 和工具
│ └── store/ # Zustand 状态管理
└── package.json
```
## 数据存储
应用数据保存在用户目录下:
- `ai-config.json` - AI 配置
- `saved-configs.json` - 已保存的 AI 配置
- `browser-settings.json` - 浏览器设置(搜索引擎等)
- `history.json` - 历史记录
## 许可证
MIT

View File

@@ -0,0 +1,56 @@
{
"$schema": "https://raw.githubusercontent.com/electron-userland/electron-builder/master/packages/app-builder-lib/scheme.json",
"appId": "com.cfspider.browser",
"productName": "cfspider-智能浏览器",
"directories": {
"output": "release",
"buildResources": "build"
},
"files": [
"dist/**/*",
"dist-electron/**/*"
],
"extraMetadata": {
"main": "dist-electron/main.js"
},
"win": {
"target": [
{
"target": "nsis",
"arch": ["x64"]
},
{
"target": "portable",
"arch": ["x64"]
}
],
"icon": "build/icon.ico",
"artifactName": "${productName}-${version}-win-${arch}.${ext}"
},
"nsis": {
"oneClick": false,
"allowToChangeInstallationDirectory": true,
"createDesktopShortcut": true,
"createStartMenuShortcut": true
},
"mac": {
"target": [
{
"target": "dmg",
"arch": ["x64", "arm64"]
}
],
"icon": "build/icon.icns",
"artifactName": "${productName}-${version}-mac-${arch}.${ext}"
},
"linux": {
"target": [
{
"target": "AppImage",
"arch": ["x64"]
}
],
"icon": "build/icons",
"artifactName": "${productName}-${version}-linux-${arch}.${ext}"
}
}

View File

@@ -0,0 +1,612 @@
import { app, BrowserWindow, ipcMain, session, Menu, webContents, dialog } from 'electron'
import { join } from 'path'
import { writeFile, mkdir } from 'fs/promises'
import { existsSync } from 'fs'
import https from 'https'
import http from 'http'
let mainWindow: BrowserWindow | null = null
let webviewContents: Electron.WebContents | null = null
function createWindow() {
// 隐藏菜单栏
Menu.setApplicationMenu(null)
mainWindow = new BrowserWindow({
width: 1400,
height: 900,
minWidth: 1000,
minHeight: 700,
title: 'cfspider-智能浏览器',
autoHideMenuBar: true,
backgroundColor: '#ffffff',
webPreferences: {
preload: join(__dirname, 'preload.js'),
nodeIntegration: false,
contextIsolation: true,
webviewTag: true
}
})
// 开发模式加载本地服务器
if (process.env.NODE_ENV === 'development' || !app.isPackaged) {
mainWindow.loadURL('http://localhost:5173')
} else {
mainWindow.loadFile(join(__dirname, '../dist/index.html'))
}
mainWindow.on('closed', () => {
mainWindow = null
})
// 注册快捷键
registerShortcuts()
}
// 注册快捷键
function registerShortcuts() {
if (!mainWindow) return
// 监听快捷键
mainWindow.webContents.on('before-input-event', (event, input) => {
// F12 - 打开/关闭 webview 的开发者工具(内嵌在底部)
if (input.key === 'F12') {
if (webviewContents && !webviewContents.isDestroyed()) {
if (webviewContents.isDevToolsOpened()) {
webviewContents.closeDevTools()
} else {
// 使用 'bottom' 模式让开发者工具显示在底部,像真实浏览器一样
webviewContents.openDevTools({ mode: 'bottom' })
}
}
event.preventDefault()
}
// Ctrl+Shift+I - 打开主窗口开发者工具(调试 Electron 应用本身)
if (input.control && input.shift && input.key.toLowerCase() === 'i') {
if (mainWindow?.webContents.isDevToolsOpened()) {
mainWindow.webContents.closeDevTools()
} else {
mainWindow?.webContents.openDevTools({ mode: 'right' })
}
event.preventDefault()
}
// F5 或 Ctrl+R - 刷新 webview
if (input.key === 'F5' || (input.control && input.key.toLowerCase() === 'r')) {
mainWindow?.webContents.send('reload-webview')
event.preventDefault()
}
// Alt+Left - 后退
if (input.alt && input.key === 'ArrowLeft') {
mainWindow?.webContents.send('navigate-back')
event.preventDefault()
}
// Alt+Right - 前进
if (input.alt && input.key === 'ArrowRight') {
mainWindow?.webContents.send('navigate-forward')
event.preventDefault()
}
// Ctrl+L - 聚焦地址栏
if (input.control && input.key.toLowerCase() === 'l') {
mainWindow?.webContents.send('focus-addressbar')
event.preventDefault()
}
// Ctrl+T - 新建标签页
if (input.control && input.key.toLowerCase() === 't') {
mainWindow?.webContents.send('new-tab')
event.preventDefault()
}
// Ctrl+W - 关闭当前标签页
if (input.control && input.key.toLowerCase() === 'w') {
mainWindow?.webContents.send('close-tab')
event.preventDefault()
}
})
}
app.whenReady().then(() => {
// 配置 webview 的独立 sessionpersist: 前缀确保数据持久化到磁盘)
const webviewSession = session.fromPartition('persist:cfspider')
// 设置真实的 User-Agent
const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
webviewSession.setUserAgent(userAgent)
// 移除 X-Frame-Options 和 CSP 限制,允许在 webview 中加载任何网站
webviewSession.webRequest.onHeadersReceived((details, callback) => {
const headers = { ...details.responseHeaders }
// 移除阻止嵌入的响应头
delete headers['x-frame-options']
delete headers['X-Frame-Options']
delete headers['content-security-policy']
delete headers['Content-Security-Policy']
delete headers['content-security-policy-report-only']
delete headers['Content-Security-Policy-Report-Only']
callback({ responseHeaders: headers })
})
// 允许所有权限请求
webviewSession.setPermissionRequestHandler((_webContents, _permission, callback) => {
callback(true)
})
// 处理 webview 中的新窗口请求
app.on('web-contents-created', (_event, contents) => {
// 处理 webview 类型的 webContents
if (contents.getType() === 'webview') {
// 保存 webview 的 webContents 引用
webviewContents = contents
// 拦截新窗口请求,在当前 webview 中打开
contents.setWindowOpenHandler(({ url }) => {
// 不允许打开新窗口,改为在当前页面导航
if (url && !url.startsWith('javascript:')) {
contents.loadURL(url)
}
return { action: 'deny' }
})
// 当 webview 被销毁时清除引用
contents.on('destroyed', () => {
if (webviewContents === contents) {
webviewContents = null
}
})
}
})
createWindow()
app.on('activate', () => {
if (BrowserWindow.getAllWindows().length === 0) {
createWindow()
}
})
})
app.on('window-all-closed', () => {
if (process.platform !== 'darwin') {
app.quit()
}
})
// IPC 处理AI API 调用(非流式,用于工具调用)
ipcMain.handle('ai:chat', async (_event, { endpoint, apiKey, model, messages, tools }) => {
try {
// 验证 endpoint
if (!endpoint || typeof endpoint !== 'string') {
throw new Error('请先配置 API 地址')
}
// Local/LAN services (Ollama etc.) do not require API Key
const isLocalEndpoint = (url: string) => {
return url.includes('localhost') ||
url.includes('127.0.0.1') ||
url.includes('192.168.') ||
url.includes('10.') ||
/172\.(1[6-9]|2[0-9]|3[01])\./.test(url) ||
url.includes(':11434') // Ollama default port
}
if (!isLocalEndpoint(endpoint) && (!apiKey || typeof apiKey !== 'string')) {
throw new Error('请先配置 API Key')
}
// 添加超时控制
const controller = new AbortController()
const timeout = setTimeout(() => controller.abort(), 60000) // 60秒超时
// 构建请求头
const headers: Record<string, string> = {
'Content-Type': 'application/json'
}
if (apiKey) {
headers['Authorization'] = `Bearer ${apiKey}`
}
try {
const response = await fetch(endpoint, {
method: 'POST',
headers,
body: JSON.stringify({
model,
messages,
tools,
stream: false
}),
signal: controller.signal
})
clearTimeout(timeout)
if (!response.ok) {
const errorText = await response.text().catch(() => '')
throw new Error(`API 错误 ${response.status}: ${errorText.slice(0, 100) || response.statusText}`)
}
return await response.json()
} catch (fetchError) {
clearTimeout(timeout)
if (fetchError instanceof Error && fetchError.name === 'AbortError') {
throw new Error('请求超时,请检查网络连接')
}
throw fetchError
}
} catch (error) {
console.error('AI API error:', error)
const message = error instanceof Error ? error.message : '未知错误'
// 友好的错误信息
if (message.includes('fetch failed') || message.includes('ECONNREFUSED') || message.includes('ENOTFOUND')) {
throw new Error('网络连接失败,请检查:\n1. 网络是否正常\n2. API 地址是否正确\n3. 是否需要代理')
}
throw new Error(message)
}
})
// IPC 处理AI API 流式调用
ipcMain.on('ai:chat-stream', async (event, { requestId, endpoint, apiKey, model, messages }) => {
try {
// Local/LAN services do not require API Key
const isLocalEndpoint = (url: string) => {
return url?.includes('localhost') ||
url?.includes('127.0.0.1') ||
url?.includes('192.168.') ||
url?.includes('10.') ||
/172\.(1[6-9]|2[0-9]|3[01])\./.test(url || '') ||
url?.includes(':11434') // Ollama default port
}
if (!endpoint || (!isLocalEndpoint(endpoint) && !apiKey)) {
event.sender.send('ai:chat-stream-error', { requestId, error: '请先配置 API 地址和 Key' })
return
}
// 添加超时控制
const controller = new AbortController()
const timeout = setTimeout(() => controller.abort(), 60000)
// 构建请求头
const headers: Record<string, string> = {
'Content-Type': 'application/json'
}
if (apiKey) {
headers['Authorization'] = `Bearer ${apiKey}`
}
let response: Response
try {
response = await fetch(endpoint, {
method: 'POST',
headers,
body: JSON.stringify({
model,
messages,
stream: true
}),
signal: controller.signal
})
clearTimeout(timeout)
} catch (fetchError) {
clearTimeout(timeout)
const msg = fetchError instanceof Error && fetchError.name === 'AbortError'
? '请求超时'
: '网络连接失败,请检查网络和 API 配置'
event.sender.send('ai:chat-stream-error', { requestId, error: msg })
return
}
if (!response.ok) {
const errorText = await response.text().catch(() => '')
event.sender.send('ai:chat-stream-error', { requestId, error: `API 错误 ${response.status}: ${errorText.slice(0, 100) || response.statusText}` })
return
}
const reader = response.body?.getReader()
if (!reader) {
event.sender.send('ai:chat-stream-error', { requestId, error: 'No response body' })
return
}
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() || ''
for (const line of lines) {
const trimmed = line.trim()
if (!trimmed || trimmed === 'data: [DONE]') continue
if (!trimmed.startsWith('data: ')) continue
try {
const json = JSON.parse(trimmed.slice(6))
const content = json.choices?.[0]?.delta?.content
if (content) {
event.sender.send('ai:chat-stream-data', { requestId, content })
}
} catch (e) {
// 忽略解析错误
}
}
}
event.sender.send('ai:chat-stream-end', { requestId })
} catch (error) {
console.error('AI stream error:', error)
event.sender.send('ai:chat-stream-error', {
requestId,
error: error instanceof Error ? error.message : 'Unknown error'
})
}
})
// IPC 处理:保存文件(支持用户自定义路径)
ipcMain.handle('file:save', async (_event, { filename, content, type, isBase64 }) => {
const fs = await import('fs/promises')
// 根据类型设置过滤器
let filters: Electron.FileFilter[]
switch (type) {
case 'json':
filters = [{ name: 'JSON 文件', extensions: ['json'] }]
break
case 'csv':
filters = [{ name: 'CSV 文件', extensions: ['csv'] }]
break
case 'excel':
filters = [{ name: 'Excel 文件', extensions: ['xlsx'] }]
break
case 'txt':
filters = [{ name: '文本文件', extensions: ['txt'] }]
break
default:
filters = [{ name: '所有文件', extensions: ['*'] }]
}
// 显示保存对话框让用户选择路径
const result = await dialog.showSaveDialog(mainWindow!, {
title: '保存文件',
defaultPath: filename,
filters,
properties: ['showOverwriteConfirmation']
})
if (!result.canceled && result.filePath) {
try {
// 处理 base64 编码的内容(用于 Excel
if (isBase64) {
const buffer = Buffer.from(content, 'base64')
await fs.writeFile(result.filePath, buffer)
} else {
await fs.writeFile(result.filePath, content, 'utf-8')
}
return { success: true, filePath: result.filePath }
} catch (error) {
return { success: false, error: `保存失败: ${error}` }
}
}
return { success: false, canceled: true }
})
// IPC 处理:读取保存的规则
ipcMain.handle('rules:load', async () => {
const fs = await import('fs/promises')
const rulesPath = join(app.getPath('userData'), 'rules.json')
try {
const content = await fs.readFile(rulesPath, 'utf-8')
return JSON.parse(content)
} catch {
return []
}
})
// IPC 处理:保存规则
ipcMain.handle('rules:save', async (_event, rules) => {
const fs = await import('fs/promises')
const rulesPath = join(app.getPath('userData'), 'rules.json')
await fs.writeFile(rulesPath, JSON.stringify(rules, null, 2))
return true
})
// IPC 处理:读取 AI 配置
ipcMain.handle('config:load', async () => {
const fs = await import('fs/promises')
const configPath = join(app.getPath('userData'), 'ai-config.json')
try {
const content = await fs.readFile(configPath, 'utf-8')
return JSON.parse(content)
} catch {
return {
endpoint: 'https://api.openai.com/v1/chat/completions',
apiKey: '',
model: 'gpt-4'
}
}
})
// IPC 处理:保存 AI 配置
ipcMain.handle('config:save', async (_event, config) => {
const fs = await import('fs/promises')
const configPath = join(app.getPath('userData'), 'ai-config.json')
await fs.writeFile(configPath, JSON.stringify(config, null, 2))
return true
})
// IPC 处理:读取已保存的配置列表
ipcMain.handle('saved-configs:load', async () => {
const fs = await import('fs/promises')
const configsPath = join(app.getPath('userData'), 'saved-configs.json')
try {
const content = await fs.readFile(configsPath, 'utf-8')
return JSON.parse(content)
} catch {
return []
}
})
// IPC 处理:保存配置列表
ipcMain.handle('saved-configs:save', async (_event, configs) => {
const fs = await import('fs/promises')
const configsPath = join(app.getPath('userData'), 'saved-configs.json')
await fs.writeFile(configsPath, JSON.stringify(configs, null, 2))
return true
})
// IPC 处理:读取浏览器设置
ipcMain.handle('browser-settings:load', async () => {
const fs = await import('fs/promises')
const settingsPath = join(app.getPath('userData'), 'browser-settings.json')
try {
const content = await fs.readFile(settingsPath, 'utf-8')
return JSON.parse(content)
} catch {
return {
searchEngine: 'bing',
homepage: 'https://www.bing.com',
defaultZoom: 100
}
}
})
// IPC 处理:保存浏览器设置
ipcMain.handle('browser-settings:save', async (_event, settings) => {
const fs = await import('fs/promises')
const settingsPath = join(app.getPath('userData'), 'browser-settings.json')
await fs.writeFile(settingsPath, JSON.stringify(settings, null, 2))
return true
})
// IPC 处理:读取历史记录
ipcMain.handle('history:load', async () => {
const fs = await import('fs/promises')
const historyPath = join(app.getPath('userData'), 'history.json')
try {
const content = await fs.readFile(historyPath, 'utf-8')
return JSON.parse(content)
} catch {
return []
}
})
// IPC 处理:保存历史记录
ipcMain.handle('history:save', async (_event, history) => {
const fs = await import('fs/promises')
const historyPath = join(app.getPath('userData'), 'history.json')
await fs.writeFile(historyPath, JSON.stringify(history, null, 2))
return true
})
// IPC 处理:下载图片
ipcMain.handle('download:image', async (_event, url: string, filename: string) => {
try {
// 创建下载目录
const downloadsPath = join(app.getPath('downloads'), 'cfspider-images')
if (!existsSync(downloadsPath)) {
await mkdir(downloadsPath, { recursive: true })
}
// 从 URL 获取扩展名
const urlObj = new URL(url)
let ext = '.jpg'
const pathExt = urlObj.pathname.split('.').pop()?.toLowerCase()
if (pathExt && ['jpg', 'jpeg', 'png', 'gif', 'webp', 'bmp', 'svg'].includes(pathExt)) {
ext = `.${pathExt}`
}
// 清理文件名
const cleanFilename = filename.replace(/[<>:"/\\|?*]/g, '_')
const fullFilename = `${cleanFilename}${ext}`
const filePath = join(downloadsPath, fullFilename)
// 下载图片
const protocol = url.startsWith('https') ? https : http
return new Promise((resolve) => {
const request = protocol.get(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'image/*,*/*;q=0.8',
'Referer': urlObj.origin
}
}, async (response) => {
// 处理重定向
if (response.statusCode === 301 || response.statusCode === 302) {
const redirectUrl = response.headers.location
if (redirectUrl) {
// 递归处理重定向
const result = await ipcMain.emit('download:image', _event, redirectUrl, filename)
resolve(result)
return
}
}
if (response.statusCode !== 200) {
resolve({ success: false, error: `HTTP ${response.statusCode}` })
return
}
const chunks: Buffer[] = []
response.on('data', (chunk) => chunks.push(chunk))
response.on('end', async () => {
try {
const buffer = Buffer.concat(chunks)
await writeFile(filePath, buffer)
resolve({
success: true,
filename: fullFilename,
path: filePath
})
} catch (writeError) {
resolve({ success: false, error: `写入失败: ${writeError}` })
}
})
response.on('error', (err) => {
resolve({ success: false, error: `下载失败: ${err.message}` })
})
})
request.on('error', (err) => {
resolve({ success: false, error: `请求失败: ${err.message}` })
})
request.setTimeout(30000, () => {
request.destroy()
resolve({ success: false, error: '下载超时' })
})
})
} catch (error) {
return { success: false, error: `下载失败: ${error}` }
}
})
// IPC 处理:打开下载文件夹
ipcMain.handle('download:openFolder', async () => {
const { shell } = await import('electron')
const downloadsPath = join(app.getPath('downloads'), 'cfspider-images')
if (!existsSync(downloadsPath)) {
await mkdir(downloadsPath, { recursive: true })
}
shell.openPath(downloadsPath)
return true
})

View File

@@ -0,0 +1,144 @@
import { contextBridge, ipcRenderer } from 'electron'
// 暴露安全的 API 给渲染进程
contextBridge.exposeInMainWorld('electronAPI', {
// AI 相关(非流式)
aiChat: (params: {
endpoint: string
apiKey: string
model: string
messages: Array<{ role: string; content: string }>
tools?: Array<object>
}) => ipcRenderer.invoke('ai:chat', params),
// AI 流式调用
aiChatStream: (params: {
requestId: string
endpoint: string
apiKey: string
model: string
messages: Array<{ role: string; content: string }>
}) => ipcRenderer.send('ai:chat-stream', params),
// 监听流式数据
onStreamData: (callback: (data: { requestId: string; content: string }) => void) => {
ipcRenderer.on('ai:chat-stream-data', (_event, data) => callback(data))
},
onStreamEnd: (callback: (data: { requestId: string }) => void) => {
ipcRenderer.on('ai:chat-stream-end', (_event, data) => callback(data))
},
onStreamError: (callback: (data: { requestId: string; error: string }) => void) => {
ipcRenderer.on('ai:chat-stream-error', (_event, data) => callback(data))
},
removeStreamListeners: () => {
ipcRenderer.removeAllListeners('ai:chat-stream-data')
ipcRenderer.removeAllListeners('ai:chat-stream-end')
ipcRenderer.removeAllListeners('ai:chat-stream-error')
},
// 文件操作(支持用户自定义保存路径)
saveFile: (params: {
filename: string;
content: string;
type: string;
isBase64?: boolean;
}) => ipcRenderer.invoke('file:save', params),
// 规则管理
loadRules: () => ipcRenderer.invoke('rules:load'),
saveRules: (rules: object[]) => ipcRenderer.invoke('rules:save', rules),
// 配置管理
loadConfig: () => ipcRenderer.invoke('config:load'),
saveConfig: (config: object) => ipcRenderer.invoke('config:save', config),
// 已保存的配置管理
loadSavedConfigs: () => ipcRenderer.invoke('saved-configs:load'),
saveSavedConfigs: (configs: object[]) => ipcRenderer.invoke('saved-configs:save', configs),
// 浏览器设置
loadBrowserSettings: () => ipcRenderer.invoke('browser-settings:load'),
saveBrowserSettings: (settings: object) => ipcRenderer.invoke('browser-settings:save', settings),
// 历史记录
loadHistory: () => ipcRenderer.invoke('history:load'),
saveHistory: (history: object[]) => ipcRenderer.invoke('history:save', history),
// 下载功能
downloadImage: (url: string, filename: string) => ipcRenderer.invoke('download:image', url, filename),
openDownloadFolder: () => ipcRenderer.invoke('download:openFolder'),
// 快捷键事件
onToggleDevtools: (callback: () => void) => {
ipcRenderer.on('toggle-devtools', () => callback())
},
onReloadWebview: (callback: () => void) => {
ipcRenderer.on('reload-webview', () => callback())
},
onNavigateBack: (callback: () => void) => {
ipcRenderer.on('navigate-back', () => callback())
},
onNavigateForward: (callback: () => void) => {
ipcRenderer.on('navigate-forward', () => callback())
},
onFocusAddressbar: (callback: () => void) => {
ipcRenderer.on('focus-addressbar', () => callback())
},
onNewTab: (callback: () => void) => {
ipcRenderer.on('new-tab', () => callback())
},
onCloseTab: (callback: () => void) => {
ipcRenderer.on('close-tab', () => callback())
}
})
// 类型声明
declare global {
interface Window {
electronAPI: {
aiChat: (params: {
endpoint: string
apiKey: string
model: string
messages: Array<{ role: string; content: string }>
tools?: Array<object>
}) => Promise<object>
aiChatStream: (params: {
requestId: string
endpoint: string
apiKey: string
model: string
messages: Array<{ role: string; content: string }>
}) => void
onStreamData: (callback: (data: { requestId: string; content: string }) => void) => void
onStreamEnd: (callback: (data: { requestId: string }) => void) => void
onStreamError: (callback: (data: { requestId: string; error: string }) => void) => void
removeStreamListeners: () => void
saveFile: (params: {
filename: string;
content: string;
type: string;
isBase64?: boolean;
}) => Promise<{ success: boolean; filePath?: string; error?: string; canceled?: boolean }>
loadRules: () => Promise<object[]>
saveRules: (rules: object[]) => Promise<boolean>
loadConfig: () => Promise<{ endpoint: string; apiKey: string; model: string }>
saveConfig: (config: object) => Promise<boolean>
loadSavedConfigs: () => Promise<object[]>
saveSavedConfigs: (configs: object[]) => Promise<boolean>
loadBrowserSettings: () => Promise<object>
saveBrowserSettings: (settings: object) => Promise<boolean>
loadHistory: () => Promise<object[]>
saveHistory: (history: object[]) => Promise<boolean>
downloadImage: (url: string, filename: string) => Promise<{ success: boolean; filename?: string; path?: string; error?: string }>
openDownloadFolder: () => Promise<boolean>
onToggleDevtools: (callback: () => void) => void
onReloadWebview: (callback: () => void) => void
onNavigateBack: (callback: () => void) => void
onNavigateForward: (callback: () => void) => void
onFocusAddressbar: (callback: () => void) => void
onNewTab: (callback: () => void) => void
onCloseTab: (callback: () => void) => void
}
}
}

8780
cfspider-browser/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,76 @@
{
"name": "cfspider-browser",
"version": "1.0.0",
"description": "cfspider-智能浏览器 - 可视化爬虫AI驱动点击即可爬取",
"main": "dist-electron/main.js",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview",
"electron:dev": "npm run electron:build-main && concurrently \"vite --port 5173 --strictPort\" \"wait-on http://localhost:5173 && cross-env NODE_ENV=development electron .\"",
"electron:build-main": "esbuild electron/main.ts --bundle --platform=node --target=node18 --outfile=dist-electron/main.js --external:electron && esbuild electron/preload.ts --bundle --platform=node --target=node18 --outfile=dist-electron/preload.js --external:electron",
"electron:build": "npm run build && npm run electron:build-main && electron-builder",
"electron:build-win": "npm run build && npm run electron:build-main && electron-builder --win",
"electron:build-mac": "npm run build && npm run electron:build-main && electron-builder --mac",
"electron:build-linux": "npm run build && npm run electron:build-main && electron-builder --linux"
},
"dependencies": {
"@types/react-syntax-highlighter": "^15.5.13",
"file-saver": "^2.0.5",
"lucide-react": "^0.294.0",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"react-markdown": "^10.1.0",
"react-syntax-highlighter": "^16.1.0",
"xlsx": "^0.18.5",
"zustand": "^4.4.7"
},
"devDependencies": {
"@types/file-saver": "^2.0.7",
"@types/react": "^18.2.43",
"@types/react-dom": "^18.2.17",
"@vitejs/plugin-react": "^4.2.1",
"autoprefixer": "^10.4.16",
"concurrently": "^8.2.2",
"cross-env": "^7.0.3",
"electron": "^28.0.0",
"electron-builder": "^24.9.1",
"esbuild": "^0.19.10",
"postcss": "^8.4.32",
"tailwindcss": "^3.3.6",
"typescript": "^5.3.3",
"vite": "^5.0.8",
"wait-on": "^7.2.0"
},
"build": {
"appId": "com.cfspider.browser",
"productName": "cfspider-智能浏览器",
"directories": {
"output": "release"
},
"files": [
"dist/**/*",
"dist-electron/**/*"
],
"win": {
"target": [
"nsis"
],
"icon": "public/icon.ico"
},
"mac": {
"target": [
"dmg"
],
"icon": "public/icon.icns"
},
"linux": {
"target": [
"AppImage",
"deb"
],
"icon": "public/icon.png",
"category": "Network"
}
}
}

View File

@@ -0,0 +1,9 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
<rect width="512" height="512" rx="100" fill="#0f0f23"/>
<circle cx="256" cy="200" r="80" fill="none" stroke="#00ff88" stroke-width="20"/>
<path d="M180 280 L140 380 M332 280 L372 380 M256 280 L256 400" stroke="#00ff88" stroke-width="16" stroke-linecap="round"/>
<circle cx="140" cy="400" r="20" fill="#00ff88"/>
<circle cx="372" cy="400" r="20" fill="#00ff88"/>
<circle cx="256" cy="420" r="20" fill="#00ff88"/>
<text x="256" y="480" text-anchor="middle" fill="#00ff88" font-family="Arial" font-size="60" font-weight="bold">CFSpider</text>
</svg>

After

Width:  |  Height:  |  Size: 625 B

View File

@@ -0,0 +1,184 @@
import { useState, useEffect } from 'react'
import { MessageCircle, X, History, Trash2, Plus, ChevronDown } from 'lucide-react'
import Browser from './components/Browser/Browser'
import AIChat from './components/AIChat/AIChat'
import SettingsModal from './components/Settings/SettingsModal'
import { useStore } from './store'
// 从模型名称获取 AI 助手名称
function getAIName(model: string): string {
if (!model) return 'AI 助手'
const lowerModel = model.toLowerCase()
if (lowerModel.includes('gpt-4')) return 'GPT-4'
if (lowerModel.includes('gpt-3')) return 'GPT-3.5'
if (lowerModel.includes('claude')) return 'Claude'
if (lowerModel.includes('gemini')) return 'Gemini'
if (lowerModel.includes('deepseek')) return 'DeepSeek'
if (lowerModel.includes('qwen')) return '通义千问'
if (lowerModel.includes('glm')) return 'ChatGLM'
if (lowerModel.includes('llama')) return 'LLaMA'
if (lowerModel.includes('mistral')) return 'Mistral'
// 显示模型名称的前部分
return model.split('/').pop()?.split('-')[0] || 'AI 助手'
}
function App() {
const [showSettings, setShowSettings] = useState(false)
const [showAI, setShowAI] = useState(false)
const [showHistory, setShowHistory] = useState(false)
const [isReady, setIsReady] = useState(false)
const [unreadCount, setUnreadCount] = useState(0)
const [lastMessageCount, setLastMessageCount] = useState(0)
const {
loadConfig, loadSavedConfigs, loadBrowserSettings, loadChatSessions,
messages, aiConfig, chatSessions, clearMessages, newChatSession,
switchChatSession, deleteChatSession
} = useStore()
const aiName = getAIName(aiConfig.model)
useEffect(() => {
// 并行加载所有配置
Promise.all([
loadConfig(),
loadSavedConfigs(),
loadBrowserSettings(),
loadChatSessions()
]).then(() => setIsReady(true))
}, [])
// 监听新消息,更新未读计数
useEffect(() => {
if (messages.length > lastMessageCount) {
// 有新消息
if (!showAI) {
// 如果对话窗口没有打开,增加未读计数
setUnreadCount(prev => prev + (messages.length - lastMessageCount))
}
}
setLastMessageCount(messages.length)
}, [messages.length, showAI, lastMessageCount])
// 打开对话窗口时清除未读计数
const handleOpenChat = () => {
setShowAI(true)
setUnreadCount(0)
}
// 等待设置加载完成(简化加载界面)
if (!isReady) {
return (
<div className="h-screen bg-white flex items-center justify-center">
<div className="text-gray-400 text-sm">...</div>
</div>
)
}
return (
<div className="h-screen bg-white">
{/* 浏览器 - 占满整个窗口 */}
<Browser onSettingsClick={() => setShowSettings(true)} />
{/* AI 悬浮按钮 */}
{!showAI && (
<button
onClick={handleOpenChat}
className="fixed bottom-6 right-6 w-14 h-14 bg-blue-500 hover:bg-blue-600 text-white rounded-full shadow-lg flex items-center justify-center z-50"
>
<MessageCircle size={24} />
{unreadCount > 0 && (
<span className="absolute -top-1 -right-1 w-5 h-5 bg-red-500 rounded-full text-xs flex items-center justify-center animate-pulse">
{unreadCount > 9 ? '9+' : unreadCount}
</span>
)}
</button>
)}
{/* AI 对话悬浮窗 */}
{showAI && (
<div className="fixed bottom-6 right-6 w-[420px] h-[600px] bg-white rounded-2xl shadow-2xl border border-gray-200 flex flex-col overflow-hidden z-50">
{/* 悬浮窗头部 */}
<div className="flex items-center justify-between px-4 py-3 bg-blue-500 text-white">
<div className="flex items-center gap-2">
<span className="font-medium">{aiName}</span>
{/* 历史记录下拉 */}
<div className="relative">
<button
onClick={() => setShowHistory(!showHistory)}
className="p-1 hover:bg-white/20 rounded flex items-center gap-1 text-sm"
title="历史记录"
>
<History size={16} />
<ChevronDown size={14} />
</button>
{showHistory && (
<div className="absolute top-full left-0 mt-1 w-64 bg-white rounded-lg shadow-xl border border-gray-200 py-1 z-50 max-h-80 overflow-auto">
<div className="px-3 py-2 border-b border-gray-100 flex items-center justify-between">
<span className="text-xs text-gray-500"></span>
<button
onClick={() => { newChatSession(); setShowHistory(false); }}
className="text-xs text-blue-500 hover:text-blue-600 flex items-center gap-1"
>
<Plus size={12} />
</button>
</div>
{chatSessions.length === 0 ? (
<div className="px-3 py-4 text-center text-gray-400 text-xs"></div>
) : (
chatSessions.map(session => (
<div
key={session.id}
className="px-3 py-2 hover:bg-gray-50 cursor-pointer flex items-center justify-between group"
onClick={() => { switchChatSession(session.id); setShowHistory(false); }}
>
<div className="flex-1 min-w-0">
<div className="text-sm text-gray-700 truncate">{session.title}</div>
<div className="text-xs text-gray-400">{new Date(session.updatedAt).toLocaleDateString()}</div>
</div>
<button
onClick={(e) => { e.stopPropagation(); deleteChatSession(session.id); }}
className="p-1 text-gray-400 hover:text-red-500 opacity-0 group-hover:opacity-100"
>
<Trash2 size={14} />
</button>
</div>
))
)}
</div>
)}
</div>
</div>
<div className="flex items-center gap-1">
{/* 清空对话按钮 */}
<button
onClick={clearMessages}
className="p-1 hover:bg-white/20 rounded"
title="清空对话"
>
<Trash2 size={16} />
</button>
{/* 关闭按钮 */}
<button
onClick={() => setShowAI(false)}
className="p-1 hover:bg-white/20 rounded"
title="关闭"
>
<X size={18} />
</button>
</div>
</div>
{/* 对话内容 */}
<div className="flex-1 overflow-hidden">
<AIChat />
</div>
</div>
)}
{/* 设置 */}
{showSettings && <SettingsModal onClose={() => setShowSettings(false)} />}
</div>
)
}
export default App

View File

@@ -0,0 +1,74 @@
import { useRef, useEffect, useState } from 'react'
import { Shield } from 'lucide-react'
import MessageList from './MessageList'
import InputBox from './InputBox'
import { useStore } from '../../store'
import { sendAIMessage, manualSafetyCheck } from '../../services/ai'
export default function AIChat() {
const { messages, isAILoading } = useStore()
const scrollRef = useRef<HTMLDivElement>(null)
const [checkingStatus, setCheckingStatus] = useState<string | null>(null)
// 自动滚动到底部
useEffect(() => {
if (scrollRef.current) {
scrollRef.current.scrollTop = scrollRef.current.scrollHeight
}
}, [messages])
const handleSend = async (content: string) => {
await sendAIMessage(content)
}
const handleSafetyCheck = async () => {
setCheckingStatus('checking')
const result = await manualSafetyCheck()
setCheckingStatus(result.includes('WARNING') ? 'warning' : 'safe')
setTimeout(() => setCheckingStatus(null), 3000)
}
return (
<div className="flex flex-col h-full bg-white">
{/* 快捷操作栏 */}
<div className="flex items-center gap-2 px-3 py-2 border-b border-gray-100 bg-gray-50">
<button
onClick={handleSafetyCheck}
disabled={checkingStatus === 'checking'}
className={`flex items-center gap-1.5 px-3 py-1.5 text-xs font-medium rounded-md transition-all ${
checkingStatus === 'checking'
? 'bg-gray-200 text-gray-500 cursor-wait'
: checkingStatus === 'safe'
? 'bg-green-100 text-green-700'
: checkingStatus === 'warning'
? 'bg-red-100 text-red-700'
: 'bg-blue-100 text-blue-700 hover:bg-blue-200'
}`}
title="Check current website safety"
>
<Shield size={14} />
{checkingStatus === 'checking' ? 'Checking...' :
checkingStatus === 'safe' ? 'Safe ✓' :
checkingStatus === 'warning' ? 'Warning!' :
'Safety Check'}
</button>
<span className="text-xs text-gray-400">Click to check current page</span>
</div>
{/* 消息列表 */}
<div ref={scrollRef} className="flex-1 overflow-auto">
{messages.length === 0 ? (
<div className="flex flex-col items-center justify-center h-full text-gray-400 text-sm px-6">
<p></p>
<p className="text-xs mt-2 text-gray-300">例如: 搜索京东</p>
</div>
) : (
<MessageList messages={messages} />
)}
</div>
{/* 输入框 */}
<InputBox onSend={handleSend} disabled={isAILoading} />
</div>
)
}

View File

@@ -0,0 +1,73 @@
import { useState, KeyboardEvent, useRef, useEffect } from 'react'
import { Send, Loader2, Square } from 'lucide-react'
import { useStore } from '../../store'
interface InputBoxProps {
onSend: (content: string) => void
disabled?: boolean
}
export default function InputBox({ onSend, disabled }: InputBoxProps) {
const [input, setInput] = useState('')
const textareaRef = useRef<HTMLTextAreaElement>(null)
const { stopAI } = useStore()
// 自动调整高度
useEffect(() => {
if (textareaRef.current) {
textareaRef.current.style.height = 'auto'
const scrollHeight = textareaRef.current.scrollHeight
// 最大高度 120px (约5行)
textareaRef.current.style.height = Math.min(scrollHeight, 120) + 'px'
}
}, [input])
const handleSend = () => {
if (!input.trim() || disabled) return
onSend(input.trim())
setInput('')
}
const handleKeyDown = (e: KeyboardEvent<HTMLTextAreaElement>) => {
// Enter 发送Shift+Enter 换行
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault()
handleSend()
}
}
return (
<div className="p-3 border-t border-gray-100">
<div className="flex items-end gap-2">
<textarea
ref={textareaRef}
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={handleKeyDown}
disabled={disabled}
placeholder="输入指令... (Enter发送, Shift+Enter换行)"
rows={1}
className="flex-1 text-sm bg-gray-100 text-gray-800 border-0 rounded-xl px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500 placeholder-gray-400 resize-none overflow-y-auto"
style={{ minHeight: '40px', maxHeight: '120px' }}
/>
{disabled ? (
<button
onClick={stopAI}
className="p-2 bg-red-500 text-white rounded-full hover:bg-red-600 flex-shrink-0 animate-pulse"
title="Stop AI"
>
<Square size={16} fill="white" />
</button>
) : (
<button
onClick={handleSend}
disabled={!input.trim()}
className="p-2 bg-blue-500 text-white rounded-full hover:bg-blue-600 disabled:opacity-30 disabled:cursor-not-allowed flex-shrink-0"
>
<Send size={16} />
</button>
)}
</div>
</div>
)
}

View File

@@ -0,0 +1,651 @@
import { useState } from 'react'
import { Loader2, Check, FileJson, FileSpreadsheet, FileText, MousePointer2, Wand2 } from 'lucide-react'
import ReactMarkdown from 'react-markdown'
import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter'
import { oneDark } from 'react-syntax-highlighter/dist/esm/styles/prism'
import { Message, useStore } from '../../store'
import { toExcel, toJSON } from '../../services/extractor'
import { sendAIMessage } from '../../services/ai'
interface MessageListProps {
messages: Message[]
}
// 爬取结果卡片组件
function CrawlResultCard({ content }: { content: string }) {
const store = useStore()
// 解析爬取结果
const lines = content.split('\n').filter(l => l.trim())
// 提取数据项(带原始序号)
const items: { index: number; value: string }[] = []
for (let i = 0; i < lines.length; i++) {
const line = lines[i]
if (line.startsWith('提示:')) break
// 匹配多种格式:
// 1. "**数字. 标题**" (full 模式)
// 2. "数字. 内容" (普通模式)
const fullMatch = line.match(/^\*\*(\d+)\.\s+(.+?)\*\*$/)
const simpleMatch = line.match(/^(\d+)\.\s+(.+)$/)
if (fullMatch) {
items.push({
index: parseInt(fullMatch[1]),
value: fullMatch[2]
})
} else if (simpleMatch) {
items.push({
index: parseInt(simpleMatch[1]),
value: simpleMatch[2]
})
}
}
// 使用实际解析到的数量,而不是从标题解析
const count = items.length
// 导出功能(用户选择保存路径)
const handleExport = async (format: 'json' | 'csv' | 'excel' | 'txt') => {
const { extractedData } = store
if (!extractedData || extractedData.length === 0) return
const flatData = extractedData.flatMap(d => d.values)
const timestamp = Date.now()
let exportContent = ''
let filename = `crawl_data_${timestamp}`
switch (format) {
case 'json':
exportContent = toJSON(extractedData)
filename += '.json'
break
case 'csv':
exportContent = 'index,value\n' + flatData.map((v, i) => `${i + 1},"${v.replace(/"/g, '""')}"`).join('\n')
filename += '.csv'
break
case 'txt':
exportContent = flatData.join('\n')
filename += '.txt'
break
case 'excel': {
filename += '.xlsx'
// 使用 xlsx 库生成 Excel
const blob = await toExcel(extractedData)
if (blob && window.electronAPI?.saveFile) {
const arrayBuffer = await blob.arrayBuffer()
const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)))
await window.electronAPI.saveFile({
content: base64,
filename,
type: 'excel',
isBase64: true
})
}
return
}
}
// 保存文件(用户选择路径)
if (window.electronAPI?.saveFile) {
await window.electronAPI.saveFile({
content: exportContent,
filename,
type: format
})
} else {
// 浏览器环境回退
const blob = new Blob([exportContent], { type: 'text/plain' })
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
a.download = filename
a.click()
URL.revokeObjectURL(url)
}
}
return (
<div className="bg-yellow-50 border border-yellow-200 rounded-lg p-3 mt-2">
<div className="flex items-center justify-between mb-2">
<div className="text-yellow-800 font-medium flex items-center gap-2">
<div className="w-2 h-2 bg-yellow-500 rounded-full animate-pulse" />
{count}
</div>
</div>
{/* 数据列表 */}
<div className="space-y-1 max-h-48 overflow-y-auto">
{items.slice(0, 10).map((item, i) => (
<div key={i} className="bg-white px-2 py-1.5 rounded text-sm flex items-start gap-2 border border-yellow-100">
<span className="text-yellow-600 font-mono text-xs bg-yellow-100 px-1.5 py-0.5 rounded flex-shrink-0">
{item.index}
</span>
<span className="font-mono text-gray-700 break-all text-xs leading-relaxed">
{item.value.length > 150 ? item.value.slice(0, 150) + '...' : item.value}
</span>
</div>
))}
{items.length > 10 && (
<div className="text-center text-xs text-gray-500 py-1">
... {items.length - 10}
</div>
)}
</div>
{/* 导出按钮 */}
<div className="mt-3 pt-2 border-t border-yellow-200 flex flex-wrap gap-2">
<button
onClick={() => handleExport('json')}
className="flex items-center gap-1 px-2 py-1 text-xs bg-blue-500 text-white rounded hover:bg-blue-600 transition-colors"
>
<FileJson size={12} />
JSON
</button>
<button
onClick={() => handleExport('csv')}
className="flex items-center gap-1 px-2 py-1 text-xs bg-green-500 text-white rounded hover:bg-green-600 transition-colors"
>
<FileSpreadsheet size={12} />
CSV
</button>
<button
onClick={() => handleExport('excel')}
className="flex items-center gap-1 px-2 py-1 text-xs bg-emerald-500 text-white rounded hover:bg-emerald-600 transition-colors"
>
<FileSpreadsheet size={12} />
Excel
</button>
<button
onClick={() => handleExport('txt')}
className="flex items-center gap-1 px-2 py-1 text-xs bg-gray-500 text-white rounded hover:bg-gray-600 transition-colors"
>
<FileText size={12} />
TXT
</button>
</div>
</div>
)
}
// 元素选择卡片组件
function ElementSelectionCard({ data }: { data: { id: string; purpose: string; suggestedSelector: string } }) {
const store = useStore()
const [selected, setSelected] = useState<'auto' | 'manual' | null>(null)
const [isManualMode, setIsManualMode] = useState(false)
const [isProcessing, setIsProcessing] = useState(false)
const handleAutoSelect = async (e: React.MouseEvent) => {
e.preventDefault()
e.stopPropagation()
if (isProcessing || selected) return
setIsProcessing(true)
setSelected('auto')
// 确保手动选择模式关闭
store.setSelectMode(false)
store.setElementSelectionRequest(null)
// 继续让 AI 使用建议的选择器进行爬取
await sendAIMessage(`使用选择器 "${data.suggestedSelector}" 进行爬取`)
setIsProcessing(false)
}
const handleManualSelect = (e: React.MouseEvent) => {
e.preventDefault()
e.stopPropagation()
if (isProcessing || selected) return
setSelected('manual')
setIsManualMode(true)
// 清空之前的选择
const state = useStore.getState()
if (state.clearSelectedElements) {
state.clearSelectedElements()
}
store.setSelectMode(true)
}
const handleConfirmManual = async (e: React.MouseEvent) => {
e.preventDefault()
e.stopPropagation()
const { selectedElements } = useStore.getState()
if (selectedElements.length > 0) {
store.setSelectMode(false)
setIsManualMode(false)
store.setElementSelectionRequest(null)
// 构建选择器信息
const titleSelectors = selectedElements.filter(e => e.role === 'title').map(e => e.selector)
const contentSelectors = selectedElements.filter(e => e.role === 'content').map(e => e.selector)
const linkSelectors = selectedElements.filter(e => e.role === 'link').map(e => e.selector)
const autoSelectors = selectedElements.filter(e => e.role === 'auto' || !e.role).map(e => e.selector)
let message = '用户已选择以下元素:\n'
if (titleSelectors.length > 0) message += `标题选择器: ${titleSelectors.join(', ')}\n`
if (contentSelectors.length > 0) message += `内容选择器: ${contentSelectors.join(', ')}\n`
if (linkSelectors.length > 0) message += `链接选择器: ${linkSelectors.join(', ')}\n`
if (autoSelectors.length > 0) message += `其他选择器: ${autoSelectors.join(', ')}\n`
message += '请使用这些选择器进行爬取'
await sendAIMessage(message)
} else {
alert('请先在页面上右键点击选择元素')
}
}
if (selected === 'auto') {
return (
<div className="bg-green-50 border border-green-200 rounded-lg p-3 mt-2">
<div className="flex items-center gap-2 text-green-700">
<Check size={16} />
<span>...</span>
</div>
</div>
)
}
if (isManualMode) {
const selectedElements = useStore.getState().selectedElements
const titleCount = selectedElements.filter(e => e.role === 'title').length
const contentCount = selectedElements.filter(e => e.role === 'content').length
const linkCount = selectedElements.filter(e => e.role === 'link').length
const totalCount = selectedElements.length
return (
<div className="bg-blue-50 border border-blue-200 rounded-lg p-3 mt-2">
<div className="text-blue-800 font-medium mb-2 flex items-center gap-2">
<MousePointer2 size={16} />
-
</div>
<p className="text-sm text-blue-600 mb-2">
</p>
{/* 已选择的元素统计 */}
<div className="flex gap-3 mb-3 text-xs">
<span className="flex items-center gap-1">
<span className="w-2 h-2 rounded-full bg-amber-500"></span>
: {titleCount}
</span>
<span className="flex items-center gap-1">
<span className="w-2 h-2 rounded-full bg-emerald-500"></span>
: {contentCount}
</span>
<span className="flex items-center gap-1">
<span className="w-2 h-2 rounded-full bg-pink-500"></span>
: {linkCount}
</span>
</div>
{/* 已选择的元素列表 */}
{totalCount > 0 && (
<div className="mb-3 max-h-32 overflow-y-auto">
{selectedElements.map((el, idx) => (
<div key={el.id} className="flex items-center gap-2 text-xs py-1 border-b border-blue-100 last:border-0">
<span className={`w-2 h-2 rounded-full ${
el.role === 'title' ? 'bg-amber-500' :
el.role === 'content' ? 'bg-emerald-500' :
el.role === 'link' ? 'bg-violet-500' : 'bg-gray-400'
}`}></span>
<span className="text-blue-700 truncate flex-1" title={el.selector}>
{el.selector}
</span>
<button
onClick={() => store.removeSelectedElement?.(el.id)}
className="text-red-400 hover:text-red-600 px-1"
>
x
</button>
</div>
))}
</div>
)}
<div className="flex gap-2">
<button
onClick={handleConfirmManual}
disabled={totalCount === 0}
className={`px-4 py-2 rounded-lg transition-colors text-sm font-medium ${
totalCount > 0
? 'bg-blue-500 text-white hover:bg-blue-600'
: 'bg-gray-300 text-gray-500 cursor-not-allowed'
}`}
>
({totalCount})
</button>
<button
onClick={() => {
setIsManualMode(false)
setSelected(null)
store.setSelectMode(false)
useStore.getState().clearSelectedElements?.()
}}
className="px-4 py-2 bg-gray-200 text-gray-700 rounded-lg hover:bg-gray-300 transition-colors text-sm"
>
</button>
</div>
</div>
)
}
return (
<div className="bg-purple-50 border border-purple-200 rounded-lg p-3 mt-2">
<div className="text-purple-800 font-medium mb-2">
{data.purpose}
</div>
<p className="text-sm text-purple-600 mb-3">
</p>
<div className="flex gap-2">
<button
onClick={handleAutoSelect}
className="flex-1 flex items-center justify-center gap-2 px-4 py-3 bg-gradient-to-r from-purple-500 to-blue-500 text-white rounded-lg hover:from-purple-600 hover:to-blue-600 transition-all text-sm font-medium shadow-md hover:shadow-lg"
>
<Wand2 size={18} />
<span></span>
</button>
<button
onClick={handleManualSelect}
className="flex-1 flex items-center justify-center gap-2 px-4 py-3 bg-white border-2 border-purple-300 text-purple-700 rounded-lg hover:bg-purple-50 hover:border-purple-400 transition-all text-sm font-medium"
>
<MousePointer2 size={18} />
<span></span>
</button>
</div>
<p className="text-xs text-purple-400 mt-2 text-center">
使 AI ·
</p>
</div>
)
}
// 获取工具调用的友好描述
function getToolDescription(toolName: string, args: Record<string, unknown>): string {
switch (toolName) {
case 'navigate_to': {
const url = (args.url as string) || ''
// 简化 URL 显示
try {
const urlObj = new URL(url.startsWith('http') ? url : 'https://' + url)
return `跳转: ${urlObj.hostname}${urlObj.pathname.length > 20 ? urlObj.pathname.slice(0, 20) + '...' : urlObj.pathname}`
} catch {
return `跳转: ${url.slice(0, 30)}${url.length > 30 ? '...' : ''}`
}
}
case 'click_element': {
const selector = (args.selector as string) || ''
// 识别常见的搜索按钮
if (selector.includes('search') || selector.includes('submit') || selector === '#su' || selector === '#search_icon') {
return '点击搜索按钮'
}
if (selector.includes('button') || selector.includes('btn')) {
return '点击按钮'
}
if (selector.includes('input') || selector.includes('#kw') || selector.includes('#q')) {
return '点击输入框'
}
return '点击元素'
}
case 'click_text': {
const text = (args.text as string) || ''
return `点击: ${text.slice(0, 20)}${text.length > 20 ? '...' : ''}`
}
case 'input_text': {
const text = (args.text as string) || ''
return `输入: ${text.slice(0, 15)}${text.length > 15 ? '...' : ''}`
}
case 'scroll_page': {
const dirs: Record<string, string> = { up: '上滚', down: '下滚', top: '顶部', bottom: '底部' }
return dirs[args.direction as string] || '滚动'
}
case 'wait':
return `等待 ${((args.ms as number) || 1000) / 1000}s`
case 'extract_elements':
return '提取内容'
case 'get_page_info':
return '获取页面信息'
case 'go_back':
return '返回上一页'
case 'go_forward':
return '前进到下一页'
case 'get_images':
return '获取图片列表'
case 'get_main_image':
return '获取主图URL'
case 'download_image': {
const filename = (args.filename as string) || ''
return `下载: ${filename}`
}
case 'add_selector':
return '添加选择器'
case 'set_search_engine': {
const engineNames: Record<string, string> = {
'bing': 'Bing',
'google': 'Google',
'baidu': '百度',
'duckduckgo': 'DuckDuckGo'
}
const engine = args.engine as string
return `设置搜索引擎: ${engineNames[engine] || engine}`
}
case 'get_settings':
return '获取设置'
case 'crawl_elements': {
const selector = (args.selector as string) || ''
const type = (args.type as string) || 'text'
const typeNames: Record<string, string> = { text: '文本', link: '链接', image: '图片', attribute: '属性' }
return `爬取${typeNames[type] || type}: ${selector.slice(0, 20)}${selector.length > 20 ? '...' : ''}`
}
case 'export_data': {
const format = (args.format as string) || ''
return `导出 ${format.toUpperCase()}`
}
case 'clear_highlight':
return '清除高亮'
case 'request_element_selection': {
const purpose = (args.purpose as string) || ''
return `请求元素选择: ${purpose}`
}
case 'click_search_button':
return '点击搜索按钮'
case 'press_enter':
return '按下回车键'
case 'analyze_page':
return '分析页面'
case 'scan_interactive_elements':
return '扫描交互元素'
case 'get_page_content':
return '获取页面内容'
case 'find_element': {
const desc = (args.description as string) || ''
return `查找元素: ${desc.slice(0, 15)}${desc.length > 15 ? '...' : ''}`
}
case 'check_element_exists': {
const selector = (args.selector as string) || ''
return `检查元素: ${selector.slice(0, 15)}${selector.length > 15 ? '...' : ''}`
}
case 'verify_action':
return '验证操作结果'
case 'retry_with_alternative':
return '尝试其他方法'
case 'check_website_safety':
return '检查网站安全'
default:
return toolName
}
}
export default function MessageList({ messages }: MessageListProps) {
return (
<div className="p-3 space-y-3">
{messages.map((message) => {
const isThinking = message.content === 'thinking'
const hasToolsExecuting = message.toolCalls?.some(t => t.result === '执行中...')
const hasTools = message.toolCalls && message.toolCalls.length > 0
const hasContent = message.content && message.content !== 'thinking'
return (
<div key={message.id}>
{/* 用户消息 */}
{message.role === 'user' && (
<div className="flex justify-end">
<div className="bg-blue-500 text-white px-3 py-2 rounded-2xl rounded-br-md text-sm max-w-[85%] break-words whitespace-pre-wrap overflow-hidden">
{message.content}
</div>
</div>
)}
{/* AI 消息 */}
{message.role === 'assistant' && (
<div className="text-sm space-y-2">
{/* 思考中 */}
{isThinking && (
<div className="flex items-center gap-2 text-gray-400">
<Loader2 size={12} className="animate-spin" />
<span>...</span>
</div>
)}
{/* 工具调用 - 每个工具单独一行显示,带 AI 评论 */}
{hasTools && (
<div className="space-y-1 bg-gray-50 rounded-lg p-2">
{message.toolCalls!.map((tool, index) => {
const isExecuting = tool.result === '执行中...'
const isSuccess = tool.result && tool.result !== '执行中...' && !tool.result.startsWith('错误') && !tool.result.startsWith('未找到')
const description = getToolDescription(tool.name, tool.arguments as Record<string, unknown>)
const comment = (tool as any).comment as string | undefined
return (
<div key={index} className="space-y-0.5">
{/* AI comment before this tool call */}
{comment && (
<div className="text-xs text-blue-600 italic pl-3 py-0.5 border-l-2 border-blue-300 bg-blue-50/50 rounded-r">
{comment}
</div>
)}
{/* Tool execution status */}
<div className="flex items-center gap-1.5 text-xs py-0.5">
{isExecuting ? (
<Loader2 size={10} className="text-blue-500 animate-spin flex-shrink-0" />
) : isSuccess ? (
<Check size={10} className="text-green-500 flex-shrink-0" />
) : (
<span className="text-gray-400 flex-shrink-0 w-2.5"></span>
)}
<span className={isSuccess ? 'text-gray-700' : 'text-gray-400'}>
{description}
</span>
</div>
</div>
)
})}
</div>
)}
{/* 元素选择卡片 */}
{message.toolCalls?.some(t => t.result?.includes('__ELEMENT_SELECTION_REQUEST__')) && (() => {
const tool = message.toolCalls?.find(t => t.result?.includes('__ELEMENT_SELECTION_REQUEST__'))
if (tool?.result) {
const match = tool.result.match(/__ELEMENT_SELECTION_REQUEST__(.+?)__END__/)
if (match) {
try {
const data = JSON.parse(match[1])
return <ElementSelectionCard data={data} />
} catch {
return null
}
}
}
return null
})()}
{/* 爬取结果卡片 */}
{hasContent && message.content?.includes('【爬取结果】') && (
<CrawlResultCard content={message.content} />
)}
{/* Final AI message - normal Markdown rendering */}
{hasContent && !message.content?.includes('【爬取结果】') && !message.content?.includes('__ELEMENT_SELECTION_REQUEST__') && (
<div className="text-gray-700 leading-relaxed mt-2 break-words overflow-hidden text-sm">
<ReactMarkdown
components={{
p: ({ children }) => <p className="mb-2 last:mb-0 whitespace-pre-wrap break-words">{children}</p>,
ul: ({ children }) => <ul className="list-disc pl-4 mb-2 space-y-1">{children}</ul>,
ol: ({ children }) => <ol className="list-decimal pl-4 mb-2 space-y-1">{children}</ol>,
li: ({ children }) => <li className="break-words">{children}</li>,
code: ({ children, className, ...props }) => {
const match = /language-(\w+)/.exec(className || '')
const codeString = String(children).replace(/\n$/, '')
if (match) {
return (
<div className="my-2 rounded-lg overflow-hidden">
<div className="bg-gray-800 px-3 py-1 text-xs text-gray-400 border-b border-gray-700">
{match[1]}
</div>
<SyntaxHighlighter
style={oneDark}
language={match[1]}
PreTag="div"
customStyle={{
margin: 0,
padding: '12px',
fontSize: '12px',
borderRadius: '0 0 8px 8px',
}}
{...props}
>
{codeString}
</SyntaxHighlighter>
</div>
)
}
return (
<code className="bg-gray-100 text-red-600 px-1.5 py-0.5 rounded text-xs font-mono border border-gray-200">
{children}
</code>
)
},
pre: ({ children }) => <>{children}</>,
a: ({ href, children }) => (
<a href={href} className="text-blue-500 hover:underline break-all" target="_blank" rel="noopener noreferrer">
{children}
</a>
),
h1: ({ children }) => <h1 className="text-base font-bold mb-2 mt-3">{children}</h1>,
h2: ({ children }) => <h2 className="text-sm font-bold mb-2 mt-2">{children}</h2>,
h3: ({ children }) => <h3 className="text-sm font-semibold mb-1 mt-2">{children}</h3>,
strong: ({ children }) => <strong className="font-semibold">{children}</strong>,
blockquote: ({ children }) => (
<blockquote className="border-l-3 border-blue-400 pl-3 py-1 text-gray-600 my-2 bg-blue-50 rounded-r">{children}</blockquote>
),
}}
>
{message.content}
</ReactMarkdown>
</div>
)}
{/* 处理中 */}
{!hasContent && !isThinking && hasTools && !hasToolsExecuting && (
<div className="flex items-center gap-2 text-gray-400">
<Loader2 size={12} className="animate-spin" />
<span>...</span>
</div>
)}
</div>
)}
</div>
)
})}
</div>
)
}

View File

@@ -0,0 +1,43 @@
import { useState, useEffect, KeyboardEvent } from 'react'
import { Globe, Lock } from 'lucide-react'
interface AddressBarProps {
url: string
onNavigate: (url: string) => void
}
export default function AddressBar({ url, onNavigate }: AddressBarProps) {
const [inputValue, setInputValue] = useState(url)
useEffect(() => {
setInputValue(url)
}, [url])
const handleKeyDown = (e: KeyboardEvent<HTMLInputElement>) => {
if (e.key === 'Enter') {
onNavigate(inputValue)
}
}
const isSecure = url.startsWith('https://')
return (
<div className="flex items-center gap-2 px-3 py-2 bg-gray-100">
<div className="flex items-center flex-1 gap-2 px-4 py-2 bg-white rounded-full border border-gray-200 hover:shadow-sm transition-shadow">
{isSecure ? (
<Lock size={14} className="text-green-600 flex-shrink-0" />
) : (
<Globe size={14} className="text-gray-400 flex-shrink-0" />
)}
<input
type="text"
value={inputValue}
onChange={(e) => setInputValue(e.target.value)}
onKeyDown={handleKeyDown}
className="flex-1 bg-transparent border-none outline-none text-sm text-gray-800"
placeholder="搜索或输入网址"
/>
</div>
</div>
)
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,60 @@
import { Plus, X } from 'lucide-react'
import { useStore } from '../../store'
export default function TabBar() {
const { tabs, activeTabId, addTab, closeTab, setActiveTab } = useStore()
return (
<div className="flex items-center bg-gray-100 border-b border-gray-200 h-9 px-1">
{/* 标签页列表 */}
<div className="flex items-center flex-1 overflow-x-auto scrollbar-hide">
{tabs.map((tab) => (
<div
key={tab.id}
onClick={() => setActiveTab(tab.id)}
className={`
group flex items-center gap-1 px-3 py-1.5 min-w-[120px] max-w-[200px]
cursor-pointer rounded-t-lg
${tab.id === activeTabId
? 'bg-white border-t border-l border-r border-gray-200'
: 'hover:bg-gray-200'
}
`}
>
{/* 加载指示器 */}
{tab.isLoading && (
<div className="w-3 h-3 border-2 border-blue-500 border-t-transparent rounded-full animate-spin" />
)}
{/* 标题 */}
<span className="flex-1 text-xs truncate text-gray-700">
{tab.title || '新标签页'}
</span>
{/* 关闭按钮 */}
{tabs.length > 1 && (
<button
onClick={(e) => {
e.stopPropagation()
closeTab(tab.id)
}}
className="p-0.5 rounded hover:bg-gray-300 opacity-0 group-hover:opacity-100"
>
<X size={12} className="text-gray-500" />
</button>
)}
</div>
))}
</div>
{/* 新建标签页按钮 */}
<button
onClick={() => addTab()}
className="p-1.5 rounded hover:bg-gray-200 ml-1"
title="新建标签页 (Ctrl+T)"
>
<Plus size={16} className="text-gray-600" />
</button>
</div>
)
}

View File

@@ -0,0 +1,76 @@
import {
ArrowLeft,
ArrowRight,
RotateCw,
X,
Settings
} from 'lucide-react'
import { useStore } from '../../store'
interface ToolbarProps {
onBack: () => void
onForward: () => void
onReload: () => void
onStop: () => void
onSettingsClick: () => void
}
export default function Toolbar({
onBack,
onForward,
onReload,
onStop,
onSettingsClick
}: ToolbarProps) {
const { isLoading } = useStore()
return (
<div className="flex items-center gap-1 px-2 py-1 bg-gray-100 border-b border-gray-200">
{/* 导航按钮 */}
<button
onClick={onBack}
className="p-2 hover:bg-gray-200 rounded-full text-gray-600 hover:text-gray-900"
title="后退"
>
<ArrowLeft size={18} />
</button>
<button
onClick={onForward}
className="p-2 hover:bg-gray-200 rounded-full text-gray-600 hover:text-gray-900"
title="前进"
>
<ArrowRight size={18} />
</button>
{isLoading ? (
<button
onClick={onStop}
className="p-2 hover:bg-gray-200 rounded-full text-gray-600 hover:text-gray-900"
title="停止"
>
<X size={18} />
</button>
) : (
<button
onClick={onReload}
className="p-2 hover:bg-gray-200 rounded-full text-gray-600 hover:text-gray-900"
title="刷新"
>
<RotateCw size={18} />
</button>
)}
<div className="flex-1" />
{/* 设置按钮 */}
<button
onClick={onSettingsClick}
className="p-2 hover:bg-gray-200 rounded-full text-gray-600 hover:text-gray-900"
title="AI 设置"
>
<Settings size={18} />
</button>
</div>
)
}

View File

@@ -0,0 +1,105 @@
import { useEffect, useState, useRef } from 'react'
import { useStore } from '../../store'
export default function VirtualMouse() {
const { mouseState } = useStore()
const [position, setPosition] = useState({ x: -100, y: -100 })
const [isClicking, setIsClicking] = useState(false)
const animationRef = useRef<number>()
// 平滑移动到目标位置
useEffect(() => {
if (!mouseState.visible) return
const targetX = mouseState.x
const targetY = mouseState.y
const startX = position.x < 0 ? targetX : position.x
const startY = position.y < 0 ? targetY : position.y
const startTime = Date.now()
const duration = mouseState.duration || 300
const animate = () => {
const elapsed = Date.now() - startTime
const progress = Math.min(elapsed / duration, 1)
// 使用 easeOutCubic 缓动函数,使移动更自然
const eased = 1 - Math.pow(1 - progress, 3)
const currentX = startX + (targetX - startX) * eased
const currentY = startY + (targetY - startY) * eased
setPosition({ x: currentX, y: currentY })
if (progress < 1) {
animationRef.current = requestAnimationFrame(animate)
}
}
if (animationRef.current) {
cancelAnimationFrame(animationRef.current)
}
animationRef.current = requestAnimationFrame(animate)
return () => {
if (animationRef.current) {
cancelAnimationFrame(animationRef.current)
}
}
}, [mouseState.x, mouseState.y, mouseState.visible, mouseState.duration])
// 点击动画
useEffect(() => {
if (mouseState.clicking) {
setIsClicking(true)
const timer = setTimeout(() => setIsClicking(false), 150)
return () => clearTimeout(timer)
}
}, [mouseState.clicking, mouseState.clickId])
if (!mouseState.visible) return null
// 鼠标尖端在 SVG 中的偏移path 从 5.5, 3.21 开始)
const tipOffsetX = 5.5
const tipOffsetY = 3.21
return (
<div
className="pointer-events-none fixed z-[99999] transition-opacity duration-200"
style={{
left: position.x - tipOffsetX,
top: position.y - tipOffsetY,
opacity: mouseState.visible ? 1 : 0,
}}
>
{/* 鼠标光标 SVG */}
<svg
width="24"
height="24"
viewBox="0 0 24 24"
className={`drop-shadow-lg transition-transform duration-100 ${
isClicking ? 'scale-90' : 'scale-100'
}`}
style={{
filter: 'drop-shadow(0 2px 4px rgba(0,0,0,0.3))',
}}
>
{/* 鼠标主体 */}
<path
d="M5.5 3.21V20.8c0 .45.54.67.85.35l4.86-4.86a.5.5 0 0 1 .35-.15h6.87c.48 0 .72-.58.38-.92L6.35 2.76a.5.5 0 0 0-.85.45Z"
fill={isClicking ? '#00cc66' : '#00ff88'}
stroke="#000"
strokeWidth="1.5"
strokeLinecap="round"
strokeLinejoin="round"
/>
</svg>
{/* 点击涟漪效果 */}
{isClicking && (
<div className="absolute left-0 top-0">
<div className="w-6 h-6 rounded-full bg-primary/50 animate-ping" />
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,346 @@
import { useState } from 'react'
import { Trash2, Play, X, Type, Link, Image, Code, Save, FileJson, FileSpreadsheet, FileText } from 'lucide-react'
import { useStore, SelectedElement } from '../../store'
import { saveCurrentAsRule } from '../../services/rules'
import { toTXT, toExcel, toJSON } from '../../services/extractor'
export default function DataPanel() {
const {
selectedElements,
removeSelectedElement,
clearSelectedElements,
updateElementType,
extractedData,
setExtractedData
} = useStore()
const [activeTab, setActiveTab] = useState<'elements' | 'data'>('elements')
const [showSaveDialog, setShowSaveDialog] = useState(false)
const [ruleName, setRuleName] = useState('')
// 保存规则
const handleSaveRule = () => {
if (!ruleName.trim()) return
saveCurrentAsRule(ruleName.trim())
setRuleName('')
setShowSaveDialog(false)
}
// 提取数据
const handleExtract = async () => {
// 获取 webview 并执行提取
const webview = document.querySelector('webview') as Electron.WebviewTag
if (!webview || selectedElements.length === 0) return
try {
const result = await webview.executeJavaScript(`
window.__cfspiderExtract(${JSON.stringify(selectedElements)})
`)
setExtractedData(result)
setActiveTab('data')
} catch (error) {
console.error('Extract error:', error)
}
}
// 导出数据
const handleExport = async (format: 'json' | 'csv' | 'excel' | 'txt') => {
if (extractedData.length === 0) return
const timestamp = Date.now()
let content: string | Blob
let filename: string
let mimeType: string
switch (format) {
case 'json':
content = toJSON(extractedData)
filename = `cfspider-data-${timestamp}.json`
mimeType = 'application/json'
break
case 'csv': {
// 转换为 CSV
const rows: string[][] = []
const headers = extractedData.map(d => d.selector)
rows.push(headers)
const maxLength = Math.max(...extractedData.map(d => d.values.length))
for (let i = 0; i < maxLength; i++) {
rows.push(extractedData.map(d => d.values[i] || ''))
}
content = rows.map(row => row.map(cell => `"${cell.replace(/"/g, '""')}"`).join(',')).join('\n')
filename = `cfspider-data-${timestamp}.csv`
mimeType = 'text/csv'
break
}
case 'txt':
content = toTXT(extractedData)
filename = `cfspider-data-${timestamp}.txt`
mimeType = 'text/plain'
break
case 'excel': {
const blob = await toExcel(extractedData)
if (!blob) {
console.error('Failed to generate Excel file')
return
}
content = blob
filename = `cfspider-data-${timestamp}.xlsx`
mimeType = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
break
}
}
// 保存文件(用户选择路径)
if (window.electronAPI) {
let result
if (content instanceof Blob) {
// Excel blob 需要特殊处理
const arrayBuffer = await content.arrayBuffer()
const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)))
result = await window.electronAPI.saveFile({
filename,
content: base64,
type: 'excel',
isBase64: true
})
} else {
result = await window.electronAPI.saveFile({ filename, content, type: format })
}
// 显示保存结果
if (result?.success) {
console.log(`文件已保存到: ${result.filePath}`)
} else if (result?.error) {
console.error(result.error)
}
// 如果用户取消,不做任何处理
} else {
// 浏览器环境下载
const blob = content instanceof Blob ? content : new Blob([content], { type: mimeType })
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
a.download = filename
a.click()
URL.revokeObjectURL(url)
}
}
const getTypeIcon = (type: SelectedElement['type']) => {
switch (type) {
case 'text': return <Type size={14} />
case 'link': return <Link size={14} />
case 'image': return <Image size={14} />
case 'attribute': return <Code size={14} />
}
}
return (
<div className="flex flex-col h-full bg-dark-100">
{/* 标签栏 */}
<div className="flex items-center justify-between px-3 py-2 border-b border-gray-700">
<div className="flex gap-2">
<button
onClick={() => setActiveTab('elements')}
className={`px-3 py-1 text-sm rounded-lg ${
activeTab === 'elements'
? 'bg-primary text-black'
: 'text-gray-400 hover:text-white'
}`}
>
({selectedElements.length})
</button>
<button
onClick={() => setActiveTab('data')}
className={`px-3 py-1 text-sm rounded-lg ${
activeTab === 'data'
? 'bg-primary text-black'
: 'text-gray-400 hover:text-white'
}`}
>
({extractedData.reduce((a, d) => a + d.values.length, 0)})
</button>
</div>
<div className="flex gap-2">
{activeTab === 'elements' ? (
<>
<button
onClick={handleExtract}
disabled={selectedElements.length === 0}
className="flex items-center gap-1 px-3 py-1 text-sm bg-primary text-black rounded-lg hover:bg-green-400 disabled:opacity-50"
>
<Play size={14} />
</button>
<button
onClick={() => setShowSaveDialog(true)}
disabled={selectedElements.length === 0}
className="flex items-center gap-1 px-3 py-1 text-sm bg-blue-500/20 text-blue-400 rounded-lg hover:bg-blue-500/30 disabled:opacity-50"
>
<Save size={14} />
</button>
<button
onClick={clearSelectedElements}
disabled={selectedElements.length === 0}
className="flex items-center gap-1 px-3 py-1 text-sm bg-red-500/20 text-red-400 rounded-lg hover:bg-red-500/30 disabled:opacity-50"
>
<Trash2 size={14} />
</button>
</>
) : (
<div className="flex flex-wrap gap-1">
<button
onClick={() => handleExport('json')}
disabled={extractedData.length === 0}
className="flex items-center gap-1 px-2 py-1 text-xs bg-blue-500 text-white rounded hover:bg-blue-600 disabled:opacity-50"
title="导出为 JSON"
>
<FileJson size={12} />
JSON
</button>
<button
onClick={() => handleExport('csv')}
disabled={extractedData.length === 0}
className="flex items-center gap-1 px-2 py-1 text-xs bg-green-500 text-white rounded hover:bg-green-600 disabled:opacity-50"
title="导出为 CSV"
>
<FileSpreadsheet size={12} />
CSV
</button>
<button
onClick={() => handleExport('excel')}
disabled={extractedData.length === 0}
className="flex items-center gap-1 px-2 py-1 text-xs bg-emerald-500 text-white rounded hover:bg-emerald-600 disabled:opacity-50"
title="导出为 Excel"
>
<FileSpreadsheet size={12} />
Excel
</button>
<button
onClick={() => handleExport('txt')}
disabled={extractedData.length === 0}
className="flex items-center gap-1 px-2 py-1 text-xs bg-gray-500 text-white rounded hover:bg-gray-600 disabled:opacity-50"
title="导出为纯文本"
>
<FileText size={12} />
TXT
</button>
</div>
)}
</div>
</div>
{/* 内容区 */}
<div className="flex-1 overflow-auto p-3">
{activeTab === 'elements' ? (
selectedElements.length === 0 ? (
<div className="flex items-center justify-center h-full text-gray-500">
"选择元素"
</div>
) : (
<div className="space-y-2">
{selectedElements.map((el) => (
<div
key={el.id}
className="flex items-center gap-3 p-2 bg-dark-300 rounded-lg"
>
<div className="flex items-center gap-2 flex-1 min-w-0">
<span className="text-primary">{getTypeIcon(el.type)}</span>
<code className="text-xs text-gray-400 truncate">
{el.selector}
</code>
</div>
<span className="text-sm text-gray-300 truncate max-w-[200px]">
{el.preview}
</span>
<select
value={el.type}
onChange={(e) => updateElementType(el.id, e.target.value as SelectedElement['type'])}
className="text-xs bg-dark-200 border border-gray-600 rounded px-2 py-1"
>
<option value="text"></option>
<option value="link"></option>
<option value="image"></option>
<option value="attribute"></option>
</select>
<button
onClick={() => removeSelectedElement(el.id)}
className="p-1 text-gray-500 hover:text-red-400"
>
<X size={14} />
</button>
</div>
))}
</div>
)
) : (
extractedData.length === 0 ? (
<div className="flex items-center justify-center h-full text-gray-500">
"提取"
</div>
) : (
<div className="space-y-3">
{extractedData.map((data, index) => (
<div key={index} className="bg-dark-300 rounded-lg p-3">
<div className="text-xs text-gray-400 mb-2 font-mono">
{data.selector}
</div>
<div className="space-y-1">
{data.values.slice(0, 5).map((value, i) => (
<div key={i} className="text-sm text-gray-200 truncate">
{value}
</div>
))}
{data.values.length > 5 && (
<div className="text-xs text-gray-500">
... {data.values.length - 5}
</div>
)}
</div>
</div>
))}
</div>
)
)}
</div>
{/* 保存规则弹窗 */}
{showSaveDialog && (
<div className="fixed inset-0 bg-black/50 flex items-center justify-center z-50">
<div className="bg-dark-100 rounded-xl p-6 w-80">
<h3 className="text-lg font-medium mb-4"></h3>
<input
type="text"
value={ruleName}
onChange={(e) => setRuleName(e.target.value)}
placeholder="输入规则名称..."
className="w-full mb-4"
autoFocus
onKeyDown={(e) => e.key === 'Enter' && handleSaveRule()}
/>
<div className="flex justify-end gap-2">
<button
onClick={() => setShowSaveDialog(false)}
className="px-4 py-2 text-gray-400 hover:text-white"
>
</button>
<button
onClick={handleSaveRule}
disabled={!ruleName.trim()}
className="px-4 py-2 bg-primary text-black rounded-lg hover:bg-green-400 disabled:opacity-50"
>
</button>
</div>
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,475 @@
import { useState, useEffect } from 'react'
import { X, Search, Bot, ChevronRight, Check, ChevronDown, Clock, Trash2 } from 'lucide-react'
import { useStore, SEARCH_ENGINES } from '../../store'
// 常用模型列表(用于自定义模式)
const COMMON_MODELS = [
'gpt-4o', 'gpt-4o-mini', 'gpt-4-turbo', 'gpt-3.5-turbo',
'claude-3-5-sonnet-20241022', 'claude-3-opus-20240229',
'deepseek-chat', 'deepseek-coder', 'deepseek-reasoner',
'gemini-pro', 'gemini-1.5-pro',
'llama-3.3-70b-versatile', 'llama-3.1-8b-instant',
'qwen-max', 'qwen-plus', 'qwen-turbo',
'glm-4-plus', 'glm-4'
]
// AI 服务商预设配置
const AI_PRESETS = [
{ id: 'custom', name: '自定义', endpoint: '', models: COMMON_MODELS, description: '自定义 API 地址,可选择常用模型' },
{ id: 'ollama', name: 'Ollama', endpoint: 'http://localhost:11434/v1/chat/completions', models: ['llama3.2', 'llama3.1', 'qwen2.5', 'deepseek-r1', 'mistral', 'codellama', 'phi3'], description: '本地运行,无需 API Key' },
{ id: 'deepseek', name: 'DeepSeek', endpoint: 'https://api.deepseek.com/v1/chat/completions', models: ['deepseek-chat', 'deepseek-coder', 'deepseek-reasoner'], description: '国产大模型,性价比高' },
{ id: 'openai', name: 'OpenAI', endpoint: 'https://api.openai.com/v1/chat/completions', models: ['gpt-4o', 'gpt-4o-mini', 'gpt-4-turbo', 'gpt-3.5-turbo', 'o1-preview', 'o1-mini'], description: 'ChatGPT 官方 API' },
{ id: 'anthropic', name: 'Anthropic', endpoint: 'https://api.anthropic.com/v1/messages', models: ['claude-3-5-sonnet-20241022', 'claude-3-opus-20240229', 'claude-3-haiku-20240307'], description: 'Claude 系列' },
{ id: 'groq', name: 'Groq', endpoint: 'https://api.groq.com/openai/v1/chat/completions', models: ['llama-3.3-70b-versatile', 'llama-3.1-8b-instant', 'mixtral-8x7b-32768'], description: '超快推理速度' },
{ id: 'google', name: 'Google AI', endpoint: 'https://generativelanguage.googleapis.com/v1beta/models', models: ['gemini-1.5-pro', 'gemini-1.5-flash', 'gemini-pro'], description: 'Gemini 系列' },
{ id: 'moonshot', name: 'Moonshot', endpoint: 'https://api.moonshot.cn/v1/chat/completions', models: ['moonshot-v1-8k', 'moonshot-v1-32k', 'moonshot-v1-128k'], description: 'Kimi 大模型' },
{ id: 'zhipu', name: '智谱 AI', endpoint: 'https://open.bigmodel.cn/api/paas/v4/chat/completions', models: ['glm-4-plus', 'glm-4', 'glm-4-flash'], description: 'ChatGLM 系列' },
{ id: 'qwen', name: '通义千问', endpoint: 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions', models: ['qwen-max', 'qwen-plus', 'qwen-turbo'], description: '阿里云大模型' },
{ id: 'siliconflow', name: 'SiliconFlow', endpoint: 'https://api.siliconflow.cn/v1/chat/completions', models: ['deepseek-ai/DeepSeek-V3', 'Qwen/Qwen2.5-72B-Instruct', 'meta-llama/Llama-3.3-70B-Instruct'], description: '国产模型聚合平台' }
]
interface SettingsModalProps {
onClose: () => void
}
type SettingsSection = 'search' | 'ai' | 'saved' | 'history'
export default function SettingsModal({ onClose }: SettingsModalProps) {
const {
aiConfig, setAIConfig, saveConfig,
browserSettings, setBrowserSettings,
savedConfigs, addSavedConfig, deleteSavedConfig, applySavedConfig, loadSavedConfigs,
history, clearHistory, setUrl
} = useStore()
const [activeSection, setActiveSection] = useState<SettingsSection>('search')
const [localConfig, setLocalConfig] = useState(aiConfig)
const [selectedPreset, setSelectedPreset] = useState('custom')
const [showPresetDropdown, setShowPresetDropdown] = useState(false)
const [showModelDropdown, setShowModelDropdown] = useState(false)
const [toast, setToast] = useState<string | null>(null)
// 显示提示并自动关闭模态框
const showToastAndClose = (message: string) => {
setToast(message)
setTimeout(() => {
setToast(null)
onClose()
}, 1000)
}
// 只显示提示
const showToast = (message: string) => {
setToast(message)
setTimeout(() => setToast(null), 2000)
}
useEffect(() => {
setLocalConfig(aiConfig)
const matched = AI_PRESETS.find(p =>
p.endpoint && aiConfig.endpoint.includes(p.endpoint.replace('/chat/completions', ''))
)
setSelectedPreset(matched?.id || 'custom')
}, [aiConfig])
useEffect(() => {
loadSavedConfigs()
}, [loadSavedConfigs])
const currentPreset = AI_PRESETS.find(p => p.id === selectedPreset) || AI_PRESETS[0]
const handlePresetSelect = (presetId: string) => {
setSelectedPreset(presetId)
setShowPresetDropdown(false)
const preset = AI_PRESETS.find(p => p.id === presetId)
if (preset && preset.endpoint) {
setLocalConfig({
...localConfig,
endpoint: preset.endpoint,
model: preset.models[0] || ''
})
}
}
const handleModelSelect = (model: string) => {
setLocalConfig({ ...localConfig, model })
setShowModelDropdown(false)
}
const handleSaveAI = async () => {
setAIConfig(localConfig)
await saveConfig()
// 自动保存到配置列表
const existingConfig = savedConfigs.find(
c => c.endpoint === localConfig.endpoint && c.model === localConfig.model
)
if (!existingConfig && localConfig.apiKey) {
const presetName = AI_PRESETS.find(p => p.endpoint === localConfig.endpoint)?.name || '自定义'
addSavedConfig(`${presetName} - ${localConfig.model}`)
}
// 显示成功提示并关闭模态框
showToastAndClose('AI 配置已保存')
}
const handleSearchEngineChange = (engineId: string) => {
const engine = SEARCH_ENGINES.find(e => e.id === engineId)
if (engine) {
setBrowserSettings({
searchEngine: engineId,
homepage: engine.url.replace('?q=%s', '').replace('?wd=%s', '').replace('/search?q=%s', '')
})
}
}
return (
<div className="fixed inset-0 bg-black/30 flex items-center justify-center z-50">
{/* Toast 提示 */}
{toast && (
<div className="fixed top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2 bg-blue-600 text-white px-6 py-3 rounded-lg shadow-xl z-[60] flex items-center gap-2">
<Check size={18} />
{toast}
</div>
)}
<div className="bg-white rounded-2xl w-[700px] h-[500px] flex shadow-2xl overflow-hidden">
{/* 左侧导航 */}
<div className="w-52 bg-gray-50 border-r border-gray-200 p-4">
<div className="flex items-center justify-between mb-6">
<h2 className="text-lg font-semibold text-gray-800"></h2>
<button onClick={onClose} className="text-gray-400 hover:text-gray-600">
<X size={18} />
</button>
</div>
<nav className="space-y-1">
<button
onClick={() => setActiveSection('search')}
className={`w-full flex items-center gap-3 px-3 py-2 rounded-lg text-left ${
activeSection === 'search' ? 'bg-blue-100 text-blue-600' : 'text-gray-700 hover:bg-gray-100'
}`}
>
<Search size={18} />
<span></span>
<ChevronRight size={16} className="ml-auto opacity-50" />
</button>
<button
onClick={() => setActiveSection('ai')}
className={`w-full flex items-center gap-3 px-3 py-2 rounded-lg text-left ${
activeSection === 'ai' ? 'bg-blue-100 text-blue-600' : 'text-gray-700 hover:bg-gray-100'
}`}
>
<Bot size={18} />
<span>AI </span>
<ChevronRight size={16} className="ml-auto opacity-50" />
</button>
<button
onClick={() => setActiveSection('saved')}
className={`w-full flex items-center gap-3 px-3 py-2 rounded-lg text-left ${
activeSection === 'saved' ? 'bg-blue-100 text-blue-600' : 'text-gray-700 hover:bg-gray-100'
}`}
>
<span className="w-[18px] h-[18px] flex items-center justify-center text-xs bg-blue-500 text-white rounded">
{savedConfigs.length}
</span>
<span></span>
<ChevronRight size={16} className="ml-auto opacity-50" />
</button>
<button
onClick={() => setActiveSection('history')}
className={`w-full flex items-center gap-3 px-3 py-2 rounded-lg text-left ${
activeSection === 'history' ? 'bg-blue-100 text-blue-600' : 'text-gray-700 hover:bg-gray-100'
}`}
>
<Clock size={18} />
<span></span>
<ChevronRight size={16} className="ml-auto opacity-50" />
</button>
</nav>
</div>
{/* 右侧内容 */}
<div className="flex-1 p-6 overflow-auto">
{/* 搜索引擎设置 */}
{activeSection === 'search' && (
<div>
<h3 className="text-xl font-semibold text-gray-900 mb-2"></h3>
<p className="text-sm text-gray-600 mb-6"></p>
<div className="space-y-2">
{SEARCH_ENGINES.map(engine => (
<label
key={engine.id}
className={`flex items-center gap-3 p-4 rounded-lg border cursor-pointer transition-all ${
browserSettings.searchEngine === engine.id
? 'border-blue-500 bg-blue-50'
: 'border-gray-200 hover:border-gray-300'
}`}
>
<input
type="radio"
name="searchEngine"
value={engine.id}
checked={browserSettings.searchEngine === engine.id}
onChange={() => handleSearchEngineChange(engine.id)}
className="sr-only"
/>
<div className={`w-5 h-5 rounded-full border-2 flex items-center justify-center ${
browserSettings.searchEngine === engine.id
? 'border-blue-500 bg-blue-500'
: 'border-gray-300'
}`}>
{browserSettings.searchEngine === engine.id && (
<Check size={12} className="text-white" />
)}
</div>
<div className="flex-1">
<div className="font-medium text-gray-900">{engine.name}</div>
<div className="text-sm text-gray-500">{engine.url.replace('%s', '关键词')}</div>
</div>
{browserSettings.searchEngine === engine.id && (
<span className="text-xs text-green-600 bg-green-50 px-2 py-1 rounded"></span>
)}
</label>
))}
</div>
<div className="mt-6 p-3 bg-gray-50 rounded-lg text-sm text-gray-600">
</div>
</div>
)}
{/* AI 配置 */}
{activeSection === 'ai' && (
<div>
<h3 className="text-xl font-semibold text-gray-900 mb-2">AI </h3>
<p className="text-sm text-gray-600 mb-6"> AI 使 API</p>
<div className="space-y-5">
{/* 服务商选择 */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">AI </label>
<div className="relative">
<button
onClick={() => setShowPresetDropdown(!showPresetDropdown)}
className="w-full flex items-center justify-between px-4 py-3 bg-white border border-gray-300 rounded-lg hover:border-blue-500"
>
<div className="text-left">
<div className="font-semibold text-gray-900">{currentPreset.name}</div>
<div className="text-sm text-blue-600">{currentPreset.description}</div>
</div>
<ChevronDown size={16} className={`text-gray-500 transition-transform ${showPresetDropdown ? 'rotate-180' : ''}`} />
</button>
{showPresetDropdown && (
<div className="absolute top-full left-0 right-0 mt-1 bg-white border border-gray-200 rounded-lg shadow-xl z-10 max-h-64 overflow-auto">
{AI_PRESETS.map((preset) => (
<button
key={preset.id}
onClick={() => handlePresetSelect(preset.id)}
className="w-full flex items-center gap-3 px-4 py-3 hover:bg-gray-50 text-left"
>
<div className="w-5 flex items-center justify-center">
{selectedPreset === preset.id && <Check size={14} className="text-blue-500" />}
</div>
<div>
<div className="font-semibold text-gray-900">{preset.name}</div>
<div className="text-sm text-gray-600">{preset.description}</div>
</div>
</button>
))}
</div>
)}
</div>
</div>
{/* API 地址 */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">API </label>
<input
type="text"
value={localConfig.endpoint}
onChange={(e) => setLocalConfig({ ...localConfig, endpoint: e.target.value })}
className="w-full px-4 py-3 bg-white border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500 text-gray-900"
placeholder="https://api.example.com/v1/chat/completions"
/>
</div>
{/* API Key */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">API Key</label>
<input
type="password"
value={localConfig.apiKey}
onChange={(e) => setLocalConfig({ ...localConfig, apiKey: e.target.value })}
className="w-full px-4 py-3 bg-white border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500 font-mono text-gray-900"
placeholder="sk-..."
/>
</div>
{/* 模型选择 */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2"></label>
<div className="relative">
<div className="flex gap-2">
{/* 下拉选择 */}
<button
onClick={() => setShowModelDropdown(!showModelDropdown)}
className="flex-1 flex items-center justify-between px-4 py-3 bg-white border border-gray-300 rounded-lg hover:border-blue-500"
>
<span className="text-gray-900 font-medium truncate">{localConfig.model || '选择模型'}</span>
<ChevronDown size={16} className="text-gray-500 flex-shrink-0" />
</button>
</div>
{showModelDropdown && (
<div className="absolute top-full left-0 right-0 mt-1 bg-white border border-gray-200 rounded-lg shadow-xl z-10 max-h-64 overflow-auto">
{/* 手动输入选项 */}
<div className="p-2 border-b border-gray-100">
<input
type="text"
value={localConfig.model}
onChange={(e) => setLocalConfig({ ...localConfig, model: e.target.value })}
className="w-full px-3 py-2 bg-gray-50 border border-gray-200 rounded text-sm text-gray-900 focus:outline-none focus:ring-1 focus:ring-blue-500"
placeholder="输入自定义模型名..."
onClick={(e) => e.stopPropagation()}
/>
</div>
{/* 预设模型列表 */}
{currentPreset.models.map((model) => (
<button
key={model}
onClick={() => handleModelSelect(model)}
className="w-full flex items-center gap-3 px-4 py-2.5 hover:bg-gray-50 text-left"
>
<div className="w-5 flex items-center justify-center">
{localConfig.model === model && <Check size={14} className="text-blue-500" />}
</div>
<span className="text-gray-800 text-sm">{model}</span>
</button>
))}
</div>
)}
</div>
</div>
{/* 保存按钮 */}
<button
onClick={handleSaveAI}
className="w-full py-2 bg-blue-500 text-white rounded-lg hover:bg-blue-600 transition-colors"
>
</button>
</div>
</div>
)}
{/* 已保存配置 */}
{activeSection === 'saved' && (
<div>
<h3 className="text-lg font-medium text-gray-800 mb-4"> AI </h3>
<p className="text-sm text-gray-500 mb-4"> AI </p>
{savedConfigs.length === 0 ? (
<div className="text-center py-8 text-gray-400">
</div>
) : (
<div className="space-y-2">
{savedConfigs.map(config => (
<div
key={config.id}
className="flex items-center gap-3 p-3 rounded-lg border border-gray-200 hover:border-gray-300"
>
<div className="flex-1">
<div className="font-medium text-gray-800">{config.name}</div>
<div className="text-xs text-gray-500">{config.model}</div>
</div>
<button
onClick={() => {
applySavedConfig(config.id)
showToastAndClose('已应用配置')
}}
className="px-3 py-1 text-sm bg-blue-500 text-white rounded hover:bg-blue-600"
>
使
</button>
<button
onClick={() => {
deleteSavedConfig(config.id)
showToast('已删除')
}}
className="px-3 py-1 text-sm text-red-500 hover:text-red-600"
>
</button>
</div>
))}
</div>
)}
</div>
)}
{/* 历史记录 */}
{activeSection === 'history' && (
<div>
<div className="flex items-center justify-between mb-4">
<div>
<h3 className="text-xl font-semibold text-gray-900"></h3>
<p className="text-sm text-gray-600">访</p>
</div>
{history.length > 0 && (
<button
onClick={() => {
clearHistory()
showToast('已清空历史')
}}
className="flex items-center gap-1 px-3 py-1.5 text-sm text-red-500 hover:text-red-600 hover:bg-red-50 rounded-lg transition-colors"
>
<Trash2 size={14} />
</button>
)}
</div>
{history.length === 0 ? (
<div className="text-center py-12 text-gray-400">
<Clock size={48} className="mx-auto mb-3 opacity-50" />
<p></p>
</div>
) : (
<div className="space-y-1 max-h-[350px] overflow-auto">
{history.map(item => (
<button
key={item.id}
onClick={() => {
setUrl(item.url)
onClose()
}}
className="w-full flex items-center gap-3 p-3 rounded-lg hover:bg-gray-50 text-left transition-colors"
>
<Clock size={14} className="text-gray-400 flex-shrink-0" />
<div className="flex-1 min-w-0">
<div className="font-medium text-gray-800 truncate">{item.title}</div>
<div className="text-xs text-gray-500 truncate">{item.url}</div>
</div>
<div className="text-xs text-gray-400 flex-shrink-0">
{new Date(item.visitedAt).toLocaleDateString()}
</div>
</button>
))}
</div>
)}
</div>
)}
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,116 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
* {
box-sizing: border-box;
}
body {
margin: 0;
padding: 0;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
background: #fff;
color: #333;
overflow: hidden;
}
#root {
width: 100vw;
height: 100vh;
}
/* 字体平滑 */
* {
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}
/* Webview 样式 */
webview {
width: 100%;
height: 100%;
border: none;
}
/* 滚动条样式 */
::-webkit-scrollbar {
width: 6px;
height: 6px;
}
::-webkit-scrollbar-track {
background: transparent;
}
::-webkit-scrollbar-thumb {
background: rgba(0, 0, 0, 0.2);
border-radius: 3px;
}
::-webkit-scrollbar-thumb:hover {
background: rgba(0, 0, 0, 0.3);
}
/* 按钮样式 */
button {
cursor: pointer;
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
/* 隐藏滚动条但保持可滚动 */
.scrollbar-hide {
-ms-overflow-style: none;
scrollbar-width: none;
}
.scrollbar-hide::-webkit-scrollbar {
display: none;
}
/* AI 聊天文本换行 */
.break-words {
word-wrap: break-word;
overflow-wrap: break-word;
word-break: break-word;
hyphens: auto;
}
/* 代码块样式 */
pre, code {
white-space: pre-wrap;
word-wrap: break-word;
overflow-wrap: break-word;
}
/* 代码块容器 */
pre > div {
max-width: 100%;
overflow-x: auto;
}
/* 语法高亮代码块 */
.react-syntax-highlighter-line-number {
min-width: 2em;
}
/* 确保代码块内容不溢出 */
pre code {
display: block;
max-width: 100%;
}
/* Markdown 内容样式 */
.markdown-content {
max-width: 100%;
overflow-wrap: break-word;
}
/* 链接样式 */
a {
word-break: break-all;
}

View File

@@ -0,0 +1,10 @@
import React from 'react'
import ReactDOM from 'react-dom/client'
import App from './App'
import './index.css'
ReactDOM.createRoot(document.getElementById('root')!).render(
<React.StrictMode>
<App />
</React.StrictMode>
)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,217 @@
import { SelectedElement, ExtractedData } from '../store'
// 在 webview 中执行提取
export async function extractData(
webview: Electron.WebviewTag,
elements: SelectedElement[]
): Promise<ExtractedData[]> {
if (!webview || elements.length === 0) {
return []
}
try {
const result = await webview.executeJavaScript(`
(function() {
const selectors = ${JSON.stringify(elements)};
return selectors.map(s => {
const elements = document.querySelectorAll(s.selector);
return {
selector: s.selector,
values: Array.from(elements).map(el => {
if (s.type === 'link') return el.href || el.textContent?.trim();
if (s.type === 'image') return el.src;
if (s.type === 'attribute' && s.attribute) return el.getAttribute(s.attribute);
return el.textContent?.trim();
}).filter(v => v)
};
});
})()
`)
return result
} catch (error) {
console.error('Extract error:', error)
return []
}
}
// 转换为 CSV 格式
export function toCSV(data: ExtractedData[]): string {
if (data.length === 0) return ''
const headers = data.map(d => d.selector)
const maxLength = Math.max(...data.map(d => d.values.length))
const rows: string[][] = [headers]
for (let i = 0; i < maxLength; i++) {
rows.push(data.map(d => escapeCSV(d.values[i] || '')))
}
return rows.map(row => row.join(',')).join('\n')
}
// 转义 CSV 字段
function escapeCSV(value: string): string {
if (value.includes(',') || value.includes('"') || value.includes('\n')) {
return `"${value.replace(/"/g, '""')}"`
}
return value
}
// 转换为 JSON 格式(用户友好版)
export function toJSON(data: ExtractedData[]): string {
// 将数据转换为更友好的格式
const result: {
exportTime: string;
totalItems: number;
data: Array<{
index: number;
title?: string;
content?: string;
link?: string;
}>;
} = {
exportTime: new Date().toLocaleString('zh-CN'),
totalItems: 0,
data: []
}
let globalIndex = 1
data.forEach(d => {
d.values.forEach(value => {
// 尝试解析为 JSON完整模式数据
try {
if (value.startsWith('{')) {
const parsed = JSON.parse(value)
if (parsed.title !== undefined || parsed.link !== undefined) {
// 完整模式:直接使用解析后的对象
result.data.push({
index: globalIndex++,
title: parsed.title || '',
content: parsed.content || '',
link: parsed.link || ''
})
return
}
}
} catch {
// 不是 JSON作为普通文本处理
}
// 普通文本模式
result.data.push({
index: globalIndex++,
content: value
})
})
})
result.totalItems = result.data.length
return JSON.stringify(result, null, 2)
}
// 转换为简洁 JSON 格式(仅数据数组)
export function toSimpleJSON(data: ExtractedData[]): string {
const items: string[] = []
data.forEach(d => {
items.push(...d.values)
})
return JSON.stringify(items, null, 2)
}
// 转换为表格格式(用于预览)
export function toTable(data: ExtractedData[]): { headers: string[]; rows: string[][] } {
const headers = data.map(d => d.selector)
const maxLength = Math.max(...data.map(d => d.values.length), 0)
const rows: string[][] = []
for (let i = 0; i < maxLength; i++) {
rows.push(data.map(d => d.values[i] || ''))
}
return { headers, rows }
}
// 转换为纯文本格式
export function toTXT(data: ExtractedData[]): string {
if (data.length === 0) return ''
const sections: string[] = []
data.forEach(d => {
const selectorName = d.selector.replace(/[^a-zA-Z0-9\u4e00-\u9fa5]/g, ' ').trim()
sections.push(`=== ${selectorName} ===`)
d.values.forEach((v, i) => {
sections.push(`${i + 1}. ${v}`)
})
sections.push('')
})
return sections.join('\n')
}
// 转换为 Excel 格式 (返回工作簿数据)
export function toExcelData(data: ExtractedData[]): { headers: string[]; rows: (string | number)[][] } {
const headers = ['序号', ...data.map(d => d.selector)]
const maxLength = Math.max(...data.map(d => d.values.length), 0)
const rows: (string | number)[][] = []
for (let i = 0; i < maxLength; i++) {
rows.push([i + 1, ...data.map(d => d.values[i] || '')])
}
return { headers, rows }
}
// 生成 Excel 文件 (使用 xlsx 库)
export async function toExcel(data: ExtractedData[]): Promise<Blob | null> {
try {
// 动态导入 xlsx
const XLSX = await import('xlsx')
const { headers, rows } = toExcelData(data)
// 创建工作表
const wsData = [headers, ...rows]
const ws = XLSX.utils.aoa_to_sheet(wsData)
// 设置列宽
const colWidths = headers.map((h, i) => {
const maxLen = Math.max(
h.length,
...rows.map(r => String(r[i] || '').length)
)
return { wch: Math.min(maxLen + 2, 50) }
})
ws['!cols'] = colWidths
// 创建工作簿
const wb = XLSX.utils.book_new()
XLSX.utils.book_append_sheet(wb, ws, '爬取数据')
// 生成 Blob
const excelBuffer = XLSX.write(wb, { bookType: 'xlsx', type: 'array' })
return new Blob([excelBuffer], { type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' })
} catch (error) {
console.error('Excel generation error:', error)
return null
}
}
// 扁平化数据(将多个选择器的数据合并为一个数组)
export function flattenData(data: ExtractedData[]): { index: number; selector: string; value: string }[] {
const result: { index: number; selector: string; value: string }[] = []
let globalIndex = 1
data.forEach(d => {
d.values.forEach(v => {
result.push({
index: globalIndex++,
selector: d.selector,
value: v
})
})
})
return result
}

View File

@@ -0,0 +1,98 @@
import { useStore, Rule, SelectedElement } from '../store'
// 保存当前选择为规则
export function saveCurrentAsRule(name: string): Rule | null {
const store = useStore.getState()
const { selectedElements, url, addRule } = store
if (selectedElements.length === 0) {
return null
}
// 从 URL 生成模式
const urlPattern = generateUrlPattern(url)
const rule: Rule = {
id: Date.now().toString(),
name,
urlPattern,
elements: [...selectedElements],
createdAt: Date.now()
}
addRule(rule)
return rule
}
// 从 URL 生成模式(保留域名,简化路径)
function generateUrlPattern(url: string): string {
try {
const parsed = new URL(url)
// 简单模式:保留域名
return `${parsed.origin}/*`
} catch {
return url
}
}
// 检查 URL 是否匹配规则
export function matchRule(url: string, rules: Rule[]): Rule | null {
for (const rule of rules) {
if (matchUrlPattern(url, rule.urlPattern)) {
return rule
}
}
return null
}
// URL 模式匹配
function matchUrlPattern(url: string, pattern: string): boolean {
// 简单的通配符匹配
const regex = new RegExp(
'^' + pattern
.replace(/[.+?^${}()|[\]\\]/g, '\\$&')
.replace(/\*/g, '.*') + '$'
)
return regex.test(url)
}
// 应用规则
export function applyRule(rule: Rule) {
const store = useStore.getState()
// 清空当前选择
store.clearSelectedElements()
// 添加规则中的元素
rule.elements.forEach((el: SelectedElement) => {
store.addSelectedElement({
...el,
id: Date.now().toString() + Math.random()
})
})
}
// 导出规则为 JSON
export function exportRules(): string {
const store = useStore.getState()
return JSON.stringify(store.rules, null, 2)
}
// 导入规则
export function importRules(json: string): boolean {
try {
const rules = JSON.parse(json) as Rule[]
const store = useStore.getState()
rules.forEach(rule => {
store.addRule({
...rule,
id: Date.now().toString() + Math.random() // 生成新 ID
})
})
return true
} catch {
return false
}
}

View File

@@ -0,0 +1,794 @@
import { create } from 'zustand'
// 标签页
export interface Tab {
id: string
url: string
title: string
isLoading: boolean
}
// 历史记录
export interface HistoryItem {
id: string
url: string
title: string
visitedAt: number
}
export interface SelectedElement {
id: string
selector: string
text: string
type: 'text' | 'link' | 'image' | 'attribute'
attribute?: string
preview?: string
tag?: string // HTML 标签名
role?: 'title' | 'content' | 'link' | 'auto' // 元素角色
}
export interface ExtractedData {
selector: string
values: string[]
}
export interface Rule {
id: string
name: string
urlPattern: string
elements: SelectedElement[]
createdAt: number
}
export interface Message {
id: string
role: 'user' | 'assistant' | 'system'
content: string
timestamp: number
toolCalls?: Array<{
name: string
arguments: object
result?: string
comment?: string // AI's commentary for this specific tool call
}>
}
// 聊天会话
export interface ChatSession {
id: string
title: string
messages: Message[]
createdAt: number
updatedAt: number
}
export interface AIConfig {
endpoint: string
apiKey: string
model: string
}
export interface SavedAIConfig extends AIConfig {
id: string
name: string
createdAt: number
}
export interface MouseState {
visible: boolean
x: number
y: number
clicking: boolean
clickId: number // 用于触发点击动画
duration: number // 移动动画时长
}
// 搜索引擎配置
export interface SearchEngine {
id: string
name: string
url: string // 包含 %s 作为搜索词占位符
icon?: string
}
// 已下载的图片
export interface DownloadedImage {
filename: string
path: string
url: string
timestamp: number
}
// 元素选择请求
export interface ElementSelectionRequest {
id: string
purpose: string // 选择目的描述,如 "爬取新闻列表"
status: 'pending' | 'auto' | 'manual' | 'completed' | 'cancelled'
selector?: string // 选择的选择器
}
// 浏览器设置
export interface BrowserSettings {
searchEngine: string // 搜索引擎 ID
homepage: string
defaultZoom: number
}
// 预设搜索引擎
export const SEARCH_ENGINES: SearchEngine[] = [
{ id: 'bing', name: 'Bing', url: 'https://www.bing.com/search?q=%s' },
{ id: 'google', name: 'Google', url: 'https://www.google.com/search?q=%s' },
{ id: 'baidu', name: '百度', url: 'https://www.baidu.com/s?wd=%s' },
{ id: 'duckduckgo', name: 'DuckDuckGo', url: 'https://duckduckgo.com/?q=%s' },
]
// 搜索引擎首页 URL
export const SEARCH_ENGINE_HOMEPAGES: Record<string, string> = {
'bing': 'https://www.bing.com',
'google': 'https://www.google.com',
'baidu': 'https://www.baidu.com',
'duckduckgo': 'https://duckduckgo.com',
}
interface AppState {
// 标签页
tabs: Tab[]
activeTabId: string
// 历史记录
history: HistoryItem[]
// 浏览器状态
url: string
isLoading: boolean
selectMode: boolean
// 浏览器设置
browserSettings: BrowserSettings
// 选择的元素
selectedElements: SelectedElement[]
// 提取的数据
extractedData: ExtractedData[]
// 规则
rules: Rule[]
// AI 对话
messages: Message[]
isAILoading: boolean
aiStopRequested: boolean
chatSessions: ChatSession[]
currentSessionId: string | null
// AI 配置
aiConfig: AIConfig
savedConfigs: SavedAIConfig[]
// 虚拟鼠标
mouseState: MouseState
// 已下载的图片
downloadedImages: DownloadedImage[]
// 元素选择请求
elementSelectionRequest: ElementSelectionRequest | null
// 标签页 Actions
addTab: (url?: string) => void
closeTab: (id: string) => void
setActiveTab: (id: string) => void
updateTab: (id: string, updates: Partial<Tab>) => void
// 历史记录 Actions
addHistory: (url: string, title: string) => void
clearHistory: () => void
loadHistory: () => Promise<void>
saveHistory: () => Promise<void>
// Actions
setUrl: (url: string) => void
setLoading: (loading: boolean) => void
setSelectMode: (mode: boolean) => void
addSelectedElement: (element: SelectedElement) => void
removeSelectedElement: (id: string) => void
clearSelectedElements: () => void
updateElementType: (id: string, type: SelectedElement['type'], attribute?: string) => void
setExtractedData: (data: ExtractedData[]) => void
clearExtractedData: () => void
addRule: (rule: Rule) => void
deleteRule: (id: string) => void
loadRules: () => Promise<void>
saveRules: () => Promise<void>
addMessage: (message: Omit<Message, 'id' | 'timestamp'>) => void
updateLastMessage: (content: string) => void
updateLastMessageWithToolCalls: (content: string, toolCalls: Array<{ name: string; arguments: object; result?: string }>) => void
clearMessages: () => void
setAILoading: (loading: boolean) => void
stopAI: () => void
resetAIStop: () => void
// 聊天会话管理
newChatSession: () => void
switchChatSession: (id: string) => void
deleteChatSession: (id: string) => void
saveChatSessions: () => Promise<void>
loadChatSessions: () => Promise<void>
autoSaveCurrentSession: () => void
setAIConfig: (config: Partial<AIConfig>) => void
loadConfig: () => Promise<void>
saveConfig: () => Promise<void>
// 已保存的配置
addSavedConfig: (name: string) => void
deleteSavedConfig: (id: string) => void
applySavedConfig: (id: string) => void
loadSavedConfigs: () => Promise<void>
saveSavedConfigs: () => Promise<void>
// 鼠标控制
showMouse: () => void
hideMouse: () => void
moveMouse: (x: number, y: number, duration?: number) => void
clickMouse: () => void
// 浏览器设置
setBrowserSettings: (settings: Partial<BrowserSettings>, navigateToHomepage?: boolean) => void
loadBrowserSettings: () => Promise<void>
saveBrowserSettings: () => Promise<void>
// 下载管理
setDownloadedImages: (images: DownloadedImage[]) => void
clearDownloadedImages: () => void
// 元素选择请求
setElementSelectionRequest: (request: ElementSelectionRequest | null) => void
respondToElementSelection: (mode: 'auto' | 'manual', selector?: string) => void
}
export const useStore = create<AppState>((set, get) => ({
// 初始状态 - 标签页URL 留空,等待 loadBrowserSettings 加载后设置)
tabs: [{ id: 'tab-1', url: '', title: '新标签页', isLoading: false }],
activeTabId: 'tab-1',
// 历史记录
history: [],
// 浏览器状态URL 留空,避免重复跳转)
url: '',
isLoading: false,
selectMode: false,
browserSettings: {
searchEngine: 'bing',
homepage: 'https://www.bing.com',
defaultZoom: 100
},
selectedElements: [],
extractedData: [],
rules: [],
messages: [],
isAILoading: false,
aiStopRequested: false,
chatSessions: [],
currentSessionId: null,
aiConfig: {
endpoint: 'https://api.openai.com/v1/chat/completions',
apiKey: '',
model: 'gpt-4'
},
savedConfigs: [],
mouseState: {
visible: false,
x: 0,
y: 0,
clicking: false,
clickId: 0,
duration: 300
},
downloadedImages: [],
elementSelectionRequest: null,
// 标签页 Actions
addTab: (url) => {
const homepage = SEARCH_ENGINE_HOMEPAGES[get().browserSettings.searchEngine] || 'https://www.bing.com'
const newTab: Tab = {
id: `tab-${Date.now()}`,
url: url || homepage,
title: '新标签页',
isLoading: false
}
set((state) => ({
tabs: [...state.tabs, newTab],
activeTabId: newTab.id,
url: newTab.url
}))
},
closeTab: (id) => {
const { tabs, activeTabId } = get()
if (tabs.length <= 1) return // 至少保留一个标签页
const newTabs = tabs.filter(t => t.id !== id)
let newActiveId = activeTabId
// 如果关闭的是当前标签,切换到相邻标签
if (id === activeTabId) {
const closedIndex = tabs.findIndex(t => t.id === id)
const newIndex = closedIndex >= newTabs.length ? newTabs.length - 1 : closedIndex
newActiveId = newTabs[newIndex].id
}
const activeTab = newTabs.find(t => t.id === newActiveId)
set({
tabs: newTabs,
activeTabId: newActiveId,
url: activeTab?.url || ''
})
},
setActiveTab: (id) => {
const tab = get().tabs.find(t => t.id === id)
if (tab) {
set({ activeTabId: id, url: tab.url })
}
},
updateTab: (id, updates) => {
set((state) => ({
tabs: state.tabs.map(t => t.id === id ? { ...t, ...updates } : t)
}))
// 如果更新的是当前标签的 URL同步更新全局 url
if (updates.url && id === get().activeTabId) {
set({ url: updates.url })
}
},
// 历史记录 Actions
addHistory: (url, title) => {
const newItem: HistoryItem = {
id: `history-${Date.now()}`,
url,
title: title || url,
visitedAt: Date.now()
}
set((state) => ({
history: [newItem, ...state.history.filter(h => h.url !== url)].slice(0, 100)
}))
get().saveHistory()
},
clearHistory: () => {
set({ history: [] })
get().saveHistory()
},
loadHistory: async () => {
if (window.electronAPI?.loadHistory) {
try {
const history = await window.electronAPI.loadHistory()
if (Array.isArray(history)) {
set({ history })
}
} catch (e) {
console.error('[CFSpider] 加载历史记录失败:', e)
}
}
},
saveHistory: async () => {
if (window.electronAPI?.saveHistory) {
await window.electronAPI.saveHistory(get().history)
}
},
// Actions
setUrl: (url) => {
set({ url })
// 同步更新当前标签页
const { activeTabId } = get()
get().updateTab(activeTabId, { url })
},
setLoading: (isLoading) => {
set({ isLoading })
// 同步更新当前标签页
const { activeTabId } = get()
get().updateTab(activeTabId, { isLoading })
},
setSelectMode: (selectMode) => set({ selectMode }),
addSelectedElement: (element) => set((state) => ({
selectedElements: [...state.selectedElements, element]
})),
removeSelectedElement: (id) => set((state) => ({
selectedElements: state.selectedElements.filter((e) => e.id !== id)
})),
clearSelectedElements: () => set({ selectedElements: [], extractedData: [] }),
updateElementType: (id, type, attribute) => set((state) => ({
selectedElements: state.selectedElements.map((e) =>
e.id === id ? { ...e, type, attribute } : e
)
})),
setExtractedData: (data) => set({ extractedData: data }),
clearExtractedData: () => set({ extractedData: [] }),
addRule: (rule) => {
set((state) => ({ rules: [...state.rules, rule] }))
get().saveRules()
},
deleteRule: (id) => {
set((state) => ({ rules: state.rules.filter((r) => r.id !== id) }))
get().saveRules()
},
loadRules: async () => {
if (window.electronAPI) {
const rules = await window.electronAPI.loadRules()
set({ rules: rules as Rule[] })
}
},
saveRules: async () => {
if (window.electronAPI) {
await window.electronAPI.saveRules(get().rules)
}
},
addMessage: (message) => set((state) => ({
messages: [...state.messages, {
...message,
id: `${Date.now()}-${Math.random().toString(36).slice(2, 9)}`,
timestamp: Date.now()
}]
})),
updateLastMessage: (content) => {
set((state) => {
const messages = [...state.messages]
if (messages.length > 0) {
messages[messages.length - 1] = {
...messages[messages.length - 1],
content
}
}
return { messages }
})
// 自动保存当前会话
get().autoSaveCurrentSession()
},
updateLastMessageWithToolCalls: (content, toolCalls) => {
set((state) => {
const messages = [...state.messages]
if (messages.length > 0) {
messages[messages.length - 1] = {
...messages[messages.length - 1],
content,
toolCalls
}
}
return { messages }
})
// 自动保存当前会话
get().autoSaveCurrentSession()
},
clearMessages: () => {
const { messages, currentSessionId, chatSessions } = get()
// 保存当前会话到历史
if (messages.length > 0) {
const title = messages.find(m => m.role === 'user')?.content.slice(0, 20) || '新对话'
const now = Date.now()
const newSession: ChatSession = {
id: currentSessionId || `session-${now}`,
title,
messages: [...messages],
createdAt: now,
updatedAt: now
}
// 更新或添加会话
const existingIndex = chatSessions.findIndex(s => s.id === currentSessionId)
if (existingIndex >= 0) {
const updated = [...chatSessions]
updated[existingIndex] = newSession
set({ chatSessions: updated, messages: [], currentSessionId: null })
} else {
set({ chatSessions: [newSession, ...chatSessions].slice(0, 20), messages: [], currentSessionId: null })
}
get().saveChatSessions()
} else {
set({ messages: [], currentSessionId: null })
}
// 清除临时保存的当前会话
try {
localStorage.removeItem('cfspider-current-session')
} catch {}
},
setAILoading: (isAILoading) => set({ isAILoading }),
stopAI: () => set({ aiStopRequested: true, isAILoading: false }),
resetAIStop: () => set({ aiStopRequested: false }),
// 聊天会话管理
newChatSession: () => {
const { messages, currentSessionId, chatSessions } = get()
// 保存当前会话
if (messages.length > 0) {
const title = messages.find(m => m.role === 'user')?.content.slice(0, 20) || '新对话'
const now = Date.now()
const newSession: ChatSession = {
id: currentSessionId || `session-${now}`,
title,
messages: [...messages],
createdAt: now,
updatedAt: now
}
const existingIndex = chatSessions.findIndex(s => s.id === currentSessionId)
if (existingIndex >= 0) {
const updated = [...chatSessions]
updated[existingIndex] = newSession
set({ chatSessions: updated, messages: [], currentSessionId: null })
} else {
set({ chatSessions: [newSession, ...chatSessions].slice(0, 20), messages: [], currentSessionId: null })
}
get().saveChatSessions()
} else {
set({ messages: [], currentSessionId: null })
}
// 清除临时保存
try {
localStorage.removeItem('cfspider-current-session')
} catch {}
},
switchChatSession: (id) => {
const { messages, currentSessionId, chatSessions } = get()
// 先保存当前会话
if (messages.length > 0 && currentSessionId) {
const existingIndex = chatSessions.findIndex(s => s.id === currentSessionId)
if (existingIndex >= 0) {
const updated = [...chatSessions]
updated[existingIndex] = {
...updated[existingIndex],
messages: [...messages],
updatedAt: Date.now()
}
set({ chatSessions: updated })
get().saveChatSessions()
}
}
// 切换到目标会话
const session = get().chatSessions.find(s => s.id === id)
if (session) {
set({ messages: [...session.messages], currentSessionId: id })
// 保存当前会话状态
get().autoSaveCurrentSession()
}
},
deleteChatSession: (id) => {
set((state) => ({
chatSessions: state.chatSessions.filter(s => s.id !== id)
}))
get().saveChatSessions()
},
saveChatSessions: async () => {
try {
localStorage.setItem('cfspider-chat-sessions', JSON.stringify(get().chatSessions))
} catch {}
},
loadChatSessions: async () => {
try {
const data = localStorage.getItem('cfspider-chat-sessions')
if (data) {
const sessions = JSON.parse(data) as ChatSession[]
set({ chatSessions: sessions })
}
// 加载当前未保存的会话
const currentData = localStorage.getItem('cfspider-current-session')
if (currentData) {
const current = JSON.parse(currentData)
if (current.messages && current.messages.length > 0) {
set({ messages: current.messages, currentSessionId: current.id })
}
}
} catch {}
},
// 自动保存当前会话(每次消息更新时调用)
autoSaveCurrentSession: () => {
const { messages, currentSessionId } = get()
if (messages.length === 0) return
try {
const sessionId = currentSessionId || `session-${Date.now()}`
if (!currentSessionId) {
set({ currentSessionId: sessionId })
}
localStorage.setItem('cfspider-current-session', JSON.stringify({
id: sessionId,
messages: messages
}))
} catch {}
},
setAIConfig: (config) => set((state) => ({
aiConfig: { ...state.aiConfig, ...config }
})),
loadConfig: async () => {
if (window.electronAPI) {
const config = await window.electronAPI.loadConfig()
set({ aiConfig: config })
}
},
saveConfig: async () => {
if (window.electronAPI) {
await window.electronAPI.saveConfig(get().aiConfig)
}
},
// 已保存的配置
addSavedConfig: (name) => {
const { aiConfig, savedConfigs } = get()
const newConfig: SavedAIConfig = {
...aiConfig,
id: Date.now().toString(),
name,
createdAt: Date.now()
}
set({ savedConfigs: [...savedConfigs, newConfig] })
get().saveSavedConfigs()
},
deleteSavedConfig: (id) => {
set((state) => ({
savedConfigs: state.savedConfigs.filter((c) => c.id !== id)
}))
get().saveSavedConfigs()
},
applySavedConfig: (id) => {
const config = get().savedConfigs.find((c) => c.id === id)
if (config) {
set({
aiConfig: {
endpoint: config.endpoint,
apiKey: config.apiKey,
model: config.model
}
})
get().saveConfig()
}
},
loadSavedConfigs: async () => {
if (window.electronAPI) {
try {
const configs = await window.electronAPI.loadSavedConfigs()
set({ savedConfigs: configs as SavedAIConfig[] })
} catch {
set({ savedConfigs: [] })
}
}
},
saveSavedConfigs: async () => {
if (window.electronAPI) {
await window.electronAPI.saveSavedConfigs(get().savedConfigs)
}
},
// 鼠标控制
showMouse: () => set((state) => ({
mouseState: { ...state.mouseState, visible: true }
})),
hideMouse: () => set((state) => ({
mouseState: { ...state.mouseState, visible: false }
})),
moveMouse: (x, y, duration = 300) => set((state) => ({
mouseState: { ...state.mouseState, x, y, duration, visible: true }
})),
clickMouse: () => set((state) => ({
mouseState: {
...state.mouseState,
clicking: true,
clickId: state.mouseState.clickId + 1
}
})),
// 浏览器设置
setBrowserSettings: (settings, navigateToHomepage = true) => {
set((state) => ({
browserSettings: { ...state.browserSettings, ...settings }
}))
get().saveBrowserSettings()
// 如果设置了搜索引擎,自动跳转到该搜索引擎首页
if (settings.searchEngine && navigateToHomepage) {
const homepage = SEARCH_ENGINE_HOMEPAGES[settings.searchEngine]
if (homepage) {
set({ url: homepage })
// 更新 webview
const webview = document.querySelector('webview') as HTMLElement & { src?: string }
if (webview) {
webview.src = homepage
}
}
}
},
loadBrowserSettings: async () => {
if (window.electronAPI?.loadBrowserSettings) {
try {
const settings = await window.electronAPI.loadBrowserSettings()
console.log('[CFSpider] 加载浏览器设置:', settings)
if (settings && typeof settings === 'object') {
const browserSettings = settings as BrowserSettings
// 根据搜索引擎设置首页 URL
const homepage = SEARCH_ENGINE_HOMEPAGES[browserSettings.searchEngine] || 'https://www.bing.com'
// 同时更新第一个标签页的 URL
const { tabs } = get()
const updatedTabs = tabs.length > 0
? [{ ...tabs[0], url: homepage }, ...tabs.slice(1)]
: [{ id: 'tab-1', url: homepage, title: '新标签页', isLoading: false }]
set({
browserSettings,
url: homepage,
tabs: updatedTabs
})
console.log('[CFSpider] 设置首页 URL:', homepage)
}
} catch (e) {
console.error('[CFSpider] 加载浏览器设置失败:', e)
}
}
},
saveBrowserSettings: async () => {
if (window.electronAPI?.saveBrowserSettings) {
const settings = get().browserSettings
console.log('[CFSpider] 保存浏览器设置:', settings)
await window.electronAPI.saveBrowserSettings(settings)
}
},
// 下载管理
setDownloadedImages: (images) => set({ downloadedImages: images }),
clearDownloadedImages: () => set({ downloadedImages: [] }),
// 元素选择请求
setElementSelectionRequest: (request) => set({ elementSelectionRequest: request }),
respondToElementSelection: (mode, selector) => {
const request = get().elementSelectionRequest
if (request) {
set({
elementSelectionRequest: {
...request,
status: mode === 'manual' ? 'manual' : (selector ? 'completed' : 'auto'),
selector: selector
}
})
}
}
}))

View File

@@ -0,0 +1,25 @@
{
"compilerOptions": {
"target": "ES2020",
"useDefineForClassFields": true,
"lib": ["ES2020", "DOM", "DOM.Iterable"],
"module": "ESNext",
"skipLibCheck": true,
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"resolveJsonModule": true,
"isolatedModules": true,
"noEmit": true,
"jsx": "react-jsx",
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noFallthroughCasesInSwitch": true,
"baseUrl": ".",
"paths": {
"@/*": ["src/*"]
}
},
"include": ["src"],
"references": [{ "path": "./tsconfig.node.json" }]
}

View File

@@ -0,0 +1,12 @@
{
"compilerOptions": {
"composite": true,
"skipLibCheck": true,
"module": "ESNext",
"moduleResolution": "bundler",
"allowSyntheticDefaultImports": true,
"strict": true,
"outDir": "dist-electron"
},
"include": ["vite.config.ts", "electron/**/*"]
}

View File

@@ -0,0 +1,20 @@
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import { resolve } from 'path'
export default defineConfig({
plugins: [react()],
base: './',
resolve: {
alias: {
'@': resolve(__dirname, 'src')
}
},
build: {
outDir: 'dist',
emptyOutDir: true
},
server: {
port: 5173
}
})

View File

@@ -111,6 +111,12 @@ from .workers_manager import (
WorkersManager
)
# 人类行为模拟浏览器
from .human_browser import HumanBrowser, HumanBrowserSync
# AI 驱动的智能浏览器
from .ai_browser import AIBrowser, AIBrowserSync, CrawlResult, ExecuteResult, PRESET_APIS
# 异步 API基于 httpx
from .async_api import (
aget, apost, aput, adelete, ahead, aoptions, apatch,
@@ -286,4 +292,8 @@ __all__ = [
"make_workers", "list_workers", "delete_workers", "WorkersManager",
# 数据处理
"DataFrame", "read", "read_csv", "read_json", "read_excel",
# 人类行为模拟浏览器
"HumanBrowser", "HumanBrowserSync",
# AI 智能浏览器
"AIBrowser", "AIBrowserSync", "CrawlResult", "ExecuteResult", "PRESET_APIS",
]

802
cfspider/ai_browser.py Normal file
View File

@@ -0,0 +1,802 @@
"""
CFspider AI Browser - AI 驱动的智能浏览器
通过大模型 API 驱动浏览器自动完成任务,支持:
- 爬虫模式:自动分析页面结构,智能提取数据
- 操作模式:理解用户指令,自动完成网页操作
支持任意 OpenAI 兼容 API
- DeepSeek (免费额度)
- 通义千问 (免费额度)
- Moonshot (免费额度)
- OpenAI
- 本地模型 (Ollama)
使用方法:
>>> import cfspider
>>>
>>> # 配置 AI
>>> browser = cfspider.AIBrowser(
... base_url="https://api.deepseek.com/v1",
... api_key="your-api-key",
... model="deepseek-chat"
... )
>>>
>>> # 爬虫模式:自动提取数据
>>> data = await browser.crawl(
... "https://news.ycombinator.com",
... goal="提取首页所有新闻标题和链接"
... )
>>>
>>> # 操作模式:完成复杂任务
>>> await browser.execute(
... "https://github.com",
... task="搜索 cfspider 项目,点击第一个结果,获取 star 数量"
... )
"""
import asyncio
import json
import re
from typing import Optional, List, Dict, Any, Union, Callable
from dataclasses import dataclass
try:
import aiohttp
except ImportError:
aiohttp = None
from .human_browser import HumanBrowser
# 免费/低价大模型 API 配置
PRESET_APIS = {
"nvidia": {
"base_url": "https://integrate.api.nvidia.com/v1",
"model": "nvidia/llama-3.1-nemotron-70b-instruct",
"description": "NVIDIA NIM免费额度 1000 请求/天)"
},
"nvidia-glm": {
"base_url": "https://integrate.api.nvidia.com/v1",
"model": "z-ai/glm4.7",
"description": "NVIDIA GLM4.7(免费)"
},
"nvidia-minimax": {
"base_url": "https://integrate.api.nvidia.com/v1",
"model": "minimaxai/minimax-m2.1",
"description": "NVIDIA Minimax M2.1(免费)"
},
"modelscope": {
"base_url": "https://api-inference.modelscope.cn/v1",
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"description": "ModelScope 魔搭(免费 Qwen2.5-Coder-32B"
},
"deepseek": {
"base_url": "https://api.deepseek.com/v1",
"model": "deepseek-chat",
"description": "DeepSeek免费额度 500万 tokens"
},
"qwen": {
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"model": "qwen-turbo",
"description": "通义千问(免费额度 100万 tokens"
},
"moonshot": {
"base_url": "https://api.moonshot.cn/v1",
"model": "moonshot-v1-8k",
"description": "Moonshot免费额度 15元"
},
"glm": {
"base_url": "https://open.bigmodel.cn/api/paas/v4",
"model": "glm-4-flash",
"description": "智谱 GLM-4-Flash完全免费"
},
"ollama": {
"base_url": "http://localhost:11434/v1",
"model": "llama3.2",
"description": "本地 Ollama完全免费"
},
"openai": {
"base_url": "https://api.openai.com/v1",
"model": "gpt-4o-mini",
"description": "OpenAI GPT-4o-mini"
}
}
@dataclass
class CrawlResult:
"""爬虫结果"""
success: bool
data: Any
steps: List[str]
html: str
error: Optional[str] = None
@dataclass
class ExecuteResult:
"""操作结果"""
success: bool
result: str
steps: List[str]
screenshots: List[bytes]
error: Optional[str] = None
class AIBrowser:
"""
AI 驱动的智能浏览器
通过大模型理解网页结构和用户意图,自动完成爬取和操作任务。
"""
def __init__(
self,
# AI 配置
base_url: str = None,
api_key: str = None,
model: str = None,
preset: str = None, # 使用预设 API
# 浏览器配置
cf_proxies: Optional[str] = None,
uuid: Optional[str] = None,
headless: bool = False,
human_like: bool = True,
# AI 行为配置
max_steps: int = 20,
screenshot_each_step: bool = False,
verbose: bool = True
):
"""
初始化 AI 浏览器
Args:
base_url: API 基础 URL如 https://api.deepseek.com/v1
api_key: API 密钥
model: 模型名称(如 deepseek-chat
preset: 使用预设 APIdeepseek/qwen/moonshot/glm/ollama/openai
cf_proxies: CFspider Workers 代理
uuid: VLESS UUID
headless: 是否无头模式
human_like: 是否启用人类行为模拟
max_steps: 最大操作步数
screenshot_each_step: 是否每步截图
verbose: 是否输出详细日志
"""
# 处理预设
if preset and preset in PRESET_APIS:
config = PRESET_APIS[preset]
self.base_url = base_url or config["base_url"]
self.model = model or config["model"]
else:
self.base_url = base_url
self.model = model
self.api_key = api_key
self.cf_proxies = cf_proxies
self.uuid = uuid
self.headless = headless
self.human_like = human_like
self.max_steps = max_steps
self.screenshot_each_step = screenshot_each_step
self.verbose = verbose
self._browser: Optional[HumanBrowser] = None
self._conversation: List[Dict] = []
if not self.base_url or not self.api_key:
raise ValueError(
"请配置 API\n"
" AIBrowser(base_url='...', api_key='...', model='...')\n"
"或使用预设:\n"
" AIBrowser(preset='deepseek', api_key='...')\n\n"
"支持的预设:" + ", ".join(PRESET_APIS.keys())
)
def _log(self, msg: str):
"""输出日志"""
if self.verbose:
print(f"[AIBrowser] {msg}")
async def _call_llm(self, messages: List[Dict], tools: List[Dict] = None) -> Dict:
"""调用大模型 API"""
if not aiohttp:
raise ImportError("请安装 aiohttp: pip install aiohttp")
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_key}"
}
payload = {
"model": self.model,
"messages": messages,
"temperature": 0.7,
"max_tokens": 4096
}
if tools:
payload["tools"] = tools
payload["tool_choice"] = "auto"
async with aiohttp.ClientSession() as session:
async with session.post(
f"{self.base_url.rstrip('/')}/chat/completions",
headers=headers,
json=payload,
timeout=aiohttp.ClientTimeout(total=60)
) as resp:
if resp.status != 200:
error = await resp.text()
raise Exception(f"API 错误 {resp.status}: {error}")
return await resp.json()
async def _get_page_context(self) -> str:
"""获取当前页面上下文(用于 AI 分析)"""
# 获取简化的页面结构
script = """
(function() {
const elements = [];
const interactable = document.querySelectorAll(
'a, button, input, select, textarea, [onclick], [role="button"]'
);
interactable.forEach((el, idx) => {
const rect = el.getBoundingClientRect();
if (rect.width > 0 && rect.height > 0) {
let text = el.innerText || el.value || el.placeholder || '';
text = text.slice(0, 100).replace(/\\s+/g, ' ').trim();
const attrs = [];
if (el.id) attrs.push(`id="${el.id}"`);
if (el.className) attrs.push(`class="${el.className.toString().slice(0, 50)}"`);
if (el.name) attrs.push(`name="${el.name}"`);
if (el.type) attrs.push(`type="${el.type}"`);
if (el.href) attrs.push(`href="${el.href.slice(0, 100)}"`);
elements.push({
index: idx,
tag: el.tagName.toLowerCase(),
attrs: attrs.join(' '),
text: text,
selector: el.id ? `#${el.id}` :
el.className ? `.${el.className.toString().split(' ')[0]}` :
`${el.tagName.toLowerCase()}:nth-of-type(${idx + 1})`
});
}
});
return {
title: document.title,
url: window.location.href,
elements: elements.slice(0, 50) // 限制数量
};
})()
"""
result = await self._browser.evaluate(script)
return json.dumps(result, ensure_ascii=False, indent=2)
async def _start_browser(self):
"""启动浏览器"""
if self._browser is None:
self._browser = HumanBrowser(
cf_proxies=self.cf_proxies,
uuid=self.uuid,
headless=self.headless,
human_like=self.human_like
)
await self._browser.start()
async def crawl(
self,
url: str,
goal: str,
output_format: str = "json"
) -> CrawlResult:
"""
爬虫模式:自动分析页面并提取数据
Args:
url: 目标 URL
goal: 爬取目标描述(如 "提取所有商品名称和价格"
output_format: 输出格式 (json/text/list)
Returns:
CrawlResult: 爬取结果
Example:
>>> result = await browser.crawl(
... "https://news.ycombinator.com",
... goal="提取首页前10条新闻的标题和链接"
... )
>>> print(result.data)
"""
await self._start_browser()
steps = []
screenshots = []
try:
# 打开页面
self._log(f"打开页面: {url}")
await self._browser.goto(url)
steps.append(f"打开页面: {url}")
# 获取页面上下文
context = await self._get_page_context()
html = await self._browser.html()
# 构建提示词
system_prompt = """你是一个智能网页数据提取助手。
用户会给你一个网页的结构信息和提取目标。
请分析页面结构,编写 JavaScript 代码来提取数据。
返回格式:
```javascript
// 你的提取代码
```
代码应该返回提取的数据JSON 格式)。
只返回代码,不要解释。"""
user_prompt = f"""页面信息:
{context}
提取目标:{goal}
输出格式:{output_format}
请编写 JavaScript 代码提取数据。"""
# 调用 AI
self._log("AI 分析页面结构...")
response = await self._call_llm([
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
])
content = response["choices"][0]["message"]["content"]
steps.append("AI 分析完成")
# 提取 JavaScript 代码
code_match = re.search(r'```(?:javascript|js)?\n(.*?)\n```', content, re.DOTALL)
if code_match:
js_code = code_match.group(1)
else:
js_code = content
# 执行提取代码
self._log("执行数据提取...")
data = await self._browser.evaluate(js_code)
steps.append("数据提取完成")
return CrawlResult(
success=True,
data=data,
steps=steps,
html=html
)
except Exception as e:
self._log(f"爬取错误: {e}")
return CrawlResult(
success=False,
data=None,
steps=steps,
html="",
error=str(e)
)
async def execute(
self,
url: str,
task: str,
on_step: Callable[[str], None] = None
) -> ExecuteResult:
"""
操作模式:让 AI 理解并完成复杂任务
Args:
url: 起始 URL
task: 任务描述(如 "登录账号,搜索商品,加入购物车"
on_step: 每步回调函数
Returns:
ExecuteResult: 操作结果
Example:
>>> result = await browser.execute(
... "https://github.com",
... task="搜索 cfspider点击第一个结果告诉我 star 数量"
... )
>>> print(result.result)
"""
await self._start_browser()
steps = []
screenshots = []
# 定义可用工具
tools = [
{
"type": "function",
"function": {
"name": "click",
"description": "点击页面元素",
"parameters": {
"type": "object",
"properties": {
"selector": {
"type": "string",
"description": "CSS 选择器"
}
},
"required": ["selector"]
}
}
},
{
"type": "function",
"function": {
"name": "type_text",
"description": "在输入框中输入文本",
"parameters": {
"type": "object",
"properties": {
"selector": {
"type": "string",
"description": "CSS 选择器"
},
"text": {
"type": "string",
"description": "要输入的文本"
}
},
"required": ["selector", "text"]
}
}
},
{
"type": "function",
"function": {
"name": "scroll",
"description": "滚动页面",
"parameters": {
"type": "object",
"properties": {
"direction": {
"type": "string",
"enum": ["up", "down"],
"description": "滚动方向"
}
},
"required": ["direction"]
}
}
},
{
"type": "function",
"function": {
"name": "wait",
"description": "等待一段时间",
"parameters": {
"type": "object",
"properties": {
"seconds": {
"type": "number",
"description": "等待秒数"
}
},
"required": ["seconds"]
}
}
},
{
"type": "function",
"function": {
"name": "get_text",
"description": "获取元素的文本内容",
"parameters": {
"type": "object",
"properties": {
"selector": {
"type": "string",
"description": "CSS 选择器"
}
},
"required": ["selector"]
}
}
},
{
"type": "function",
"function": {
"name": "done",
"description": "任务完成,返回结果",
"parameters": {
"type": "object",
"properties": {
"result": {
"type": "string",
"description": "任务结果"
}
},
"required": ["result"]
}
}
}
]
try:
# 打开页面
self._log(f"打开页面: {url}")
await self._browser.goto(url)
steps.append(f"打开页面: {url}")
if self.screenshot_each_step:
screenshots.append(await self._browser.screenshot())
# 初始化对话
system_prompt = """你是一个网页自动化助手,通过工具来完成用户的任务。
可用工具:
- click(selector): 点击元素
- type_text(selector, text): 输入文本
- scroll(direction): 滚动页面 (up/down)
- wait(seconds): 等待
- get_text(selector): 获取文本
- done(result): 完成任务并返回结果
每次我会给你当前页面的结构信息,你决定下一步操作。
一步一步完成任务,完成后调用 done() 返回结果。"""
messages = [{"role": "system", "content": system_prompt}]
# 开始执行循环
for step in range(self.max_steps):
# 获取页面上下文
context = await self._get_page_context()
user_msg = f"""当前页面:
{context}
任务:{task}
已完成的步骤:
{chr(10).join(steps)}
请决定下一步操作。"""
messages.append({"role": "user", "content": user_msg})
# 调用 AI
self._log(f"步骤 {step + 1}: 分析中...")
response = await self._call_llm(messages, tools)
choice = response["choices"][0]
message = choice["message"]
messages.append(message)
# 检查是否有工具调用
if "tool_calls" not in message or not message["tool_calls"]:
# 没有工具调用,可能是对话回复
content = message.get("content", "")
if content:
self._log(f"AI: {content}")
break
# 执行工具调用
for tool_call in message["tool_calls"]:
func_name = tool_call["function"]["name"]
func_args = json.loads(tool_call["function"]["arguments"])
self._log(f"执行: {func_name}({func_args})")
if on_step:
on_step(f"{func_name}({func_args})")
# 执行操作
result = await self._execute_tool(func_name, func_args)
step_desc = f"{func_name}({func_args}) -> {result}"
steps.append(step_desc)
# 检查是否完成
if func_name == "done":
return ExecuteResult(
success=True,
result=func_args.get("result", ""),
steps=steps,
screenshots=screenshots
)
# 添加工具结果
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"content": str(result)
})
if self.screenshot_each_step:
screenshots.append(await self._browser.screenshot())
await asyncio.sleep(1)
return ExecuteResult(
success=True,
result="达到最大步数限制",
steps=steps,
screenshots=screenshots
)
except Exception as e:
self._log(f"执行错误: {e}")
return ExecuteResult(
success=False,
result="",
steps=steps,
screenshots=screenshots,
error=str(e)
)
async def _execute_tool(self, name: str, args: Dict) -> str:
"""执行工具"""
try:
if name == "click":
await self._browser.human_click(args["selector"])
await asyncio.sleep(1)
return "点击成功"
elif name == "type_text":
await self._browser.human_type(args["selector"], args["text"])
return "输入成功"
elif name == "scroll":
await self._browser.human_scroll(args["direction"])
return "滚动成功"
elif name == "wait":
await asyncio.sleep(args["seconds"])
return f"等待 {args['seconds']}"
elif name == "get_text":
text = await self._browser.evaluate(
f"document.querySelector('{args['selector']}')?.innerText || ''"
)
return text[:500] if text else "未找到元素"
elif name == "done":
return args.get("result", "完成")
else:
return f"未知工具: {name}"
except Exception as e:
return f"错误: {e}"
async def chat(self, message: str) -> str:
"""
对话模式:与 AI 对话,让它帮你操作浏览器
Args:
message: 用户消息
Returns:
AI 回复
Example:
>>> await browser.goto("https://github.com")
>>> response = await browser.chat("帮我搜索 cfspider")
>>> print(response)
"""
await self._start_browser()
# 获取页面上下文
context = await self._get_page_context()
# 添加用户消息
self._conversation.append({
"role": "user",
"content": f"当前页面:\n{context}\n\n用户:{message}"
})
# 调用 AI
system = """你是一个浏览器助手。用户会问你关于当前页面的问题,
或者让你帮忙操作页面。请简洁回答,如果需要操作,告诉用户你会做什么。"""
messages = [{"role": "system", "content": system}] + self._conversation
response = await self._call_llm(messages)
content = response["choices"][0]["message"]["content"]
self._conversation.append({"role": "assistant", "content": content})
return content
async def goto(self, url: str) -> str:
"""导航到 URL"""
await self._start_browser()
return await self._browser.goto(url)
async def screenshot(self, path: str = None) -> bytes:
"""截图"""
await self._start_browser()
return await self._browser.screenshot(path)
async def close(self):
"""关闭浏览器"""
if self._browser:
await self._browser.close()
async def __aenter__(self):
await self._start_browser()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()
@staticmethod
def list_presets() -> Dict[str, Dict]:
"""列出所有预设 API"""
return PRESET_APIS
# 同步版本
class AIBrowserSync:
"""
同步版 AI 浏览器
Example:
>>> browser = cfspider.AIBrowserSync(preset="deepseek", api_key="...")
>>> result = browser.crawl("https://example.com", goal="提取所有链接")
"""
def __init__(self, *args, **kwargs):
self._browser = AIBrowser(*args, **kwargs)
self._loop = None
def _get_loop(self):
if self._loop is None:
try:
self._loop = asyncio.get_event_loop()
except RuntimeError:
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
return self._loop
def _run(self, coro):
return self._get_loop().run_until_complete(coro)
def crawl(self, url: str, goal: str, output_format: str = "json") -> CrawlResult:
return self._run(self._browser.crawl(url, goal, output_format))
def execute(self, url: str, task: str, on_step=None) -> ExecuteResult:
return self._run(self._browser.execute(url, task, on_step))
def chat(self, message: str) -> str:
return self._run(self._browser.chat(message))
def goto(self, url: str) -> str:
return self._run(self._browser.goto(url))
def screenshot(self, path: str = None) -> bytes:
return self._run(self._browser.screenshot(path))
def close(self):
return self._run(self._browser.close())
def __enter__(self):
self._run(self._browser._start_browser())
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
@staticmethod
def list_presets() -> Dict[str, Dict]:
return PRESET_APIS

392
cfspider/ai_browser_v2.py Normal file
View File

@@ -0,0 +1,392 @@
"""
CFspider AI Browser V2 - 基于 Playwright 的 AI 智能浏览器
更稳定的实现,使用 Playwright 替代原生 CDP。
支持爬虫模式和操作模式,由 AI 驱动完成任务。
"""
import asyncio
import json
import re
import random
from typing import Optional, List, Dict, Any, Callable
from dataclasses import dataclass
try:
from playwright.async_api import async_playwright, Page, Browser
PLAYWRIGHT_AVAILABLE = True
except ImportError:
PLAYWRIGHT_AVAILABLE = False
try:
import aiohttp
except ImportError:
aiohttp = None
# 预设 API
PRESET_APIS = {
"nvidia": {
"base_url": "https://integrate.api.nvidia.com/v1",
"model": "nvidia/llama-3.1-nemotron-70b-instruct",
"description": "NVIDIA NIM"
},
"nvidia-glm": {
"base_url": "https://integrate.api.nvidia.com/v1",
"model": "z-ai/glm4.7",
"description": "NVIDIA GLM4.7"
},
"nvidia-minimax": {
"base_url": "https://integrate.api.nvidia.com/v1",
"model": "minimaxai/minimax-m2.1",
"description": "NVIDIA Minimax"
},
"modelscope": {
"base_url": "https://api-inference.modelscope.cn/v1",
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"description": "ModelScope Qwen"
},
"deepseek": {
"base_url": "https://api.deepseek.com/v1",
"model": "deepseek-chat",
"description": "DeepSeek"
},
"glm": {
"base_url": "https://open.bigmodel.cn/api/paas/v4",
"model": "glm-4-flash",
"description": "智谱 GLM-4-Flash"
},
}
@dataclass
class TaskResult:
"""任务结果"""
success: bool
result: str
steps: List[str]
error: Optional[str] = None
class AIBrowserV2:
"""
AI 驱动的智能浏览器 V2
基于 Playwright更稳定可靠。
"""
def __init__(
self,
base_url: str = None,
api_key: str = None,
model: str = None,
preset: str = None,
headless: bool = False,
slow_mo: int = 100, # 操作延迟(毫秒)
verbose: bool = True
):
if not PLAYWRIGHT_AVAILABLE:
raise ImportError("请安装 playwright: pip install playwright && playwright install")
# 处理预设
if preset and preset in PRESET_APIS:
config = PRESET_APIS[preset]
self.base_url = base_url or config["base_url"]
self.model = model or config["model"]
else:
self.base_url = base_url
self.model = model
self.api_key = api_key
self.headless = headless
self.slow_mo = slow_mo
self.verbose = verbose
self._playwright = None
self._browser: Browser = None
self._page: Page = None
def _log(self, msg: str):
if self.verbose:
print(f"[AI浏览器] {msg}")
async def _call_llm(self, messages: List[Dict]) -> str:
"""调用 LLM"""
if not aiohttp:
raise ImportError("请安装 aiohttp: pip install aiohttp")
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_key}"
}
payload = {
"model": self.model,
"messages": messages,
"temperature": 0.3,
"max_tokens": 2000
}
async with aiohttp.ClientSession() as session:
async with session.post(
f"{self.base_url.rstrip('/')}/chat/completions",
headers=headers,
json=payload,
timeout=aiohttp.ClientTimeout(total=120)
) as resp:
if resp.status != 200:
error = await resp.text()
raise Exception(f"API 错误: {error}")
data = await resp.json()
return data["choices"][0]["message"]["content"]
async def start(self):
"""启动浏览器"""
self._playwright = await async_playwright().start()
self._browser = await self._playwright.chromium.launch(
headless=self.headless,
slow_mo=self.slow_mo
)
self._page = await self._browser.new_page()
# 设置视口
await self._page.set_viewport_size({"width": 1280, "height": 800})
self._log("浏览器已启动")
async def goto(self, url: str):
"""导航到 URL"""
await self._page.goto(url, wait_until="domcontentloaded")
self._log(f"打开页面: {url}")
await asyncio.sleep(1)
async def _get_page_elements(self) -> str:
"""获取页面可交互元素"""
elements = await self._page.evaluate("""
() => {
const results = [];
const selectors = 'a, button, input, select, textarea, [onclick], [role="button"], [role="search"], [type="search"], [aria-label*="search" i], [placeholder*="search" i], [placeholder*="搜索" i]';
const seen = new Set();
document.querySelectorAll(selectors).forEach((el) => {
const rect = el.getBoundingClientRect();
if (rect.width > 0 && rect.height > 0 && results.length < 40) {
// 避免重复
const key = el.tagName + el.id + el.className;
if (seen.has(key)) return;
seen.add(key);
let text = (el.innerText || el.value || el.placeholder || el.title || el.getAttribute('aria-label') || '').slice(0, 50).trim();
let selector = '';
if (el.id) selector = '#' + el.id;
else if (el.name) selector = '[name="' + el.name + '"]';
else if (el.placeholder) selector = '[placeholder*="' + el.placeholder.slice(0,10) + '"]';
else if (el.className) selector = '.' + el.className.split(' ')[0];
else selector = el.tagName.toLowerCase();
results.push({
idx: results.length,
tag: el.tagName.toLowerCase(),
text: text,
selector: selector,
type: el.type || '',
placeholder: el.placeholder || ''
});
}
});
return results;
}
""")
# 格式化为文本
lines = []
for el in elements:
desc = f"[{el['idx']}] <{el['tag']}> {el['selector']}"
if el.get('text'):
desc += f" \"{el['text']}\""
if el.get('placeholder'):
desc += f" placeholder=\"{el['placeholder']}\""
if el.get('type'):
desc += f" (type={el['type']})"
lines.append(desc)
return "\n".join(lines)
async def execute(self, task: str, max_steps: int = 10) -> TaskResult:
"""
执行任务
Args:
task: 任务描述
max_steps: 最大步骤数
Returns:
TaskResult
"""
steps = []
system_prompt = """你是一个浏览器自动化助手。根据页面元素和任务,返回下一步操作。
操作格式(每次只返回一个操作):
- CLICK [idx] - 点击元素
- TYPE [idx] "文本" - 在输入框输入
- SCROLL down/up - 滚动页面
- WAIT - 等待页面加载
- DONE "结果" - 任务完成
只返回操作命令,不要解释。"""
for step in range(max_steps):
# 获取页面信息
title = await self._page.title()
url = self._page.url
elements = await self._get_page_elements()
user_msg = f"""当前页面: {title}
URL: {url}
可交互元素:
{elements}
任务: {task}
已完成步骤: {steps}
下一步操作:"""
self._log(f"步骤 {step + 1}: 分析页面...")
try:
response = await self._call_llm([
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_msg}
])
# 清理响应(移除思考标签)
response = re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL).strip()
self._log(f"AI 决定: {response}")
# 解析并执行操作
action = response.strip().split('\n')[0]
steps.append(action)
if action.startswith("CLICK"):
match = re.search(r'CLICK\s*\[?(\d+)\]?', action)
if match:
idx = int(match.group(1))
await self._click_by_index(idx)
await asyncio.sleep(1)
elif action.startswith("TYPE"):
match = re.search(r'TYPE\s*\[?(\d+)\]?\s*["\'](.+?)["\']', action)
if match:
idx = int(match.group(1))
text = match.group(2)
await self._type_by_index(idx, text)
await asyncio.sleep(0.5)
elif action.startswith("SCROLL"):
direction = "down" if "down" in action.lower() else "up"
await self._page.evaluate(f"window.scrollBy(0, {'300' if direction == 'down' else '-300'})")
await asyncio.sleep(0.5)
elif action.startswith("WAIT"):
await asyncio.sleep(2)
elif action.startswith("DONE"):
match = re.search(r'DONE\s*["\'](.+?)["\']', action)
result = match.group(1) if match else "任务完成"
return TaskResult(success=True, result=result, steps=steps)
except Exception as e:
self._log(f"步骤错误: {e}")
steps.append(f"错误: {e}")
# 如果页面已关闭,停止执行
if "closed" in str(e).lower():
break
return TaskResult(
success=True,
result="达到最大步骤数",
steps=steps
)
async def _click_by_index(self, idx: int):
"""通过索引点击元素"""
await self._page.evaluate(f"""
() => {{
const selectors = 'a, button, input, select, textarea, [onclick], [role="button"]';
const elements = document.querySelectorAll(selectors);
if (elements[{idx}]) {{
elements[{idx}].click();
}}
}}
""")
async def _type_by_index(self, idx: int, text: str):
"""通过索引输入文本"""
# 先点击获取焦点
await self._click_by_index(idx)
await asyncio.sleep(0.3)
# 输入文本(模拟人类打字)
for char in text:
await self._page.keyboard.type(char, delay=random.randint(50, 150))
async def crawl(self, goal: str) -> Dict:
"""
爬虫模式:让 AI 分析页面并提取数据
Args:
goal: 提取目标
Returns:
提取的数据
"""
html = await self._page.content()
prompt = f"""分析这个 HTML 页面,{goal}
HTML前5000字符
{html[:5000]}
返回 JSON 格式数据,只返回 JSON不要解释。"""
self._log("AI 分析页面中...")
response = await self._call_llm([
{"role": "user", "content": prompt}
])
# 提取 JSON
response = re.sub(r'<think>.*?</think>', '', response, flags=re.DOTALL).strip()
# 尝试解析 JSON
try:
# 尝试找到 JSON 块
json_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', response)
if json_match:
return json.loads(json_match.group(1))
return json.loads(response)
except:
return {"raw": response}
async def screenshot(self, path: str = "screenshot.png"):
"""截图"""
await self._page.screenshot(path=path)
self._log(f"截图保存: {path}")
async def close(self):
"""关闭浏览器"""
if self._browser:
await self._browser.close()
if self._playwright:
await self._playwright.stop()
self._log("浏览器已关闭")
async def __aenter__(self):
await self.start()
return self
async def __aexit__(self, *args):
await self.close()

803
cfspider/human_browser.py Normal file
View File

@@ -0,0 +1,803 @@
"""
CFspider Human Browser - 真实人类行为模拟浏览器
通过 Chrome DevTools Protocol (CDP) 控制真实 Chrome 浏览器,
模拟人类操作行为,绕过自动化检测。
核心功能:
- 贝塞尔曲线鼠标移动(真实的鼠标轨迹)
- 随机打字延迟(模拟人类打字速度)
- 自然滚动行为(随机停顿和速度变化)
- 随机点击偏移(不会每次精确点击中心)
- 页面停留时间(模拟阅读行为)
使用方法:
>>> import cfspider
>>>
>>> # 基本用法
>>> browser = cfspider.HumanBrowser()
>>> await browser.goto("https://example.com")
>>> await browser.human_click("#button")
>>> await browser.human_type("#input", "hello")
>>> await browser.close()
>>>
>>> # 结合 CF Workers 代理
>>> workers = cfspider.make_workers(api_token="...", account_id="...")
>>> browser = cfspider.HumanBrowser(cf_proxies=workers)
依赖:
pip install pychrome bezier
Chrome DevTools MCP 配置:
{
"mcpServers": {
"chrome-devtools": {
"command": "npx",
"args": ["chrome-devtools-mcp@latest", "--headless=false"]
}
}
}
"""
import asyncio
import random
import math
import time
import json
import subprocess
import platform
import os
from typing import Optional, List, Tuple, Dict, Any, Union
from pathlib import Path
# 贝塞尔曲线计算
def _bezier_curve(points: List[Tuple[float, float]], t: float) -> Tuple[float, float]:
"""计算贝塞尔曲线上的点"""
n = len(points) - 1
x, y = 0.0, 0.0
for i, (px, py) in enumerate(points):
# 二项式系数
coef = math.comb(n, i) * (t ** i) * ((1 - t) ** (n - i))
x += coef * px
y += coef * py
return x, y
def _generate_bezier_path(
start: Tuple[float, float],
end: Tuple[float, float],
num_points: int = 50,
randomness: float = 0.3
) -> List[Tuple[float, float]]:
"""
生成从 start 到 end 的贝塞尔曲线路径
Args:
start: 起始坐标 (x, y)
end: 结束坐标 (x, y)
num_points: 路径点数量
randomness: 随机性程度 (0-1)
Returns:
路径点列表
"""
sx, sy = start
ex, ey = end
# 计算距离
distance = math.sqrt((ex - sx) ** 2 + (ey - sy) ** 2)
# 生成 2-4 个控制点
num_controls = random.randint(2, 4)
control_points = [start]
for i in range(num_controls):
# 在起点和终点之间插入控制点
t = (i + 1) / (num_controls + 1)
base_x = sx + t * (ex - sx)
base_y = sy + t * (ey - sy)
# 添加随机偏移
offset = distance * randomness * random.uniform(-1, 1)
angle = random.uniform(0, 2 * math.pi)
ctrl_x = base_x + offset * math.cos(angle)
ctrl_y = base_y + offset * math.sin(angle)
control_points.append((ctrl_x, ctrl_y))
control_points.append(end)
# 生成路径点
path = []
for i in range(num_points):
t = i / (num_points - 1)
# 添加速度变化(开始和结束慢,中间快)
t_adjusted = 0.5 - 0.5 * math.cos(t * math.pi)
point = _bezier_curve(control_points, t_adjusted)
path.append(point)
return path
def _random_delay(min_ms: int = 50, max_ms: int = 200) -> float:
"""生成随机延迟时间(秒)"""
# 使用对数正态分布,更接近人类行为
mean = (min_ms + max_ms) / 2
std = (max_ms - min_ms) / 4
delay = random.gauss(mean, std)
delay = max(min_ms, min(max_ms, delay))
return delay / 1000
def _typing_delay() -> float:
"""模拟人类打字延迟"""
# 大多数按键 50-150ms偶尔有较长停顿
if random.random() < 0.1: # 10% 概率长停顿
return random.uniform(0.2, 0.5)
elif random.random() < 0.2: # 20% 概率短停顿
return random.uniform(0.1, 0.2)
else:
return random.uniform(0.05, 0.15)
class HumanBrowser:
"""
人类行为模拟浏览器
通过 CDP 控制 Chrome模拟真实人类操作。
"""
def __init__(
self,
cf_proxies: Optional[str] = None,
uuid: Optional[str] = None,
headless: bool = False,
chrome_path: Optional[str] = None,
remote_debugging_port: int = 9222,
user_data_dir: Optional[str] = None,
auto_start_chrome: bool = True,
human_like: bool = True,
viewport: Tuple[int, int] = (1920, 1080)
):
"""
初始化人类行为模拟浏览器
Args:
cf_proxies: CFspider Workers 地址或 WorkersManager 对象
uuid: VLESS UUID使用 VLESS 代理时需要)
headless: 是否无头模式(建议 False 以获得更真实的行为)
chrome_path: Chrome 可执行文件路径(不填则自动检测)
remote_debugging_port: CDP 远程调试端口
user_data_dir: 用户数据目录(不填则使用临时目录)
auto_start_chrome: 是否自动启动 Chrome
human_like: 是否启用人类行为模拟
viewport: 视口大小
"""
self.cf_proxies = cf_proxies
self.uuid = uuid
self.headless = headless
self.chrome_path = chrome_path or self._find_chrome()
self.remote_debugging_port = remote_debugging_port
self.user_data_dir = user_data_dir
self.auto_start_chrome = auto_start_chrome
self.human_like = human_like
self.viewport = viewport
self._chrome_process = None
self._ws_url = None
self._session = None
self._page_id = None
self._mouse_position = (0, 0)
self._connected = False
# 尝试导入 websockets
try:
import websockets
self._websockets = websockets
except ImportError:
self._websockets = None
def _find_chrome(self) -> str:
"""查找 Chrome 可执行文件路径"""
system = platform.system()
if system == "Windows":
paths = [
os.path.expandvars(r"%ProgramFiles%\Google\Chrome\Application\chrome.exe"),
os.path.expandvars(r"%ProgramFiles(x86)%\Google\Chrome\Application\chrome.exe"),
os.path.expandvars(r"%LocalAppData%\Google\Chrome\Application\chrome.exe"),
]
elif system == "Darwin": # macOS
paths = [
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary",
]
else: # Linux
paths = [
"/usr/bin/google-chrome",
"/usr/bin/google-chrome-stable",
"/usr/bin/chromium-browser",
"/usr/bin/chromium",
]
for path in paths:
if os.path.exists(path):
return path
# 尝试从 PATH 中查找
import shutil
for name in ["google-chrome", "google-chrome-stable", "chromium", "chrome"]:
path = shutil.which(name)
if path:
return path
raise FileNotFoundError("无法找到 Chrome 浏览器,请手动指定 chrome_path")
async def start(self):
"""启动浏览器并连接"""
if self.auto_start_chrome:
await self._start_chrome()
await self._connect()
await self._setup_page()
async def _start_chrome(self):
"""启动 Chrome 浏览器"""
args = [
self.chrome_path,
f"--remote-debugging-port={self.remote_debugging_port}",
]
if self.headless:
args.append("--headless=new")
if self.user_data_dir:
args.append(f"--user-data-dir={self.user_data_dir}")
else:
# 使用临时目录
import tempfile
temp_dir = tempfile.mkdtemp(prefix="cfspider_chrome_")
args.append(f"--user-data-dir={temp_dir}")
# 禁用自动化检测特征
args.extend([
"--disable-blink-features=AutomationControlled",
"--disable-infobars",
"--no-first-run",
"--no-default-browser-check",
f"--window-size={self.viewport[0]},{self.viewport[1]}",
])
# 如果使用代理
if self.cf_proxies:
proxy_url = await self._setup_proxy()
if proxy_url:
args.append(f"--proxy-server={proxy_url}")
self._chrome_process = subprocess.Popen(
args,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL
)
# 等待 Chrome 启动
await asyncio.sleep(2)
async def _setup_proxy(self) -> Optional[str]:
"""设置代理"""
if not self.cf_proxies:
return None
# 如果是 WorkersManager 对象
if hasattr(self.cf_proxies, 'url'):
workers_url = self.cf_proxies.url
if not self.uuid and hasattr(self.cf_proxies, 'uuid'):
self.uuid = self.cf_proxies.uuid
else:
workers_url = self.cf_proxies
# 启动本地 VLESS 代理
try:
from .vless_client import LocalVlessProxy
proxy = LocalVlessProxy(
workers_url=workers_url,
uuid=self.uuid
)
await proxy.start()
return f"socks5://127.0.0.1:{proxy.local_port}"
except Exception as e:
print(f"[HumanBrowser] 代理设置失败: {e}")
return None
async def _connect(self):
"""连接到 Chrome DevTools"""
import aiohttp
# 确保 websockets 已安装
if not self._websockets:
try:
import websockets
self._websockets = websockets
except ImportError:
raise ImportError("请安装 websockets: pip install websockets")
# 获取 WebSocket URL
url = f"http://127.0.0.1:{self.remote_debugging_port}/json/version"
for _ in range(10): # 重试 10 次
try:
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
if resp.status == 200:
data = await resp.json()
self._ws_url = data.get("webSocketDebuggerUrl")
break
except:
await asyncio.sleep(0.5)
if not self._ws_url:
raise ConnectionError(f"无法连接到 Chrome DevTools (端口 {self.remote_debugging_port})")
# 连接 WebSocket
self._session = await self._websockets.connect(self._ws_url)
self._connected = True
async def _send_command(self, method: str, params: Dict = None) -> Dict:
"""发送 CDP 命令"""
if not self._connected:
raise ConnectionError("未连接到浏览器")
import json
msg_id = random.randint(1, 1000000)
message = {
"id": msg_id,
"method": method,
"params": params or {}
}
await self._session.send(json.dumps(message))
# 等待响应
while True:
response = await self._session.recv()
data = json.loads(response)
if data.get("id") == msg_id:
if "error" in data:
raise Exception(f"CDP 错误: {data['error']}")
return data.get("result", {})
async def _setup_page(self):
"""设置页面"""
# 获取页面列表
import aiohttp
url = f"http://127.0.0.1:{self.remote_debugging_port}/json"
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
pages = await resp.json()
if pages:
self._page_id = pages[0].get("id")
page_ws_url = pages[0].get("webSocketDebuggerUrl")
# 重新连接到页面
if self._session:
await self._session.close()
if page_ws_url and self._websockets:
self._session = await self._websockets.connect(page_ws_url)
self._connected = True
# 设置视口
await self._send_command("Emulation.setDeviceMetricsOverride", {
"width": self.viewport[0],
"height": self.viewport[1],
"deviceScaleFactor": 1,
"mobile": False
})
# 隐藏自动化特征
await self._send_command("Page.addScriptToEvaluateOnNewDocument", {
"source": """
// 隐藏 webdriver 标志
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
// 隐藏 Chrome 自动化
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5]
});
// 隐藏 languages
Object.defineProperty(navigator, 'languages', {
get: () => ['zh-CN', 'zh', 'en-US', 'en']
});
// 修复 Chrome 检测
window.chrome = {
runtime: {}
};
// 修复权限检测
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
"""
})
async def goto(self, url: str, wait_until: str = "load") -> str:
"""
导航到 URL
Args:
url: 目标 URL
wait_until: 等待条件 ("load", "domcontentloaded", "networkidle")
Returns:
页面 HTML
"""
await self._send_command("Page.enable")
await self._send_command("Page.navigate", {"url": url})
# 等待页面加载
await asyncio.sleep(2)
# 如果启用人类行为,模拟阅读
if self.human_like:
await self._simulate_reading()
return await self.html()
async def html(self) -> str:
"""获取页面 HTML"""
result = await self._send_command("Runtime.evaluate", {
"expression": "document.documentElement.outerHTML"
})
return result.get("result", {}).get("value", "")
async def _get_element_center(self, selector: str) -> Tuple[float, float]:
"""获取元素中心坐标"""
result = await self._send_command("Runtime.evaluate", {
"expression": f"""
(function() {{
const el = document.querySelector('{selector}');
if (!el) return null;
const rect = el.getBoundingClientRect();
return {{
x: rect.left + rect.width / 2,
y: rect.top + rect.height / 2,
width: rect.width,
height: rect.height
}};
}})()
""",
"returnByValue": True
})
value = result.get("result", {}).get("value")
if not value:
raise ValueError(f"找不到元素: {selector}")
return value["x"], value["y"]
async def human_move_to(self, x: float, y: float):
"""
人类式鼠标移动(贝塞尔曲线)
Args:
x: 目标 x 坐标
y: 目标 y 坐标
"""
if not self.human_like:
self._mouse_position = (x, y)
await self._send_command("Input.dispatchMouseEvent", {
"type": "mouseMoved",
"x": x,
"y": y
})
return
# 生成贝塞尔曲线路径
path = _generate_bezier_path(
self._mouse_position,
(x, y),
num_points=random.randint(30, 60),
randomness=random.uniform(0.2, 0.4)
)
# 沿路径移动
for px, py in path:
await self._send_command("Input.dispatchMouseEvent", {
"type": "mouseMoved",
"x": int(px),
"y": int(py)
})
# 随机延迟
await asyncio.sleep(random.uniform(0.005, 0.02))
self._mouse_position = (x, y)
async def human_click(self, selector: str, button: str = "left"):
"""
人类式点击
Args:
selector: CSS 选择器
button: 鼠标按钮 ("left", "right", "middle")
"""
# 获取元素位置
center_x, center_y = await self._get_element_center(selector)
# 添加随机偏移(不会每次精确点击中心)
if self.human_like:
offset_x = random.uniform(-10, 10)
offset_y = random.uniform(-5, 5)
target_x = center_x + offset_x
target_y = center_y + offset_y
else:
target_x, target_y = center_x, center_y
# 移动鼠标
await self.human_move_to(target_x, target_y)
# 点击前短暂停顿
if self.human_like:
await asyncio.sleep(random.uniform(0.05, 0.15))
# 鼠标按下
await self._send_command("Input.dispatchMouseEvent", {
"type": "mousePressed",
"x": int(target_x),
"y": int(target_y),
"button": button,
"clickCount": 1
})
# 按下持续时间
await asyncio.sleep(random.uniform(0.05, 0.15))
# 鼠标释放
await self._send_command("Input.dispatchMouseEvent", {
"type": "mouseReleased",
"x": int(target_x),
"y": int(target_y),
"button": button,
"clickCount": 1
})
# 点击后短暂等待
if self.human_like:
await asyncio.sleep(random.uniform(0.1, 0.3))
async def human_type(self, selector: str, text: str, clear: bool = True):
"""
人类式打字
Args:
selector: CSS 选择器
text: 要输入的文本
clear: 是否先清空输入框
"""
# 先点击输入框
await self.human_click(selector)
# 清空现有内容
if clear:
await self._send_command("Input.dispatchKeyEvent", {
"type": "keyDown",
"key": "a",
"modifiers": 2 # Ctrl
})
await self._send_command("Input.dispatchKeyEvent", {
"type": "keyUp",
"key": "a",
"modifiers": 2
})
await asyncio.sleep(0.1)
await self._send_command("Input.dispatchKeyEvent", {
"type": "keyDown",
"key": "Backspace"
})
await self._send_command("Input.dispatchKeyEvent", {
"type": "keyUp",
"key": "Backspace"
})
await asyncio.sleep(0.1)
# 逐字输入
for char in text:
# 偶尔打错字再删除(更真实)
if self.human_like and random.random() < 0.03:
wrong_char = random.choice('abcdefghijklmnopqrstuvwxyz')
await self._send_command("Input.insertText", {"text": wrong_char})
await asyncio.sleep(_typing_delay())
await self._send_command("Input.dispatchKeyEvent", {
"type": "keyDown",
"key": "Backspace"
})
await self._send_command("Input.dispatchKeyEvent", {
"type": "keyUp",
"key": "Backspace"
})
await asyncio.sleep(_typing_delay())
# 输入正确字符
await self._send_command("Input.insertText", {"text": char})
# 打字延迟
if self.human_like:
await asyncio.sleep(_typing_delay())
async def human_scroll(self, direction: str = "down", distance: int = None):
"""
人类式滚动
Args:
direction: 滚动方向 ("up", "down")
distance: 滚动距离像素None 则随机
"""
if distance is None:
distance = random.randint(200, 600)
if direction == "up":
distance = -distance
# 分段滚动
num_steps = random.randint(5, 15)
step_distance = distance / num_steps
for _ in range(num_steps):
await self._send_command("Input.dispatchMouseEvent", {
"type": "mouseWheel",
"x": self._mouse_position[0],
"y": self._mouse_position[1],
"deltaX": 0,
"deltaY": step_distance
})
# 随机延迟
if self.human_like:
await asyncio.sleep(random.uniform(0.02, 0.08))
# 滚动后停顿
if self.human_like:
await asyncio.sleep(random.uniform(0.3, 1.0))
async def _simulate_reading(self):
"""模拟阅读行为"""
# 随机移动鼠标
for _ in range(random.randint(2, 5)):
x = random.randint(100, self.viewport[0] - 100)
y = random.randint(100, self.viewport[1] - 100)
await self.human_move_to(x, y)
await asyncio.sleep(random.uniform(0.5, 2.0))
# 随机滚动
if random.random() < 0.7:
await self.human_scroll("down")
async def wait_for_selector(self, selector: str, timeout: int = 30):
"""等待元素出现"""
start = time.time()
while time.time() - start < timeout:
result = await self._send_command("Runtime.evaluate", {
"expression": f"document.querySelector('{selector}') !== null"
})
if result.get("result", {}).get("value"):
return True
await asyncio.sleep(0.5)
raise TimeoutError(f"等待元素超时: {selector}")
async def screenshot(self, path: str = None) -> bytes:
"""截图"""
result = await self._send_command("Page.captureScreenshot", {
"format": "png"
})
import base64
data = base64.b64decode(result.get("data", ""))
if path:
with open(path, "wb") as f:
f.write(data)
return data
async def evaluate(self, expression: str) -> Any:
"""执行 JavaScript"""
result = await self._send_command("Runtime.evaluate", {
"expression": expression,
"returnByValue": True
})
return result.get("result", {}).get("value")
async def close(self):
"""关闭浏览器"""
if self._session:
await self._session.close()
if self._chrome_process:
self._chrome_process.terminate()
self._chrome_process.wait()
async def __aenter__(self):
await self.start()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()
# 同步包装器
class HumanBrowserSync:
"""
同步版人类行为模拟浏览器
使用方法:
>>> browser = cfspider.HumanBrowserSync()
>>> browser.goto("https://example.com")
>>> browser.human_click("#button")
>>> browser.close()
"""
def __init__(self, *args, **kwargs):
self._browser = HumanBrowser(*args, **kwargs)
self._loop = None
def _get_loop(self):
if self._loop is None:
try:
self._loop = asyncio.get_event_loop()
except RuntimeError:
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
return self._loop
def _run(self, coro):
return self._get_loop().run_until_complete(coro)
def start(self):
return self._run(self._browser.start())
def goto(self, url: str, wait_until: str = "load") -> str:
return self._run(self._browser.goto(url, wait_until))
def html(self) -> str:
return self._run(self._browser.html())
def human_click(self, selector: str, button: str = "left"):
return self._run(self._browser.human_click(selector, button))
def human_type(self, selector: str, text: str, clear: bool = True):
return self._run(self._browser.human_type(selector, text, clear))
def human_scroll(self, direction: str = "down", distance: int = None):
return self._run(self._browser.human_scroll(direction, distance))
def human_move_to(self, x: float, y: float):
return self._run(self._browser.human_move_to(x, y))
def wait_for_selector(self, selector: str, timeout: int = 30):
return self._run(self._browser.wait_for_selector(selector, timeout))
def screenshot(self, path: str = None) -> bytes:
return self._run(self._browser.screenshot(path))
def evaluate(self, expression: str) -> Any:
return self._run(self._browser.evaluate(expression))
def close(self):
return self._run(self._browser.close())
def __enter__(self):
self.start()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()

View File

@@ -33,6 +33,9 @@ dependencies = [
"openpyxl>=3.0.0",
# 进度条显示
"tqdm>=4.60.0",
# CDP 连接(人类行为模拟)
"aiohttp>=3.8.0",
"websockets>=10.0",
]
[project.optional-dependencies]
@@ -53,6 +56,11 @@ extract = [
"openpyxl>=3.0.0",
"tqdm>=4.60.0",
]
# 人类行为模拟浏览器
human = [
"aiohttp>=3.8.0",
"websockets>=10.0",
]
# 全部可选功能
all = [
"playwright>=1.40.0",
@@ -60,6 +68,8 @@ all = [
"jsonpath-ng>=1.5.0",
"openpyxl>=3.0.0",
"tqdm>=4.60.0",
"aiohttp>=3.8.0",
"websockets>=10.0",
]
[project.scripts]