v1.8.2: Simplified API, UUID security enhancement, auto new IP

- Simplified API: Only Workers URL needed, auto-fetch UUID config
- UUID security: Custom UUID not exposed in /api/config, requires manual uuid parameter
- Default UUID warning: Shows security warning when using default UUID
- Auto new IP: Each request automatically gets new exit IP
- Dynamic code examples: Workers UI shows correct Python code based on UUID config
- Updated README with UUID configuration guide
This commit is contained in:
test01
2026-01-21 22:04:07 +08:00
parent 8565e30475
commit 696ddf43ea
10 changed files with 3132 additions and 2034 deletions

16
.gitignore vendored
View File

@@ -25,4 +25,18 @@ obfuscate_pages.py
obfuscate_config.json
#示例文件
examples/
examples/
#视频生成脚本
create_video.py
temp_obfuscate.js
#视频文件(排除普通版本,保留高亮模糊版本)
media/videos/1080p60/CameraFollowCursorCVScene.mp4
# 允许提交高亮模糊版本
!media/videos/1080p60/CameraFollowCursorCV.mp4
#视频文件目录
media/images/
media/text/
media/videos/1080p60/partial_movie_files/

117
README.md
View File

@@ -4,7 +4,7 @@
## ⚡ 核心优势:动态 IP 池
> **CFspider 是动态 IP 池**,每次请求可能使用不同的 Cloudflare IP自动从 300+ 全球节点中选择最优节点。
> **CFspider 是动态 IP 池**,每次请求自动获取新的出口 IP自动从 300+ 全球节点中选择最优节点。完全隐藏 Cloudflare 特征(无 CF-Ray、CF-Worker 等头),实现真正的匿名代理。
### 🎯 动态 IP 池的优势
@@ -21,10 +21,10 @@
# 静态 IP 代理:固定 IP容易被封
proxies = {"http": "1.2.3.4:8080"} # 固定 IP
# CFspider 动态 IP 池:每次请求可能不同
response = cfspider.get("https://example.com", cf_proxies="your-workers.dev")
print(response.cf_colo) # 可能显示 NRT, SIN, LAX 等不同节点
# 每次请求可能使用不同的 Cloudflare IP
# CFspider 动态 IP 池:每次请求自动获取新 IP
response = cfspider.get("https://example.com", cf_proxies="https://your-workers.dev")
print(response.json()['origin']) # 每次都是不同的出口 IP
# 完全隐藏 CF 特征,目标网站无法检测到使用了 Cloudflare
```
## 📸 项目截图
@@ -196,15 +196,16 @@ Cloudflare Workers 免费版每日 100,000 请求,无需信用卡,无需付
```
**工作流程:**
1. 你的应用调用 `cfspider.get(url, cf_proxies="workers.dev")`
2. CFspider 发送请求到你的 Cloudflare Workers
1. 你的应用调用 `cfspider.get(url, cf_proxies="https://your-workers.dev")`
2. CFspider 通过 VLESS 协议连接到你的 Cloudflare Workers
3. Workers 自动路由到离目标网站最近的边缘节点动态 IP
4. 每次请求可能使用不同的 Cloudflare IP 300+ 节点中选择
5. 响应返回目标网站看到的是 Cloudflare IP而不是你的 IP
4. 每次请求自动获取新的出口 IP 300+ 节点中选择
5. 响应返回目标网站看到的是干净的请求 CF-RayCF-Worker 等头
## 特性
- **动态 IP **每次请求可能使用不同的 Cloudflare IP 300+ 全球节点自动选择
- **动态 IP **每次请求自动获取新的出口 IP 300+ 全球节点自动选择
- **完全隐藏 CF 特征**使用 VLESS 协议目标网站无法检测到 CF-RayCF-Worker Cloudflare
- 使用 Cloudflare 全球 300+ 边缘节点 IP
- requests 库语法一致无学习成本
- 支持 GETPOSTPUTDELETE 等所有 HTTP 方法
@@ -263,39 +264,42 @@ Cloudflare Workers 免费版每日 100,000 请求,无需信用卡,无需付
如需自定义域名可在 Worker Settings Triggers Custom Domain 中添加
### Token 鉴权配置(可选
### UUID 配置(推荐
为了增强安全性你可以为 Workers 配置 Token 鉴权
为了增强安全性强烈建议配置自定义 UUID
1. Worker Settings Variables and Secrets 中添加环境变量
2. 变量名`TOKEN`
3. 变量值你的 token支持多个 token用逗号分隔 `token1,token2,token3`
2. 变量名`UUID`
3. 变量值你的 UUID标准 UUID 格式 `xxxxxxxx-xxxx-4xxx-8xxx-xxxxxxxxxxxx`
4. 保存并重新部署 Worker
配置 Token 所有 API 请求除了首页和 debug 页面都需要提供有效的 token
**UUID 与 Python 库的关系:**
| Workers 配置 | Python 库用法 |
|-------------|--------------|
| 未配置 `UUID` 环境变量使用默认 UUID | 不需要填写 `uuid` 参数直接使用 `cfspider.get(url, cf_proxies="...")` |
| 配置了自定义 `UUID` 环境变量 | **必须**填写 `uuid` 参数`cfspider.get(url, cf_proxies="...", uuid="你的UUID")` |
**示例:**
```python
import cfspider
# 在请求时传递 token
# 如果 Workers 使用默认 UUID未配置环境变量
response = cfspider.get("https://httpbin.org/ip", cf_proxies="https://your-workers.dev")
# 如果 Workers 配置了自定义 UUID 环境变量
response = cfspider.get(
"https://httpbin.org/ip",
cf_proxies="https://your-workers.dev",
token="your-token" # 从查询参数传递
uuid="xxxxxxxx-xxxx-4xxx-8xxx-xxxxxxxxxxxx" # 必须填写配置的 UUID
)
# 或在 Session 中设置 token
with cfspider.Session(
cf_proxies="https://your-workers.dev",
token="your-token"
) as session:
response = session.get("https://httpbin.org/ip")
```
**注意:**
- 如果不配置 `TOKEN` 环境变量则所有请求都可以访问无鉴权
- Token 可以通过查询参数 `?token=xxx` Header `Authorization: Bearer xxx` 传递
- 支持配置多个 token用逗号分隔
- 如果不配置 `UUID` 环境变量Workers 会使用默认 UUID界面会显示安全警告
- 强烈建议在生产环境中配置自定义 UUID
- 配置自定义 UUID Python 库请求时必须提供相同的 UUID否则无法连接
## 安装
@@ -360,11 +364,13 @@ cfspider install
```python
import cfspider
cf_proxies = "https://your-workers.dev"
response = cfspider.get("https://httpbin.org/ip", cf_proxies=cf_proxies)
print(response.text)
# {"origin": "2a06:98c0:3600::103, 172.71.24.151"} # Cloudflare IP
# 只需填写 Workers 地址,每次请求自动获取新 IP
for i in range(5):
response = cfspider.get(
"https://httpbin.org/ip",
cf_proxies="https://your-workers.dev"
)
print(response.json()['origin']) # 每次都是不同的 IP
```
### 浏览器模式
@@ -372,26 +378,17 @@ print(response.text)
```python
import cfspider
# 使用本地 HTTP 代理
browser = cfspider.Browser(cf_proxies="127.0.0.1:9674")
# 简化用法:只需 Workers 地址(自动获取 UUID
browser = cfspider.Browser(cf_proxies="https://your-workers.dev")
html = browser.html("https://httpbin.org/ip")
print(html)
print(html) # 返回动态 IP
browser.close()
# 使用 VLESS 链接(推荐,无需填写 UUID
# 使用 VLESS 链接
browser = cfspider.Browser(
cf_proxies="vless://your-uuid@v2.example.com:443?path=/"
)
html = browser.html("https://httpbin.org/ip")
print(html) # 返回 Cloudflare IP
browser.close()
# 使用 edgetunnel 域名 + UUID旧方式
browser = cfspider.Browser(
cf_proxies="v2.example.com",
vless_uuid="your-vless-uuid"
)
html = browser.html("https://httpbin.org/ip")
browser.close()
# 无代理模式
@@ -763,13 +760,13 @@ with cfspider.StealthSession(
```python
import cfspider
# 隐身模式 + Cloudflare IP 出口
# 隐身模式 + 动态 IP每次请求自动获取新 IP
response = cfspider.get(
"https://httpbin.org/headers",
cf_proxies="https://your-workers.dev",
stealth=True
)
print(response.cf_colo) # Cloudflare 节点代码
print(response.json()) # 完整的浏览器请求头
# 隐身会话 + Workers 代理
with cfspider.StealthSession(
@@ -777,7 +774,7 @@ with cfspider.StealthSession(
browser='chrome'
) as session:
r1 = session.get("https://example.com")
r2 = session.get("https://example.com/api")
r2 = session.get("https://example.com/api") # 自动携带 Cookie 和 Referer
```
### 配合 TLS 指纹模拟
@@ -1354,25 +1351,19 @@ cfspider install
```python
import cfspider
# 1. HTTP 代理IP:PORT 格式
browser = cfspider.Browser(cf_proxies="127.0.0.1:9674")
# 1. CFspider Workers推荐自动获取 UUID
browser = cfspider.Browser(cf_proxies="https://your-workers.dev")
# 2. HTTP 代理(完整格式)
browser = cfspider.Browser(cf_proxies="http://127.0.0.1:9674")
# 3. SOCKS5 代理
browser = cfspider.Browser(cf_proxies="socks5://127.0.0.1:1080")
# 4. VLESS 链接(推荐,无需填写 UUID
# 2. VLESS 链接
browser = cfspider.Browser(cf_proxies="vless://uuid@v2.example.com:443?path=/")
# 5. edgetunnel 域名 + UUID旧方式
browser = cfspider.Browser(
cf_proxies="v2.example.com",
vless_uuid="your-vless-uuid"
)
# 3. HTTP 代理
browser = cfspider.Browser(cf_proxies="http://127.0.0.1:9674")
# 6. 代理
# 4. SOCKS5 代理
browser = cfspider.Browser(cf_proxies="socks5://127.0.0.1:1080")
# 5. 无代理
browser = cfspider.Browser()
```

View File

@@ -52,7 +52,7 @@ CFspider - Cloudflare 代理 IP 池 Python 库
from .api import (
get, post, put, delete, head, options, patch, request,
clear_map_records, get_map_collector
clear_map_records, get_map_collector, stop_vless_proxies
)
from .session import Session
from .cli import install_browser
@@ -101,45 +101,50 @@ from .stealth import (
# 延迟导入 Browser避免强制依赖 playwright
def Browser(cf_proxies=None, headless=True, timeout=30, vless_uuid=None):
def Browser(cf_proxies=None, headless=True, timeout=30, uuid=None):
"""
创建浏览器实例
创建浏览器实例 / Create browser instance
封装 Playwright支持通过 Cloudflare Workers 代理浏览器流量。
Wraps Playwright with Cloudflare Workers proxy support.
Args:
cf_proxies: 代理地址,支持以下格式:
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"(推荐)
- HTTP 代理: "http://ip:port""ip:port"
- SOCKS5 代理: "socks5://ip:port"
- edgetunnel 域名: "v2.example.com"(需配合 vless_uuid
如不指定,则直接使用本地网络
headless: 是否无头模式,默认 True
timeout: 请求超时时间(秒),默认 30
vless_uuid: VLESS UUID仅当使用域名方式时需要指定
如果使用完整 VLESS 链接,则无需此参数
cf_proxies (str, optional): 代理地址 / Proxy address
- CFspider Workers URL推荐: "https://cfspider.violetqqcom.workers.dev"
UUID 将自动从 Workers 获取 / UUID auto-fetched from Workers
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"
- HTTP 代理: "http://ip:port""ip:port"
- SOCKS5 代理: "socks5://ip:port"
不填则直接使用本地网络 / None for direct connection
headless (bool): 是否无头模式,默认 True / Headless mode (default: True)
timeout (int): 请求超时时间(秒),默认 30 / Timeout in seconds (default: 30)
uuid (str, optional): VLESS UUID可选不填则自动获取
/ VLESS UUID (optional, auto-fetched)
Returns:
Browser: 浏览器实例
Browser: 浏览器实例 / Browser instance
Example:
>>> import cfspider
>>> # 使用完整 VLESS 链接(推荐,无需 vless_uuid
>>>
>>> # 简化用法(推荐):只需 Workers 地址,自动获取 UUID
>>> browser = cfspider.Browser(
... cf_proxies="vless://uuid@v2.example.com:443?path=/"
... cf_proxies="https://cfspider.violetqqcom.workers.dev"
... )
>>> html = browser.html("https://example.com")
>>> browser.close()
>>>
>>> # 使用域名 + UUID旧方式
>>> # 手动指定 UUID
>>> browser = cfspider.Browser(
... cf_proxies="v2.example.com",
... vless_uuid="your-vless-uuid"
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
... )
>>>
>>> # 直接使用(无代理)
>>> browser = cfspider.Browser()
"""
from .browser import Browser as _Browser
return _Browser(cf_proxies, headless, timeout, vless_uuid)
return _Browser(cf_proxies, headless, timeout, uuid)
def parse_vless_link(vless_link):
@@ -205,11 +210,11 @@ class PlaywrightNotInstalledError(CFSpiderError):
pass
__version__ = "1.8.0"
__version__ = "1.8.2"
__all__ = [
# 同步 API (requests)
"get", "post", "put", "delete", "head", "options", "patch", "request",
"Session", "Browser", "install_browser", "parse_vless_link",
"Session", "Browser", "install_browser", "parse_vless_link", "stop_vless_proxies",
"CFSpiderError", "BrowserNotInstalledError", "PlaywrightNotInstalledError",
# 异步 API (httpx)
"aget", "apost", "aput", "adelete", "ahead", "aoptions", "apatch",

File diff suppressed because it is too large Load Diff

View File

@@ -89,14 +89,16 @@ class PlaywrightNotInstalledError(Exception):
class Browser:
"""
CFspider 浏览器类
CFspider 浏览器类 / CFspider Browser class
封装 Playwright支持通过 Cloudflare Workers (edgetunnel) 代理浏览器流量
Wraps Playwright with Cloudflare Workers (edgetunnel) proxy support
Example:
>>> import cfspider
>>> # 通过 edgetunnel Workers 代理
>>> browser = cfspider.Browser(cf_proxies="wss://v2.kami666.xyz")
>>>
>>> # 简化用法:只需 Workers 地址(自动获取 UUID
>>> browser = cfspider.Browser(cf_proxies="https://cfspider.violetqqcom.workers.dev")
>>> html = browser.html("https://example.com")
>>> browser.close()
>>>
@@ -106,34 +108,38 @@ class Browser:
>>> browser.close()
"""
def __init__(self, cf_proxies=None, headless=True, timeout=30, vless_uuid=None):
def __init__(self, cf_proxies=None, headless=True, timeout=30, uuid=None):
"""
初始化浏览器
初始化浏览器 / Initialize browser
Args:
cf_proxies: 代理地址(选填),支持以下格式:
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"(推荐)
- HTTP 代理: "http://ip:port""ip:port"
- SOCKS5 代理: "socks5://ip:port"
- edgetunnel 域名: "v2.example.com"(需配合 vless_uuid
不填则直接使用本地网络
headless: 是否无头模式,默认 True
timeout: 请求超时时间(秒),默认 30
vless_uuid: VLESS UUID选填使用域名方式时需要指定
如果使用完整 VLESS 链接,则无需此参数
cf_proxies (str, optional): 代理地址 / Proxy address
- CFspider Workers URL推荐: "https://cfspider.violetqqcom.workers.dev"
UUID 将自动从 Workers 获取 / UUID auto-fetched from Workers
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"
- HTTP 代理: "http://ip:port""ip:port"
- SOCKS5 代理: "socks5://ip:port"
不填则直接使用本地网络 / None for direct connection
headless (bool): 是否无头模式,默认 True / Headless mode (default: True)
timeout (int): 请求超时时间(秒),默认 30 / Timeout in seconds (default: 30)
uuid (str, optional): VLESS UUID可选不填则自动获取
/ VLESS UUID (optional, auto-fetched)
Examples:
# 使用完整 VLESS 链接(推荐,无需填写 vless_uuid
browser = Browser(cf_proxies="vless://uuid@v2.example.com:443?path=/")
# 使用域名 + UUID旧方式
browser = Browser(cf_proxies="v2.example.com", vless_uuid="your-uuid")
# 使用 HTTP 代理
browser = Browser(cf_proxies="127.0.0.1:8080")
# 使用 SOCKS5 代理
browser = Browser(cf_proxies="socks5://127.0.0.1:1080")
>>> # 简化用法(推荐
>>> browser = Browser(cf_proxies="https://cfspider.violetqqcom.workers.dev")
>>>
>>> # 手动指定 UUID
>>> browser = Browser(
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
... )
>>>
>>> # 使用 VLESS 链接
>>> browser = Browser(cf_proxies="vless://uuid@v2.example.com:443?path=/")
>>>
>>> # 使用 HTTP 代理
>>> browser = Browser(cf_proxies="127.0.0.1:8080")
"""
if not PLAYWRIGHT_AVAILABLE:
raise PlaywrightNotInstalledError(
@@ -158,20 +164,60 @@ class Browser:
proxy_url = f"http://127.0.0.1:{port}"
# 2. HTTP/SOCKS5 代理格式
elif cf_proxies.startswith('http://') or cf_proxies.startswith('https://') or cf_proxies.startswith('socks5://'):
proxy_url = cf_proxies
# 如果是 CFspider Workers URL尝试获取 UUID
if 'workers.dev' in cf_proxies or not uuid:
uuid = uuid or self._get_workers_uuid(cf_proxies)
if uuid:
# 使用 VLESS 代理
hostname = cf_proxies.replace('https://', '').replace('http://', '').split('/')[0]
ws_url = f'wss://{hostname}/{uuid}'
self._vless_proxy = LocalVlessProxy(ws_url, uuid)
port = self._vless_proxy.start()
proxy_url = f"http://127.0.0.1:{port}"
else:
# 直接使用 HTTP 代理
proxy_url = cf_proxies
# 3. IP:PORT 格式
elif ':' in cf_proxies and cf_proxies.replace('.', '').replace(':', '').isdigit():
proxy_url = f"http://{cf_proxies}"
# 4. 域名 + UUID(旧方式
elif vless_uuid:
hostname = cf_proxies.replace('https://', '').replace('http://', '').replace('wss://', '').replace('ws://', '').split('/')[0]
ws_url = f'wss://{hostname}/{vless_uuid}'
self._vless_proxy = LocalVlessProxy(ws_url, vless_uuid)
port = self._vless_proxy.start()
proxy_url = f"http://127.0.0.1:{port}"
# 5. 默认当作 HTTP 代理
# 4. 域名方式(尝试自动获取 UUID
else:
proxy_url = f"http://{cf_proxies}"
hostname = cf_proxies.replace('wss://', '').replace('ws://', '').split('/')[0]
uuid = uuid or self._get_workers_uuid(f"https://{hostname}")
if uuid:
ws_url = f'wss://{hostname}/{uuid}'
self._vless_proxy = LocalVlessProxy(ws_url, uuid)
port = self._vless_proxy.start()
proxy_url = f"http://127.0.0.1:{port}"
else:
proxy_url = f"http://{cf_proxies}"
def _get_workers_uuid(self, workers_url):
"""从 Workers 获取 UUID / Get UUID from Workers"""
import requests
import re
try:
# 尝试从 /api/config 获取
config_url = f"{workers_url.rstrip('/')}/api/config"
resp = requests.get(config_url, timeout=10)
if resp.status_code == 200:
config = resp.json()
return config.get('uuid')
except:
pass
try:
# 尝试从首页 HTML 解析
resp = requests.get(workers_url, timeout=10)
if resp.status_code == 200:
match = re.search(r'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-4[0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})', resp.text)
if match:
return match.group(1).lower()
except:
pass
return None
# 启动 Playwright
self._playwright = sync_playwright().start()

View File

@@ -334,7 +334,8 @@ def export_sqlite(data: Union[Dict, List[Dict]],
# 插入数据
placeholders = ", ".join(["?" for _ in fieldnames])
insert_sql = f"INSERT INTO {table} ({', '.join([f'\"{n}\"' for n in fieldnames])}) VALUES ({placeholders})"
fieldnames_str = ', '.join([f'"{n}"' for n in fieldnames])
insert_sql = f"INSERT INTO {table} ({fieldnames_str}) VALUES ({placeholders})"
for row in rows:
if isinstance(row, dict):

View File

@@ -2,6 +2,7 @@
CFspider Session 模块
提供会话管理功能,在多个请求之间保持代理配置、请求头和 Cookie。
简化 API只需提供 Workers 地址即可自动获取 UUID 和配置。
"""
from .api import request
@@ -17,26 +18,27 @@ class Session:
Suitable for scenarios requiring login state or consecutive requests.
Attributes:
cf_proxies (str): Workers 代理地址 / Workers proxy address
cf_proxies (str): Workers 代理地址(自动获取 UUID 配置)
/ Workers proxy address (auto-fetches UUID config)
uuid (str, optional): VLESS UUID可选不填则自动获取
/ VLESS UUID (optional, auto-fetched if not provided)
headers (dict): 会话级别的默认请求头 / Session-level default headers
cookies (dict): 会话级别的 Cookie / Session-level cookies
token (str, optional): Workers API 鉴权 token / Workers API authentication token
Example:
>>> import cfspider
>>>
>>> # 创建会话 / Create session
>>> with cfspider.Session(cf_proxies="https://your-workers.dev", token="your-token") as session:
... # 设置会话级别的请求头 / Set session-level headers
... session.headers['Authorization'] = 'Bearer token'
...
... # 所有请求都会使用相同的代理和请求头
... # All requests use the same proxy and headers
... response1 = session.get("https://api.example.com/user")
... response2 = session.post("https://api.example.com/data", json={"key": "value"})
...
... # Cookie 会自动保持 / Cookies are automatically maintained
... print(session.cookies)
>>> # 简化用法:只需 Workers 地址(自动获取 UUID
>>> with cfspider.Session(cf_proxies="https://cfspider.violetqqcom.workers.dev") as session:
... response = session.get("https://api.example.com/user")
... print(f"Cookies: {session.cookies}")
>>>
>>> # 手动指定 UUID
>>> with cfspider.Session(
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
... ) as session:
... response = session.get("https://httpbin.org/ip")
Note:
如果需要隐身模式的会话一致性(自动 Referer、随机延迟等
@@ -45,39 +47,145 @@ class Session:
please use cfspider.StealthSession.
"""
def __init__(self, cf_proxies=None, token=None):
def __init__(self, cf_proxies=None, uuid=None):
"""
初始化会话 / Initialize session
Args:
cf_proxies (str): Workers 代理地址(必填)
/ Workers proxy address (required)
例如:"https://your-workers.dev"
e.g., "https://your-workers.dev"
token (str, optional): Workers API 鉴权 token
/ Workers API authentication token
当 Workers 端配置了 TOKEN 环境变量时,必须提供有效的 token
Required when Workers has TOKEN environment variable configured
例如:"https://cfspider.violetqqcom.workers.dev"
e.g., "https://cfspider.violetqqcom.workers.dev"
UUID 将自动从 Workers 获取
UUID will be auto-fetched from Workers
uuid (str, optional): VLESS UUID可选
如果不填写,会自动从 Workers 首页获取
If not provided, will be auto-fetched from Workers homepage
Raises:
ValueError: 当 cf_proxies 为空时
/ When cf_proxies is empty
Example:
>>> session = cfspider.Session(cf_proxies="https://your-workers.dev", token="your-token")
>>> # 简化用法(推荐)
>>> session = cfspider.Session(cf_proxies="https://cfspider.violetqqcom.workers.dev")
>>>
>>> # 手动指定 UUID
>>> session = cfspider.Session(
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
... )
"""
if not cf_proxies:
raise ValueError(
"cf_proxies 是必填参数。\n"
"请提供 CFspider Workers 地址,例如:\n"
" session = cfspider.Session(cf_proxies='https://your-workers.dev')\n\n"
" session = cfspider.Session(cf_proxies='https://cfspider.violetqqcom.workers.dev')\n\n"
"UUID 将自动从 Workers 获取,无需手动指定。\n"
"如果不需要代理,可以直接使用 cfspider.get() 等函数。\n"
"如果需要隐身模式会话,请使用 cfspider.StealthSession。"
)
self.cf_proxies = cf_proxies.rstrip("/")
self.token = token
self.cf_proxies = cf_proxies.rstrip("/") if cf_proxies else None
self.uuid = uuid
self.headers = {}
self.cookies = {}
self._base_headers = {} # 兼容 StealthSession API
@property
def _cookies(self):
"""兼容 StealthSession 的 _cookies 属性"""
return self.cookies
@_cookies.setter
def _cookies(self, value):
"""兼容 StealthSession 的 _cookies 属性"""
self.cookies = value
def _update_cookies(self, response):
"""
从响应中更新 cookies / Update cookies from response
支持两种方式:
1. 从 response.cookies 获取(直接请求时)
2. 从响应头 Set-Cookie 解析(通过 Workers 代理时)
"""
# 方式1从 response.cookies 获取
if hasattr(response, 'cookies'):
try:
for cookie in response.cookies:
if hasattr(cookie, 'name') and hasattr(cookie, 'value'):
self.cookies[cookie.name] = cookie.value
elif isinstance(cookie, str):
if '=' in cookie:
name, value = cookie.split('=', 1)
self.cookies[name.strip()] = value.strip()
except TypeError:
if hasattr(response.cookies, 'items'):
for name, value in response.cookies.items():
self.cookies[name] = value
# 方式2从响应头 Set-Cookie 解析Workers 代理时需要)
if hasattr(response, 'headers'):
self._parse_set_cookie_headers(response.headers)
def _parse_set_cookie_headers(self, headers):
"""
从响应头中解析 Set-Cookie
Workers 代理会原样返回目标网站的 Set-Cookie 头,
但 requests 库不会自动解析成 cookies需要手动处理。
"""
# 获取所有 Set-Cookie 头
set_cookie_headers = []
# 尝试多种方式获取所有 Set-Cookie 头
if hasattr(headers, 'get_all'):
# httpx 风格
set_cookie_headers = headers.get_all('set-cookie') or []
elif hasattr(headers, 'getlist'):
# urllib3 风格
set_cookie_headers = headers.getlist('set-cookie') or []
else:
# requests 风格 - headers 可能合并了多个 Set-Cookie
# 用逗号分隔多个 cookie但需要小心 Expires 中的逗号)
cookie_header = headers.get('set-cookie', '')
if cookie_header:
# 简单分割,按照 ", " 后跟字母开头的模式
# 例如: "a=1; Path=/, b=2; Path=/"
import re
# 匹配 ", " 后面紧跟 cookie 名称的模式
parts = re.split(r',\s*(?=[A-Za-z_][A-Za-z0-9_-]*=)', cookie_header)
set_cookie_headers = [p.strip() for p in parts if p.strip()]
# 解析每个 Set-Cookie 头
for cookie_str in set_cookie_headers:
self._parse_single_cookie(cookie_str)
def _parse_single_cookie(self, cookie_str):
"""
解析单个 Set-Cookie 字符串
格式示例:
__Host-authjs.csrf-token=xxx%7Cyyy; Path=/; Secure; HttpOnly
"""
if not cookie_str:
return
# 分割成多个部分
parts = cookie_str.split(';')
if not parts:
return
# 第一部分是 name=value
first_part = parts[0].strip()
if '=' not in first_part:
return
name, value = first_part.split('=', 1)
name = name.strip()
value = value.strip()
if name:
self.cookies[name] = value
def request(self, method, url, **kwargs):
"""
@@ -94,6 +202,9 @@ class Session:
- data (dict/str): 表单数据 / Form data
- json (dict): JSON 数据 / JSON data
- timeout (int/float): 超时时间(秒) / Timeout (seconds)
- stealth (bool): 启用隐身模式 / Enable stealth mode
- impersonate (str): TLS 指纹模拟 / TLS fingerprint impersonation
- http2 (bool): 启用 HTTP/2 / Enable HTTP/2
- 其他参数与 requests 库兼容
- Other parameters compatible with requests library
@@ -105,22 +216,30 @@ class Session:
Session-level headers and cookies are automatically added to requests,
但请求级别的参数优先级更高。
but request-level parameters have higher priority.
响应中的 Set-Cookie 会自动保存到会话中。
Set-Cookie from response will be automatically saved to session.
"""
headers = self.headers.copy()
headers.update(self._base_headers) # 应用基础请求头
headers.update(kwargs.pop("headers", {}))
cookies = self.cookies.copy()
cookies.update(kwargs.pop("cookies", {}))
return request(
response = request(
method,
url,
cf_proxies=self.cf_proxies,
token=self.token,
uuid=self.uuid,
headers=headers,
cookies=cookies,
**kwargs
)
# 自动从响应中更新 cookies
self._update_cookies(response)
return response
def get(self, url, **kwargs):
"""

View File

@@ -220,29 +220,38 @@ def update_sec_fetch_headers(headers: Dict, site_type: str = 'none') -> Dict:
class StealthSession:
"""
隐身会话类
隐身会话类 / Stealth Session class
提供完整的会话一致性管理,解决反爬虫检测的三大问题:
Provides complete session consistency management, solving three major anti-crawler issues:
1. 固定 User-Agent整个会话使用同一个浏览器指纹
Fixed User-Agent: Uses the same browser fingerprint throughout the session
2. 自动管理 Cookie响应中的 Cookie 自动保存并在后续请求中发送
Auto Cookie Management: Cookies from responses are saved and sent in subsequent requests
3. 自动添加 Referer页面跳转时自动添加来源信息
Auto Referer: Automatically adds origin information during page navigation
4. 随机延迟:每次请求前随机等待,模拟人类行为
Random Delay: Random wait before each request, simulating human behavior
5. 自动更新 Sec-Fetch-Site根据 Referer 判断同站/跨站访问
Auto Sec-Fetch-Site: Updates based on Referer to indicate same-site/cross-site access
Attributes:
browser (str): 当前使用的浏览器类型
cf_proxies (str): 代理地址
delay (tuple): 随机延迟范围
auto_referer (bool): 是否自动添加 Referer
last_url (str): 上一次请求的 URL
request_count (int): 会话累计请求次数
browser (str): 当前使用的浏览器类型 / Current browser type
cf_proxies (str): Workers 代理地址 / Workers proxy address
uuid (str): VLESS UUID可选自动获取 / VLESS UUID (optional, auto-fetched)
delay (tuple): 随机延迟范围 / Random delay range
auto_referer (bool): 是否自动添加 Referer / Whether to auto-add Referer
last_url (str): 上一次请求的 URL / Last requested URL
request_count (int): 会话累计请求次数 / Session cumulative request count
Example:
>>> import cfspider
>>>
>>> # 基本用法
>>> with cfspider.StealthSession(browser='chrome') as session:
>>> # 基本用法(使用 Workers 代理)
>>> with cfspider.StealthSession(
... cf_proxies="https://cfspider.violetqqcom.workers.dev"
... ) as session:
... # 第一次请求Sec-Fetch-Site: none
... r1 = session.get("https://example.com")
...
@@ -251,14 +260,17 @@ class StealthSession:
... r2 = session.get("https://example.com/page2")
>>>
>>> # 带随机延迟
>>> with cfspider.StealthSession(delay=(1, 3)) as session:
>>> with cfspider.StealthSession(
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
... delay=(1, 3)
... ) as session:
... for url in urls:
... # 每次请求前随机等待 1-3 秒
... response = session.get(url)
>>>
>>> # 结合代理使用
>>> # 完整配置
>>> with cfspider.StealthSession(
... cf_proxies="https://your-workers.dev",
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
... browser='firefox',
... delay=(0.5, 2.0)
... ) as session:
@@ -268,59 +280,63 @@ class StealthSession:
Note:
StealthSession 与普通 Session 的区别:
- Session: 仅保持代理配置和基本请求头
Differences between StealthSession and regular Session:
- Session: 仅保持代理配置和基本请求头 / Only maintains proxy config and basic headers
- StealthSession: 完整的隐身模式包括浏览器指纹、Cookie 管理、
自动 Referer、随机延迟、Sec-Fetch-* 更新
Complete stealth mode including browser fingerprint, Cookie management,
auto Referer, random delay, Sec-Fetch-* updates
"""
def __init__(
self,
browser: str = 'chrome',
cf_proxies: str = None,
cf_workers: bool = True,
uuid: str = None,
delay: Tuple[float, float] = None,
auto_referer: bool = True,
token: str = None,
**kwargs
):
"""
初始化隐身会话
初始化隐身会话 / Initialize stealth session
Args:
browser (str): 浏览器类型,决定使用的 User-Agent 和请求头模板
- 'chrome': Chrome 131推荐最完整的请求头15 个)
- 'firefox': Firefox 133含 Sec-GPC 隐私12 个)
- 'safari': Safari 18macOS 风格5 个)
- 'edge': Edge 131类似 Chrome14 个)
- 'chrome_mobile': Chrome MobileAndroid10 个)
cf_proxies (str, optional): 代理地址
- 不指定则直接请求目标 URL
- 指定 Workers 地址时配合 cf_workers=True
- 指定普通代理时配合 cf_workers=False
cf_workers (bool): 是否使用 CFspider Workers API默认 True
/ Browser type, determines User-Agent and header template
- 'chrome': Chrome 131推荐最完整的请求15 个)/ Recommended, 15 headers
- 'firefox': Firefox 133含 Sec-GPC 隐私头12 个)/ Includes privacy headers
- 'safari': Safari 18macOS 风格5 个)/ macOS style
- 'edge': Edge 131类似 Chrome14 个)/ Similar to Chrome
- 'chrome_mobile': Chrome MobileAndroid10 个)/ Android mobile
cf_proxies (str, optional): Workers 代理地址
/ Workers proxy address
- "https://cfspider.violetqqcom.workers.dev"
- 不指定则直接请求目标 URL / If not specified, requests directly
- UUID 自动从 Workers 获取 / UUID auto-fetched from Workers
uuid (str, optional): VLESS UUID可选不填则自动获取
/ VLESS UUID (optional, auto-fetched if not provided)
delay (tuple, optional): 请求间随机延迟范围(秒)
/ Random delay range between requests (seconds)
- 如 (1, 3) 表示每次请求前随机等待 1-3 秒
- 第一次请求不会延迟
- 用于避免请求频率过高被检测
- e.g., (1, 3) means random wait 1-3 seconds before each request
- 第一次请求不会延迟 / First request won't be delayed
auto_referer (bool): 是否自动添加 Referer默认 True
- True: 自动使用上一个 URL 作为 Referer
- False: 不自动添加(但可以手动指定)
**kwargs: 保留参数,用于未来扩展
/ Whether to auto-add Referer (default: True)
**kwargs: 保留参数,用于未来扩展 / Reserved for future extensions
Example:
>>> session = cfspider.StealthSession(
... browser='chrome',
... cf_proxies='https://your-workers.dev',
... cf_proxies='https://cfspider.violetqqcom.workers.dev',
... delay=(1, 3),
... auto_referer=True
... )
"""
self.browser = browser
self.cf_proxies = cf_proxies
self.cf_workers = cf_workers
self.uuid = uuid
self.delay = delay
self.auto_referer = auto_referer
self.token = token
self.last_url = None
self.request_count = 0
self._extra_kwargs = kwargs
@@ -361,21 +377,75 @@ class StealthSession:
random_delay(self.delay[0], self.delay[1])
def _update_cookies(self, response):
"""更新 Cookie"""
"""
从响应中更新 cookies
支持两种方式:
1. 从 response.cookies 获取(直接请求时)
2. 从响应头 Set-Cookie 解析(通过 Workers 代理时)
"""
# 方式1从 response.cookies 获取
if hasattr(response, 'cookies'):
for cookie in response.cookies:
self._cookies[cookie.name] = cookie.value
try:
for cookie in response.cookies:
if hasattr(cookie, 'name') and hasattr(cookie, 'value'):
self._cookies[cookie.name] = cookie.value
except TypeError:
if hasattr(response.cookies, 'items'):
for name, value in response.cookies.items():
self._cookies[name] = value
# 方式2从响应头 Set-Cookie 解析Workers 代理时需要)
if hasattr(response, 'headers'):
self._parse_set_cookie_headers(response.headers)
def _parse_set_cookie_headers(self, headers):
"""从响应头中解析 Set-Cookie"""
set_cookie_headers = []
if hasattr(headers, 'get_all'):
set_cookie_headers = headers.get_all('set-cookie') or []
elif hasattr(headers, 'getlist'):
set_cookie_headers = headers.getlist('set-cookie') or []
else:
cookie_header = headers.get('set-cookie', '')
if cookie_header:
import re
parts = re.split(r',\s*(?=[A-Za-z_][A-Za-z0-9_-]*=)', cookie_header)
set_cookie_headers = [p.strip() for p in parts if p.strip()]
for cookie_str in set_cookie_headers:
self._parse_single_cookie(cookie_str)
def _parse_single_cookie(self, cookie_str):
"""解析单个 Set-Cookie 字符串"""
if not cookie_str:
return
parts = cookie_str.split(';')
if not parts:
return
first_part = parts[0].strip()
if '=' not in first_part:
return
name, value = first_part.split('=', 1)
name = name.strip()
value = value.strip()
if name:
self._cookies[name] = value
def get(self, url: str, **kwargs) -> Any:
"""
发送 GET 请求
发送 GET 请求 / Send GET request
Args:
url: 目标 URL
**kwargs: 其他参数
url (str): 目标 URL / Target URL
**kwargs: 其他参数 / Other parameters
- impersonate (str): TLS 指纹模拟 / TLS fingerprint impersonation
- http2 (bool): 启用 HTTP/2 / Enable HTTP/2
- 其他参数与 requests 库兼容 / Compatible with requests library
Returns:
响应对象
CFSpiderResponse: 响应对象 / Response object
"""
from .api import get as _get
@@ -390,8 +460,7 @@ class StealthSession:
response = _get(
url,
cf_proxies=self.cf_proxies,
cf_workers=self.cf_workers,
token=self.token,
uuid=self.uuid,
headers=headers,
cookies=cookies,
**kwargs
@@ -404,7 +473,16 @@ class StealthSession:
return response
def post(self, url: str, **kwargs) -> Any:
"""发送 POST 请求"""
"""
发送 POST 请求 / Send POST request
Args:
url (str): 目标 URL / Target URL
**kwargs: 其他参数 / Other parameters
Returns:
CFSpiderResponse: 响应对象 / Response object
"""
from .api import post as _post
self._apply_delay()
@@ -421,8 +499,7 @@ class StealthSession:
response = _post(
url,
cf_proxies=self.cf_proxies,
cf_workers=self.cf_workers,
token=self.token,
uuid=self.uuid,
headers=headers,
cookies=cookies,
**kwargs
@@ -435,7 +512,7 @@ class StealthSession:
return response
def put(self, url: str, **kwargs) -> Any:
"""发送 PUT 请求"""
"""发送 PUT 请求 / Send PUT request"""
from .api import put as _put
self._apply_delay()
@@ -445,8 +522,7 @@ class StealthSession:
response = _put(
url,
cf_proxies=self.cf_proxies,
cf_workers=self.cf_workers,
token=self.token,
uuid=self.uuid,
headers=headers,
cookies=cookies,
**kwargs
@@ -457,7 +533,7 @@ class StealthSession:
return response
def delete(self, url: str, **kwargs) -> Any:
"""发送 DELETE 请求"""
"""发送 DELETE 请求 / Send DELETE request"""
from .api import delete as _delete
self._apply_delay()
@@ -467,8 +543,7 @@ class StealthSession:
response = _delete(
url,
cf_proxies=self.cf_proxies,
cf_workers=self.cf_workers,
token=self.token,
uuid=self.uuid,
headers=headers,
cookies=cookies,
**kwargs
@@ -479,7 +554,7 @@ class StealthSession:
return response
def head(self, url: str, **kwargs) -> Any:
"""发送 HEAD 请求"""
"""发送 HEAD 请求 / Send HEAD request"""
from .api import head as _head
self._apply_delay()
@@ -489,8 +564,7 @@ class StealthSession:
response = _head(
url,
cf_proxies=self.cf_proxies,
cf_workers=self.cf_workers,
token=self.token,
uuid=self.uuid,
headers=headers,
cookies=cookies,
**kwargs

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "cfspider"
version = "1.8.0"
version = "1.8.2"
description = "Cloudflare Workers proxy IP pool client"
readme = "README.md"
license = {text = "Apache-2.0"}
@@ -23,6 +23,16 @@ dependencies = [
"httpx[http2]>=0.25.0",
"curl_cffi>=0.5.0",
"beautifulsoup4>=4.9.0",
# 浏览器自动化
"playwright>=1.40.0",
# XPath 数据提取
"lxml>=4.9.0",
# JSONPath 数据提取
"jsonpath-ng>=1.5.0",
# Excel 导出
"openpyxl>=3.0.0",
# 进度条显示
"tqdm>=4.60.0",
]
[project.optional-dependencies]

3556
workers.js

File diff suppressed because it is too large Load Diff