mirror of
https://github.com/violettoolssite/CFspider.git
synced 2026-04-05 03:09:01 +08:00
v1.8.2: Simplified API, UUID security enhancement, auto new IP
- Simplified API: Only Workers URL needed, auto-fetch UUID config - UUID security: Custom UUID not exposed in /api/config, requires manual uuid parameter - Default UUID warning: Shows security warning when using default UUID - Auto new IP: Each request automatically gets new exit IP - Dynamic code examples: Workers UI shows correct Python code based on UUID config - Updated README with UUID configuration guide
This commit is contained in:
16
.gitignore
vendored
16
.gitignore
vendored
@@ -25,4 +25,18 @@ obfuscate_pages.py
|
||||
obfuscate_config.json
|
||||
|
||||
#示例文件
|
||||
examples/
|
||||
examples/
|
||||
|
||||
#视频生成脚本
|
||||
create_video.py
|
||||
temp_obfuscate.js
|
||||
|
||||
#视频文件(排除普通版本,保留高亮模糊版本)
|
||||
media/videos/1080p60/CameraFollowCursorCVScene.mp4
|
||||
# 允许提交高亮模糊版本
|
||||
!media/videos/1080p60/CameraFollowCursorCV.mp4
|
||||
|
||||
#视频文件目录
|
||||
media/images/
|
||||
media/text/
|
||||
media/videos/1080p60/partial_movie_files/
|
||||
|
||||
117
README.md
117
README.md
@@ -4,7 +4,7 @@
|
||||
|
||||
## ⚡ 核心优势:动态 IP 池
|
||||
|
||||
> **CFspider 是动态 IP 池**,每次请求可能使用不同的 Cloudflare IP,自动从 300+ 全球节点中选择最优节点。
|
||||
> **CFspider 是动态 IP 池**,每次请求自动获取新的出口 IP,自动从 300+ 全球节点中选择最优节点。完全隐藏 Cloudflare 特征(无 CF-Ray、CF-Worker 等头),实现真正的匿名代理。
|
||||
|
||||
### 🎯 动态 IP 池的优势
|
||||
|
||||
@@ -21,10 +21,10 @@
|
||||
# 静态 IP 代理:固定 IP,容易被封
|
||||
proxies = {"http": "1.2.3.4:8080"} # 固定 IP
|
||||
|
||||
# CFspider 动态 IP 池:每次请求可能不同
|
||||
response = cfspider.get("https://example.com", cf_proxies="your-workers.dev")
|
||||
print(response.cf_colo) # 可能显示 NRT, SIN, LAX 等不同节点
|
||||
# 每次请求可能使用不同的 Cloudflare IP
|
||||
# CFspider 动态 IP 池:每次请求自动获取新 IP
|
||||
response = cfspider.get("https://example.com", cf_proxies="https://your-workers.dev")
|
||||
print(response.json()['origin']) # 每次都是不同的出口 IP
|
||||
# 完全隐藏 CF 特征,目标网站无法检测到使用了 Cloudflare
|
||||
```
|
||||
|
||||
## 📸 项目截图
|
||||
@@ -196,15 +196,16 @@ Cloudflare Workers 免费版每日 100,000 请求,无需信用卡,无需付
|
||||
```
|
||||
|
||||
**工作流程:**
|
||||
1. 你的应用调用 `cfspider.get(url, cf_proxies="workers.dev")`
|
||||
2. CFspider 发送请求到你的 Cloudflare Workers
|
||||
1. 你的应用调用 `cfspider.get(url, cf_proxies="https://your-workers.dev")`
|
||||
2. CFspider 通过 VLESS 协议连接到你的 Cloudflare Workers
|
||||
3. Workers 自动路由到离目标网站最近的边缘节点(动态 IP 池)
|
||||
4. 每次请求可能使用不同的 Cloudflare IP(从 300+ 节点中选择)
|
||||
5. 响应返回,目标网站看到的是 Cloudflare IP,而不是你的 IP
|
||||
4. 每次请求自动获取新的出口 IP(从 300+ 节点中选择)
|
||||
5. 响应返回,目标网站看到的是干净的请求(无 CF-Ray、CF-Worker 等头)
|
||||
|
||||
## 特性
|
||||
|
||||
- **动态 IP 池**:每次请求可能使用不同的 Cloudflare IP,从 300+ 全球节点自动选择
|
||||
- **动态 IP 池**:每次请求自动获取新的出口 IP,从 300+ 全球节点自动选择
|
||||
- **完全隐藏 CF 特征**:使用 VLESS 协议,目标网站无法检测到 CF-Ray、CF-Worker 等 Cloudflare 头
|
||||
- 使用 Cloudflare 全球 300+ 边缘节点 IP
|
||||
- 与 requests 库语法一致,无学习成本
|
||||
- 支持 GET、POST、PUT、DELETE 等所有 HTTP 方法
|
||||
@@ -263,39 +264,42 @@ Cloudflare Workers 免费版每日 100,000 请求,无需信用卡,无需付
|
||||
|
||||
如需自定义域名,可在 Worker → Settings → Triggers → Custom Domain 中添加。
|
||||
|
||||
### Token 鉴权配置(可选)
|
||||
### UUID 配置(推荐)
|
||||
|
||||
为了增强安全性,你可以为 Workers 配置 Token 鉴权:
|
||||
为了增强安全性,强烈建议配置自定义 UUID:
|
||||
|
||||
1. 在 Worker → Settings → Variables and Secrets 中添加环境变量
|
||||
2. 变量名:`TOKEN`
|
||||
3. 变量值:你的 token(支持多个 token,用逗号分隔,如 `token1,token2,token3`)
|
||||
2. 变量名:`UUID`
|
||||
3. 变量值:你的 UUID(标准 UUID 格式,如 `xxxxxxxx-xxxx-4xxx-8xxx-xxxxxxxxxxxx`)
|
||||
4. 保存并重新部署 Worker
|
||||
|
||||
配置 Token 后,所有 API 请求(除了首页和 debug 页面)都需要提供有效的 token:
|
||||
**UUID 与 Python 库的关系:**
|
||||
|
||||
| Workers 配置 | Python 库用法 |
|
||||
|-------------|--------------|
|
||||
| 未配置 `UUID` 环境变量(使用默认 UUID) | 不需要填写 `uuid` 参数,直接使用 `cfspider.get(url, cf_proxies="...")` |
|
||||
| 配置了自定义 `UUID` 环境变量 | **必须**填写 `uuid` 参数:`cfspider.get(url, cf_proxies="...", uuid="你的UUID")` |
|
||||
|
||||
**示例:**
|
||||
|
||||
```python
|
||||
import cfspider
|
||||
|
||||
# 在请求时传递 token
|
||||
# 如果 Workers 使用默认 UUID(未配置环境变量)
|
||||
response = cfspider.get("https://httpbin.org/ip", cf_proxies="https://your-workers.dev")
|
||||
|
||||
# 如果 Workers 配置了自定义 UUID 环境变量
|
||||
response = cfspider.get(
|
||||
"https://httpbin.org/ip",
|
||||
cf_proxies="https://your-workers.dev",
|
||||
token="your-token" # 从查询参数传递
|
||||
uuid="xxxxxxxx-xxxx-4xxx-8xxx-xxxxxxxxxxxx" # 必须填写配置的 UUID
|
||||
)
|
||||
|
||||
# 或在 Session 中设置 token
|
||||
with cfspider.Session(
|
||||
cf_proxies="https://your-workers.dev",
|
||||
token="your-token"
|
||||
) as session:
|
||||
response = session.get("https://httpbin.org/ip")
|
||||
```
|
||||
|
||||
**注意:**
|
||||
- 如果不配置 `TOKEN` 环境变量,则所有请求都可以访问(无鉴权)
|
||||
- Token 可以通过查询参数 `?token=xxx` 或 Header `Authorization: Bearer xxx` 传递
|
||||
- 支持配置多个 token,用逗号分隔
|
||||
- 如果不配置 `UUID` 环境变量,Workers 会使用默认 UUID,界面会显示安全警告
|
||||
- 强烈建议在生产环境中配置自定义 UUID
|
||||
- 配置自定义 UUID 后,Python 库请求时必须提供相同的 UUID,否则无法连接
|
||||
|
||||
## 安装
|
||||
|
||||
@@ -360,11 +364,13 @@ cfspider install
|
||||
```python
|
||||
import cfspider
|
||||
|
||||
cf_proxies = "https://your-workers.dev"
|
||||
|
||||
response = cfspider.get("https://httpbin.org/ip", cf_proxies=cf_proxies)
|
||||
print(response.text)
|
||||
# {"origin": "2a06:98c0:3600::103, 172.71.24.151"} # Cloudflare IP
|
||||
# 只需填写 Workers 地址,每次请求自动获取新 IP
|
||||
for i in range(5):
|
||||
response = cfspider.get(
|
||||
"https://httpbin.org/ip",
|
||||
cf_proxies="https://your-workers.dev"
|
||||
)
|
||||
print(response.json()['origin']) # 每次都是不同的 IP
|
||||
```
|
||||
|
||||
### 浏览器模式
|
||||
@@ -372,26 +378,17 @@ print(response.text)
|
||||
```python
|
||||
import cfspider
|
||||
|
||||
# 使用本地 HTTP 代理
|
||||
browser = cfspider.Browser(cf_proxies="127.0.0.1:9674")
|
||||
# 简化用法:只需 Workers 地址(自动获取 UUID)
|
||||
browser = cfspider.Browser(cf_proxies="https://your-workers.dev")
|
||||
html = browser.html("https://httpbin.org/ip")
|
||||
print(html)
|
||||
print(html) # 返回动态 IP
|
||||
browser.close()
|
||||
|
||||
# 使用 VLESS 链接(推荐,无需填写 UUID)
|
||||
# 使用 VLESS 链接
|
||||
browser = cfspider.Browser(
|
||||
cf_proxies="vless://your-uuid@v2.example.com:443?path=/"
|
||||
)
|
||||
html = browser.html("https://httpbin.org/ip")
|
||||
print(html) # 返回 Cloudflare IP
|
||||
browser.close()
|
||||
|
||||
# 使用 edgetunnel 域名 + UUID(旧方式)
|
||||
browser = cfspider.Browser(
|
||||
cf_proxies="v2.example.com",
|
||||
vless_uuid="your-vless-uuid"
|
||||
)
|
||||
html = browser.html("https://httpbin.org/ip")
|
||||
browser.close()
|
||||
|
||||
# 无代理模式
|
||||
@@ -763,13 +760,13 @@ with cfspider.StealthSession(
|
||||
```python
|
||||
import cfspider
|
||||
|
||||
# 隐身模式 + Cloudflare IP 出口
|
||||
# 隐身模式 + 动态 IP(每次请求自动获取新 IP)
|
||||
response = cfspider.get(
|
||||
"https://httpbin.org/headers",
|
||||
cf_proxies="https://your-workers.dev",
|
||||
stealth=True
|
||||
)
|
||||
print(response.cf_colo) # Cloudflare 节点代码
|
||||
print(response.json()) # 完整的浏览器请求头
|
||||
|
||||
# 隐身会话 + Workers 代理
|
||||
with cfspider.StealthSession(
|
||||
@@ -777,7 +774,7 @@ with cfspider.StealthSession(
|
||||
browser='chrome'
|
||||
) as session:
|
||||
r1 = session.get("https://example.com")
|
||||
r2 = session.get("https://example.com/api")
|
||||
r2 = session.get("https://example.com/api") # 自动携带 Cookie 和 Referer
|
||||
```
|
||||
|
||||
### 配合 TLS 指纹模拟
|
||||
@@ -1354,25 +1351,19 @@ cfspider install
|
||||
```python
|
||||
import cfspider
|
||||
|
||||
# 1. HTTP 代理(IP:PORT 格式)
|
||||
browser = cfspider.Browser(cf_proxies="127.0.0.1:9674")
|
||||
# 1. CFspider Workers(推荐,自动获取 UUID)
|
||||
browser = cfspider.Browser(cf_proxies="https://your-workers.dev")
|
||||
|
||||
# 2. HTTP 代理(完整格式)
|
||||
browser = cfspider.Browser(cf_proxies="http://127.0.0.1:9674")
|
||||
|
||||
# 3. SOCKS5 代理
|
||||
browser = cfspider.Browser(cf_proxies="socks5://127.0.0.1:1080")
|
||||
|
||||
# 4. VLESS 链接(推荐,无需填写 UUID)
|
||||
# 2. VLESS 链接
|
||||
browser = cfspider.Browser(cf_proxies="vless://uuid@v2.example.com:443?path=/")
|
||||
|
||||
# 5. edgetunnel 域名 + UUID(旧方式)
|
||||
browser = cfspider.Browser(
|
||||
cf_proxies="v2.example.com",
|
||||
vless_uuid="your-vless-uuid"
|
||||
)
|
||||
# 3. HTTP 代理
|
||||
browser = cfspider.Browser(cf_proxies="http://127.0.0.1:9674")
|
||||
|
||||
# 6. 无代理
|
||||
# 4. SOCKS5 代理
|
||||
browser = cfspider.Browser(cf_proxies="socks5://127.0.0.1:1080")
|
||||
|
||||
# 5. 无代理
|
||||
browser = cfspider.Browser()
|
||||
```
|
||||
|
||||
|
||||
@@ -52,7 +52,7 @@ CFspider - Cloudflare 代理 IP 池 Python 库
|
||||
|
||||
from .api import (
|
||||
get, post, put, delete, head, options, patch, request,
|
||||
clear_map_records, get_map_collector
|
||||
clear_map_records, get_map_collector, stop_vless_proxies
|
||||
)
|
||||
from .session import Session
|
||||
from .cli import install_browser
|
||||
@@ -101,45 +101,50 @@ from .stealth import (
|
||||
|
||||
|
||||
# 延迟导入 Browser,避免强制依赖 playwright
|
||||
def Browser(cf_proxies=None, headless=True, timeout=30, vless_uuid=None):
|
||||
def Browser(cf_proxies=None, headless=True, timeout=30, uuid=None):
|
||||
"""
|
||||
创建浏览器实例
|
||||
创建浏览器实例 / Create browser instance
|
||||
|
||||
封装 Playwright,支持通过 Cloudflare Workers 代理浏览器流量。
|
||||
Wraps Playwright with Cloudflare Workers proxy support.
|
||||
|
||||
Args:
|
||||
cf_proxies: 代理地址,支持以下格式:
|
||||
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"(推荐)
|
||||
- HTTP 代理: "http://ip:port" 或 "ip:port"
|
||||
- SOCKS5 代理: "socks5://ip:port"
|
||||
- edgetunnel 域名: "v2.example.com"(需配合 vless_uuid)
|
||||
如不指定,则直接使用本地网络
|
||||
headless: 是否无头模式,默认 True
|
||||
timeout: 请求超时时间(秒),默认 30
|
||||
vless_uuid: VLESS UUID,仅当使用域名方式时需要指定
|
||||
如果使用完整 VLESS 链接,则无需此参数
|
||||
cf_proxies (str, optional): 代理地址 / Proxy address
|
||||
- CFspider Workers URL(推荐): "https://cfspider.violetqqcom.workers.dev"
|
||||
UUID 将自动从 Workers 获取 / UUID auto-fetched from Workers
|
||||
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"
|
||||
- HTTP 代理: "http://ip:port" 或 "ip:port"
|
||||
- SOCKS5 代理: "socks5://ip:port"
|
||||
不填则直接使用本地网络 / None for direct connection
|
||||
headless (bool): 是否无头模式,默认 True / Headless mode (default: True)
|
||||
timeout (int): 请求超时时间(秒),默认 30 / Timeout in seconds (default: 30)
|
||||
uuid (str, optional): VLESS UUID(可选,不填则自动获取)
|
||||
/ VLESS UUID (optional, auto-fetched)
|
||||
|
||||
Returns:
|
||||
Browser: 浏览器实例
|
||||
Browser: 浏览器实例 / Browser instance
|
||||
|
||||
Example:
|
||||
>>> import cfspider
|
||||
>>> # 使用完整 VLESS 链接(推荐,无需 vless_uuid)
|
||||
>>>
|
||||
>>> # 简化用法(推荐):只需 Workers 地址,自动获取 UUID
|
||||
>>> browser = cfspider.Browser(
|
||||
... cf_proxies="vless://uuid@v2.example.com:443?path=/"
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev"
|
||||
... )
|
||||
>>> html = browser.html("https://example.com")
|
||||
>>> browser.close()
|
||||
>>>
|
||||
>>> # 使用域名 + UUID(旧方式)
|
||||
>>> # 手动指定 UUID
|
||||
>>> browser = cfspider.Browser(
|
||||
... cf_proxies="v2.example.com",
|
||||
... vless_uuid="your-vless-uuid"
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
|
||||
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
|
||||
... )
|
||||
>>>
|
||||
>>> # 直接使用(无代理)
|
||||
>>> browser = cfspider.Browser()
|
||||
"""
|
||||
from .browser import Browser as _Browser
|
||||
return _Browser(cf_proxies, headless, timeout, vless_uuid)
|
||||
return _Browser(cf_proxies, headless, timeout, uuid)
|
||||
|
||||
|
||||
def parse_vless_link(vless_link):
|
||||
@@ -205,11 +210,11 @@ class PlaywrightNotInstalledError(CFSpiderError):
|
||||
pass
|
||||
|
||||
|
||||
__version__ = "1.8.0"
|
||||
__version__ = "1.8.2"
|
||||
__all__ = [
|
||||
# 同步 API (requests)
|
||||
"get", "post", "put", "delete", "head", "options", "patch", "request",
|
||||
"Session", "Browser", "install_browser", "parse_vless_link",
|
||||
"Session", "Browser", "install_browser", "parse_vless_link", "stop_vless_proxies",
|
||||
"CFSpiderError", "BrowserNotInstalledError", "PlaywrightNotInstalledError",
|
||||
# 异步 API (httpx)
|
||||
"aget", "apost", "aput", "adelete", "ahead", "aoptions", "apatch",
|
||||
|
||||
936
cfspider/api.py
936
cfspider/api.py
File diff suppressed because it is too large
Load Diff
@@ -89,14 +89,16 @@ class PlaywrightNotInstalledError(Exception):
|
||||
|
||||
class Browser:
|
||||
"""
|
||||
CFspider 浏览器类
|
||||
CFspider 浏览器类 / CFspider Browser class
|
||||
|
||||
封装 Playwright,支持通过 Cloudflare Workers (edgetunnel) 代理浏览器流量
|
||||
Wraps Playwright with Cloudflare Workers (edgetunnel) proxy support
|
||||
|
||||
Example:
|
||||
>>> import cfspider
|
||||
>>> # 通过 edgetunnel Workers 代理
|
||||
>>> browser = cfspider.Browser(cf_proxies="wss://v2.kami666.xyz")
|
||||
>>>
|
||||
>>> # 简化用法:只需 Workers 地址(自动获取 UUID)
|
||||
>>> browser = cfspider.Browser(cf_proxies="https://cfspider.violetqqcom.workers.dev")
|
||||
>>> html = browser.html("https://example.com")
|
||||
>>> browser.close()
|
||||
>>>
|
||||
@@ -106,34 +108,38 @@ class Browser:
|
||||
>>> browser.close()
|
||||
"""
|
||||
|
||||
def __init__(self, cf_proxies=None, headless=True, timeout=30, vless_uuid=None):
|
||||
def __init__(self, cf_proxies=None, headless=True, timeout=30, uuid=None):
|
||||
"""
|
||||
初始化浏览器
|
||||
初始化浏览器 / Initialize browser
|
||||
|
||||
Args:
|
||||
cf_proxies: 代理地址(选填),支持以下格式:
|
||||
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"(推荐)
|
||||
- HTTP 代理: "http://ip:port" 或 "ip:port"
|
||||
- SOCKS5 代理: "socks5://ip:port"
|
||||
- edgetunnel 域名: "v2.example.com"(需配合 vless_uuid)
|
||||
不填则直接使用本地网络
|
||||
headless: 是否无头模式,默认 True
|
||||
timeout: 请求超时时间(秒),默认 30
|
||||
vless_uuid: VLESS UUID(选填),使用域名方式时需要指定
|
||||
如果使用完整 VLESS 链接,则无需此参数
|
||||
cf_proxies (str, optional): 代理地址 / Proxy address
|
||||
- CFspider Workers URL(推荐): "https://cfspider.violetqqcom.workers.dev"
|
||||
UUID 将自动从 Workers 获取 / UUID auto-fetched from Workers
|
||||
- VLESS 链接: "vless://uuid@host:port?path=/xxx#name"
|
||||
- HTTP 代理: "http://ip:port" 或 "ip:port"
|
||||
- SOCKS5 代理: "socks5://ip:port"
|
||||
不填则直接使用本地网络 / None for direct connection
|
||||
headless (bool): 是否无头模式,默认 True / Headless mode (default: True)
|
||||
timeout (int): 请求超时时间(秒),默认 30 / Timeout in seconds (default: 30)
|
||||
uuid (str, optional): VLESS UUID(可选,不填则自动获取)
|
||||
/ VLESS UUID (optional, auto-fetched)
|
||||
|
||||
Examples:
|
||||
# 使用完整 VLESS 链接(推荐,无需填写 vless_uuid)
|
||||
browser = Browser(cf_proxies="vless://uuid@v2.example.com:443?path=/")
|
||||
|
||||
# 使用域名 + UUID(旧方式)
|
||||
browser = Browser(cf_proxies="v2.example.com", vless_uuid="your-uuid")
|
||||
|
||||
# 使用 HTTP 代理
|
||||
browser = Browser(cf_proxies="127.0.0.1:8080")
|
||||
|
||||
# 使用 SOCKS5 代理
|
||||
browser = Browser(cf_proxies="socks5://127.0.0.1:1080")
|
||||
>>> # 简化用法(推荐)
|
||||
>>> browser = Browser(cf_proxies="https://cfspider.violetqqcom.workers.dev")
|
||||
>>>
|
||||
>>> # 手动指定 UUID
|
||||
>>> browser = Browser(
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
|
||||
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
|
||||
... )
|
||||
>>>
|
||||
>>> # 使用 VLESS 链接
|
||||
>>> browser = Browser(cf_proxies="vless://uuid@v2.example.com:443?path=/")
|
||||
>>>
|
||||
>>> # 使用 HTTP 代理
|
||||
>>> browser = Browser(cf_proxies="127.0.0.1:8080")
|
||||
"""
|
||||
if not PLAYWRIGHT_AVAILABLE:
|
||||
raise PlaywrightNotInstalledError(
|
||||
@@ -158,20 +164,60 @@ class Browser:
|
||||
proxy_url = f"http://127.0.0.1:{port}"
|
||||
# 2. HTTP/SOCKS5 代理格式
|
||||
elif cf_proxies.startswith('http://') or cf_proxies.startswith('https://') or cf_proxies.startswith('socks5://'):
|
||||
proxy_url = cf_proxies
|
||||
# 如果是 CFspider Workers URL,尝试获取 UUID
|
||||
if 'workers.dev' in cf_proxies or not uuid:
|
||||
uuid = uuid or self._get_workers_uuid(cf_proxies)
|
||||
if uuid:
|
||||
# 使用 VLESS 代理
|
||||
hostname = cf_proxies.replace('https://', '').replace('http://', '').split('/')[0]
|
||||
ws_url = f'wss://{hostname}/{uuid}'
|
||||
self._vless_proxy = LocalVlessProxy(ws_url, uuid)
|
||||
port = self._vless_proxy.start()
|
||||
proxy_url = f"http://127.0.0.1:{port}"
|
||||
else:
|
||||
# 直接使用 HTTP 代理
|
||||
proxy_url = cf_proxies
|
||||
# 3. IP:PORT 格式
|
||||
elif ':' in cf_proxies and cf_proxies.replace('.', '').replace(':', '').isdigit():
|
||||
proxy_url = f"http://{cf_proxies}"
|
||||
# 4. 域名 + UUID(旧方式)
|
||||
elif vless_uuid:
|
||||
hostname = cf_proxies.replace('https://', '').replace('http://', '').replace('wss://', '').replace('ws://', '').split('/')[0]
|
||||
ws_url = f'wss://{hostname}/{vless_uuid}'
|
||||
self._vless_proxy = LocalVlessProxy(ws_url, vless_uuid)
|
||||
port = self._vless_proxy.start()
|
||||
proxy_url = f"http://127.0.0.1:{port}"
|
||||
# 5. 默认当作 HTTP 代理
|
||||
# 4. 域名方式(尝试自动获取 UUID)
|
||||
else:
|
||||
proxy_url = f"http://{cf_proxies}"
|
||||
hostname = cf_proxies.replace('wss://', '').replace('ws://', '').split('/')[0]
|
||||
uuid = uuid or self._get_workers_uuid(f"https://{hostname}")
|
||||
if uuid:
|
||||
ws_url = f'wss://{hostname}/{uuid}'
|
||||
self._vless_proxy = LocalVlessProxy(ws_url, uuid)
|
||||
port = self._vless_proxy.start()
|
||||
proxy_url = f"http://127.0.0.1:{port}"
|
||||
else:
|
||||
proxy_url = f"http://{cf_proxies}"
|
||||
|
||||
def _get_workers_uuid(self, workers_url):
|
||||
"""从 Workers 获取 UUID / Get UUID from Workers"""
|
||||
import requests
|
||||
import re
|
||||
|
||||
try:
|
||||
# 尝试从 /api/config 获取
|
||||
config_url = f"{workers_url.rstrip('/')}/api/config"
|
||||
resp = requests.get(config_url, timeout=10)
|
||||
if resp.status_code == 200:
|
||||
config = resp.json()
|
||||
return config.get('uuid')
|
||||
except:
|
||||
pass
|
||||
|
||||
try:
|
||||
# 尝试从首页 HTML 解析
|
||||
resp = requests.get(workers_url, timeout=10)
|
||||
if resp.status_code == 200:
|
||||
match = re.search(r'([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-4[0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})', resp.text)
|
||||
if match:
|
||||
return match.group(1).lower()
|
||||
except:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
# 启动 Playwright
|
||||
self._playwright = sync_playwright().start()
|
||||
|
||||
@@ -334,7 +334,8 @@ def export_sqlite(data: Union[Dict, List[Dict]],
|
||||
|
||||
# 插入数据
|
||||
placeholders = ", ".join(["?" for _ in fieldnames])
|
||||
insert_sql = f"INSERT INTO {table} ({', '.join([f'\"{n}\"' for n in fieldnames])}) VALUES ({placeholders})"
|
||||
fieldnames_str = ', '.join([f'"{n}"' for n in fieldnames])
|
||||
insert_sql = f"INSERT INTO {table} ({fieldnames_str}) VALUES ({placeholders})"
|
||||
|
||||
for row in rows:
|
||||
if isinstance(row, dict):
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
CFspider Session 模块
|
||||
|
||||
提供会话管理功能,在多个请求之间保持代理配置、请求头和 Cookie。
|
||||
简化 API:只需提供 Workers 地址即可自动获取 UUID 和配置。
|
||||
"""
|
||||
|
||||
from .api import request
|
||||
@@ -17,26 +18,27 @@ class Session:
|
||||
Suitable for scenarios requiring login state or consecutive requests.
|
||||
|
||||
Attributes:
|
||||
cf_proxies (str): Workers 代理地址 / Workers proxy address
|
||||
cf_proxies (str): Workers 代理地址(自动获取 UUID 配置)
|
||||
/ Workers proxy address (auto-fetches UUID config)
|
||||
uuid (str, optional): VLESS UUID(可选,不填则自动获取)
|
||||
/ VLESS UUID (optional, auto-fetched if not provided)
|
||||
headers (dict): 会话级别的默认请求头 / Session-level default headers
|
||||
cookies (dict): 会话级别的 Cookie / Session-level cookies
|
||||
token (str, optional): Workers API 鉴权 token / Workers API authentication token
|
||||
|
||||
Example:
|
||||
>>> import cfspider
|
||||
>>>
|
||||
>>> # 创建会话 / Create session
|
||||
>>> with cfspider.Session(cf_proxies="https://your-workers.dev", token="your-token") as session:
|
||||
... # 设置会话级别的请求头 / Set session-level headers
|
||||
... session.headers['Authorization'] = 'Bearer token'
|
||||
...
|
||||
... # 所有请求都会使用相同的代理和请求头
|
||||
... # All requests use the same proxy and headers
|
||||
... response1 = session.get("https://api.example.com/user")
|
||||
... response2 = session.post("https://api.example.com/data", json={"key": "value"})
|
||||
...
|
||||
... # Cookie 会自动保持 / Cookies are automatically maintained
|
||||
... print(session.cookies)
|
||||
>>> # 简化用法:只需 Workers 地址(自动获取 UUID)
|
||||
>>> with cfspider.Session(cf_proxies="https://cfspider.violetqqcom.workers.dev") as session:
|
||||
... response = session.get("https://api.example.com/user")
|
||||
... print(f"Cookies: {session.cookies}")
|
||||
>>>
|
||||
>>> # 手动指定 UUID
|
||||
>>> with cfspider.Session(
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
|
||||
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
|
||||
... ) as session:
|
||||
... response = session.get("https://httpbin.org/ip")
|
||||
|
||||
Note:
|
||||
如果需要隐身模式的会话一致性(自动 Referer、随机延迟等),
|
||||
@@ -45,39 +47,145 @@ class Session:
|
||||
please use cfspider.StealthSession.
|
||||
"""
|
||||
|
||||
def __init__(self, cf_proxies=None, token=None):
|
||||
def __init__(self, cf_proxies=None, uuid=None):
|
||||
"""
|
||||
初始化会话 / Initialize session
|
||||
|
||||
Args:
|
||||
cf_proxies (str): Workers 代理地址(必填)
|
||||
/ Workers proxy address (required)
|
||||
例如:"https://your-workers.dev"
|
||||
e.g., "https://your-workers.dev"
|
||||
token (str, optional): Workers API 鉴权 token
|
||||
/ Workers API authentication token
|
||||
当 Workers 端配置了 TOKEN 环境变量时,必须提供有效的 token
|
||||
Required when Workers has TOKEN environment variable configured
|
||||
例如:"https://cfspider.violetqqcom.workers.dev"
|
||||
e.g., "https://cfspider.violetqqcom.workers.dev"
|
||||
UUID 将自动从 Workers 获取
|
||||
UUID will be auto-fetched from Workers
|
||||
uuid (str, optional): VLESS UUID(可选)
|
||||
如果不填写,会自动从 Workers 首页获取
|
||||
If not provided, will be auto-fetched from Workers homepage
|
||||
|
||||
Raises:
|
||||
ValueError: 当 cf_proxies 为空时
|
||||
/ When cf_proxies is empty
|
||||
|
||||
Example:
|
||||
>>> session = cfspider.Session(cf_proxies="https://your-workers.dev", token="your-token")
|
||||
>>> # 简化用法(推荐)
|
||||
>>> session = cfspider.Session(cf_proxies="https://cfspider.violetqqcom.workers.dev")
|
||||
>>>
|
||||
>>> # 手动指定 UUID
|
||||
>>> session = cfspider.Session(
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
|
||||
... uuid="c373c80c-58e4-4e64-8db5-40096905ec58"
|
||||
... )
|
||||
"""
|
||||
if not cf_proxies:
|
||||
raise ValueError(
|
||||
"cf_proxies 是必填参数。\n"
|
||||
"请提供 CFspider Workers 地址,例如:\n"
|
||||
" session = cfspider.Session(cf_proxies='https://your-workers.dev')\n\n"
|
||||
" session = cfspider.Session(cf_proxies='https://cfspider.violetqqcom.workers.dev')\n\n"
|
||||
"UUID 将自动从 Workers 获取,无需手动指定。\n"
|
||||
"如果不需要代理,可以直接使用 cfspider.get() 等函数。\n"
|
||||
"如果需要隐身模式会话,请使用 cfspider.StealthSession。"
|
||||
)
|
||||
self.cf_proxies = cf_proxies.rstrip("/")
|
||||
self.token = token
|
||||
self.cf_proxies = cf_proxies.rstrip("/") if cf_proxies else None
|
||||
self.uuid = uuid
|
||||
self.headers = {}
|
||||
self.cookies = {}
|
||||
self._base_headers = {} # 兼容 StealthSession API
|
||||
|
||||
@property
|
||||
def _cookies(self):
|
||||
"""兼容 StealthSession 的 _cookies 属性"""
|
||||
return self.cookies
|
||||
|
||||
@_cookies.setter
|
||||
def _cookies(self, value):
|
||||
"""兼容 StealthSession 的 _cookies 属性"""
|
||||
self.cookies = value
|
||||
|
||||
def _update_cookies(self, response):
|
||||
"""
|
||||
从响应中更新 cookies / Update cookies from response
|
||||
|
||||
支持两种方式:
|
||||
1. 从 response.cookies 获取(直接请求时)
|
||||
2. 从响应头 Set-Cookie 解析(通过 Workers 代理时)
|
||||
"""
|
||||
# 方式1:从 response.cookies 获取
|
||||
if hasattr(response, 'cookies'):
|
||||
try:
|
||||
for cookie in response.cookies:
|
||||
if hasattr(cookie, 'name') and hasattr(cookie, 'value'):
|
||||
self.cookies[cookie.name] = cookie.value
|
||||
elif isinstance(cookie, str):
|
||||
if '=' in cookie:
|
||||
name, value = cookie.split('=', 1)
|
||||
self.cookies[name.strip()] = value.strip()
|
||||
except TypeError:
|
||||
if hasattr(response.cookies, 'items'):
|
||||
for name, value in response.cookies.items():
|
||||
self.cookies[name] = value
|
||||
|
||||
# 方式2:从响应头 Set-Cookie 解析(Workers 代理时需要)
|
||||
if hasattr(response, 'headers'):
|
||||
self._parse_set_cookie_headers(response.headers)
|
||||
|
||||
def _parse_set_cookie_headers(self, headers):
|
||||
"""
|
||||
从响应头中解析 Set-Cookie
|
||||
|
||||
Workers 代理会原样返回目标网站的 Set-Cookie 头,
|
||||
但 requests 库不会自动解析成 cookies,需要手动处理。
|
||||
"""
|
||||
# 获取所有 Set-Cookie 头
|
||||
set_cookie_headers = []
|
||||
|
||||
# 尝试多种方式获取所有 Set-Cookie 头
|
||||
if hasattr(headers, 'get_all'):
|
||||
# httpx 风格
|
||||
set_cookie_headers = headers.get_all('set-cookie') or []
|
||||
elif hasattr(headers, 'getlist'):
|
||||
# urllib3 风格
|
||||
set_cookie_headers = headers.getlist('set-cookie') or []
|
||||
else:
|
||||
# requests 风格 - headers 可能合并了多个 Set-Cookie
|
||||
# 用逗号分隔多个 cookie(但需要小心 Expires 中的逗号)
|
||||
cookie_header = headers.get('set-cookie', '')
|
||||
if cookie_header:
|
||||
# 简单分割,按照 ", " 后跟字母开头的模式
|
||||
# 例如: "a=1; Path=/, b=2; Path=/"
|
||||
import re
|
||||
# 匹配 ", " 后面紧跟 cookie 名称的模式
|
||||
parts = re.split(r',\s*(?=[A-Za-z_][A-Za-z0-9_-]*=)', cookie_header)
|
||||
set_cookie_headers = [p.strip() for p in parts if p.strip()]
|
||||
|
||||
# 解析每个 Set-Cookie 头
|
||||
for cookie_str in set_cookie_headers:
|
||||
self._parse_single_cookie(cookie_str)
|
||||
|
||||
def _parse_single_cookie(self, cookie_str):
|
||||
"""
|
||||
解析单个 Set-Cookie 字符串
|
||||
|
||||
格式示例:
|
||||
__Host-authjs.csrf-token=xxx%7Cyyy; Path=/; Secure; HttpOnly
|
||||
"""
|
||||
if not cookie_str:
|
||||
return
|
||||
|
||||
# 分割成多个部分
|
||||
parts = cookie_str.split(';')
|
||||
if not parts:
|
||||
return
|
||||
|
||||
# 第一部分是 name=value
|
||||
first_part = parts[0].strip()
|
||||
if '=' not in first_part:
|
||||
return
|
||||
|
||||
name, value = first_part.split('=', 1)
|
||||
name = name.strip()
|
||||
value = value.strip()
|
||||
|
||||
if name:
|
||||
self.cookies[name] = value
|
||||
|
||||
def request(self, method, url, **kwargs):
|
||||
"""
|
||||
@@ -94,6 +202,9 @@ class Session:
|
||||
- data (dict/str): 表单数据 / Form data
|
||||
- json (dict): JSON 数据 / JSON data
|
||||
- timeout (int/float): 超时时间(秒) / Timeout (seconds)
|
||||
- stealth (bool): 启用隐身模式 / Enable stealth mode
|
||||
- impersonate (str): TLS 指纹模拟 / TLS fingerprint impersonation
|
||||
- http2 (bool): 启用 HTTP/2 / Enable HTTP/2
|
||||
- 其他参数与 requests 库兼容
|
||||
- Other parameters compatible with requests library
|
||||
|
||||
@@ -105,22 +216,30 @@ class Session:
|
||||
Session-level headers and cookies are automatically added to requests,
|
||||
但请求级别的参数优先级更高。
|
||||
but request-level parameters have higher priority.
|
||||
响应中的 Set-Cookie 会自动保存到会话中。
|
||||
Set-Cookie from response will be automatically saved to session.
|
||||
"""
|
||||
headers = self.headers.copy()
|
||||
headers.update(self._base_headers) # 应用基础请求头
|
||||
headers.update(kwargs.pop("headers", {}))
|
||||
|
||||
cookies = self.cookies.copy()
|
||||
cookies.update(kwargs.pop("cookies", {}))
|
||||
|
||||
return request(
|
||||
response = request(
|
||||
method,
|
||||
url,
|
||||
cf_proxies=self.cf_proxies,
|
||||
token=self.token,
|
||||
uuid=self.uuid,
|
||||
headers=headers,
|
||||
cookies=cookies,
|
||||
**kwargs
|
||||
)
|
||||
|
||||
# 自动从响应中更新 cookies
|
||||
self._update_cookies(response)
|
||||
|
||||
return response
|
||||
|
||||
def get(self, url, **kwargs):
|
||||
"""
|
||||
|
||||
@@ -220,29 +220,38 @@ def update_sec_fetch_headers(headers: Dict, site_type: str = 'none') -> Dict:
|
||||
|
||||
class StealthSession:
|
||||
"""
|
||||
隐身会话类
|
||||
隐身会话类 / Stealth Session class
|
||||
|
||||
提供完整的会话一致性管理,解决反爬虫检测的三大问题:
|
||||
Provides complete session consistency management, solving three major anti-crawler issues:
|
||||
|
||||
1. 固定 User-Agent:整个会话使用同一个浏览器指纹
|
||||
Fixed User-Agent: Uses the same browser fingerprint throughout the session
|
||||
2. 自动管理 Cookie:响应中的 Cookie 自动保存并在后续请求中发送
|
||||
Auto Cookie Management: Cookies from responses are saved and sent in subsequent requests
|
||||
3. 自动添加 Referer:页面跳转时自动添加来源信息
|
||||
Auto Referer: Automatically adds origin information during page navigation
|
||||
4. 随机延迟:每次请求前随机等待,模拟人类行为
|
||||
Random Delay: Random wait before each request, simulating human behavior
|
||||
5. 自动更新 Sec-Fetch-Site:根据 Referer 判断同站/跨站访问
|
||||
Auto Sec-Fetch-Site: Updates based on Referer to indicate same-site/cross-site access
|
||||
|
||||
Attributes:
|
||||
browser (str): 当前使用的浏览器类型
|
||||
cf_proxies (str): 代理地址
|
||||
delay (tuple): 随机延迟范围
|
||||
auto_referer (bool): 是否自动添加 Referer
|
||||
last_url (str): 上一次请求的 URL
|
||||
request_count (int): 会话累计请求次数
|
||||
browser (str): 当前使用的浏览器类型 / Current browser type
|
||||
cf_proxies (str): Workers 代理地址 / Workers proxy address
|
||||
uuid (str): VLESS UUID(可选,自动获取) / VLESS UUID (optional, auto-fetched)
|
||||
delay (tuple): 随机延迟范围 / Random delay range
|
||||
auto_referer (bool): 是否自动添加 Referer / Whether to auto-add Referer
|
||||
last_url (str): 上一次请求的 URL / Last requested URL
|
||||
request_count (int): 会话累计请求次数 / Session cumulative request count
|
||||
|
||||
Example:
|
||||
>>> import cfspider
|
||||
>>>
|
||||
>>> # 基本用法
|
||||
>>> with cfspider.StealthSession(browser='chrome') as session:
|
||||
>>> # 基本用法(使用 Workers 代理)
|
||||
>>> with cfspider.StealthSession(
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev"
|
||||
... ) as session:
|
||||
... # 第一次请求:Sec-Fetch-Site: none
|
||||
... r1 = session.get("https://example.com")
|
||||
...
|
||||
@@ -251,14 +260,17 @@ class StealthSession:
|
||||
... r2 = session.get("https://example.com/page2")
|
||||
>>>
|
||||
>>> # 带随机延迟
|
||||
>>> with cfspider.StealthSession(delay=(1, 3)) as session:
|
||||
>>> with cfspider.StealthSession(
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
|
||||
... delay=(1, 3)
|
||||
... ) as session:
|
||||
... for url in urls:
|
||||
... # 每次请求前随机等待 1-3 秒
|
||||
... response = session.get(url)
|
||||
>>>
|
||||
>>> # 结合代理使用
|
||||
>>> # 完整配置
|
||||
>>> with cfspider.StealthSession(
|
||||
... cf_proxies="https://your-workers.dev",
|
||||
... cf_proxies="https://cfspider.violetqqcom.workers.dev",
|
||||
... browser='firefox',
|
||||
... delay=(0.5, 2.0)
|
||||
... ) as session:
|
||||
@@ -268,59 +280,63 @@ class StealthSession:
|
||||
|
||||
Note:
|
||||
StealthSession 与普通 Session 的区别:
|
||||
- Session: 仅保持代理配置和基本请求头
|
||||
Differences between StealthSession and regular Session:
|
||||
- Session: 仅保持代理配置和基本请求头 / Only maintains proxy config and basic headers
|
||||
- StealthSession: 完整的隐身模式,包括浏览器指纹、Cookie 管理、
|
||||
自动 Referer、随机延迟、Sec-Fetch-* 更新
|
||||
Complete stealth mode including browser fingerprint, Cookie management,
|
||||
auto Referer, random delay, Sec-Fetch-* updates
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
browser: str = 'chrome',
|
||||
cf_proxies: str = None,
|
||||
cf_workers: bool = True,
|
||||
uuid: str = None,
|
||||
delay: Tuple[float, float] = None,
|
||||
auto_referer: bool = True,
|
||||
token: str = None,
|
||||
**kwargs
|
||||
):
|
||||
"""
|
||||
初始化隐身会话
|
||||
初始化隐身会话 / Initialize stealth session
|
||||
|
||||
Args:
|
||||
browser (str): 浏览器类型,决定使用的 User-Agent 和请求头模板
|
||||
- 'chrome': Chrome 131(推荐,最完整的请求头,15 个)
|
||||
- 'firefox': Firefox 133(含 Sec-GPC 隐私头,12 个)
|
||||
- 'safari': Safari 18(macOS 风格,5 个)
|
||||
- 'edge': Edge 131(类似 Chrome,14 个)
|
||||
- 'chrome_mobile': Chrome Mobile(Android,10 个)
|
||||
cf_proxies (str, optional): 代理地址
|
||||
- 不指定则直接请求目标 URL
|
||||
- 指定 Workers 地址时配合 cf_workers=True
|
||||
- 指定普通代理时配合 cf_workers=False
|
||||
cf_workers (bool): 是否使用 CFspider Workers API(默认 True)
|
||||
/ Browser type, determines User-Agent and header template
|
||||
- 'chrome': Chrome 131(推荐,最完整的请求头,15 个)/ Recommended, 15 headers
|
||||
- 'firefox': Firefox 133(含 Sec-GPC 隐私头,12 个)/ Includes privacy headers
|
||||
- 'safari': Safari 18(macOS 风格,5 个)/ macOS style
|
||||
- 'edge': Edge 131(类似 Chrome,14 个)/ Similar to Chrome
|
||||
- 'chrome_mobile': Chrome Mobile(Android,10 个)/ Android mobile
|
||||
cf_proxies (str, optional): Workers 代理地址
|
||||
/ Workers proxy address
|
||||
- 如 "https://cfspider.violetqqcom.workers.dev"
|
||||
- 不指定则直接请求目标 URL / If not specified, requests directly
|
||||
- UUID 自动从 Workers 获取 / UUID auto-fetched from Workers
|
||||
uuid (str, optional): VLESS UUID(可选,不填则自动获取)
|
||||
/ VLESS UUID (optional, auto-fetched if not provided)
|
||||
delay (tuple, optional): 请求间随机延迟范围(秒)
|
||||
/ Random delay range between requests (seconds)
|
||||
- 如 (1, 3) 表示每次请求前随机等待 1-3 秒
|
||||
- 第一次请求不会延迟
|
||||
- 用于避免请求频率过高被检测
|
||||
- e.g., (1, 3) means random wait 1-3 seconds before each request
|
||||
- 第一次请求不会延迟 / First request won't be delayed
|
||||
auto_referer (bool): 是否自动添加 Referer(默认 True)
|
||||
- True: 自动使用上一个 URL 作为 Referer
|
||||
- False: 不自动添加(但可以手动指定)
|
||||
**kwargs: 保留参数,用于未来扩展
|
||||
/ Whether to auto-add Referer (default: True)
|
||||
**kwargs: 保留参数,用于未来扩展 / Reserved for future extensions
|
||||
|
||||
Example:
|
||||
>>> session = cfspider.StealthSession(
|
||||
... browser='chrome',
|
||||
... cf_proxies='https://your-workers.dev',
|
||||
... cf_proxies='https://cfspider.violetqqcom.workers.dev',
|
||||
... delay=(1, 3),
|
||||
... auto_referer=True
|
||||
... )
|
||||
"""
|
||||
self.browser = browser
|
||||
self.cf_proxies = cf_proxies
|
||||
self.cf_workers = cf_workers
|
||||
self.uuid = uuid
|
||||
self.delay = delay
|
||||
self.auto_referer = auto_referer
|
||||
self.token = token
|
||||
self.last_url = None
|
||||
self.request_count = 0
|
||||
self._extra_kwargs = kwargs
|
||||
@@ -361,21 +377,75 @@ class StealthSession:
|
||||
random_delay(self.delay[0], self.delay[1])
|
||||
|
||||
def _update_cookies(self, response):
|
||||
"""更新 Cookie"""
|
||||
"""
|
||||
从响应中更新 cookies
|
||||
|
||||
支持两种方式:
|
||||
1. 从 response.cookies 获取(直接请求时)
|
||||
2. 从响应头 Set-Cookie 解析(通过 Workers 代理时)
|
||||
"""
|
||||
# 方式1:从 response.cookies 获取
|
||||
if hasattr(response, 'cookies'):
|
||||
for cookie in response.cookies:
|
||||
self._cookies[cookie.name] = cookie.value
|
||||
try:
|
||||
for cookie in response.cookies:
|
||||
if hasattr(cookie, 'name') and hasattr(cookie, 'value'):
|
||||
self._cookies[cookie.name] = cookie.value
|
||||
except TypeError:
|
||||
if hasattr(response.cookies, 'items'):
|
||||
for name, value in response.cookies.items():
|
||||
self._cookies[name] = value
|
||||
|
||||
# 方式2:从响应头 Set-Cookie 解析(Workers 代理时需要)
|
||||
if hasattr(response, 'headers'):
|
||||
self._parse_set_cookie_headers(response.headers)
|
||||
|
||||
def _parse_set_cookie_headers(self, headers):
|
||||
"""从响应头中解析 Set-Cookie"""
|
||||
set_cookie_headers = []
|
||||
|
||||
if hasattr(headers, 'get_all'):
|
||||
set_cookie_headers = headers.get_all('set-cookie') or []
|
||||
elif hasattr(headers, 'getlist'):
|
||||
set_cookie_headers = headers.getlist('set-cookie') or []
|
||||
else:
|
||||
cookie_header = headers.get('set-cookie', '')
|
||||
if cookie_header:
|
||||
import re
|
||||
parts = re.split(r',\s*(?=[A-Za-z_][A-Za-z0-9_-]*=)', cookie_header)
|
||||
set_cookie_headers = [p.strip() for p in parts if p.strip()]
|
||||
|
||||
for cookie_str in set_cookie_headers:
|
||||
self._parse_single_cookie(cookie_str)
|
||||
|
||||
def _parse_single_cookie(self, cookie_str):
|
||||
"""解析单个 Set-Cookie 字符串"""
|
||||
if not cookie_str:
|
||||
return
|
||||
parts = cookie_str.split(';')
|
||||
if not parts:
|
||||
return
|
||||
first_part = parts[0].strip()
|
||||
if '=' not in first_part:
|
||||
return
|
||||
name, value = first_part.split('=', 1)
|
||||
name = name.strip()
|
||||
value = value.strip()
|
||||
if name:
|
||||
self._cookies[name] = value
|
||||
|
||||
def get(self, url: str, **kwargs) -> Any:
|
||||
"""
|
||||
发送 GET 请求
|
||||
发送 GET 请求 / Send GET request
|
||||
|
||||
Args:
|
||||
url: 目标 URL
|
||||
**kwargs: 其他参数
|
||||
url (str): 目标 URL / Target URL
|
||||
**kwargs: 其他参数 / Other parameters
|
||||
- impersonate (str): TLS 指纹模拟 / TLS fingerprint impersonation
|
||||
- http2 (bool): 启用 HTTP/2 / Enable HTTP/2
|
||||
- 其他参数与 requests 库兼容 / Compatible with requests library
|
||||
|
||||
Returns:
|
||||
响应对象
|
||||
CFSpiderResponse: 响应对象 / Response object
|
||||
"""
|
||||
from .api import get as _get
|
||||
|
||||
@@ -390,8 +460,7 @@ class StealthSession:
|
||||
response = _get(
|
||||
url,
|
||||
cf_proxies=self.cf_proxies,
|
||||
cf_workers=self.cf_workers,
|
||||
token=self.token,
|
||||
uuid=self.uuid,
|
||||
headers=headers,
|
||||
cookies=cookies,
|
||||
**kwargs
|
||||
@@ -404,7 +473,16 @@ class StealthSession:
|
||||
return response
|
||||
|
||||
def post(self, url: str, **kwargs) -> Any:
|
||||
"""发送 POST 请求"""
|
||||
"""
|
||||
发送 POST 请求 / Send POST request
|
||||
|
||||
Args:
|
||||
url (str): 目标 URL / Target URL
|
||||
**kwargs: 其他参数 / Other parameters
|
||||
|
||||
Returns:
|
||||
CFSpiderResponse: 响应对象 / Response object
|
||||
"""
|
||||
from .api import post as _post
|
||||
|
||||
self._apply_delay()
|
||||
@@ -421,8 +499,7 @@ class StealthSession:
|
||||
response = _post(
|
||||
url,
|
||||
cf_proxies=self.cf_proxies,
|
||||
cf_workers=self.cf_workers,
|
||||
token=self.token,
|
||||
uuid=self.uuid,
|
||||
headers=headers,
|
||||
cookies=cookies,
|
||||
**kwargs
|
||||
@@ -435,7 +512,7 @@ class StealthSession:
|
||||
return response
|
||||
|
||||
def put(self, url: str, **kwargs) -> Any:
|
||||
"""发送 PUT 请求"""
|
||||
"""发送 PUT 请求 / Send PUT request"""
|
||||
from .api import put as _put
|
||||
|
||||
self._apply_delay()
|
||||
@@ -445,8 +522,7 @@ class StealthSession:
|
||||
response = _put(
|
||||
url,
|
||||
cf_proxies=self.cf_proxies,
|
||||
cf_workers=self.cf_workers,
|
||||
token=self.token,
|
||||
uuid=self.uuid,
|
||||
headers=headers,
|
||||
cookies=cookies,
|
||||
**kwargs
|
||||
@@ -457,7 +533,7 @@ class StealthSession:
|
||||
return response
|
||||
|
||||
def delete(self, url: str, **kwargs) -> Any:
|
||||
"""发送 DELETE 请求"""
|
||||
"""发送 DELETE 请求 / Send DELETE request"""
|
||||
from .api import delete as _delete
|
||||
|
||||
self._apply_delay()
|
||||
@@ -467,8 +543,7 @@ class StealthSession:
|
||||
response = _delete(
|
||||
url,
|
||||
cf_proxies=self.cf_proxies,
|
||||
cf_workers=self.cf_workers,
|
||||
token=self.token,
|
||||
uuid=self.uuid,
|
||||
headers=headers,
|
||||
cookies=cookies,
|
||||
**kwargs
|
||||
@@ -479,7 +554,7 @@ class StealthSession:
|
||||
return response
|
||||
|
||||
def head(self, url: str, **kwargs) -> Any:
|
||||
"""发送 HEAD 请求"""
|
||||
"""发送 HEAD 请求 / Send HEAD request"""
|
||||
from .api import head as _head
|
||||
|
||||
self._apply_delay()
|
||||
@@ -489,8 +564,7 @@ class StealthSession:
|
||||
response = _head(
|
||||
url,
|
||||
cf_proxies=self.cf_proxies,
|
||||
cf_workers=self.cf_workers,
|
||||
token=self.token,
|
||||
uuid=self.uuid,
|
||||
headers=headers,
|
||||
cookies=cookies,
|
||||
**kwargs
|
||||
|
||||
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "cfspider"
|
||||
version = "1.8.0"
|
||||
version = "1.8.2"
|
||||
description = "Cloudflare Workers proxy IP pool client"
|
||||
readme = "README.md"
|
||||
license = {text = "Apache-2.0"}
|
||||
@@ -23,6 +23,16 @@ dependencies = [
|
||||
"httpx[http2]>=0.25.0",
|
||||
"curl_cffi>=0.5.0",
|
||||
"beautifulsoup4>=4.9.0",
|
||||
# 浏览器自动化
|
||||
"playwright>=1.40.0",
|
||||
# XPath 数据提取
|
||||
"lxml>=4.9.0",
|
||||
# JSONPath 数据提取
|
||||
"jsonpath-ng>=1.5.0",
|
||||
# Excel 导出
|
||||
"openpyxl>=3.0.0",
|
||||
# 进度条显示
|
||||
"tqdm>=4.60.0",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
|
||||
3556
workers.js
3556
workers.js
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user