Compare commits

..

48 Commits

Author SHA1 Message Date
CodingOnStar
0c29b67e22 Merge remote-tracking branch 'origin/main' into refactor/configuration 2026-01-27 11:43:36 +08:00
CodingOnStar
c080c48aba refactor(debug): extract hooks and components, add comprehensive tests
Extract reusable hooks and components from debug/index.tsx:
- useInputValidation, useFormattingChangeConfirm, useModalWidth hooks
- useTextCompletion hook for text completion logic
- DebugHeader component for header UI
- TextCompletionResult component for completion display

Add comprehensive test coverage for debug-with-multiple-model:
- chat-item.spec.tsx (23 tests)
- debug-item.spec.tsx (25 tests)
- model-parameter-trigger.spec.tsx (14 tests)
- text-generation-item.spec.tsx (16 tests)
- index.spec.tsx expanded (84 tests)

Total: 183 tests passing with 95%+ coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 11:42:09 +08:00
lif
d13638f6e4 test: wrap test cleanup in act() to prevent window is not defined error (#31558)
Signed-off-by: majiayu000 <1835304752@qq.com>
2026-01-27 11:25:14 +08:00
hj24
b4eef76c14 fix: billing account deletion (#31556) 2026-01-27 11:18:23 +08:00
dependabot[bot]
cbf7f646d9 chore(deps): bump pypdf from 6.6.0 to 6.6.2 in /api (#31568)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2026-01-27 11:06:13 +08:00
Coding On Star
c58647d39c refactor(web): extract MCP components and add comprehensive tests (#31517)
Co-authored-by: CodingOnStar <hanxujiang@dify.ai>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Co-authored-by: CodingOnStar <hanxujiang@dify.com>
2026-01-27 11:05:59 +08:00
E.G
f6be9cd90d refactor: replace request.args.get with Pydantic BaseModel validation (#31104)
Co-authored-by: GlobalStar117 <GlobalStar117@users.noreply.github.com>
Co-authored-by: Asuka Minato <i@asukaminato.eu.org>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-27 10:48:42 +08:00
dependabot[bot]
360f3bb32f chore(deps): bump pycryptodome from 3.19.1 to 3.23.0 in /api (#31504)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-27 10:43:05 +08:00
lif
8519b16cfc docs: add ESLint guide to AGENTS.md (#31559)
Signed-off-by: majiayu000 <1835304752@qq.com>
2026-01-27 09:32:55 +08:00
盐粒 Yanli
f00d823f9f chore: move agent notes into docstrings (#31560) 2026-01-27 09:32:26 +08:00
wangxiaolei
e48419937b feat: chatflow support multimodal (#31293)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-27 00:24:48 +08:00
Asuka Minato
5eaf0c733a fix: service api doc can not gen (#31549) 2026-01-26 21:59:02 +09:00
yyh
f561656a89 chore: follow-up fixes for storybook vite migration (#31545) 2026-01-26 20:20:14 +08:00
Junyan Qin (Chin)
f01f555146 chore: increase plugin cache ttl to 1 hour (#31552) 2026-01-26 19:48:33 +08:00
Stephen Zhou
47d0e400ae chore: update to story book nextjs-vite (#31536) 2026-01-26 17:07:20 +08:00
wangxiaolei
8724ba04aa fix: fix Cannot read properties of null (reading 'credential_form_sch… (#31117) 2026-01-26 15:52:53 +08:00
Stephen Zhou
6fd001c660 chore: eslint prune-suppressions (#31526) 2026-01-26 15:31:19 +08:00
coopercoder
e8e386a6b9 fix: Add vertical scrolling support for floating elements. (#30897)
Co-authored-by: zhaiguangpeng <zhaiguangpeng@didiglobal.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-26 15:17:42 +08:00
Asuka Minato
eba5eac3fa refactor: api/controllers/console/setup.py to ov3 (#31465)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-26 15:04:33 +08:00
Asuka Minato
19008dce13 refactor: api/controllers/console/version.py to v3 (#31463)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-26 15:04:25 +08:00
盐粒 Yanli
92011d0a31 refactor: LLM plugin invoke parsing (#31499)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-26 14:59:57 +08:00
Xiangxuan Qu
a51ced0a4f refactor: pass BaseModel instances instead of dict (#31514)
Co-authored-by: fghpdf <fghpdf@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-26 14:50:14 +08:00
yyh
dad8e408b0 fix(web): upgrade tanstack devtools to fix seroval RCE vulnerability (#31515) 2026-01-26 14:49:58 +08:00
Coding On Star
d941201a3e refactor(tool-selector): remove unused components and consolidate import (#31018)
Co-authored-by: CodingOnStar <hanxujiang@dify.ai>
2026-01-26 14:24:00 +08:00
Coding On Star
dd988d42c2 feat: enhance quota panel to support additional model providers and integrate trial models feature (#31443)
Co-authored-by: CodingOnStar <hanxujiang@dify.ai>
2026-01-26 14:04:12 +08:00
Coding On Star
a43d2ec4f0 refactor: restructure Completed component (#31435)
Co-authored-by: CodingOnStar <hanxujiang@dify.ai>
2026-01-26 14:03:51 +08:00
zyssyz123
7c12e923b6 feat: add trial model list in system features (#31313)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: hj24 <mambahj24@gmail.com>
2026-01-26 11:52:05 +08:00
Asuka Minato
b9f1d65d4f refactor: example of refine dict / Mapping (#31498) 2026-01-26 10:23:38 +08:00
dependabot[bot]
b4e2af96e2 chore(deps): bump @lexical/utils from 0.38.2 to 0.39.0 in /web (#31503)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-26 10:17:04 +08:00
dependabot[bot]
9d38af6d99 chore(deps): bump pyasn1 from 0.6.1 to 0.6.2 in /api (#31140)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-24 10:31:56 +08:00
TomoOkuyama
0772d49257 fix(api): fix IRIS hybrid search returning zero results (#31309)
Co-authored-by: Tomo Okuyama <tomo.okuyama@intersystems.com>
2026-01-24 10:29:19 +08:00
-LAN-
67eb8c052d refactor: single-node workflow runner helpers (#31472) 2026-01-24 10:27:44 +08:00
Asuka Minato
5c4028d557 refactor: port AppModelConfig (#30919)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-24 10:25:51 +08:00
lif
55e6bca11c fix(http-request): prevent UUID truncation in JSON body (#31444)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-24 10:21:21 +08:00
盐粒 Yanli
67657c2f48 chore: Update dev setup scripts and API README (#31415)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-24 10:20:47 +08:00
fenglin
e8f9d64651 fix(tools): fix ToolInvokeMessage Union type parsing issue (#31450)
Co-authored-by: qiaofenglin <qiaofenglin@baidu.com>
2026-01-24 10:18:06 +08:00
wangxiaolei
1f8c730259 feat: optimize http status code (#31430) 2026-01-24 10:16:16 +08:00
Asuka Minato
8d45755303 feat: init fastopenapi (#30453)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-23 21:07:52 +09:00
Asuka Minato
6342d196e8 refactor: split changes for api/controllers/web/workflow.py (#29852) 2026-01-23 19:06:21 +09:00
Asuka Minato
5dc5709d58 refactor: split changes for api/controllers/web/login.py (#29854)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-23 19:06:04 +09:00
QuantumGhost
99d19cd3db docs(api): clarity SystemFeatureApi for webapp is unauthenticated by design (#31432)
The `/api/system-features` is required for the web app initialization.
Authentication would create circular dependency (can't authenticate without web app loading).

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 16:03:12 +08:00
非法操作
fa92548cf6 feat: archive workflow run logs backend (#31310)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-01-23 13:11:56 +08:00
lif
41428432cc ci: enable ESLint autofix in autofix bot (#31428)
Signed-off-by: majiayu000 <1835304752@qq.com>
2026-01-23 13:05:51 +08:00
Cursx
b3a869b91b refactor: optimize system features response payload for unauthenticated clients (#31392)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: QuantumGhost <obelisk.reg+git@gmail.com>
2026-01-23 12:12:11 +08:00
Stephen Zhou
f911199c8e chore: disable serwist in dev (#31424)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 11:35:14 +08:00
wangxiaolei
056095238b fix: fix create-by-file doc_form (#31346) 2026-01-23 11:34:47 +08:00
盐粒 Yanli
c8ae6e39d2 fix: NextStep crash when target node is missing (#31416) 2026-01-23 10:15:20 +08:00
QuantumGhost
61f8647f37 docs(api): mark SystemFeatureApi as unauthenticated by design (#31417)
The `/console/api/system-features` is required for the dashboard initialization. Authentication would create circular dependency (can't login without dashboard loading).

ref: CVE-2025-63387

Related: #31368
2026-01-22 22:33:59 +08:00
406 changed files with 42642 additions and 11734 deletions

View File

@@ -79,6 +79,29 @@ jobs:
find . -name "*.py" -type f -exec sed -i.bak -E 's/"([^"]+)" \| None/Optional["\1"]/g; s/'"'"'([^'"'"']+)'"'"' \| None/Optional['"'"'\1'"'"']/g' {} \;
find . -name "*.py.bak" -type f -delete
- name: Install pnpm
uses: pnpm/action-setup@v4
with:
package_json_file: web/package.json
run_install: false
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24
cache: pnpm
cache-dependency-path: ./web/pnpm-lock.yaml
- name: Install web dependencies
run: |
cd web
pnpm install --frozen-lockfile
- name: ESLint autofix
run: |
cd web
pnpm lint:fix || true
# mdformat breaks YAML front matter in markdown files. Add --exclude for directories containing YAML front matter.
- name: mdformat
run: |

View File

@@ -25,6 +25,30 @@ pnpm type-check:tsgo
pnpm test
```
### Frontend Linting
ESLint is used for frontend code quality. Available commands:
```bash
# Lint all files (report only)
pnpm lint
# Lint and auto-fix issues
pnpm lint:fix
# Lint specific files or directories
pnpm lint:fix app/components/base/button/
pnpm lint:fix app/components/base/button/index.tsx
# Lint quietly (errors only, no warnings)
pnpm lint:quiet
# Check code complexity
pnpm lint:complexity
```
**Important**: Always run `pnpm lint:fix` before committing. The pre-commit hook runs `lint-staged` which only lints staged files.
## Testing & Quality Practices
- Follow TDD: red → green → refactor.

View File

View File

@@ -1,97 +1,47 @@
# API Agent Guide
## Agent Notes (must-check)
## Notes for Agent (must-check)
Before you start work on any backend file under `api/`, you MUST check whether a related note exists under:
Before changing any backend code under `api/`, you MUST read the surrounding docstrings and comments. These notes contain required context (invariants, edge cases, trade-offs) and are treated as part of the spec.
- `agent-notes/<same-relative-path-as-target-file>.md`
Look for:
Rules:
- The module (file) docstring at the top of a source code file
- Docstrings on classes and functions/methods
- Paragraph/block comments for non-obvious logic
- **Path mapping**: for a target file `<path>/<name>.py`, the note must be `agent-notes/<path>/<name>.py.md` (same folder structure, same filename, plus `.md`).
- **Before working**:
- If the note exists, read it first and follow any constraints/decisions recorded there.
- If the note conflicts with the current code, or references an "origin" file/path that has been deleted, renamed, or migrated, treat the **code as the single source of truth** and update the note to match reality.
- If the note does not exist, create it with a short architecture/intent summary and any relevant invariants/edge cases.
- **During working**:
- Keep the note in sync as you discover constraints, make decisions, or change approach.
- If you move/rename a file, migrate its note to the new mapped path (and fix any outdated references inside the note).
- Record non-obvious edge cases, trade-offs, and the test/verification plan as you go (not just at the end).
- Keep notes **coherent**: integrate new findings into the relevant sections and rewrite for clarity; avoid append-only “recent fix” / changelog-style additions unless the note is explicitly intended to be a changelog.
- **When finishing work**:
- Update the related note(s) to reflect what changed, why, and any new edge cases/tests.
- If a file is deleted, remove or clearly deprecate the corresponding note so it cannot be mistaken as current guidance.
- Keep notes concise and accurate; they are meant to prevent repeated rediscovery.
### What to write where
## Skill Index
- Keep notes scoped: module notes cover module-wide context, class notes cover class-wide context, function/method notes cover behavioural contracts, and paragraph/block comments cover local “why”. Avoid duplicating the same content across scopes unless repetition prevents misuse.
- **Module (file) docstring**: purpose, boundaries, key invariants, and “gotchas” that a new reader must know before editing.
- Include cross-links to the key collaborators (modules/services) when discovery is otherwise hard.
- Prefer stable facts (invariants, contracts) over ephemeral “today we…” notes.
- **Class docstring**: responsibility, lifecycle, invariants, and how it should be used (or not used).
- If the class is intentionally stateful, note what state exists and what methods mutate it.
- If concurrency/async assumptions matter, state them explicitly.
- **Function/method docstring**: behavioural contract.
- Document arguments, return shape, side effects (DB writes, external I/O, task dispatch), and raised domain exceptions.
- Add examples only when they prevent misuse.
- **Paragraph/block comments**: explain *why* (trade-offs, historical constraints, surprising edge cases), not what the code already states.
- Keep comments adjacent to the logic they justify; delete or rewrite comments that no longer match reality.
Start with the section that best matches your need. Each entry lists the problems it solves plus key files/concepts so you know what to expect before opening it.
### Rules (must follow)
### Platform Foundations
In this section, “notes” means module/class/function docstrings plus any relevant paragraph/block comments.
#### [Infrastructure Overview](agent_skills/infra.md)
- **When to read this**
- You need to understand where a feature belongs in the architecture.
- Youre wiring storage, Redis, vector stores, or OTEL.
- Youre about to add CLI commands or async jobs.
- **What it covers**
- Configuration stack (`configs/app_config.py`, remote settings)
- Storage entry points (`extensions/ext_storage.py`, `core/file/file_manager.py`)
- Redis conventions (`extensions/ext_redis.py`)
- Plugin runtime topology
- Vector-store factory (`core/rag/datasource/vdb/*`)
- Observability hooks
- SSRF proxy usage
- Core CLI commands
### Plugin & Extension Development
#### [Plugin Systems](agent_skills/plugin.md)
- **When to read this**
- Youre building or debugging a marketplace plugin.
- You need to know how manifests, providers, daemons, and migrations fit together.
- **What it covers**
- Plugin manifests (`core/plugin/entities/plugin.py`)
- Installation/upgrade flows (`services/plugin/plugin_service.py`, CLI commands)
- Runtime adapters (`core/plugin/impl/*` for tool/model/datasource/trigger/endpoint/agent)
- Daemon coordination (`core/plugin/entities/plugin_daemon.py`)
- How provider registries surface capabilities to the rest of the platform
#### [Plugin OAuth](agent_skills/plugin_oauth.md)
- **When to read this**
- You must integrate OAuth for a plugin or datasource.
- Youre handling credential encryption or refresh flows.
- **Topics**
- Credential storage
- Encryption helpers (`core/helper/provider_encryption.py`)
- OAuth client bootstrap (`services/plugin/oauth_service.py`, `services/plugin/plugin_parameter_service.py`)
- How console/API layers expose the flows
### Workflow Entry & Execution
#### [Trigger Concepts](agent_skills/trigger.md)
- **When to read this**
- Youre debugging why a workflow didnt start.
- Youre adding a new trigger type or hook.
- You need to trace async execution, draft debugging, or webhook/schedule pipelines.
- **Details**
- Start-node taxonomy
- Webhook & schedule internals (`core/workflow/nodes/trigger_*`, `services/trigger/*`)
- Async orchestration (`services/async_workflow_service.py`, Celery queues)
- Debug event bus
- Storage/logging interactions
## General Reminders
- All skill docs assume you follow the coding style rules below—run the lint/type/test commands before submitting changes.
- When you cannot find an answer in these briefs, search the codebase using the paths referenced (e.g., `core/plugin/impl/tool.py`, `services/dataset_service.py`).
- If you run into cross-cutting concerns (tenancy, configuration, storage), check the infrastructure guide first; it links to most supporting modules.
- Keep multi-tenancy and configuration central: everything flows through `configs.dify_config` and `tenant_id`.
- When touching plugins or triggers, consult both the system overview and the specialised doc to ensure you adjust lifecycle, storage, and observability consistently.
- **Before working**
- Read the notes in the area youll touch; treat them as part of the spec.
- If a docstring or comment conflicts with the current code, treat the **code as the single source of truth** and update the docstring or comment to match reality.
- If important intent/invariants/edge cases are missing, add them in the closest docstring or comment (module for overall scope, function for behaviour).
- **During working**
- Keep the notes in sync as you discover constraints, make decisions, or change approach.
- If you move/rename responsibilities across modules/classes, update the affected docstrings and comments so readers can still find the “why” and the invariants.
- Record non-obvious edge cases, trade-offs, and the test/verification plan in the nearest docstring or comment that will stay correct.
- Keep the notes **coherent**: integrate new findings into the relevant docstrings and comments; avoid append-only “recent fix” / changelog-style additions.
- **When finishing**
- Update the notes to reflect what changed, why, and any new edge cases/tests.
- Remove or rewrite any comments that could be mistaken as current guidance but no longer apply.
- Keep docstrings and comments concise and accurate; they are meant to prevent repeated rediscovery.
## Coding Style
@@ -226,7 +176,7 @@ Before opening a PR / submitting:
- Controllers: parse input via Pydantic, invoke services, return serialised responses; no business logic.
- Services: coordinate repositories, providers, background tasks; keep side effects explicit.
- Document non-obvious behaviour with concise comments.
- Document non-obvious behaviour with concise docstrings and comments.
### Miscellaneous

View File

@@ -1,6 +1,6 @@
# Dify Backend API
## Usage
## Setup and Run
> [!IMPORTANT]
>
@@ -8,48 +8,77 @@
> [`uv`](https://docs.astral.sh/uv/) as the package manager
> for Dify API backend service.
1. Start the docker-compose stack
`uv` and `pnpm` are required to run the setup and development commands below.
The backend require some middleware, including PostgreSQL, Redis, and Weaviate, which can be started together using `docker-compose`.
### Using scripts (recommended)
The scripts resolve paths relative to their location, so you can run them from anywhere.
1. Run setup (copies env files and installs dependencies).
```bash
cd ../docker
cp middleware.env.example middleware.env
# change the profile to mysql if you are not using postgres,change the profile to other vector database if you are not using weaviate
docker compose -f docker-compose.middleware.yaml --profile postgresql --profile weaviate -p dify up -d
cd ../api
./dev/setup
```
1. Copy `.env.example` to `.env`
1. Review `api/.env`, `web/.env.local`, and `docker/middleware.env` values (see the `SECRET_KEY` note below).
```cli
cp .env.example .env
1. Start middleware (PostgreSQL/Redis/Weaviate).
```bash
./dev/start-docker-compose
```
> [!IMPORTANT]
>
> When the frontend and backend run on different subdomains, set COOKIE_DOMAIN to the sites top-level domain (e.g., `example.com`). The frontend and backend must be under the same top-level domain in order to share authentication cookies.
1. Start backend (runs migrations first).
1. Generate a `SECRET_KEY` in the `.env` file.
bash for Linux
```bash for Linux
sed -i "/^SECRET_KEY=/c\SECRET_KEY=$(openssl rand -base64 42)" .env
```bash
./dev/start-api
```
bash for Mac
1. Start Dify [web](../web) service.
```bash for Mac
secret_key=$(openssl rand -base64 42)
sed -i '' "/^SECRET_KEY=/c\\
SECRET_KEY=${secret_key}" .env
```bash
./dev/start-web
```
1. Create environment.
1. Set up your application by visiting `http://localhost:3000`.
Dify API service uses [UV](https://docs.astral.sh/uv/) to manage dependencies.
First, you need to add the uv package manager, if you don't have it already.
1. Optional: start the worker service (async tasks, runs from `api`).
```bash
./dev/start-worker
```
1. Optional: start Celery Beat (scheduled tasks).
```bash
./dev/start-beat
```
### Manual commands
<details>
<summary>Show manual setup and run steps</summary>
These commands assume you start from the repository root.
1. Start the docker-compose stack.
The backend requires middleware, including PostgreSQL, Redis, and Weaviate, which can be started together using `docker-compose`.
```bash
cp docker/middleware.env.example docker/middleware.env
# Use mysql or another vector database profile if you are not using postgres/weaviate.
docker compose -f docker/docker-compose.middleware.yaml --profile postgresql --profile weaviate -p dify up -d
```
1. Copy env files.
```bash
cp api/.env.example api/.env
cp web/.env.example web/.env.local
```
1. Install UV if needed.
```bash
pip install uv
@@ -57,60 +86,96 @@
brew install uv
```
1. Install dependencies
1. Install API dependencies.
```bash
uv sync --dev
cd api
uv sync --group dev
```
1. Run migrate
Before the first launch, migrate the database to the latest version.
1. Install web dependencies.
```bash
cd web
pnpm install
cd ..
```
1. Start backend (runs migrations first, in a new terminal).
```bash
cd api
uv run flask db upgrade
```
1. Start backend
```bash
uv run flask run --host 0.0.0.0 --port=5001 --debug
```
1. Start Dify [web](../web) service.
1. Start Dify [web](../web) service (in a new terminal).
1. Setup your application by visiting `http://localhost:3000`.
```bash
cd web
pnpm dev:inspect
```
1. If you need to handle and debug the async tasks (e.g. dataset importing and documents indexing), please start the worker service.
1. Set up your application by visiting `http://localhost:3000`.
```bash
uv run celery -A app.celery worker -P threads -c 2 --loglevel INFO -Q dataset,priority_dataset,priority_pipeline,pipeline,mail,ops_trace,app_deletion,plugin,workflow_storage,conversation,workflow,schedule_poller,schedule_executor,triggered_workflow_dispatcher,trigger_refresh_executor,retention
```
1. Optional: start the worker service (async tasks, in a new terminal).
Additionally, if you want to debug the celery scheduled tasks, you can run the following command in another terminal to start the beat service:
```bash
cd api
uv run celery -A app.celery worker -P threads -c 2 --loglevel INFO -Q dataset,priority_dataset,priority_pipeline,pipeline,mail,ops_trace,app_deletion,plugin,workflow_storage,conversation,workflow,schedule_poller,schedule_executor,triggered_workflow_dispatcher,trigger_refresh_executor,retention
```
```bash
uv run celery -A app.celery beat
```
1. Optional: start Celery Beat (scheduled tasks, in a new terminal).
```bash
cd api
uv run celery -A app.celery beat
```
</details>
### Environment notes
> [!IMPORTANT]
>
> When the frontend and backend run on different subdomains, set COOKIE_DOMAIN to the sites top-level domain (e.g., `example.com`). The frontend and backend must be under the same top-level domain in order to share authentication cookies.
- Generate a `SECRET_KEY` in the `.env` file.
bash for Linux
```bash
sed -i "/^SECRET_KEY=/c\\SECRET_KEY=$(openssl rand -base64 42)" .env
```
bash for Mac
```bash
secret_key=$(openssl rand -base64 42)
sed -i '' "/^SECRET_KEY=/c\\
SECRET_KEY=${secret_key}" .env
```
## Testing
1. Install dependencies for both the backend and the test environment
```bash
uv sync --dev
cd api
uv sync --group dev
```
1. Run the tests locally with mocked system environment variables in `tool.pytest_env` section in `pyproject.toml`, more can check [Claude.md](../CLAUDE.md)
```bash
cd api
uv run pytest # Run all tests
uv run pytest tests/unit_tests/ # Unit tests only
uv run pytest tests/integration_tests/ # Integration tests
# Code quality
../dev/reformat # Run all formatters and linters
uv run ruff check --fix ./ # Fix linting issues
uv run ruff format ./ # Format code
uv run basedpyright . # Type checking
./dev/reformat # Run all formatters and linters
uv run ruff check --fix ./ # Fix linting issues
uv run ruff format ./ # Format code
uv run basedpyright . # Type checking
```

View File

@@ -81,6 +81,7 @@ def initialize_extensions(app: DifyApp):
ext_commands,
ext_compress,
ext_database,
ext_fastopenapi,
ext_forward_refs,
ext_hosting_provider,
ext_import_modules,
@@ -128,6 +129,7 @@ def initialize_extensions(app: DifyApp):
ext_proxy_fix,
ext_blueprints,
ext_commands,
ext_fastopenapi,
ext_otel,
ext_request_logging,
ext_session_factory,

View File

@@ -950,6 +950,346 @@ def clean_workflow_runs(
)
@click.command(
"archive-workflow-runs",
help="Archive workflow runs for paid plan tenants to S3-compatible storage.",
)
@click.option("--tenant-ids", default=None, help="Optional comma-separated tenant IDs for grayscale rollout.")
@click.option("--before-days", default=90, show_default=True, help="Archive runs older than N days.")
@click.option(
"--from-days-ago",
default=None,
type=click.IntRange(min=0),
help="Lower bound in days ago (older). Must be paired with --to-days-ago.",
)
@click.option(
"--to-days-ago",
default=None,
type=click.IntRange(min=0),
help="Upper bound in days ago (newer). Must be paired with --from-days-ago.",
)
@click.option(
"--start-from",
type=click.DateTime(formats=["%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"]),
default=None,
help="Archive runs created at or after this timestamp (UTC if no timezone).",
)
@click.option(
"--end-before",
type=click.DateTime(formats=["%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"]),
default=None,
help="Archive runs created before this timestamp (UTC if no timezone).",
)
@click.option("--batch-size", default=100, show_default=True, help="Batch size for processing.")
@click.option("--workers", default=1, show_default=True, type=int, help="Concurrent workflow runs to archive.")
@click.option("--limit", default=None, type=int, help="Maximum number of runs to archive.")
@click.option("--dry-run", is_flag=True, help="Preview without archiving.")
@click.option("--delete-after-archive", is_flag=True, help="Delete runs and related data after archiving.")
def archive_workflow_runs(
tenant_ids: str | None,
before_days: int,
from_days_ago: int | None,
to_days_ago: int | None,
start_from: datetime.datetime | None,
end_before: datetime.datetime | None,
batch_size: int,
workers: int,
limit: int | None,
dry_run: bool,
delete_after_archive: bool,
):
"""
Archive workflow runs for paid plan tenants older than the specified days.
This command archives the following tables to storage:
- workflow_node_executions
- workflow_node_execution_offload
- workflow_pauses
- workflow_pause_reasons
- workflow_trigger_logs
The workflow_runs and workflow_app_logs tables are preserved for UI listing.
"""
from services.retention.workflow_run.archive_paid_plan_workflow_run import WorkflowRunArchiver
run_started_at = datetime.datetime.now(datetime.UTC)
click.echo(
click.style(
f"Starting workflow run archiving at {run_started_at.isoformat()}.",
fg="white",
)
)
if (start_from is None) ^ (end_before is None):
click.echo(click.style("start-from and end-before must be provided together.", fg="red"))
return
if (from_days_ago is None) ^ (to_days_ago is None):
click.echo(click.style("from-days-ago and to-days-ago must be provided together.", fg="red"))
return
if from_days_ago is not None and to_days_ago is not None:
if start_from or end_before:
click.echo(click.style("Choose either day offsets or explicit dates, not both.", fg="red"))
return
if from_days_ago <= to_days_ago:
click.echo(click.style("from-days-ago must be greater than to-days-ago.", fg="red"))
return
now = datetime.datetime.now()
start_from = now - datetime.timedelta(days=from_days_ago)
end_before = now - datetime.timedelta(days=to_days_ago)
before_days = 0
if start_from and end_before and start_from >= end_before:
click.echo(click.style("start-from must be earlier than end-before.", fg="red"))
return
if workers < 1:
click.echo(click.style("workers must be at least 1.", fg="red"))
return
archiver = WorkflowRunArchiver(
days=before_days,
batch_size=batch_size,
start_from=start_from,
end_before=end_before,
workers=workers,
tenant_ids=[tid.strip() for tid in tenant_ids.split(",")] if tenant_ids else None,
limit=limit,
dry_run=dry_run,
delete_after_archive=delete_after_archive,
)
summary = archiver.run()
click.echo(
click.style(
f"Summary: processed={summary.total_runs_processed}, archived={summary.runs_archived}, "
f"skipped={summary.runs_skipped}, failed={summary.runs_failed}, "
f"time={summary.total_elapsed_time:.2f}s",
fg="cyan",
)
)
run_finished_at = datetime.datetime.now(datetime.UTC)
elapsed = run_finished_at - run_started_at
click.echo(
click.style(
f"Workflow run archiving completed. start={run_started_at.isoformat()} "
f"end={run_finished_at.isoformat()} duration={elapsed}",
fg="green",
)
)
@click.command(
"restore-workflow-runs",
help="Restore archived workflow runs from S3-compatible storage.",
)
@click.option(
"--tenant-ids",
required=False,
help="Tenant IDs (comma-separated).",
)
@click.option("--run-id", required=False, help="Workflow run ID to restore.")
@click.option(
"--start-from",
type=click.DateTime(formats=["%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"]),
default=None,
help="Optional lower bound (inclusive) for created_at; must be paired with --end-before.",
)
@click.option(
"--end-before",
type=click.DateTime(formats=["%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"]),
default=None,
help="Optional upper bound (exclusive) for created_at; must be paired with --start-from.",
)
@click.option("--workers", default=1, show_default=True, type=int, help="Concurrent workflow runs to restore.")
@click.option("--limit", type=int, default=100, show_default=True, help="Maximum number of runs to restore.")
@click.option("--dry-run", is_flag=True, help="Preview without restoring.")
def restore_workflow_runs(
tenant_ids: str | None,
run_id: str | None,
start_from: datetime.datetime | None,
end_before: datetime.datetime | None,
workers: int,
limit: int,
dry_run: bool,
):
"""
Restore an archived workflow run from storage to the database.
This restores the following tables:
- workflow_node_executions
- workflow_node_execution_offload
- workflow_pauses
- workflow_pause_reasons
- workflow_trigger_logs
"""
from services.retention.workflow_run.restore_archived_workflow_run import WorkflowRunRestore
parsed_tenant_ids = None
if tenant_ids:
parsed_tenant_ids = [tid.strip() for tid in tenant_ids.split(",") if tid.strip()]
if not parsed_tenant_ids:
raise click.BadParameter("tenant-ids must not be empty")
if (start_from is None) ^ (end_before is None):
raise click.UsageError("--start-from and --end-before must be provided together.")
if run_id is None and (start_from is None or end_before is None):
raise click.UsageError("--start-from and --end-before are required for batch restore.")
if workers < 1:
raise click.BadParameter("workers must be at least 1")
start_time = datetime.datetime.now(datetime.UTC)
click.echo(
click.style(
f"Starting restore of workflow run {run_id} at {start_time.isoformat()}.",
fg="white",
)
)
restorer = WorkflowRunRestore(dry_run=dry_run, workers=workers)
if run_id:
results = [restorer.restore_by_run_id(run_id)]
else:
assert start_from is not None
assert end_before is not None
results = restorer.restore_batch(
parsed_tenant_ids,
start_date=start_from,
end_date=end_before,
limit=limit,
)
end_time = datetime.datetime.now(datetime.UTC)
elapsed = end_time - start_time
successes = sum(1 for result in results if result.success)
failures = len(results) - successes
if failures == 0:
click.echo(
click.style(
f"Restore completed successfully. success={successes} duration={elapsed}",
fg="green",
)
)
else:
click.echo(
click.style(
f"Restore completed with failures. success={successes} failed={failures} duration={elapsed}",
fg="red",
)
)
@click.command(
"delete-archived-workflow-runs",
help="Delete archived workflow runs from the database.",
)
@click.option(
"--tenant-ids",
required=False,
help="Tenant IDs (comma-separated).",
)
@click.option("--run-id", required=False, help="Workflow run ID to delete.")
@click.option(
"--start-from",
type=click.DateTime(formats=["%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"]),
default=None,
help="Optional lower bound (inclusive) for created_at; must be paired with --end-before.",
)
@click.option(
"--end-before",
type=click.DateTime(formats=["%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"]),
default=None,
help="Optional upper bound (exclusive) for created_at; must be paired with --start-from.",
)
@click.option("--limit", type=int, default=100, show_default=True, help="Maximum number of runs to delete.")
@click.option("--dry-run", is_flag=True, help="Preview without deleting.")
def delete_archived_workflow_runs(
tenant_ids: str | None,
run_id: str | None,
start_from: datetime.datetime | None,
end_before: datetime.datetime | None,
limit: int,
dry_run: bool,
):
"""
Delete archived workflow runs from the database.
"""
from services.retention.workflow_run.delete_archived_workflow_run import ArchivedWorkflowRunDeletion
parsed_tenant_ids = None
if tenant_ids:
parsed_tenant_ids = [tid.strip() for tid in tenant_ids.split(",") if tid.strip()]
if not parsed_tenant_ids:
raise click.BadParameter("tenant-ids must not be empty")
if (start_from is None) ^ (end_before is None):
raise click.UsageError("--start-from and --end-before must be provided together.")
if run_id is None and (start_from is None or end_before is None):
raise click.UsageError("--start-from and --end-before are required for batch delete.")
start_time = datetime.datetime.now(datetime.UTC)
target_desc = f"workflow run {run_id}" if run_id else "workflow runs"
click.echo(
click.style(
f"Starting delete of {target_desc} at {start_time.isoformat()}.",
fg="white",
)
)
deleter = ArchivedWorkflowRunDeletion(dry_run=dry_run)
if run_id:
results = [deleter.delete_by_run_id(run_id)]
else:
assert start_from is not None
assert end_before is not None
results = deleter.delete_batch(
parsed_tenant_ids,
start_date=start_from,
end_date=end_before,
limit=limit,
)
for result in results:
if result.success:
click.echo(
click.style(
f"{'[DRY RUN] Would delete' if dry_run else 'Deleted'} "
f"workflow run {result.run_id} (tenant={result.tenant_id})",
fg="green",
)
)
else:
click.echo(
click.style(
f"Failed to delete workflow run {result.run_id}: {result.error}",
fg="red",
)
)
end_time = datetime.datetime.now(datetime.UTC)
elapsed = end_time - start_time
successes = sum(1 for result in results if result.success)
failures = len(results) - successes
if failures == 0:
click.echo(
click.style(
f"Delete completed successfully. success={successes} duration={elapsed}",
fg="green",
)
)
else:
click.echo(
click.style(
f"Delete completed with failures. success={successes} failed={failures} duration={elapsed}",
fg="red",
)
)
@click.option("-f", "--force", is_flag=True, help="Skip user confirmation and force the command to execute.")
@click.command("clear-orphaned-file-records", help="Clear orphaned file records.")
def clear_orphaned_file_records(force: bool):

View File

@@ -82,13 +82,13 @@ class ProviderNotSupportSpeechToTextError(BaseHTTPException):
class DraftWorkflowNotExist(BaseHTTPException):
error_code = "draft_workflow_not_exist"
description = "Draft workflow need to be initialized."
code = 400
code = 404
class DraftWorkflowNotSync(BaseHTTPException):
error_code = "draft_workflow_not_sync"
description = "Workflow graph might have been modified, please refresh and resubmit."
code = 400
code = 409
class TracingConfigNotExist(BaseHTTPException):

View File

@@ -470,7 +470,7 @@ class AdvancedChatDraftRunLoopNodeApi(Resource):
Run draft workflow loop node
"""
current_user, _ = current_account_with_tenant()
args = LoopNodeRunPayload.model_validate(console_ns.payload or {}).model_dump(exclude_none=True)
args = LoopNodeRunPayload.model_validate(console_ns.payload or {})
try:
response = AppGenerateService.generate_single_loop(
@@ -508,7 +508,7 @@ class WorkflowDraftRunLoopNodeApi(Resource):
Run draft workflow loop node
"""
current_user, _ = current_account_with_tenant()
args = LoopNodeRunPayload.model_validate(console_ns.payload or {}).model_dump(exclude_none=True)
args = LoopNodeRunPayload.model_validate(console_ns.payload or {})
try:
response = AppGenerateService.generate_single_loop(
@@ -999,6 +999,7 @@ class DraftWorkflowTriggerRunApi(Resource):
if not event:
return jsonable_encoder({"status": "waiting", "retry_in": LISTENING_RETRY_IN})
workflow_args = dict(event.workflow_args)
workflow_args[SKIP_PREPARE_USER_INPUTS_KEY] = True
return helper.compact_generate_response(
AppGenerateService.generate(
@@ -1147,6 +1148,7 @@ class DraftWorkflowTriggerRunAllApi(Resource):
try:
workflow_args = dict(trigger_debug_event.workflow_args)
workflow_args[SKIP_PREPARE_USER_INPUTS_KEY] = True
response = AppGenerateService.generate(
app_model=app_model,

View File

@@ -11,7 +11,10 @@ from controllers.console.app.wraps import get_app_model
from controllers.console.wraps import account_initialization_required, setup_required
from core.workflow.enums import WorkflowExecutionStatus
from extensions.ext_database import db
from fields.workflow_app_log_fields import build_workflow_app_log_pagination_model
from fields.workflow_app_log_fields import (
build_workflow_app_log_pagination_model,
build_workflow_archived_log_pagination_model,
)
from libs.login import login_required
from models import App
from models.model import AppMode
@@ -61,6 +64,7 @@ console_ns.schema_model(
# Register model for flask_restx to avoid dict type issues in Swagger
workflow_app_log_pagination_model = build_workflow_app_log_pagination_model(console_ns)
workflow_archived_log_pagination_model = build_workflow_archived_log_pagination_model(console_ns)
@console_ns.route("/apps/<uuid:app_id>/workflow-app-logs")
@@ -99,3 +103,33 @@ class WorkflowAppLogApi(Resource):
)
return workflow_app_log_pagination
@console_ns.route("/apps/<uuid:app_id>/workflow-archived-logs")
class WorkflowArchivedLogApi(Resource):
@console_ns.doc("get_workflow_archived_logs")
@console_ns.doc(description="Get workflow archived execution logs")
@console_ns.doc(params={"app_id": "Application ID"})
@console_ns.expect(console_ns.models[WorkflowAppLogQuery.__name__])
@console_ns.response(200, "Workflow archived logs retrieved successfully", workflow_archived_log_pagination_model)
@setup_required
@login_required
@account_initialization_required
@get_app_model(mode=[AppMode.WORKFLOW])
@marshal_with(workflow_archived_log_pagination_model)
def get(self, app_model: App):
"""
Get workflow archived logs
"""
args = WorkflowAppLogQuery.model_validate(request.args.to_dict(flat=True)) # type: ignore
workflow_app_service = WorkflowAppService()
with Session(db.engine) as session:
workflow_app_log_pagination = workflow_app_service.get_paginate_workflow_archive_logs(
session=session,
app_model=app_model,
page=args.page,
limit=args.limit,
)
return workflow_app_log_pagination

View File

@@ -1,12 +1,15 @@
from datetime import UTC, datetime, timedelta
from typing import Literal, cast
from flask import request
from flask_restx import Resource, fields, marshal_with
from pydantic import BaseModel, Field, field_validator
from sqlalchemy import select
from controllers.console import console_ns
from controllers.console.app.wraps import get_app_model
from controllers.console.wraps import account_initialization_required, setup_required
from extensions.ext_database import db
from fields.end_user_fields import simple_end_user_fields
from fields.member_fields import simple_account_fields
from fields.workflow_run_fields import (
@@ -19,14 +22,17 @@ from fields.workflow_run_fields import (
workflow_run_node_execution_list_fields,
workflow_run_pagination_fields,
)
from libs.archive_storage import ArchiveStorageNotConfiguredError, get_archive_storage
from libs.custom_inputs import time_duration
from libs.helper import uuid_value
from libs.login import current_user, login_required
from models import Account, App, AppMode, EndUser, WorkflowRunTriggeredFrom
from models import Account, App, AppMode, EndUser, WorkflowArchiveLog, WorkflowRunTriggeredFrom
from services.retention.workflow_run.constants import ARCHIVE_BUNDLE_NAME
from services.workflow_run_service import WorkflowRunService
# Workflow run status choices for filtering
WORKFLOW_RUN_STATUS_CHOICES = ["running", "succeeded", "failed", "stopped", "partial-succeeded"]
EXPORT_SIGNED_URL_EXPIRE_SECONDS = 3600
# Register models for flask_restx to avoid dict type issues in Swagger
# Register in dependency order: base models first, then dependent models
@@ -93,6 +99,15 @@ workflow_run_node_execution_list_model = console_ns.model(
"WorkflowRunNodeExecutionList", workflow_run_node_execution_list_fields_copy
)
workflow_run_export_fields = console_ns.model(
"WorkflowRunExport",
{
"status": fields.String(description="Export status: success/failed"),
"presigned_url": fields.String(description="Pre-signed URL for download", required=False),
"presigned_url_expires_at": fields.String(description="Pre-signed URL expiration time", required=False),
},
)
DEFAULT_REF_TEMPLATE_SWAGGER_2_0 = "#/definitions/{model}"
@@ -181,6 +196,56 @@ class AdvancedChatAppWorkflowRunListApi(Resource):
return result
@console_ns.route("/apps/<uuid:app_id>/workflow-runs/<uuid:run_id>/export")
class WorkflowRunExportApi(Resource):
@console_ns.doc("get_workflow_run_export_url")
@console_ns.doc(description="Generate a download URL for an archived workflow run.")
@console_ns.doc(params={"app_id": "Application ID", "run_id": "Workflow run ID"})
@console_ns.response(200, "Export URL generated", workflow_run_export_fields)
@setup_required
@login_required
@account_initialization_required
@get_app_model()
def get(self, app_model: App, run_id: str):
tenant_id = str(app_model.tenant_id)
app_id = str(app_model.id)
run_id_str = str(run_id)
run_created_at = db.session.scalar(
select(WorkflowArchiveLog.run_created_at)
.where(
WorkflowArchiveLog.tenant_id == tenant_id,
WorkflowArchiveLog.app_id == app_id,
WorkflowArchiveLog.workflow_run_id == run_id_str,
)
.limit(1)
)
if not run_created_at:
return {"code": "archive_log_not_found", "message": "workflow run archive not found"}, 404
prefix = (
f"{tenant_id}/app_id={app_id}/year={run_created_at.strftime('%Y')}/"
f"month={run_created_at.strftime('%m')}/workflow_run_id={run_id_str}"
)
archive_key = f"{prefix}/{ARCHIVE_BUNDLE_NAME}"
try:
archive_storage = get_archive_storage()
except ArchiveStorageNotConfiguredError as e:
return {"code": "archive_storage_not_configured", "message": str(e)}, 500
presigned_url = archive_storage.generate_presigned_url(
archive_key,
expires_in=EXPORT_SIGNED_URL_EXPIRE_SECONDS,
)
expires_at = datetime.now(UTC) + timedelta(seconds=EXPORT_SIGNED_URL_EXPIRE_SECONDS)
return {
"status": "success",
"presigned_url": presigned_url,
"presigned_url_expires_at": expires_at.isoformat(),
}, 200
@console_ns.route("/apps/<uuid:app_id>/advanced-chat/workflow-runs/count")
class AdvancedChatAppWorkflowRunCountApi(Resource):
@console_ns.doc("get_advanced_chat_workflow_runs_count")

View File

@@ -36,6 +36,16 @@ class NotionEstimatePayload(BaseModel):
doc_language: str = Field(default="English")
class DataSourceNotionListQuery(BaseModel):
dataset_id: str | None = Field(default=None, description="Dataset ID")
credential_id: str = Field(..., description="Credential ID", min_length=1)
datasource_parameters: dict[str, Any] | None = Field(default=None, description="Datasource parameters JSON string")
class DataSourceNotionPreviewQuery(BaseModel):
credential_id: str = Field(..., description="Credential ID", min_length=1)
register_schema_model(console_ns, NotionEstimatePayload)
@@ -136,26 +146,15 @@ class DataSourceNotionListApi(Resource):
def get(self):
current_user, current_tenant_id = current_account_with_tenant()
dataset_id = request.args.get("dataset_id", default=None, type=str)
credential_id = request.args.get("credential_id", default=None, type=str)
if not credential_id:
raise ValueError("Credential id is required.")
query = DataSourceNotionListQuery.model_validate(request.args.to_dict())
# Get datasource_parameters from query string (optional, for GitHub and other datasources)
datasource_parameters_str = request.args.get("datasource_parameters", default=None, type=str)
datasource_parameters = {}
if datasource_parameters_str:
try:
datasource_parameters = json.loads(datasource_parameters_str)
if not isinstance(datasource_parameters, dict):
raise ValueError("datasource_parameters must be a JSON object.")
except json.JSONDecodeError:
raise ValueError("Invalid datasource_parameters JSON format.")
datasource_parameters = query.datasource_parameters or {}
datasource_provider_service = DatasourceProviderService()
credential = datasource_provider_service.get_datasource_credentials(
tenant_id=current_tenant_id,
credential_id=credential_id,
credential_id=query.credential_id,
provider="notion_datasource",
plugin_id="langgenius/notion_datasource",
)
@@ -164,8 +163,8 @@ class DataSourceNotionListApi(Resource):
exist_page_ids = []
with Session(db.engine) as session:
# import notion in the exist dataset
if dataset_id:
dataset = DatasetService.get_dataset(dataset_id)
if query.dataset_id:
dataset = DatasetService.get_dataset(query.dataset_id)
if not dataset:
raise NotFound("Dataset not found.")
if dataset.data_source_type != "notion_import":
@@ -173,7 +172,7 @@ class DataSourceNotionListApi(Resource):
documents = session.scalars(
select(Document).filter_by(
dataset_id=dataset_id,
dataset_id=query.dataset_id,
tenant_id=current_tenant_id,
data_source_type="notion_import",
enabled=True,
@@ -240,13 +239,12 @@ class DataSourceNotionApi(Resource):
def get(self, page_id, page_type):
_, current_tenant_id = current_account_with_tenant()
credential_id = request.args.get("credential_id", default=None, type=str)
if not credential_id:
raise ValueError("Credential id is required.")
query = DataSourceNotionPreviewQuery.model_validate(request.args.to_dict())
datasource_provider_service = DatasourceProviderService()
credential = datasource_provider_service.get_datasource_credentials(
tenant_id=current_tenant_id,
credential_id=credential_id,
credential_id=query.credential_id,
provider="notion_datasource",
plugin_id="langgenius/notion_datasource",
)

View File

@@ -176,7 +176,18 @@ class IndexingEstimatePayload(BaseModel):
return result
register_schema_models(console_ns, DatasetCreatePayload, DatasetUpdatePayload, IndexingEstimatePayload)
class ConsoleDatasetListQuery(BaseModel):
page: int = Field(default=1, description="Page number")
limit: int = Field(default=20, description="Number of items per page")
keyword: str | None = Field(default=None, description="Search keyword")
include_all: bool = Field(default=False, description="Include all datasets")
ids: list[str] = Field(default_factory=list, description="Filter by dataset IDs")
tag_ids: list[str] = Field(default_factory=list, description="Filter by tag IDs")
register_schema_models(
console_ns, DatasetCreatePayload, DatasetUpdatePayload, IndexingEstimatePayload, ConsoleDatasetListQuery
)
def _get_retrieval_methods_by_vector_type(vector_type: str | None, is_mock: bool = False) -> dict[str, list[str]]:
@@ -275,18 +286,19 @@ class DatasetListApi(Resource):
@enterprise_license_required
def get(self):
current_user, current_tenant_id = current_account_with_tenant()
page = request.args.get("page", default=1, type=int)
limit = request.args.get("limit", default=20, type=int)
ids = request.args.getlist("ids")
query = ConsoleDatasetListQuery.model_validate(request.args.to_dict(flat=False))
# provider = request.args.get("provider", default="vendor")
search = request.args.get("keyword", default=None, type=str)
tag_ids = request.args.getlist("tag_ids")
include_all = request.args.get("include_all", default="false").lower() == "true"
if ids:
datasets, total = DatasetService.get_datasets_by_ids(ids, current_tenant_id)
if query.ids:
datasets, total = DatasetService.get_datasets_by_ids(query.ids, current_tenant_id)
else:
datasets, total = DatasetService.get_datasets(
page, limit, current_tenant_id, current_user, search, tag_ids, include_all
query.page,
query.limit,
current_tenant_id,
current_user,
query.keyword,
query.tag_ids,
query.include_all,
)
# check embedding setting
@@ -318,7 +330,13 @@ class DatasetListApi(Resource):
else:
item.update({"partial_member_list": []})
response = {"data": data, "has_more": len(datasets) == limit, "limit": limit, "total": total, "page": page}
response = {
"data": data,
"has_more": len(datasets) == query.limit,
"limit": query.limit,
"total": total,
"page": query.page,
}
return response, 200
@console_ns.doc("create_dataset")

View File

@@ -98,12 +98,19 @@ class BedrockRetrievalPayload(BaseModel):
knowledge_id: str
class ExternalApiTemplateListQuery(BaseModel):
page: int = Field(default=1, description="Page number")
limit: int = Field(default=20, description="Number of items per page")
keyword: str | None = Field(default=None, description="Search keyword")
register_schema_models(
console_ns,
ExternalKnowledgeApiPayload,
ExternalDatasetCreatePayload,
ExternalHitTestingPayload,
BedrockRetrievalPayload,
ExternalApiTemplateListQuery,
)
@@ -124,19 +131,17 @@ class ExternalApiTemplateListApi(Resource):
@account_initialization_required
def get(self):
_, current_tenant_id = current_account_with_tenant()
page = request.args.get("page", default=1, type=int)
limit = request.args.get("limit", default=20, type=int)
search = request.args.get("keyword", default=None, type=str)
query = ExternalApiTemplateListQuery.model_validate(request.args.to_dict())
external_knowledge_apis, total = ExternalDatasetService.get_external_knowledge_apis(
page, limit, current_tenant_id, search
query.page, query.limit, current_tenant_id, query.keyword
)
response = {
"data": [item.to_dict() for item in external_knowledge_apis],
"has_more": len(external_knowledge_apis) == limit,
"limit": limit,
"has_more": len(external_knowledge_apis) == query.limit,
"limit": query.limit,
"total": total,
"page": page,
"page": query.page,
}
return response, 200

View File

@@ -3,7 +3,7 @@ from typing import Any
from flask import request
from flask_restx import Resource, marshal_with
from pydantic import BaseModel
from pydantic import BaseModel, Field
from sqlalchemy import and_, select
from werkzeug.exceptions import BadRequest, Forbidden, NotFound
@@ -28,6 +28,10 @@ class InstalledAppUpdatePayload(BaseModel):
is_pinned: bool | None = None
class InstalledAppsListQuery(BaseModel):
app_id: str | None = Field(default=None, description="App ID to filter by")
logger = logging.getLogger(__name__)
@@ -37,13 +41,13 @@ class InstalledAppsListApi(Resource):
@account_initialization_required
@marshal_with(installed_app_list_fields)
def get(self):
app_id = request.args.get("app_id", default=None, type=str)
query = InstalledAppsListQuery.model_validate(request.args.to_dict())
current_user, current_tenant_id = current_account_with_tenant()
if app_id:
if query.app_id:
installed_apps = db.session.scalars(
select(InstalledApp).where(
and_(InstalledApp.tenant_id == current_tenant_id, InstalledApp.app_id == app_id)
and_(InstalledApp.tenant_id == current_tenant_id, InstalledApp.app_id == query.app_id)
)
).all()
else:

View File

@@ -1,6 +1,7 @@
from flask_restx import Resource, fields
from werkzeug.exceptions import Unauthorized
from libs.login import current_account_with_tenant, login_required
from libs.login import current_account_with_tenant, current_user, login_required
from services.feature_service import FeatureService
from . import console_ns
@@ -39,5 +40,21 @@ class SystemFeatureApi(Resource):
),
)
def get(self):
"""Get system-wide feature configuration"""
return FeatureService.get_system_features().model_dump()
"""Get system-wide feature configuration
NOTE: This endpoint is unauthenticated by design, as it provides system features
data required for dashboard initialization.
Authentication would create circular dependency (can't login without dashboard loading).
Only non-sensitive configuration data should be returned by this endpoint.
"""
# NOTE(QuantumGhost): ideally we should access `current_user.is_authenticated`
# without a try-catch. However, due to the implementation of user loader (the `load_user_from_request`
# in api/extensions/ext_login.py), accessing `current_user.is_authenticated` will
# raise `Unauthorized` exception if authentication token is not provided.
try:
is_authenticated = current_user.is_authenticated
except Unauthorized:
is_authenticated = False
return FeatureService.get_system_features(is_authenticated=is_authenticated).model_dump()

View File

@@ -1,17 +1,17 @@
from flask_restx import Resource, fields
from pydantic import BaseModel, Field
from . import console_ns
from controllers.fastopenapi import console_router
@console_ns.route("/ping")
class PingApi(Resource):
@console_ns.doc("health_check")
@console_ns.doc(description="Health check endpoint for connection testing")
@console_ns.response(
200,
"Success",
console_ns.model("PingResponse", {"result": fields.String(description="Health check result", example="pong")}),
)
def get(self):
"""Health check endpoint for connection testing"""
return {"result": "pong"}
class PingResponse(BaseModel):
result: str = Field(description="Health check result", examples=["pong"])
@console_router.get(
"/ping",
response_model=PingResponse,
tags=["console"],
)
def ping() -> PingResponse:
"""Health check endpoint for connection testing."""
return PingResponse(result="pong")

View File

@@ -1,20 +1,19 @@
from typing import Literal
from flask import request
from flask_restx import Resource, fields
from pydantic import BaseModel, Field, field_validator
from configs import dify_config
from controllers.fastopenapi import console_router
from libs.helper import EmailStr, extract_remote_ip
from libs.password import valid_password
from models.model import DifySetup, db
from services.account_service import RegisterService, TenantService
from . import console_ns
from .error import AlreadySetupError, NotInitValidateError
from .init_validate import get_init_validate_status
from .wraps import only_edition_self_hosted
DEFAULT_REF_TEMPLATE_SWAGGER_2_0 = "#/definitions/{model}"
class SetupRequestPayload(BaseModel):
email: EmailStr = Field(..., description="Admin email address")
@@ -28,78 +27,66 @@ class SetupRequestPayload(BaseModel):
return valid_password(value)
console_ns.schema_model(
SetupRequestPayload.__name__,
SetupRequestPayload.model_json_schema(ref_template=DEFAULT_REF_TEMPLATE_SWAGGER_2_0),
class SetupStatusResponse(BaseModel):
step: Literal["not_started", "finished"] = Field(description="Setup step status")
setup_at: str | None = Field(default=None, description="Setup completion time (ISO format)")
class SetupResponse(BaseModel):
result: str = Field(description="Setup result", examples=["success"])
@console_router.get(
"/setup",
response_model=SetupStatusResponse,
tags=["console"],
)
def get_setup_status_api() -> SetupStatusResponse:
"""Get system setup status."""
if dify_config.EDITION == "SELF_HOSTED":
setup_status = get_setup_status()
if setup_status and not isinstance(setup_status, bool):
return SetupStatusResponse(step="finished", setup_at=setup_status.setup_at.isoformat())
if setup_status:
return SetupStatusResponse(step="finished")
return SetupStatusResponse(step="not_started")
return SetupStatusResponse(step="finished")
@console_ns.route("/setup")
class SetupApi(Resource):
@console_ns.doc("get_setup_status")
@console_ns.doc(description="Get system setup status")
@console_ns.response(
200,
"Success",
console_ns.model(
"SetupStatusResponse",
{
"step": fields.String(description="Setup step status", enum=["not_started", "finished"]),
"setup_at": fields.String(description="Setup completion time (ISO format)", required=False),
},
),
@console_router.post(
"/setup",
response_model=SetupResponse,
tags=["console"],
status_code=201,
)
@only_edition_self_hosted
def setup_system(payload: SetupRequestPayload) -> SetupResponse:
"""Initialize system setup with admin account."""
if get_setup_status():
raise AlreadySetupError()
tenant_count = TenantService.get_tenant_count()
if tenant_count > 0:
raise AlreadySetupError()
if not get_init_validate_status():
raise NotInitValidateError()
normalized_email = payload.email.lower()
RegisterService.setup(
email=normalized_email,
name=payload.name,
password=payload.password,
ip_address=extract_remote_ip(request),
language=payload.language,
)
def get(self):
"""Get system setup status"""
if dify_config.EDITION == "SELF_HOSTED":
setup_status = get_setup_status()
# Check if setup_status is a DifySetup object rather than a bool
if setup_status and not isinstance(setup_status, bool):
return {"step": "finished", "setup_at": setup_status.setup_at.isoformat()}
elif setup_status:
return {"step": "finished"}
return {"step": "not_started"}
return {"step": "finished"}
@console_ns.doc("setup_system")
@console_ns.doc(description="Initialize system setup with admin account")
@console_ns.expect(console_ns.models[SetupRequestPayload.__name__])
@console_ns.response(
201, "Success", console_ns.model("SetupResponse", {"result": fields.String(description="Setup result")})
)
@console_ns.response(400, "Already setup or validation failed")
@only_edition_self_hosted
def post(self):
"""Initialize system setup with admin account"""
# is set up
if get_setup_status():
raise AlreadySetupError()
# is tenant created
tenant_count = TenantService.get_tenant_count()
if tenant_count > 0:
raise AlreadySetupError()
if not get_init_validate_status():
raise NotInitValidateError()
args = SetupRequestPayload.model_validate(console_ns.payload)
normalized_email = args.email.lower()
# setup
RegisterService.setup(
email=normalized_email,
name=args.name,
password=args.password,
ip_address=extract_remote_ip(request),
language=args.language,
)
return {"result": "success"}, 201
return SetupResponse(result="success")
def get_setup_status():
def get_setup_status() -> DifySetup | bool | None:
if dify_config.EDITION == "SELF_HOSTED":
return db.session.query(DifySetup).first()
else:
return True
return True

View File

@@ -40,6 +40,7 @@ register_schema_models(
TagBasePayload,
TagBindingPayload,
TagBindingRemovePayload,
TagListQueryParam,
)

View File

@@ -1,15 +1,11 @@
import json
import logging
import httpx
from flask import request
from flask_restx import Resource, fields
from packaging import version
from pydantic import BaseModel, Field
from configs import dify_config
from . import console_ns
from controllers.fastopenapi import console_router
logger = logging.getLogger(__name__)
@@ -18,69 +14,61 @@ class VersionQuery(BaseModel):
current_version: str = Field(..., description="Current application version")
console_ns.schema_model(
VersionQuery.__name__,
VersionQuery.model_json_schema(ref_template="#/definitions/{model}"),
class VersionFeatures(BaseModel):
can_replace_logo: bool = Field(description="Whether logo replacement is supported")
model_load_balancing_enabled: bool = Field(description="Whether model load balancing is enabled")
class VersionResponse(BaseModel):
version: str = Field(description="Latest version number")
release_date: str = Field(description="Release date of latest version")
release_notes: str = Field(description="Release notes for latest version")
can_auto_update: bool = Field(description="Whether auto-update is supported")
features: VersionFeatures = Field(description="Feature flags and capabilities")
@console_router.get(
"/version",
response_model=VersionResponse,
tags=["console"],
)
def check_version_update(query: VersionQuery) -> VersionResponse:
"""Check for application version updates."""
check_update_url = dify_config.CHECK_UPDATE_URL
@console_ns.route("/version")
class VersionApi(Resource):
@console_ns.doc("check_version_update")
@console_ns.doc(description="Check for application version updates")
@console_ns.expect(console_ns.models[VersionQuery.__name__])
@console_ns.response(
200,
"Success",
console_ns.model(
"VersionResponse",
{
"version": fields.String(description="Latest version number"),
"release_date": fields.String(description="Release date of latest version"),
"release_notes": fields.String(description="Release notes for latest version"),
"can_auto_update": fields.Boolean(description="Whether auto-update is supported"),
"features": fields.Raw(description="Feature flags and capabilities"),
},
result = VersionResponse(
version=dify_config.project.version,
release_date="",
release_notes="",
can_auto_update=False,
features=VersionFeatures(
can_replace_logo=dify_config.CAN_REPLACE_LOGO,
model_load_balancing_enabled=dify_config.MODEL_LB_ENABLED,
),
)
def get(self):
"""Check for application version updates"""
args = VersionQuery.model_validate(request.args.to_dict(flat=True)) # type: ignore
check_update_url = dify_config.CHECK_UPDATE_URL
result = {
"version": dify_config.project.version,
"release_date": "",
"release_notes": "",
"can_auto_update": False,
"features": {
"can_replace_logo": dify_config.CAN_REPLACE_LOGO,
"model_load_balancing_enabled": dify_config.MODEL_LB_ENABLED,
},
}
if not check_update_url:
return result
try:
response = httpx.get(
check_update_url,
params={"current_version": args.current_version},
timeout=httpx.Timeout(timeout=10.0, connect=3.0),
)
except Exception as error:
logger.warning("Check update version error: %s.", str(error))
result["version"] = args.current_version
return result
content = json.loads(response.content)
if _has_new_version(latest_version=content["version"], current_version=f"{args.current_version}"):
result["version"] = content["version"]
result["release_date"] = content["releaseDate"]
result["release_notes"] = content["releaseNotes"]
result["can_auto_update"] = content["canAutoUpdate"]
if not check_update_url:
return result
try:
response = httpx.get(
check_update_url,
params={"current_version": query.current_version},
timeout=httpx.Timeout(timeout=10.0, connect=3.0),
)
content = response.json()
except Exception as error:
logger.warning("Check update version error: %s.", str(error))
result.version = query.current_version
return result
latest_version = content.get("version", result.version)
if _has_new_version(latest_version=latest_version, current_version=f"{query.current_version}"):
result.version = latest_version
result.release_date = content.get("releaseDate", "")
result.release_notes = content.get("releaseNotes", "")
result.can_auto_update = content.get("canAutoUpdate", False)
return result
def _has_new_version(*, latest_version: str, current_version: str) -> bool:
try:

View File

@@ -0,0 +1,3 @@
from fastopenapi.routers import FlaskRouter
console_router = FlaskRouter()

View File

@@ -87,6 +87,14 @@ class TagUnbindingPayload(BaseModel):
target_id: str
class DatasetListQuery(BaseModel):
page: int = Field(default=1, description="Page number")
limit: int = Field(default=20, description="Number of items per page")
keyword: str | None = Field(default=None, description="Search keyword")
include_all: bool = Field(default=False, description="Include all datasets")
tag_ids: list[str] = Field(default_factory=list, description="Filter by tag IDs")
register_schema_models(
service_api_ns,
DatasetCreatePayload,
@@ -96,6 +104,7 @@ register_schema_models(
TagDeletePayload,
TagBindingPayload,
TagUnbindingPayload,
DatasetListQuery,
)
@@ -113,15 +122,11 @@ class DatasetListApi(DatasetApiResource):
)
def get(self, tenant_id):
"""Resource for getting datasets."""
page = request.args.get("page", default=1, type=int)
limit = request.args.get("limit", default=20, type=int)
query = DatasetListQuery.model_validate(request.args.to_dict(flat=False))
# provider = request.args.get("provider", default="vendor")
search = request.args.get("keyword", default=None, type=str)
tag_ids = request.args.getlist("tag_ids")
include_all = request.args.get("include_all", default="false").lower() == "true"
datasets, total = DatasetService.get_datasets(
page, limit, tenant_id, current_user, search, tag_ids, include_all
query.page, query.limit, tenant_id, current_user, query.keyword, query.tag_ids, query.include_all
)
# check embedding setting
provider_manager = ProviderManager()
@@ -147,7 +152,13 @@ class DatasetListApi(DatasetApiResource):
item["embedding_available"] = False
else:
item["embedding_available"] = True
response = {"data": data, "has_more": len(datasets) == limit, "limit": limit, "total": total, "page": page}
response = {
"data": data,
"has_more": len(datasets) == query.limit,
"limit": query.limit,
"total": total,
"page": query.page,
}
return response, 200
@service_api_ns.expect(service_api_ns.models[DatasetCreatePayload.__name__])

View File

@@ -69,7 +69,14 @@ class DocumentTextUpdate(BaseModel):
return self
for m in [ProcessRule, RetrievalModel, DocumentTextCreatePayload, DocumentTextUpdate]:
class DocumentListQuery(BaseModel):
page: int = Field(default=1, description="Page number")
limit: int = Field(default=20, description="Number of items per page")
keyword: str | None = Field(default=None, description="Search keyword")
status: str | None = Field(default=None, description="Document status filter")
for m in [ProcessRule, RetrievalModel, DocumentTextCreatePayload, DocumentTextUpdate, DocumentListQuery]:
service_api_ns.schema_model(m.__name__, m.model_json_schema(ref_template=DEFAULT_REF_TEMPLATE_SWAGGER_2_0)) # type: ignore
@@ -261,17 +268,6 @@ class DocumentAddByFileApi(DatasetApiResource):
@cloud_edition_billing_rate_limit_check("knowledge", "dataset")
def post(self, tenant_id, dataset_id):
"""Create document by upload file."""
args = {}
if "data" in request.form:
args = json.loads(request.form["data"])
if "doc_form" not in args:
args["doc_form"] = "text_model"
if "doc_language" not in args:
args["doc_language"] = "English"
# get dataset info
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
dataset = db.session.query(Dataset).where(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset:
@@ -280,6 +276,18 @@ class DocumentAddByFileApi(DatasetApiResource):
if dataset.provider == "external":
raise ValueError("External datasets are not supported.")
args = {}
if "data" in request.form:
args = json.loads(request.form["data"])
if "doc_form" not in args:
args["doc_form"] = dataset.chunk_structure or "text_model"
if "doc_language" not in args:
args["doc_language"] = "English"
# get dataset info
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
indexing_technique = args.get("indexing_technique") or dataset.indexing_technique
if not indexing_technique:
raise ValueError("indexing_technique is required.")
@@ -370,17 +378,6 @@ class DocumentUpdateByFileApi(DatasetApiResource):
@cloud_edition_billing_rate_limit_check("knowledge", "dataset")
def post(self, tenant_id, dataset_id, document_id):
"""Update document by upload file."""
args = {}
if "data" in request.form:
args = json.loads(request.form["data"])
if "doc_form" not in args:
args["doc_form"] = "text_model"
if "doc_language" not in args:
args["doc_language"] = "English"
# get dataset info
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
dataset = db.session.query(Dataset).where(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset:
@@ -389,6 +386,18 @@ class DocumentUpdateByFileApi(DatasetApiResource):
if dataset.provider == "external":
raise ValueError("External datasets are not supported.")
args = {}
if "data" in request.form:
args = json.loads(request.form["data"])
if "doc_form" not in args:
args["doc_form"] = dataset.chunk_structure or "text_model"
if "doc_language" not in args:
args["doc_language"] = "English"
# get dataset info
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
# indexing_technique is already set in dataset since this is an update
args["indexing_technique"] = dataset.indexing_technique
@@ -458,34 +467,33 @@ class DocumentListApi(DatasetApiResource):
def get(self, tenant_id, dataset_id):
dataset_id = str(dataset_id)
tenant_id = str(tenant_id)
page = request.args.get("page", default=1, type=int)
limit = request.args.get("limit", default=20, type=int)
search = request.args.get("keyword", default=None, type=str)
status = request.args.get("status", default=None, type=str)
query_params = DocumentListQuery.model_validate(request.args.to_dict())
dataset = db.session.query(Dataset).where(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
if not dataset:
raise NotFound("Dataset not found.")
query = select(Document).filter_by(dataset_id=str(dataset_id), tenant_id=tenant_id)
if status:
query = DocumentService.apply_display_status_filter(query, status)
if query_params.status:
query = DocumentService.apply_display_status_filter(query, query_params.status)
if search:
search = f"%{search}%"
if query_params.keyword:
search = f"%{query_params.keyword}%"
query = query.where(Document.name.like(search))
query = query.order_by(desc(Document.created_at), desc(Document.position))
paginated_documents = db.paginate(select=query, page=page, per_page=limit, max_per_page=100, error_out=False)
paginated_documents = db.paginate(
select=query, page=query_params.page, per_page=query_params.limit, max_per_page=100, error_out=False
)
documents = paginated_documents.items
response = {
"data": marshal(documents, document_fields),
"has_more": len(documents) == limit,
"limit": limit,
"has_more": len(documents) == query_params.limit,
"limit": query_params.limit,
"total": paginated_documents.total,
"page": page,
"page": query_params.page,
}
return response

View File

@@ -11,7 +11,9 @@ from controllers.service_api.wraps import DatasetApiResource, cloud_edition_bill
from fields.dataset_fields import dataset_metadata_fields
from services.dataset_service import DatasetService
from services.entities.knowledge_entities.knowledge_entities import (
DocumentMetadataOperation,
MetadataArgs,
MetadataDetail,
MetadataOperationData,
)
from services.metadata_service import MetadataService
@@ -22,7 +24,13 @@ class MetadataUpdatePayload(BaseModel):
register_schema_model(service_api_ns, MetadataUpdatePayload)
register_schema_models(service_api_ns, MetadataArgs, MetadataOperationData)
register_schema_models(
service_api_ns,
MetadataArgs,
MetadataDetail,
DocumentMetadataOperation,
MetadataOperationData,
)
@service_api_ns.route("/datasets/<uuid:dataset_id>/metadata")

View File

@@ -17,5 +17,15 @@ class SystemFeatureApi(Resource):
Returns:
dict: System feature configuration object
This endpoint is akin to the `SystemFeatureApi` endpoint in api/controllers/console/feature.py,
except it is intended for use by the web app, instead of the console dashboard.
NOTE: This endpoint is unauthenticated by design, as it provides system features
data required for webapp initialization.
Authentication would create circular dependency (can't authenticate without webapp loading).
Only non-sensitive configuration data should be returned by this endpoint.
"""
return FeatureService.get_system_features().model_dump()

View File

@@ -1,9 +1,11 @@
from flask import make_response, request
from flask_restx import Resource, reqparse
from flask_restx import Resource
from jwt import InvalidTokenError
from pydantic import BaseModel, Field, field_validator
import services
from configs import dify_config
from controllers.common.schema import register_schema_models
from controllers.console.auth.error import (
AuthenticationFailedError,
EmailCodeError,
@@ -18,7 +20,7 @@ from controllers.console.wraps import (
)
from controllers.web import web_ns
from controllers.web.wraps import decode_jwt_token
from libs.helper import email
from libs.helper import EmailStr
from libs.passport import PassportService
from libs.password import valid_password
from libs.token import (
@@ -30,10 +32,35 @@ from services.app_service import AppService
from services.webapp_auth_service import WebAppAuthService
class LoginPayload(BaseModel):
email: EmailStr
password: str
@field_validator("password")
@classmethod
def validate_password(cls, value: str) -> str:
return valid_password(value)
class EmailCodeLoginSendPayload(BaseModel):
email: EmailStr
language: str | None = None
class EmailCodeLoginVerifyPayload(BaseModel):
email: EmailStr
code: str
token: str = Field(min_length=1)
register_schema_models(web_ns, LoginPayload, EmailCodeLoginSendPayload, EmailCodeLoginVerifyPayload)
@web_ns.route("/login")
class LoginApi(Resource):
"""Resource for web app email/password login."""
@web_ns.expect(web_ns.models[LoginPayload.__name__])
@setup_required
@only_edition_enterprise
@web_ns.doc("web_app_login")
@@ -50,15 +77,10 @@ class LoginApi(Resource):
@decrypt_password_field
def post(self):
"""Authenticate user and login."""
parser = (
reqparse.RequestParser()
.add_argument("email", type=email, required=True, location="json")
.add_argument("password", type=valid_password, required=True, location="json")
)
args = parser.parse_args()
payload = LoginPayload.model_validate(web_ns.payload or {})
try:
account = WebAppAuthService.authenticate(args["email"], args["password"])
account = WebAppAuthService.authenticate(payload.email, payload.password)
except services.errors.account.AccountLoginError:
raise AccountBannedError()
except services.errors.account.AccountPasswordError:
@@ -145,6 +167,7 @@ class EmailCodeLoginSendEmailApi(Resource):
@only_edition_enterprise
@web_ns.doc("send_email_code_login")
@web_ns.doc(description="Send email verification code for login")
@web_ns.expect(web_ns.models[EmailCodeLoginSendPayload.__name__])
@web_ns.doc(
responses={
200: "Email code sent successfully",
@@ -153,19 +176,14 @@ class EmailCodeLoginSendEmailApi(Resource):
}
)
def post(self):
parser = (
reqparse.RequestParser()
.add_argument("email", type=email, required=True, location="json")
.add_argument("language", type=str, required=False, location="json")
)
args = parser.parse_args()
payload = EmailCodeLoginSendPayload.model_validate(web_ns.payload or {})
if args["language"] is not None and args["language"] == "zh-Hans":
if payload.language == "zh-Hans":
language = "zh-Hans"
else:
language = "en-US"
account = WebAppAuthService.get_user_through_email(args["email"])
account = WebAppAuthService.get_user_through_email(payload.email)
if account is None:
raise AuthenticationFailedError()
else:
@@ -179,6 +197,7 @@ class EmailCodeLoginApi(Resource):
@only_edition_enterprise
@web_ns.doc("verify_email_code_login")
@web_ns.doc(description="Verify email code and complete login")
@web_ns.expect(web_ns.models[EmailCodeLoginVerifyPayload.__name__])
@web_ns.doc(
responses={
200: "Email code verified and login successful",
@@ -189,17 +208,11 @@ class EmailCodeLoginApi(Resource):
)
@decrypt_code_field
def post(self):
parser = (
reqparse.RequestParser()
.add_argument("email", type=str, required=True, location="json")
.add_argument("code", type=str, required=True, location="json")
.add_argument("token", type=str, required=True, location="json")
)
args = parser.parse_args()
payload = EmailCodeLoginVerifyPayload.model_validate(web_ns.payload or {})
user_email = args["email"].lower()
user_email = payload.email.lower()
token_data = WebAppAuthService.get_email_code_login_data(args["token"])
token_data = WebAppAuthService.get_email_code_login_data(payload.token)
if token_data is None:
raise InvalidTokenError()
@@ -210,10 +223,10 @@ class EmailCodeLoginApi(Resource):
if normalized_token_email != user_email:
raise InvalidEmailError()
if token_data["code"] != args["code"]:
if token_data["code"] != payload.code:
raise EmailCodeError()
WebAppAuthService.revoke_email_code_login_token(args["token"])
WebAppAuthService.revoke_email_code_login_token(payload.token)
account = WebAppAuthService.get_user_through_email(token_email)
if not account:
raise AuthenticationFailedError()

View File

@@ -1,8 +1,10 @@
import logging
from typing import Any
from flask_restx import reqparse
from pydantic import BaseModel, Field
from werkzeug.exceptions import InternalServerError
from controllers.common.schema import register_schema_models
from controllers.web import web_ns
from controllers.web.error import (
CompletionRequestError,
@@ -27,19 +29,22 @@ from models.model import App, AppMode, EndUser
from services.app_generate_service import AppGenerateService
from services.errors.llm import InvokeRateLimitError
class WorkflowRunPayload(BaseModel):
inputs: dict[str, Any] = Field(description="Input variables for the workflow")
files: list[dict[str, Any]] | None = Field(default=None, description="Files to be processed by the workflow")
logger = logging.getLogger(__name__)
register_schema_models(web_ns, WorkflowRunPayload)
@web_ns.route("/workflows/run")
class WorkflowRunApi(WebApiResource):
@web_ns.doc("Run Workflow")
@web_ns.doc(description="Execute a workflow with provided inputs and files.")
@web_ns.doc(
params={
"inputs": {"description": "Input variables for the workflow", "type": "object", "required": True},
"files": {"description": "Files to be processed by the workflow", "type": "array", "required": False},
}
)
@web_ns.expect(web_ns.models[WorkflowRunPayload.__name__])
@web_ns.doc(
responses={
200: "Success",
@@ -58,12 +63,8 @@ class WorkflowRunApi(WebApiResource):
if app_mode != AppMode.WORKFLOW:
raise NotWorkflowAppError()
parser = (
reqparse.RequestParser()
.add_argument("inputs", type=dict, required=True, nullable=False, location="json")
.add_argument("files", type=list, required=False, location="json")
)
args = parser.parse_args()
payload = WorkflowRunPayload.model_validate(web_ns.payload or {})
args = payload.model_dump(exclude_none=True)
try:
response = AppGenerateService.generate(

View File

@@ -1,9 +1,11 @@
from __future__ import annotations
import contextvars
import logging
import threading
import uuid
from collections.abc import Generator, Mapping
from typing import Any, Literal, Union, overload
from typing import TYPE_CHECKING, Any, Literal, Union, overload
from flask import Flask, current_app
from pydantic import ValidationError
@@ -13,6 +15,9 @@ from sqlalchemy.orm import Session, sessionmaker
import contexts
from configs import dify_config
from constants import UUID_NIL
if TYPE_CHECKING:
from controllers.console.app.workflow import LoopNodeRunPayload
from core.app.app_config.features.file_upload.manager import FileUploadConfigManager
from core.app.apps.advanced_chat.app_config_manager import AdvancedChatAppConfigManager
from core.app.apps.advanced_chat.app_runner import AdvancedChatAppRunner
@@ -304,7 +309,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
workflow: Workflow,
node_id: str,
user: Account | EndUser,
args: Mapping,
args: LoopNodeRunPayload,
streaming: bool = True,
) -> Mapping[str, Any] | Generator[str | Mapping[str, Any], Any, None]:
"""
@@ -320,7 +325,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
if not node_id:
raise ValueError("node_id is required")
if args.get("inputs") is None:
if args.inputs is None:
raise ValueError("inputs is required")
# convert to app config
@@ -338,7 +343,7 @@ class AdvancedChatAppGenerator(MessageBasedAppGenerator):
stream=streaming,
invoke_from=InvokeFrom.DEBUGGER,
extras={"auto_generate_conversation_name": False},
single_loop_run=AdvancedChatAppGenerateEntity.SingleLoopRunEntity(node_id=node_id, inputs=args["inputs"]),
single_loop_run=AdvancedChatAppGenerateEntity.SingleLoopRunEntity(node_id=node_id, inputs=args.inputs),
)
contexts.plugin_tool_providers.set({})
contexts.plugin_tool_providers_lock.set(threading.Lock())

View File

@@ -236,4 +236,7 @@ class AgentChatAppRunner(AppRunner):
queue_manager=queue_manager,
stream=application_generate_entity.stream,
agent=True,
message_id=message.id,
user_id=application_generate_entity.user_id,
tenant_id=app_config.tenant_id,
)

View File

@@ -1,6 +1,8 @@
import base64
import logging
import time
from collections.abc import Generator, Mapping, Sequence
from mimetypes import guess_extension
from typing import TYPE_CHECKING, Any, Union
from core.app.app_config.entities import ExternalDataVariableEntity, PromptTemplateEntity
@@ -11,10 +13,16 @@ from core.app.entities.app_invoke_entities import (
InvokeFrom,
ModelConfigWithCredentialsEntity,
)
from core.app.entities.queue_entities import QueueAgentMessageEvent, QueueLLMChunkEvent, QueueMessageEndEvent
from core.app.entities.queue_entities import (
QueueAgentMessageEvent,
QueueLLMChunkEvent,
QueueMessageEndEvent,
QueueMessageFileEvent,
)
from core.app.features.annotation_reply.annotation_reply import AnnotationReplyFeature
from core.app.features.hosting_moderation.hosting_moderation import HostingModerationFeature
from core.external_data_tool.external_data_fetch import ExternalDataFetch
from core.file.enums import FileTransferMethod, FileType
from core.memory.token_buffer_memory import TokenBufferMemory
from core.model_manager import ModelInstance
from core.model_runtime.entities.llm_entities import LLMResult, LLMResultChunk, LLMResultChunkDelta, LLMUsage
@@ -22,6 +30,7 @@ from core.model_runtime.entities.message_entities import (
AssistantPromptMessage,
ImagePromptMessageContent,
PromptMessage,
TextPromptMessageContent,
)
from core.model_runtime.entities.model_entities import ModelPropertyKey
from core.model_runtime.errors.invoke import InvokeBadRequestError
@@ -29,7 +38,10 @@ from core.moderation.input_moderation import InputModeration
from core.prompt.advanced_prompt_transform import AdvancedPromptTransform
from core.prompt.entities.advanced_prompt_entities import ChatModelMessage, CompletionModelPromptTemplate, MemoryConfig
from core.prompt.simple_prompt_transform import ModelMode, SimplePromptTransform
from models.model import App, AppMode, Message, MessageAnnotation
from core.tools.tool_file_manager import ToolFileManager
from extensions.ext_database import db
from models.enums import CreatorUserRole
from models.model import App, AppMode, Message, MessageAnnotation, MessageFile
if TYPE_CHECKING:
from core.file.models import File
@@ -203,6 +215,9 @@ class AppRunner:
queue_manager: AppQueueManager,
stream: bool,
agent: bool = False,
message_id: str | None = None,
user_id: str | None = None,
tenant_id: str | None = None,
):
"""
Handle invoke result
@@ -210,21 +225,41 @@ class AppRunner:
:param queue_manager: application queue manager
:param stream: stream
:param agent: agent
:param message_id: message id for multimodal output
:param user_id: user id for multimodal output
:param tenant_id: tenant id for multimodal output
:return:
"""
if not stream and isinstance(invoke_result, LLMResult):
self._handle_invoke_result_direct(invoke_result=invoke_result, queue_manager=queue_manager, agent=agent)
self._handle_invoke_result_direct(
invoke_result=invoke_result,
queue_manager=queue_manager,
)
elif stream and isinstance(invoke_result, Generator):
self._handle_invoke_result_stream(invoke_result=invoke_result, queue_manager=queue_manager, agent=agent)
self._handle_invoke_result_stream(
invoke_result=invoke_result,
queue_manager=queue_manager,
agent=agent,
message_id=message_id,
user_id=user_id,
tenant_id=tenant_id,
)
else:
raise NotImplementedError(f"unsupported invoke result type: {type(invoke_result)}")
def _handle_invoke_result_direct(self, invoke_result: LLMResult, queue_manager: AppQueueManager, agent: bool):
def _handle_invoke_result_direct(
self,
invoke_result: LLMResult,
queue_manager: AppQueueManager,
):
"""
Handle invoke result direct
:param invoke_result: invoke result
:param queue_manager: application queue manager
:param agent: agent
:param message_id: message id for multimodal output
:param user_id: user id for multimodal output
:param tenant_id: tenant id for multimodal output
:return:
"""
queue_manager.publish(
@@ -235,13 +270,22 @@ class AppRunner:
)
def _handle_invoke_result_stream(
self, invoke_result: Generator[LLMResultChunk, None, None], queue_manager: AppQueueManager, agent: bool
self,
invoke_result: Generator[LLMResultChunk, None, None],
queue_manager: AppQueueManager,
agent: bool,
message_id: str | None = None,
user_id: str | None = None,
tenant_id: str | None = None,
):
"""
Handle invoke result
:param invoke_result: invoke result
:param queue_manager: application queue manager
:param agent: agent
:param message_id: message id for multimodal output
:param user_id: user id for multimodal output
:param tenant_id: tenant id for multimodal output
:return:
"""
model: str = ""
@@ -259,12 +303,26 @@ class AppRunner:
text += message.content
elif isinstance(message.content, list):
for content in message.content:
if not isinstance(content, str):
# TODO(QuantumGhost): Add multimodal output support for easy ui.
_logger.warning("received multimodal output, type=%s", type(content))
if isinstance(content, str):
text += content
elif isinstance(content, TextPromptMessageContent):
text += content.data
elif isinstance(content, ImagePromptMessageContent):
if message_id and user_id and tenant_id:
try:
self._handle_multimodal_image_content(
content=content,
message_id=message_id,
user_id=user_id,
tenant_id=tenant_id,
queue_manager=queue_manager,
)
except Exception:
_logger.exception("Failed to handle multimodal image output")
else:
_logger.warning("Received multimodal output but missing required parameters")
else:
text += content # failback to str
text += content.data if hasattr(content, "data") else str(content)
if not model:
model = result.model
@@ -289,6 +347,101 @@ class AppRunner:
PublishFrom.APPLICATION_MANAGER,
)
def _handle_multimodal_image_content(
self,
content: ImagePromptMessageContent,
message_id: str,
user_id: str,
tenant_id: str,
queue_manager: AppQueueManager,
):
"""
Handle multimodal image content from LLM response.
Save the image and create a MessageFile record.
:param content: ImagePromptMessageContent instance
:param message_id: message id
:param user_id: user id
:param tenant_id: tenant id
:param queue_manager: queue manager
:return:
"""
_logger.info("Handling multimodal image content for message %s", message_id)
image_url = content.url
base64_data = content.base64_data
_logger.info("Image URL: %s, Base64 data present: %s", image_url, base64_data)
if not image_url and not base64_data:
_logger.warning("Image content has neither URL nor base64 data")
return
tool_file_manager = ToolFileManager()
# Save the image file
try:
if image_url:
# Download image from URL
_logger.info("Downloading image from URL: %s", image_url)
tool_file = tool_file_manager.create_file_by_url(
user_id=user_id,
tenant_id=tenant_id,
file_url=image_url,
conversation_id=None,
)
_logger.info("Image saved successfully, tool_file_id: %s", tool_file.id)
elif base64_data:
if base64_data.startswith("data:"):
base64_data = base64_data.split(",", 1)[1]
image_binary = base64.b64decode(base64_data)
mimetype = content.mime_type or "image/png"
extension = guess_extension(mimetype) or ".png"
tool_file = tool_file_manager.create_file_by_raw(
user_id=user_id,
tenant_id=tenant_id,
conversation_id=None,
file_binary=image_binary,
mimetype=mimetype,
filename=f"generated_image{extension}",
)
_logger.info("Image saved successfully, tool_file_id: %s", tool_file.id)
else:
return
except Exception:
_logger.exception("Failed to save image file")
return
# Create MessageFile record
message_file = MessageFile(
message_id=message_id,
type=FileType.IMAGE,
transfer_method=FileTransferMethod.TOOL_FILE,
belongs_to="assistant",
url=f"/files/tools/{tool_file.id}",
upload_file_id=tool_file.id,
created_by_role=(
CreatorUserRole.ACCOUNT
if queue_manager.invoke_from in {InvokeFrom.DEBUGGER, InvokeFrom.EXPLORE}
else CreatorUserRole.END_USER
),
created_by=user_id,
)
db.session.add(message_file)
db.session.commit()
db.session.refresh(message_file)
# Publish QueueMessageFileEvent
queue_manager.publish(
QueueMessageFileEvent(message_file_id=message_file.id),
PublishFrom.APPLICATION_MANAGER,
)
_logger.info("QueueMessageFileEvent published for message_file_id: %s", message_file.id)
def moderation_for_inputs(
self,
*,

View File

@@ -226,5 +226,10 @@ class ChatAppRunner(AppRunner):
# handle invoke result
self._handle_invoke_result(
invoke_result=invoke_result, queue_manager=queue_manager, stream=application_generate_entity.stream
invoke_result=invoke_result,
queue_manager=queue_manager,
stream=application_generate_entity.stream,
message_id=message.id,
user_id=application_generate_entity.user_id,
tenant_id=app_config.tenant_id,
)

View File

@@ -184,5 +184,10 @@ class CompletionAppRunner(AppRunner):
# handle invoke result
self._handle_invoke_result(
invoke_result=invoke_result, queue_manager=queue_manager, stream=application_generate_entity.stream
invoke_result=invoke_result,
queue_manager=queue_manager,
stream=application_generate_entity.stream,
message_id=message.id,
user_id=application_generate_entity.user_id,
tenant_id=app_config.tenant_id,
)

View File

@@ -1,9 +1,11 @@
from __future__ import annotations
import contextvars
import logging
import threading
import uuid
from collections.abc import Generator, Mapping, Sequence
from typing import Any, Literal, Union, overload
from typing import TYPE_CHECKING, Any, Literal, Union, overload
from flask import Flask, current_app
from pydantic import ValidationError
@@ -40,6 +42,9 @@ from models import Account, App, EndUser, Workflow, WorkflowNodeExecutionTrigger
from models.enums import WorkflowRunTriggeredFrom
from services.workflow_draft_variable_service import DraftVarLoader, WorkflowDraftVariableService
if TYPE_CHECKING:
from controllers.console.app.workflow import LoopNodeRunPayload
SKIP_PREPARE_USER_INPUTS_KEY = "_skip_prepare_user_inputs"
logger = logging.getLogger(__name__)
@@ -381,7 +386,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
workflow: Workflow,
node_id: str,
user: Account | EndUser,
args: Mapping[str, Any],
args: LoopNodeRunPayload,
streaming: bool = True,
) -> Mapping[str, Any] | Generator[str | Mapping[str, Any], None, None]:
"""
@@ -397,7 +402,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
if not node_id:
raise ValueError("node_id is required")
if args.get("inputs") is None:
if args.inputs is None:
raise ValueError("inputs is required")
# convert to app config
@@ -413,7 +418,7 @@ class WorkflowAppGenerator(BaseAppGenerator):
stream=streaming,
invoke_from=InvokeFrom.DEBUGGER,
extras={"auto_generate_conversation_name": False},
single_loop_run=WorkflowAppGenerateEntity.SingleLoopRunEntity(node_id=node_id, inputs=args["inputs"]),
single_loop_run=WorkflowAppGenerateEntity.SingleLoopRunEntity(node_id=node_id, inputs=args.inputs or {}),
workflow_execution_id=str(uuid.uuid4()),
)
contexts.plugin_tool_providers.set({})

View File

@@ -166,18 +166,22 @@ class WorkflowBasedAppRunner:
# Determine which type of single node execution and get graph/variable_pool
if single_iteration_run:
graph, variable_pool = self._get_graph_and_variable_pool_of_single_iteration(
graph, variable_pool = self._get_graph_and_variable_pool_for_single_node_run(
workflow=workflow,
node_id=single_iteration_run.node_id,
user_inputs=dict(single_iteration_run.inputs),
graph_runtime_state=graph_runtime_state,
node_type_filter_key="iteration_id",
node_type_label="iteration",
)
elif single_loop_run:
graph, variable_pool = self._get_graph_and_variable_pool_of_single_loop(
graph, variable_pool = self._get_graph_and_variable_pool_for_single_node_run(
workflow=workflow,
node_id=single_loop_run.node_id,
user_inputs=dict(single_loop_run.inputs),
graph_runtime_state=graph_runtime_state,
node_type_filter_key="loop_id",
node_type_label="loop",
)
else:
raise ValueError("Neither single_iteration_run nor single_loop_run is specified")
@@ -314,44 +318,6 @@ class WorkflowBasedAppRunner:
return graph, variable_pool
def _get_graph_and_variable_pool_of_single_iteration(
self,
workflow: Workflow,
node_id: str,
user_inputs: dict[str, Any],
graph_runtime_state: GraphRuntimeState,
) -> tuple[Graph, VariablePool]:
"""
Get variable pool of single iteration
"""
return self._get_graph_and_variable_pool_for_single_node_run(
workflow=workflow,
node_id=node_id,
user_inputs=user_inputs,
graph_runtime_state=graph_runtime_state,
node_type_filter_key="iteration_id",
node_type_label="iteration",
)
def _get_graph_and_variable_pool_of_single_loop(
self,
workflow: Workflow,
node_id: str,
user_inputs: dict[str, Any],
graph_runtime_state: GraphRuntimeState,
) -> tuple[Graph, VariablePool]:
"""
Get variable pool of single loop
"""
return self._get_graph_and_variable_pool_for_single_node_run(
workflow=workflow,
node_id=node_id,
user_inputs=user_inputs,
graph_runtime_state=graph_runtime_state,
node_type_filter_key="loop_id",
node_type_label="loop",
)
def _handle_event(self, workflow_entry: WorkflowEntry, event: GraphEngineEvent):
"""
Handle event

View File

@@ -39,6 +39,7 @@ from core.app.entities.task_entities import (
MessageAudioEndStreamResponse,
MessageAudioStreamResponse,
MessageEndStreamResponse,
StreamEvent,
StreamResponse,
)
from core.app.task_pipeline.based_generate_task_pipeline import BasedGenerateTaskPipeline
@@ -70,6 +71,7 @@ class EasyUIBasedGenerateTaskPipeline(BasedGenerateTaskPipeline):
_task_state: EasyUITaskState
_application_generate_entity: Union[ChatAppGenerateEntity, CompletionAppGenerateEntity, AgentChatAppGenerateEntity]
_precomputed_event_type: StreamEvent | None = None
def __init__(
self,
@@ -342,11 +344,15 @@ class EasyUIBasedGenerateTaskPipeline(BasedGenerateTaskPipeline):
self._task_state.llm_result.message.content = current_content
if isinstance(event, QueueLLMChunkEvent):
event_type = self._message_cycle_manager.get_message_event_type(message_id=self._message_id)
# Determine the event type once, on first LLM chunk, and reuse for subsequent chunks
if not hasattr(self, "_precomputed_event_type") or self._precomputed_event_type is None:
self._precomputed_event_type = self._message_cycle_manager.get_message_event_type(
message_id=self._message_id
)
yield self._message_cycle_manager.message_to_stream_response(
answer=cast(str, delta_text),
message_id=self._message_id,
event_type=event_type,
event_type=self._precomputed_event_type,
)
else:
yield self._agent_message_to_stream_response(

View File

@@ -5,7 +5,7 @@ from threading import Thread
from typing import Union
from flask import Flask, current_app
from sqlalchemy import exists, select
from sqlalchemy import select
from sqlalchemy.orm import Session
from configs import dify_config
@@ -30,6 +30,7 @@ from core.app.entities.task_entities import (
StreamEvent,
WorkflowTaskState,
)
from core.db.session_factory import session_factory
from core.llm_generator.llm_generator import LLMGenerator
from core.tools.signature import sign_tool_file
from extensions.ext_database import db
@@ -57,13 +58,15 @@ class MessageCycleManager:
self._message_has_file: set[str] = set()
def get_message_event_type(self, message_id: str) -> StreamEvent:
# Fast path: cached determination from prior QueueMessageFileEvent
if message_id in self._message_has_file:
return StreamEvent.MESSAGE_FILE
with Session(db.engine, expire_on_commit=False) as session:
has_file = session.query(exists().where(MessageFile.message_id == message_id)).scalar()
# Use SQLAlchemy 2.x style session.scalar(select(...))
with session_factory.create_session() as session:
message_file = session.scalar(select(MessageFile).where(MessageFile.message_id == message_id))
if has_file:
if message_file:
self._message_has_file.add(message_id)
return StreamEvent.MESSAGE_FILE
@@ -199,6 +202,8 @@ class MessageCycleManager:
message_file = session.scalar(select(MessageFile).where(MessageFile.id == event.message_file_id))
if message_file and message_file.url is not None:
self._message_has_file.add(message_file.message_id)
# get tool file id
tool_file_id = message_file.url.split("/")[-1]
# trim extension

View File

@@ -1,4 +1,4 @@
from collections.abc import Generator, Mapping
from collections.abc import Generator
from typing import Any
from core.datasource.__base.datasource_plugin import DatasourcePlugin
@@ -34,7 +34,7 @@ class OnlineDocumentDatasourcePlugin(DatasourcePlugin):
def get_online_document_pages(
self,
user_id: str,
datasource_parameters: Mapping[str, Any],
datasource_parameters: dict[str, Any],
provider_type: str,
) -> Generator[OnlineDocumentPagesMessage, None, None]:
manager = PluginDatasourceManager()

View File

@@ -1,7 +1,7 @@
import logging
import time
import uuid
from collections.abc import Generator, Sequence
from collections.abc import Callable, Generator, Iterator, Sequence
from typing import Union
from pydantic import ConfigDict
@@ -30,6 +30,142 @@ def _gen_tool_call_id() -> str:
return f"chatcmpl-tool-{str(uuid.uuid4().hex)}"
def _run_callbacks(callbacks: Sequence[Callback] | None, *, event: str, invoke: Callable[[Callback], None]) -> None:
if not callbacks:
return
for callback in callbacks:
try:
invoke(callback)
except Exception as e:
if callback.raise_error:
raise
logger.warning("Callback %s %s failed with error %s", callback.__class__.__name__, event, e)
def _get_or_create_tool_call(
existing_tools_calls: list[AssistantPromptMessage.ToolCall],
tool_call_id: str,
) -> AssistantPromptMessage.ToolCall:
"""
Get or create a tool call by ID.
If `tool_call_id` is empty, returns the most recently created tool call.
"""
if not tool_call_id:
if not existing_tools_calls:
raise ValueError("tool_call_id is empty but no existing tool call is available to apply the delta")
return existing_tools_calls[-1]
tool_call = next((tool_call for tool_call in existing_tools_calls if tool_call.id == tool_call_id), None)
if tool_call is None:
tool_call = AssistantPromptMessage.ToolCall(
id=tool_call_id,
type="function",
function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
)
existing_tools_calls.append(tool_call)
return tool_call
def _merge_tool_call_delta(
tool_call: AssistantPromptMessage.ToolCall,
delta: AssistantPromptMessage.ToolCall,
) -> None:
if delta.id:
tool_call.id = delta.id
if delta.type:
tool_call.type = delta.type
if delta.function.name:
tool_call.function.name = delta.function.name
if delta.function.arguments:
tool_call.function.arguments += delta.function.arguments
def _build_llm_result_from_first_chunk(
model: str,
prompt_messages: Sequence[PromptMessage],
chunks: Iterator[LLMResultChunk],
) -> LLMResult:
"""
Build a single `LLMResult` from the first returned chunk.
This is used for `stream=False` because the plugin side may still implement the response via a chunked stream.
"""
content = ""
content_list: list[PromptMessageContentUnionTypes] = []
usage = LLMUsage.empty_usage()
system_fingerprint: str | None = None
tools_calls: list[AssistantPromptMessage.ToolCall] = []
first_chunk = next(chunks, None)
if first_chunk is not None:
if isinstance(first_chunk.delta.message.content, str):
content += first_chunk.delta.message.content
elif isinstance(first_chunk.delta.message.content, list):
content_list.extend(first_chunk.delta.message.content)
if first_chunk.delta.message.tool_calls:
_increase_tool_call(first_chunk.delta.message.tool_calls, tools_calls)
usage = first_chunk.delta.usage or LLMUsage.empty_usage()
system_fingerprint = first_chunk.system_fingerprint
return LLMResult(
model=model,
prompt_messages=prompt_messages,
message=AssistantPromptMessage(
content=content or content_list,
tool_calls=tools_calls,
),
usage=usage,
system_fingerprint=system_fingerprint,
)
def _invoke_llm_via_plugin(
*,
tenant_id: str,
user_id: str,
plugin_id: str,
provider: str,
model: str,
credentials: dict,
model_parameters: dict,
prompt_messages: Sequence[PromptMessage],
tools: list[PromptMessageTool] | None,
stop: Sequence[str] | None,
stream: bool,
) -> Union[LLMResult, Generator[LLMResultChunk, None, None]]:
from core.plugin.impl.model import PluginModelClient
plugin_model_manager = PluginModelClient()
return plugin_model_manager.invoke_llm(
tenant_id=tenant_id,
user_id=user_id,
plugin_id=plugin_id,
provider=provider,
model=model,
credentials=credentials,
model_parameters=model_parameters,
prompt_messages=list(prompt_messages),
tools=tools,
stop=list(stop) if stop else None,
stream=stream,
)
def _normalize_non_stream_plugin_result(
model: str,
prompt_messages: Sequence[PromptMessage],
result: Union[LLMResult, Iterator[LLMResultChunk]],
) -> LLMResult:
if isinstance(result, LLMResult):
return result
return _build_llm_result_from_first_chunk(model=model, prompt_messages=prompt_messages, chunks=result)
def _increase_tool_call(
new_tool_calls: list[AssistantPromptMessage.ToolCall], existing_tools_calls: list[AssistantPromptMessage.ToolCall]
):
@@ -40,42 +176,13 @@ def _increase_tool_call(
:param existing_tools_calls: List of existing tool calls to be modified IN-PLACE.
"""
def get_tool_call(tool_call_id: str):
"""
Get or create a tool call by ID
:param tool_call_id: tool call ID
:return: existing or new tool call
"""
if not tool_call_id:
return existing_tools_calls[-1]
_tool_call = next((_tool_call for _tool_call in existing_tools_calls if _tool_call.id == tool_call_id), None)
if _tool_call is None:
_tool_call = AssistantPromptMessage.ToolCall(
id=tool_call_id,
type="function",
function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
)
existing_tools_calls.append(_tool_call)
return _tool_call
for new_tool_call in new_tool_calls:
# generate ID for tool calls with function name but no ID to track them
if new_tool_call.function.name and not new_tool_call.id:
new_tool_call.id = _gen_tool_call_id()
# get tool call
tool_call = get_tool_call(new_tool_call.id)
# update tool call
if new_tool_call.id:
tool_call.id = new_tool_call.id
if new_tool_call.type:
tool_call.type = new_tool_call.type
if new_tool_call.function.name:
tool_call.function.name = new_tool_call.function.name
if new_tool_call.function.arguments:
tool_call.function.arguments += new_tool_call.function.arguments
tool_call = _get_or_create_tool_call(existing_tools_calls, new_tool_call.id)
_merge_tool_call_delta(tool_call, new_tool_call)
class LargeLanguageModel(AIModel):
@@ -141,10 +248,7 @@ class LargeLanguageModel(AIModel):
result: Union[LLMResult, Generator[LLMResultChunk, None, None]]
try:
from core.plugin.impl.model import PluginModelClient
plugin_model_manager = PluginModelClient()
result = plugin_model_manager.invoke_llm(
result = _invoke_llm_via_plugin(
tenant_id=self.tenant_id,
user_id=user or "unknown",
plugin_id=self.plugin_id,
@@ -154,38 +258,13 @@ class LargeLanguageModel(AIModel):
model_parameters=model_parameters,
prompt_messages=prompt_messages,
tools=tools,
stop=list(stop) if stop else None,
stop=stop,
stream=stream,
)
if not stream:
content = ""
content_list = []
usage = LLMUsage.empty_usage()
system_fingerprint = None
tools_calls: list[AssistantPromptMessage.ToolCall] = []
for chunk in result:
if isinstance(chunk.delta.message.content, str):
content += chunk.delta.message.content
elif isinstance(chunk.delta.message.content, list):
content_list.extend(chunk.delta.message.content)
if chunk.delta.message.tool_calls:
_increase_tool_call(chunk.delta.message.tool_calls, tools_calls)
usage = chunk.delta.usage or LLMUsage.empty_usage()
system_fingerprint = chunk.system_fingerprint
break
result = LLMResult(
model=model,
prompt_messages=prompt_messages,
message=AssistantPromptMessage(
content=content or content_list,
tool_calls=tools_calls,
),
usage=usage,
system_fingerprint=system_fingerprint,
result = _normalize_non_stream_plugin_result(
model=model, prompt_messages=prompt_messages, result=result
)
except Exception as e:
self._trigger_invoke_error_callbacks(
@@ -425,27 +504,21 @@ class LargeLanguageModel(AIModel):
:param user: unique user id
:param callbacks: callbacks
"""
if callbacks:
for callback in callbacks:
try:
callback.on_before_invoke(
llm_instance=self,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
)
except Exception as e:
if callback.raise_error:
raise e
else:
logger.warning(
"Callback %s on_before_invoke failed with error %s", callback.__class__.__name__, e
)
_run_callbacks(
callbacks,
event="on_before_invoke",
invoke=lambda callback: callback.on_before_invoke(
llm_instance=self,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
),
)
def _trigger_new_chunk_callbacks(
self,
@@ -473,26 +546,22 @@ class LargeLanguageModel(AIModel):
:param stream: is stream response
:param user: unique user id
"""
if callbacks:
for callback in callbacks:
try:
callback.on_new_chunk(
llm_instance=self,
chunk=chunk,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
)
except Exception as e:
if callback.raise_error:
raise e
else:
logger.warning("Callback %s on_new_chunk failed with error %s", callback.__class__.__name__, e)
_run_callbacks(
callbacks,
event="on_new_chunk",
invoke=lambda callback: callback.on_new_chunk(
llm_instance=self,
chunk=chunk,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
),
)
def _trigger_after_invoke_callbacks(
self,
@@ -521,28 +590,22 @@ class LargeLanguageModel(AIModel):
:param user: unique user id
:param callbacks: callbacks
"""
if callbacks:
for callback in callbacks:
try:
callback.on_after_invoke(
llm_instance=self,
result=result,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
)
except Exception as e:
if callback.raise_error:
raise e
else:
logger.warning(
"Callback %s on_after_invoke failed with error %s", callback.__class__.__name__, e
)
_run_callbacks(
callbacks,
event="on_after_invoke",
invoke=lambda callback: callback.on_after_invoke(
llm_instance=self,
result=result,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
),
)
def _trigger_invoke_error_callbacks(
self,
@@ -571,25 +634,19 @@ class LargeLanguageModel(AIModel):
:param user: unique user id
:param callbacks: callbacks
"""
if callbacks:
for callback in callbacks:
try:
callback.on_invoke_error(
llm_instance=self,
ex=ex,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
)
except Exception as e:
if callback.raise_error:
raise e
else:
logger.warning(
"Callback %s on_invoke_error failed with error %s", callback.__class__.__name__, e
)
_run_callbacks(
callbacks,
event="on_invoke_error",
invoke=lambda callback: callback.on_invoke_error(
llm_instance=self,
ex=ex,
model=model,
credentials=credentials,
prompt_messages=prompt_messages,
model_parameters=model_parameters,
tools=tools,
stop=stop,
stream=stream,
user=user,
),
)

View File

@@ -154,7 +154,7 @@ class IrisConnectionPool:
# Add to cache to skip future checks
self._schemas_initialized.add(schema)
except Exception as e:
except Exception:
conn.rollback()
logger.exception("Failed to ensure schema %s exists", schema)
raise
@@ -177,6 +177,9 @@ class IrisConnectionPool:
class IrisVector(BaseVector):
"""IRIS vector database implementation using native VECTOR type and HNSW indexing."""
# Fallback score for full-text search when Rank function unavailable or TEXT_INDEX disabled
_FULL_TEXT_FALLBACK_SCORE = 0.5
def __init__(self, collection_name: str, config: IrisVectorConfig) -> None:
super().__init__(collection_name)
self.config = config
@@ -272,41 +275,131 @@ class IrisVector(BaseVector):
return docs
def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
"""Search documents by full-text using iFind index or fallback to LIKE search."""
"""Search documents by full-text using iFind index with BM25 relevance scoring.
When IRIS_TEXT_INDEX is enabled, this method uses the auto-generated Rank
function from %iFind.Index.Basic to calculate BM25 relevance scores. The Rank
function is automatically created with naming: {schema}.{table_name}_{index}Rank
Args:
query: Search query string
**kwargs: Optional parameters including top_k, document_ids_filter
Returns:
List of Document objects with relevance scores in metadata["score"]
"""
top_k = kwargs.get("top_k", 5)
document_ids_filter = kwargs.get("document_ids_filter")
with self._get_cursor() as cursor:
if self.config.IRIS_TEXT_INDEX:
# Use iFind full-text search with index
# Use iFind full-text search with auto-generated Rank function
text_index_name = f"idx_{self.table_name}_text"
# IRIS removes underscores from function names
table_no_underscore = self.table_name.replace("_", "")
index_no_underscore = text_index_name.replace("_", "")
rank_function = f"{self.schema}.{table_no_underscore}_{index_no_underscore}Rank"
# Build WHERE clause with document ID filter if provided
where_clause = f"WHERE %ID %FIND search_index({text_index_name}, ?)"
# First param for Rank function, second for FIND
params = [query, query]
if document_ids_filter:
# Add document ID filter
placeholders = ",".join("?" * len(document_ids_filter))
where_clause += f" AND JSON_VALUE(meta, '$.document_id') IN ({placeholders})"
params.extend(document_ids_filter)
sql = f"""
SELECT TOP {top_k} id, text, meta
SELECT TOP {top_k}
id,
text,
meta,
{rank_function}(%ID, ?) AS score
FROM {self.schema}.{self.table_name}
WHERE %ID %FIND search_index({text_index_name}, ?)
{where_clause}
ORDER BY score DESC
"""
cursor.execute(sql, (query,))
logger.debug(
"iFind search: query='%s', index='%s', rank='%s'",
query,
text_index_name,
rank_function,
)
try:
cursor.execute(sql, params)
except Exception: # pylint: disable=broad-exception-caught
# Fallback to query without Rank function if it fails
logger.warning(
"Rank function '%s' failed, using fixed score",
rank_function,
exc_info=True,
)
sql_fallback = f"""
SELECT TOP {top_k} id, text, meta, {self._FULL_TEXT_FALLBACK_SCORE} AS score
FROM {self.schema}.{self.table_name}
{where_clause}
"""
# Skip first param (for Rank function)
cursor.execute(sql_fallback, params[1:])
else:
# Fallback to LIKE search (inefficient for large datasets)
# Escape special characters for LIKE clause to prevent SQL injection
from libs.helper import escape_like_pattern
# Fallback to LIKE search (IRIS_TEXT_INDEX disabled)
from libs.helper import ( # pylint: disable=import-outside-toplevel
escape_like_pattern,
)
escaped_query = escape_like_pattern(query)
query_pattern = f"%{escaped_query}%"
# Build WHERE clause with document ID filter if provided
where_clause = "WHERE text LIKE ? ESCAPE '\\\\'"
params = [query_pattern]
if document_ids_filter:
placeholders = ",".join("?" * len(document_ids_filter))
where_clause += f" AND JSON_VALUE(meta, '$.document_id') IN ({placeholders})"
params.extend(document_ids_filter)
sql = f"""
SELECT TOP {top_k} id, text, meta
SELECT TOP {top_k} id, text, meta, {self._FULL_TEXT_FALLBACK_SCORE} AS score
FROM {self.schema}.{self.table_name}
WHERE text LIKE ? ESCAPE '\\'
{where_clause}
ORDER BY LENGTH(text) ASC
"""
cursor.execute(sql, (query_pattern,))
logger.debug(
"LIKE fallback (TEXT_INDEX disabled): query='%s'",
query_pattern,
)
cursor.execute(sql, params)
docs = []
for row in cursor.fetchall():
if len(row) >= 3:
metadata = json.loads(row[2]) if row[2] else {}
docs.append(Document(page_content=row[1], metadata=metadata))
# Expecting 4 columns: id, text, meta, score
if len(row) >= 4:
text_content = row[1]
meta_str = row[2]
score_value = row[3]
metadata = json.loads(meta_str) if meta_str else {}
# Add score to metadata for hybrid search compatibility
score = float(score_value) if score_value is not None else 0.0
metadata["score"] = score
docs.append(Document(page_content=text_content, metadata=metadata))
logger.info(
"Full-text search completed: query='%s', results=%d/%d",
query,
len(docs),
top_k,
)
if not docs:
logger.info("Full-text search for '%s' returned no results", query)
logger.warning("Full-text search for '%s' returned no results", query)
return docs
@@ -370,7 +463,11 @@ class IrisVector(BaseVector):
AS %iFind.Index.Basic
(LANGUAGE = '{language}', LOWER = 1, INDEXOPTION = 0)
"""
logger.info("Creating text index: %s with language: %s", text_index_name, language)
logger.info(
"Creating text index: %s with language: %s",
text_index_name,
language,
)
logger.info("SQL for text index: %s", sql_text_index)
cursor.execute(sql_text_index)
logger.info("Text index created successfully: %s", text_index_name)

View File

@@ -130,7 +130,7 @@ class ToolInvokeMessage(BaseModel):
text: str
class JsonMessage(BaseModel):
json_object: dict
json_object: dict | list
suppress_output: bool = Field(default=False, description="Whether to suppress JSON output in result string")
class BlobMessage(BaseModel):
@@ -144,7 +144,14 @@ class ToolInvokeMessage(BaseModel):
end: bool = Field(..., description="Whether the chunk is the last chunk")
class FileMessage(BaseModel):
pass
file_marker: str = Field(default="file_marker")
@model_validator(mode="before")
@classmethod
def validate_file_message(cls, values):
if isinstance(values, dict) and "file_marker" not in values:
raise ValueError("Invalid FileMessage: missing file_marker")
return values
class VariableMessage(BaseModel):
variable_name: str = Field(..., description="The name of the variable")
@@ -234,10 +241,22 @@ class ToolInvokeMessage(BaseModel):
@field_validator("message", mode="before")
@classmethod
def decode_blob_message(cls, v):
def decode_blob_message(cls, v, info: ValidationInfo):
# 处理 blob 解码
if isinstance(v, dict) and "blob" in v:
with contextlib.suppress(Exception):
v["blob"] = base64.b64decode(v["blob"])
# Force correct message type based on type field
# Only wrap dict types to avoid wrapping already parsed Pydantic model objects
if info.data and isinstance(info.data, dict) and isinstance(v, dict):
msg_type = info.data.get("type")
if msg_type == cls.MessageType.JSON:
if "json_object" not in v:
v = {"json_object": v}
elif msg_type == cls.MessageType.FILE:
v = {"file_marker": "file_marker"}
return v
@field_serializer("message")

View File

@@ -494,7 +494,7 @@ class AgentNode(Node[AgentNodeData]):
text = ""
files: list[File] = []
json_list: list[dict] = []
json_list: list[dict | list] = []
agent_logs: list[AgentLogEvent] = []
agent_execution_metadata: Mapping[WorkflowNodeExecutionMetadataKey, Any] = {}
@@ -568,13 +568,18 @@ class AgentNode(Node[AgentNodeData]):
elif message.type == ToolInvokeMessage.MessageType.JSON:
assert isinstance(message.message, ToolInvokeMessage.JsonMessage)
if node_type == NodeType.AGENT:
msg_metadata: dict[str, Any] = message.message.json_object.pop("execution_metadata", {})
llm_usage = LLMUsage.from_metadata(cast(LLMUsageMetadata, msg_metadata))
agent_execution_metadata = {
WorkflowNodeExecutionMetadataKey(key): value
for key, value in msg_metadata.items()
if key in WorkflowNodeExecutionMetadataKey.__members__.values()
}
if isinstance(message.message.json_object, dict):
msg_metadata: dict[str, Any] = message.message.json_object.pop("execution_metadata", {})
llm_usage = LLMUsage.from_metadata(cast(LLMUsageMetadata, msg_metadata))
agent_execution_metadata = {
WorkflowNodeExecutionMetadataKey(key): value
for key, value in msg_metadata.items()
if key in WorkflowNodeExecutionMetadataKey.__members__.values()
}
else:
msg_metadata = {}
llm_usage = LLMUsage.empty_usage()
agent_execution_metadata = {}
if message.message.json_object:
json_list.append(message.message.json_object)
elif message.type == ToolInvokeMessage.MessageType.LINK:
@@ -683,7 +688,7 @@ class AgentNode(Node[AgentNodeData]):
yield agent_log
# Add agent_logs to outputs['json'] to ensure frontend can access thinking process
json_output: list[dict[str, Any]] = []
json_output: list[dict[str, Any] | list[Any]] = []
# Step 1: append each agent log as its own dict.
if agent_logs:

View File

@@ -301,7 +301,7 @@ class DatasourceNode(Node[DatasourceNodeData]):
text = ""
files: list[File] = []
json: list[dict] = []
json: list[dict | list] = []
variables: dict[str, Any] = {}

View File

@@ -244,7 +244,7 @@ class ToolNode(Node[ToolNodeData]):
text = ""
files: list[File] = []
json: list[dict] = []
json: list[dict | list] = []
variables: dict[str, Any] = {}
@@ -400,7 +400,7 @@ class ToolNode(Node[ToolNodeData]):
message.message.metadata = dict_metadata
# Add agent_logs to outputs['json'] to ensure frontend can access thinking process
json_output: list[dict[str, Any]] = []
json_output: list[dict[str, Any] | list[Any]] = []
# Step 2: normalize JSON into {"data": [...]}.change json to list[dict]
if json:

View File

@@ -0,0 +1,21 @@
from enum import StrEnum
class HostedTrialProvider(StrEnum):
"""
Enum representing hosted model provider names for trial access.
"""
OPENAI = "langgenius/openai/openai"
ANTHROPIC = "langgenius/anthropic/anthropic"
GEMINI = "langgenius/gemini/google"
X = "langgenius/x/x"
DEEPSEEK = "langgenius/deepseek/deepseek"
TONGYI = "langgenius/tongyi/tongyi"
@property
def config_key(self) -> str:
"""Return the config key used in dify_config (e.g., HOSTED_{config_key}_PAID_ENABLED)."""
if self == HostedTrialProvider.X:
return "XAI"
return self.name

View File

@@ -4,6 +4,7 @@ from dify_app import DifyApp
def init_app(app: DifyApp):
from commands import (
add_qdrant_index,
archive_workflow_runs,
clean_expired_messages,
clean_workflow_runs,
cleanup_orphaned_draft_variables,
@@ -11,6 +12,7 @@ def init_app(app: DifyApp):
clear_orphaned_file_records,
convert_to_agent_apps,
create_tenant,
delete_archived_workflow_runs,
extract_plugins,
extract_unique_plugins,
file_usage,
@@ -24,6 +26,7 @@ def init_app(app: DifyApp):
reset_email,
reset_encrypt_key_pair,
reset_password,
restore_workflow_runs,
setup_datasource_oauth_client,
setup_system_tool_oauth_client,
setup_system_trigger_oauth_client,
@@ -58,6 +61,9 @@ def init_app(app: DifyApp):
setup_datasource_oauth_client,
transform_datasource_credentials,
install_rag_pipeline_plugins,
archive_workflow_runs,
delete_archived_workflow_runs,
restore_workflow_runs,
clean_workflow_runs,
clean_expired_messages,
]

View File

@@ -0,0 +1,45 @@
from fastopenapi.routers import FlaskRouter
from flask_cors import CORS
from configs import dify_config
from controllers.fastopenapi import console_router
from dify_app import DifyApp
from extensions.ext_blueprints import AUTHENTICATED_HEADERS, EXPOSED_HEADERS
DOCS_PREFIX = "/fastopenapi"
def init_app(app: DifyApp) -> None:
docs_enabled = dify_config.SWAGGER_UI_ENABLED
docs_url = f"{DOCS_PREFIX}/docs" if docs_enabled else None
redoc_url = f"{DOCS_PREFIX}/redoc" if docs_enabled else None
openapi_url = f"{DOCS_PREFIX}/openapi.json" if docs_enabled else None
router = FlaskRouter(
app=app,
docs_url=docs_url,
redoc_url=redoc_url,
openapi_url=openapi_url,
openapi_version="3.0.0",
title="Dify API (FastOpenAPI PoC)",
version="1.0",
description="FastOpenAPI proof of concept for Dify API",
)
# Ensure route decorators are evaluated.
import controllers.console.ping as ping_module
from controllers.console import setup
_ = ping_module
_ = setup
router.include_router(console_router, prefix="/console/api")
CORS(
app,
resources={r"/console/api/*": {"origins": dify_config.CONSOLE_CORS_ALLOW_ORIGINS}},
supports_credentials=True,
allow_headers=list(AUTHENTICATED_HEADERS),
methods=["GET", "PUT", "POST", "DELETE", "OPTIONS", "PATCH"],
expose_headers=list(EXPOSED_HEADERS),
)
app.extensions["fastopenapi"] = router

View File

@@ -2,7 +2,12 @@ from flask_restx import Namespace, fields
from fields.end_user_fields import build_simple_end_user_model, simple_end_user_fields
from fields.member_fields import build_simple_account_model, simple_account_fields
from fields.workflow_run_fields import build_workflow_run_for_log_model, workflow_run_for_log_fields
from fields.workflow_run_fields import (
build_workflow_run_for_archived_log_model,
build_workflow_run_for_log_model,
workflow_run_for_archived_log_fields,
workflow_run_for_log_fields,
)
from libs.helper import TimestampField
workflow_app_log_partial_fields = {
@@ -34,6 +39,33 @@ def build_workflow_app_log_partial_model(api_or_ns: Namespace):
return api_or_ns.model("WorkflowAppLogPartial", copied_fields)
workflow_archived_log_partial_fields = {
"id": fields.String,
"workflow_run": fields.Nested(workflow_run_for_archived_log_fields, allow_null=True),
"trigger_metadata": fields.Raw,
"created_by_account": fields.Nested(simple_account_fields, attribute="created_by_account", allow_null=True),
"created_by_end_user": fields.Nested(simple_end_user_fields, attribute="created_by_end_user", allow_null=True),
"created_at": TimestampField,
}
def build_workflow_archived_log_partial_model(api_or_ns: Namespace):
"""Build the workflow archived log partial model for the API or Namespace."""
workflow_run_model = build_workflow_run_for_archived_log_model(api_or_ns)
simple_account_model = build_simple_account_model(api_or_ns)
simple_end_user_model = build_simple_end_user_model(api_or_ns)
copied_fields = workflow_archived_log_partial_fields.copy()
copied_fields["workflow_run"] = fields.Nested(workflow_run_model, allow_null=True)
copied_fields["created_by_account"] = fields.Nested(
simple_account_model, attribute="created_by_account", allow_null=True
)
copied_fields["created_by_end_user"] = fields.Nested(
simple_end_user_model, attribute="created_by_end_user", allow_null=True
)
return api_or_ns.model("WorkflowArchivedLogPartial", copied_fields)
workflow_app_log_pagination_fields = {
"page": fields.Integer,
"limit": fields.Integer,
@@ -51,3 +83,21 @@ def build_workflow_app_log_pagination_model(api_or_ns: Namespace):
copied_fields = workflow_app_log_pagination_fields.copy()
copied_fields["data"] = fields.List(fields.Nested(workflow_app_log_partial_model))
return api_or_ns.model("WorkflowAppLogPagination", copied_fields)
workflow_archived_log_pagination_fields = {
"page": fields.Integer,
"limit": fields.Integer,
"total": fields.Integer,
"has_more": fields.Boolean,
"data": fields.List(fields.Nested(workflow_archived_log_partial_fields)),
}
def build_workflow_archived_log_pagination_model(api_or_ns: Namespace):
"""Build the workflow archived log pagination model for the API or Namespace."""
workflow_archived_log_partial_model = build_workflow_archived_log_partial_model(api_or_ns)
copied_fields = workflow_archived_log_pagination_fields.copy()
copied_fields["data"] = fields.List(fields.Nested(workflow_archived_log_partial_model))
return api_or_ns.model("WorkflowArchivedLogPagination", copied_fields)

View File

@@ -23,6 +23,19 @@ def build_workflow_run_for_log_model(api_or_ns: Namespace):
return api_or_ns.model("WorkflowRunForLog", workflow_run_for_log_fields)
workflow_run_for_archived_log_fields = {
"id": fields.String,
"status": fields.String,
"triggered_from": fields.String,
"elapsed_time": fields.Float,
"total_tokens": fields.Integer,
}
def build_workflow_run_for_archived_log_model(api_or_ns: Namespace):
return api_or_ns.model("WorkflowRunForArchivedLog", workflow_run_for_archived_log_fields)
workflow_run_for_list_fields = {
"id": fields.String,
"version": fields.String,

View File

@@ -7,7 +7,6 @@ to S3-compatible object storage.
import base64
import datetime
import gzip
import hashlib
import logging
from collections.abc import Generator
@@ -39,7 +38,7 @@ class ArchiveStorage:
"""
S3-compatible storage client for archiving or exporting.
This client provides methods for storing and retrieving archived data in JSONL+gzip format.
This client provides methods for storing and retrieving archived data in JSONL format.
"""
def __init__(self, bucket: str):
@@ -69,7 +68,10 @@ class ArchiveStorage:
aws_access_key_id=dify_config.ARCHIVE_STORAGE_ACCESS_KEY,
aws_secret_access_key=dify_config.ARCHIVE_STORAGE_SECRET_KEY,
region_name=dify_config.ARCHIVE_STORAGE_REGION,
config=Config(s3={"addressing_style": "path"}),
config=Config(
s3={"addressing_style": "path"},
max_pool_connections=64,
),
)
# Verify bucket accessibility
@@ -100,12 +102,18 @@ class ArchiveStorage:
"""
checksum = hashlib.md5(data).hexdigest()
try:
self.client.put_object(
response = self.client.put_object(
Bucket=self.bucket,
Key=key,
Body=data,
ContentMD5=self._content_md5(data),
)
etag = response.get("ETag")
if not etag:
raise ArchiveStorageError(f"Missing ETag for '{key}'")
normalized_etag = etag.strip('"')
if normalized_etag != checksum:
raise ArchiveStorageError(f"ETag mismatch for '{key}': expected={checksum}, actual={normalized_etag}")
logger.debug("Uploaded object: %s (size=%d, checksum=%s)", key, len(data), checksum)
return checksum
except ClientError as e:
@@ -240,19 +248,18 @@ class ArchiveStorage:
return base64.b64encode(hashlib.md5(data).digest()).decode()
@staticmethod
def serialize_to_jsonl_gz(records: list[dict[str, Any]]) -> bytes:
def serialize_to_jsonl(records: list[dict[str, Any]]) -> bytes:
"""
Serialize records to gzipped JSONL format.
Serialize records to JSONL format.
Args:
records: List of dictionaries to serialize
Returns:
Gzipped JSONL bytes
JSONL bytes
"""
lines = []
for record in records:
# Convert datetime objects to ISO format strings
serialized = ArchiveStorage._serialize_record(record)
lines.append(orjson.dumps(serialized))
@@ -260,23 +267,22 @@ class ArchiveStorage:
if jsonl_content:
jsonl_content += b"\n"
return gzip.compress(jsonl_content)
return jsonl_content
@staticmethod
def deserialize_from_jsonl_gz(data: bytes) -> list[dict[str, Any]]:
def deserialize_from_jsonl(data: bytes) -> list[dict[str, Any]]:
"""
Deserialize gzipped JSONL data to records.
Deserialize JSONL data to records.
Args:
data: Gzipped JSONL bytes
data: JSONL bytes
Returns:
List of dictionaries
"""
jsonl_content = gzip.decompress(data)
records = []
for line in jsonl_content.splitlines():
for line in data.splitlines():
if line:
records.append(orjson.loads(line))

View File

@@ -0,0 +1,95 @@
"""create workflow_archive_logs
Revision ID: 9d77545f524e
Revises: f9f6d18a37f9
Create Date: 2026-01-06 17:18:56.292479
"""
from alembic import op
import models as models
import sqlalchemy as sa
def _is_pg(conn):
return conn.dialect.name == "postgresql"
# revision identifiers, used by Alembic.
revision = '9d77545f524e'
down_revision = 'f9f6d18a37f9'
branch_labels = None
depends_on = None
def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
conn = op.get_bind()
if _is_pg(conn):
op.create_table('workflow_archive_logs',
sa.Column('id', models.types.StringUUID(), server_default=sa.text('uuidv7()'), nullable=False),
sa.Column('log_id', models.types.StringUUID(), nullable=True),
sa.Column('tenant_id', models.types.StringUUID(), nullable=False),
sa.Column('app_id', models.types.StringUUID(), nullable=False),
sa.Column('workflow_id', models.types.StringUUID(), nullable=False),
sa.Column('workflow_run_id', models.types.StringUUID(), nullable=False),
sa.Column('created_by_role', sa.String(length=255), nullable=False),
sa.Column('created_by', models.types.StringUUID(), nullable=False),
sa.Column('log_created_at', sa.DateTime(), nullable=True),
sa.Column('log_created_from', sa.String(length=255), nullable=True),
sa.Column('run_version', sa.String(length=255), nullable=False),
sa.Column('run_status', sa.String(length=255), nullable=False),
sa.Column('run_triggered_from', sa.String(length=255), nullable=False),
sa.Column('run_error', models.types.LongText(), nullable=True),
sa.Column('run_elapsed_time', sa.Float(), server_default=sa.text('0'), nullable=False),
sa.Column('run_total_tokens', sa.BigInteger(), server_default=sa.text('0'), nullable=False),
sa.Column('run_total_steps', sa.Integer(), server_default=sa.text('0'), nullable=True),
sa.Column('run_created_at', sa.DateTime(), nullable=False),
sa.Column('run_finished_at', sa.DateTime(), nullable=True),
sa.Column('run_exceptions_count', sa.Integer(), server_default=sa.text('0'), nullable=True),
sa.Column('trigger_metadata', models.types.LongText(), nullable=True),
sa.Column('archived_at', sa.DateTime(), server_default=sa.text('CURRENT_TIMESTAMP'), nullable=False),
sa.PrimaryKeyConstraint('id', name='workflow_archive_log_pkey')
)
else:
op.create_table('workflow_archive_logs',
sa.Column('id', models.types.StringUUID(), nullable=False),
sa.Column('log_id', models.types.StringUUID(), nullable=True),
sa.Column('tenant_id', models.types.StringUUID(), nullable=False),
sa.Column('app_id', models.types.StringUUID(), nullable=False),
sa.Column('workflow_id', models.types.StringUUID(), nullable=False),
sa.Column('workflow_run_id', models.types.StringUUID(), nullable=False),
sa.Column('created_by_role', sa.String(length=255), nullable=False),
sa.Column('created_by', models.types.StringUUID(), nullable=False),
sa.Column('log_created_at', sa.DateTime(), nullable=True),
sa.Column('log_created_from', sa.String(length=255), nullable=True),
sa.Column('run_version', sa.String(length=255), nullable=False),
sa.Column('run_status', sa.String(length=255), nullable=False),
sa.Column('run_triggered_from', sa.String(length=255), nullable=False),
sa.Column('run_error', models.types.LongText(), nullable=True),
sa.Column('run_elapsed_time', sa.Float(), server_default=sa.text('0'), nullable=False),
sa.Column('run_total_tokens', sa.BigInteger(), server_default=sa.text('0'), nullable=False),
sa.Column('run_total_steps', sa.Integer(), server_default=sa.text('0'), nullable=True),
sa.Column('run_created_at', sa.DateTime(), nullable=False),
sa.Column('run_finished_at', sa.DateTime(), nullable=True),
sa.Column('run_exceptions_count', sa.Integer(), server_default=sa.text('0'), nullable=True),
sa.Column('trigger_metadata', models.types.LongText(), nullable=True),
sa.Column('archived_at', sa.DateTime(), server_default=sa.text('CURRENT_TIMESTAMP'), nullable=False),
sa.PrimaryKeyConstraint('id', name='workflow_archive_log_pkey')
)
with op.batch_alter_table('workflow_archive_logs', schema=None) as batch_op:
batch_op.create_index('workflow_archive_log_app_idx', ['tenant_id', 'app_id'], unique=False)
batch_op.create_index('workflow_archive_log_run_created_at_idx', ['run_created_at'], unique=False)
batch_op.create_index('workflow_archive_log_workflow_run_id_idx', ['workflow_run_id'], unique=False)
# ### end Alembic commands ###
def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('workflow_archive_logs', schema=None) as batch_op:
batch_op.drop_index('workflow_archive_log_workflow_run_id_idx')
batch_op.drop_index('workflow_archive_log_run_created_at_idx')
batch_op.drop_index('workflow_archive_log_app_idx')
op.drop_table('workflow_archive_logs')
# ### end Alembic commands ###

View File

@@ -103,6 +103,7 @@ from .workflow import (
Workflow,
WorkflowAppLog,
WorkflowAppLogCreatedFrom,
WorkflowArchiveLog,
WorkflowNodeExecutionModel,
WorkflowNodeExecutionOffload,
WorkflowNodeExecutionTriggeredFrom,
@@ -203,6 +204,7 @@ __all__ = [
"Workflow",
"WorkflowAppLog",
"WorkflowAppLogCreatedFrom",
"WorkflowArchiveLog",
"WorkflowNodeExecutionModel",
"WorkflowNodeExecutionOffload",
"WorkflowNodeExecutionTriggeredFrom",

View File

@@ -315,40 +315,48 @@ class App(Base):
return None
class AppModelConfig(Base):
class AppModelConfig(TypeBase):
__tablename__ = "app_model_configs"
__table_args__ = (sa.PrimaryKeyConstraint("id", name="app_model_config_pkey"), sa.Index("app_app_id_idx", "app_id"))
id = mapped_column(StringUUID, default=lambda: str(uuid4()))
app_id = mapped_column(StringUUID, nullable=False)
provider = mapped_column(String(255), nullable=True)
model_id = mapped_column(String(255), nullable=True)
configs = mapped_column(sa.JSON, nullable=True)
created_by = mapped_column(StringUUID, nullable=True)
created_at = mapped_column(sa.DateTime, nullable=False, server_default=func.current_timestamp())
updated_by = mapped_column(StringUUID, nullable=True)
updated_at = mapped_column(
sa.DateTime, nullable=False, server_default=func.current_timestamp(), onupdate=func.current_timestamp()
id: Mapped[str] = mapped_column(StringUUID, default=lambda: str(uuid4()), init=False)
app_id: Mapped[str] = mapped_column(StringUUID, nullable=False)
provider: Mapped[str | None] = mapped_column(String(255), nullable=True, default=None)
model_id: Mapped[str | None] = mapped_column(String(255), nullable=True, default=None)
configs: Mapped[Any | None] = mapped_column(sa.JSON, nullable=True, default=None)
created_by: Mapped[str | None] = mapped_column(StringUUID, nullable=True, default=None)
created_at: Mapped[datetime] = mapped_column(
sa.DateTime, nullable=False, server_default=func.current_timestamp(), init=False
)
opening_statement = mapped_column(LongText)
suggested_questions = mapped_column(LongText)
suggested_questions_after_answer = mapped_column(LongText)
speech_to_text = mapped_column(LongText)
text_to_speech = mapped_column(LongText)
more_like_this = mapped_column(LongText)
model = mapped_column(LongText)
user_input_form = mapped_column(LongText)
dataset_query_variable = mapped_column(String(255))
pre_prompt = mapped_column(LongText)
agent_mode = mapped_column(LongText)
sensitive_word_avoidance = mapped_column(LongText)
retriever_resource = mapped_column(LongText)
prompt_type = mapped_column(String(255), nullable=False, server_default=sa.text("'simple'"))
chat_prompt_config = mapped_column(LongText)
completion_prompt_config = mapped_column(LongText)
dataset_configs = mapped_column(LongText)
external_data_tools = mapped_column(LongText)
file_upload = mapped_column(LongText)
updated_by: Mapped[str | None] = mapped_column(StringUUID, nullable=True, default=None)
updated_at: Mapped[datetime] = mapped_column(
sa.DateTime,
nullable=False,
server_default=func.current_timestamp(),
onupdate=func.current_timestamp(),
init=False,
)
opening_statement: Mapped[str | None] = mapped_column(LongText, default=None)
suggested_questions: Mapped[str | None] = mapped_column(LongText, default=None)
suggested_questions_after_answer: Mapped[str | None] = mapped_column(LongText, default=None)
speech_to_text: Mapped[str | None] = mapped_column(LongText, default=None)
text_to_speech: Mapped[str | None] = mapped_column(LongText, default=None)
more_like_this: Mapped[str | None] = mapped_column(LongText, default=None)
model: Mapped[str | None] = mapped_column(LongText, default=None)
user_input_form: Mapped[str | None] = mapped_column(LongText, default=None)
dataset_query_variable: Mapped[str | None] = mapped_column(String(255), default=None)
pre_prompt: Mapped[str | None] = mapped_column(LongText, default=None)
agent_mode: Mapped[str | None] = mapped_column(LongText, default=None)
sensitive_word_avoidance: Mapped[str | None] = mapped_column(LongText, default=None)
retriever_resource: Mapped[str | None] = mapped_column(LongText, default=None)
prompt_type: Mapped[str] = mapped_column(
String(255), nullable=False, server_default=sa.text("'simple'"), default="simple"
)
chat_prompt_config: Mapped[str | None] = mapped_column(LongText, default=None)
completion_prompt_config: Mapped[str | None] = mapped_column(LongText, default=None)
dataset_configs: Mapped[str | None] = mapped_column(LongText, default=None)
external_data_tools: Mapped[str | None] = mapped_column(LongText, default=None)
file_upload: Mapped[str | None] = mapped_column(LongText, default=None)
@property
def app(self) -> App | None:
@@ -810,8 +818,8 @@ class Conversation(Base):
override_model_configs = json.loads(self.override_model_configs)
if "model" in override_model_configs:
app_model_config = AppModelConfig()
app_model_config = app_model_config.from_model_config_dict(override_model_configs)
# where is app_id?
app_model_config = AppModelConfig(app_id=self.app_id).from_model_config_dict(override_model_configs)
model_config = app_model_config.to_dict()
else:
model_config["configs"] = override_model_configs

View File

@@ -226,8 +226,7 @@ class Workflow(Base): # bug
#
# Currently, the following functions / methods would mutate the returned dict:
#
# - `_get_graph_and_variable_pool_of_single_iteration`.
# - `_get_graph_and_variable_pool_of_single_loop`.
# - `_get_graph_and_variable_pool_for_single_node_run`.
return json.loads(self.graph) if self.graph else {}
def get_node_config_by_id(self, node_id: str) -> Mapping[str, Any]:
@@ -1163,6 +1162,69 @@ class WorkflowAppLog(TypeBase):
}
class WorkflowArchiveLog(TypeBase):
"""
Workflow archive log.
Stores essential workflow run snapshot data for archived app logs.
Field sources:
- Shared fields (tenant/app/workflow/run ids, created_by*): from WorkflowRun for consistency.
- log_* fields: from WorkflowAppLog when present; null if the run has no app log.
- run_* fields: workflow run snapshot fields from WorkflowRun.
- trigger_metadata: snapshot from WorkflowTriggerLog when present.
"""
__tablename__ = "workflow_archive_logs"
__table_args__ = (
sa.PrimaryKeyConstraint("id", name="workflow_archive_log_pkey"),
sa.Index("workflow_archive_log_app_idx", "tenant_id", "app_id"),
sa.Index("workflow_archive_log_workflow_run_id_idx", "workflow_run_id"),
sa.Index("workflow_archive_log_run_created_at_idx", "run_created_at"),
)
id: Mapped[str] = mapped_column(
StringUUID, insert_default=lambda: str(uuidv7()), default_factory=lambda: str(uuidv7()), init=False
)
tenant_id: Mapped[str] = mapped_column(StringUUID, nullable=False)
app_id: Mapped[str] = mapped_column(StringUUID, nullable=False)
workflow_id: Mapped[str] = mapped_column(StringUUID, nullable=False)
workflow_run_id: Mapped[str] = mapped_column(StringUUID, nullable=False)
created_by_role: Mapped[str] = mapped_column(String(255), nullable=False)
created_by: Mapped[str] = mapped_column(StringUUID, nullable=False)
log_id: Mapped[str | None] = mapped_column(StringUUID, nullable=True)
log_created_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True)
log_created_from: Mapped[str | None] = mapped_column(String(255), nullable=True)
run_version: Mapped[str] = mapped_column(String(255), nullable=False)
run_status: Mapped[str] = mapped_column(String(255), nullable=False)
run_triggered_from: Mapped[str] = mapped_column(String(255), nullable=False)
run_error: Mapped[str | None] = mapped_column(LongText, nullable=True)
run_elapsed_time: Mapped[float] = mapped_column(sa.Float, nullable=False, server_default=sa.text("0"))
run_total_tokens: Mapped[int] = mapped_column(sa.BigInteger, server_default=sa.text("0"))
run_total_steps: Mapped[int] = mapped_column(sa.Integer, server_default=sa.text("0"), nullable=True)
run_created_at: Mapped[datetime] = mapped_column(DateTime, nullable=False)
run_finished_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True)
run_exceptions_count: Mapped[int] = mapped_column(sa.Integer, server_default=sa.text("0"), nullable=True)
trigger_metadata: Mapped[str | None] = mapped_column(LongText, nullable=True)
archived_at: Mapped[datetime] = mapped_column(
DateTime, nullable=False, server_default=func.current_timestamp(), init=False
)
@property
def workflow_run_summary(self) -> dict[str, Any]:
return {
"id": self.workflow_run_id,
"status": self.run_status,
"triggered_from": self.run_triggered_from,
"elapsed_time": self.run_elapsed_time,
"total_tokens": self.run_total_tokens,
}
class ConversationVariable(TypeBase):
__tablename__ = "workflow_conversation_variables"

View File

@@ -31,7 +31,7 @@ dependencies = [
"gunicorn~=23.0.0",
"httpx[socks]~=0.27.0",
"jieba==0.42.1",
"json-repair>=0.41.1",
"json-repair>=0.55.1",
"jsonschema>=4.25.1",
"langfuse~=2.51.3",
"langsmith~=0.1.77",
@@ -64,7 +64,7 @@ dependencies = [
"pandas[excel,output-formatting,performance]~=2.2.2",
"psycogreen~=1.0.2",
"psycopg2-binary~=2.9.6",
"pycryptodome==3.19.1",
"pycryptodome==3.23.0",
"pydantic~=2.11.4",
"pydantic-extra-types~=2.10.3",
"pydantic-settings~=2.11.0",
@@ -93,6 +93,7 @@ dependencies = [
"weaviate-client==4.17.0",
"apscheduler>=3.11.0",
"weave>=0.52.16",
"fastopenapi[flask]>=0.7.0",
]
# Before adding new dependency, consider place it in
# alphabet order (a-z) and suitable group.

View File

@@ -8,6 +8,7 @@
],
"typeCheckingMode": "strict",
"allowedUntypedLibraries": [
"fastopenapi",
"flask_restx",
"flask_login",
"opentelemetry.instrumentation.celery",

View File

@@ -16,7 +16,7 @@ from typing import Protocol
from sqlalchemy.orm import Session
from core.workflow.repositories.workflow_node_execution_repository import WorkflowNodeExecutionRepository
from models.workflow import WorkflowNodeExecutionModel
from models.workflow import WorkflowNodeExecutionModel, WorkflowNodeExecutionOffload
class DifyAPIWorkflowNodeExecutionRepository(WorkflowNodeExecutionRepository, Protocol):
@@ -209,3 +209,23 @@ class DifyAPIWorkflowNodeExecutionRepository(WorkflowNodeExecutionRepository, Pr
The number of executions deleted
"""
...
def get_offloads_by_execution_ids(
self,
session: Session,
node_execution_ids: Sequence[str],
) -> Sequence[WorkflowNodeExecutionOffload]:
"""
Get offload records by node execution IDs.
This method retrieves workflow node execution offload records
that belong to the given node execution IDs.
Args:
session: The database session to use
node_execution_ids: List of node execution IDs to filter by
Returns:
A sequence of WorkflowNodeExecutionOffload instances
"""
...

View File

@@ -45,7 +45,7 @@ from core.workflow.enums import WorkflowType
from core.workflow.repositories.workflow_execution_repository import WorkflowExecutionRepository
from libs.infinite_scroll_pagination import InfiniteScrollPagination
from models.enums import WorkflowRunTriggeredFrom
from models.workflow import WorkflowRun
from models.workflow import WorkflowAppLog, WorkflowArchiveLog, WorkflowPause, WorkflowPauseReason, WorkflowRun
from repositories.entities.workflow_pause import WorkflowPauseEntity
from repositories.types import (
AverageInteractionStats,
@@ -270,6 +270,58 @@ class APIWorkflowRunRepository(WorkflowExecutionRepository, Protocol):
"""
...
def get_archived_run_ids(
self,
session: Session,
run_ids: Sequence[str],
) -> set[str]:
"""
Fetch workflow run IDs that already have archive log records.
"""
...
def get_archived_logs_by_time_range(
self,
session: Session,
tenant_ids: Sequence[str] | None,
start_date: datetime,
end_date: datetime,
limit: int,
) -> Sequence[WorkflowArchiveLog]:
"""
Fetch archived workflow logs by time range for restore.
"""
...
def get_archived_log_by_run_id(
self,
run_id: str,
) -> WorkflowArchiveLog | None:
"""
Fetch a workflow archive log by workflow run ID.
"""
...
def delete_archive_log_by_run_id(
self,
session: Session,
run_id: str,
) -> int:
"""
Delete archive log by workflow run ID.
Used after restoring a workflow run to remove the archive log record,
allowing the run to be archived again if needed.
Args:
session: Database session
run_id: Workflow run ID
Returns:
Number of records deleted (0 or 1)
"""
...
def delete_runs_with_related(
self,
runs: Sequence[WorkflowRun],
@@ -282,6 +334,61 @@ class APIWorkflowRunRepository(WorkflowExecutionRepository, Protocol):
"""
...
def get_pause_records_by_run_id(
self,
session: Session,
run_id: str,
) -> Sequence[WorkflowPause]:
"""
Fetch workflow pause records by workflow run ID.
"""
...
def get_pause_reason_records_by_run_id(
self,
session: Session,
pause_ids: Sequence[str],
) -> Sequence[WorkflowPauseReason]:
"""
Fetch workflow pause reason records by pause IDs.
"""
...
def get_app_logs_by_run_id(
self,
session: Session,
run_id: str,
) -> Sequence[WorkflowAppLog]:
"""
Fetch workflow app logs by workflow run ID.
"""
...
def create_archive_logs(
self,
session: Session,
run: WorkflowRun,
app_logs: Sequence[WorkflowAppLog],
trigger_metadata: str | None,
) -> int:
"""
Create archive log records for a workflow run.
"""
...
def get_archived_runs_by_time_range(
self,
session: Session,
tenant_ids: Sequence[str] | None,
start_date: datetime,
end_date: datetime,
limit: int,
) -> Sequence[WorkflowRun]:
"""
Return workflow runs that already have archive logs, for cleanup of `workflow_runs`.
"""
...
def count_runs_with_related(
self,
runs: Sequence[WorkflowRun],

View File

@@ -351,3 +351,27 @@ class DifyAPISQLAlchemyWorkflowNodeExecutionRepository(DifyAPIWorkflowNodeExecut
)
return int(node_executions_count), int(offloads_count)
@staticmethod
def get_by_run(
session: Session,
run_id: str,
) -> Sequence[WorkflowNodeExecutionModel]:
"""
Fetch node executions for a run using workflow_run_id.
"""
stmt = select(WorkflowNodeExecutionModel).where(WorkflowNodeExecutionModel.workflow_run_id == run_id)
return list(session.scalars(stmt))
def get_offloads_by_execution_ids(
self,
session: Session,
node_execution_ids: Sequence[str],
) -> Sequence[WorkflowNodeExecutionOffload]:
if not node_execution_ids:
return []
stmt = select(WorkflowNodeExecutionOffload).where(
WorkflowNodeExecutionOffload.node_execution_id.in_(node_execution_ids)
)
return list(session.scalars(stmt))

View File

@@ -40,14 +40,7 @@ from libs.infinite_scroll_pagination import InfiniteScrollPagination
from libs.time_parser import get_time_threshold
from libs.uuid_utils import uuidv7
from models.enums import WorkflowRunTriggeredFrom
from models.workflow import (
WorkflowAppLog,
WorkflowPauseReason,
WorkflowRun,
)
from models.workflow import (
WorkflowPause as WorkflowPauseModel,
)
from models.workflow import WorkflowAppLog, WorkflowArchiveLog, WorkflowPause, WorkflowPauseReason, WorkflowRun
from repositories.api_workflow_run_repository import APIWorkflowRunRepository
from repositories.entities.workflow_pause import WorkflowPauseEntity
from repositories.types import (
@@ -369,6 +362,53 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
return session.scalars(stmt).all()
def get_archived_run_ids(
self,
session: Session,
run_ids: Sequence[str],
) -> set[str]:
if not run_ids:
return set()
stmt = select(WorkflowArchiveLog.workflow_run_id).where(WorkflowArchiveLog.workflow_run_id.in_(run_ids))
return set(session.scalars(stmt).all())
def get_archived_log_by_run_id(
self,
run_id: str,
) -> WorkflowArchiveLog | None:
with self._session_maker() as session:
stmt = select(WorkflowArchiveLog).where(WorkflowArchiveLog.workflow_run_id == run_id).limit(1)
return session.scalar(stmt)
def delete_archive_log_by_run_id(
self,
session: Session,
run_id: str,
) -> int:
stmt = delete(WorkflowArchiveLog).where(WorkflowArchiveLog.workflow_run_id == run_id)
result = session.execute(stmt)
return cast(CursorResult, result).rowcount or 0
def get_pause_records_by_run_id(
self,
session: Session,
run_id: str,
) -> Sequence[WorkflowPause]:
stmt = select(WorkflowPause).where(WorkflowPause.workflow_run_id == run_id)
return list(session.scalars(stmt))
def get_pause_reason_records_by_run_id(
self,
session: Session,
pause_ids: Sequence[str],
) -> Sequence[WorkflowPauseReason]:
if not pause_ids:
return []
stmt = select(WorkflowPauseReason).where(WorkflowPauseReason.pause_id.in_(pause_ids))
return list(session.scalars(stmt))
def delete_runs_with_related(
self,
runs: Sequence[WorkflowRun],
@@ -396,9 +436,8 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
app_logs_result = session.execute(delete(WorkflowAppLog).where(WorkflowAppLog.workflow_run_id.in_(run_ids)))
app_logs_deleted = cast(CursorResult, app_logs_result).rowcount or 0
pause_ids = session.scalars(
select(WorkflowPauseModel.id).where(WorkflowPauseModel.workflow_run_id.in_(run_ids))
).all()
pause_stmt = select(WorkflowPause.id).where(WorkflowPause.workflow_run_id.in_(run_ids))
pause_ids = session.scalars(pause_stmt).all()
pause_reasons_deleted = 0
pauses_deleted = 0
@@ -407,7 +446,7 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
delete(WorkflowPauseReason).where(WorkflowPauseReason.pause_id.in_(pause_ids))
)
pause_reasons_deleted = cast(CursorResult, pause_reasons_result).rowcount or 0
pauses_result = session.execute(delete(WorkflowPauseModel).where(WorkflowPauseModel.id.in_(pause_ids)))
pauses_result = session.execute(delete(WorkflowPause).where(WorkflowPause.id.in_(pause_ids)))
pauses_deleted = cast(CursorResult, pauses_result).rowcount or 0
trigger_logs_deleted = delete_trigger_logs(session, run_ids) if delete_trigger_logs else 0
@@ -427,6 +466,124 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
"pause_reasons": pause_reasons_deleted,
}
def get_app_logs_by_run_id(
self,
session: Session,
run_id: str,
) -> Sequence[WorkflowAppLog]:
stmt = select(WorkflowAppLog).where(WorkflowAppLog.workflow_run_id == run_id)
return list(session.scalars(stmt))
def create_archive_logs(
self,
session: Session,
run: WorkflowRun,
app_logs: Sequence[WorkflowAppLog],
trigger_metadata: str | None,
) -> int:
if not app_logs:
archive_log = WorkflowArchiveLog(
log_id=None,
log_created_at=None,
log_created_from=None,
tenant_id=run.tenant_id,
app_id=run.app_id,
workflow_id=run.workflow_id,
workflow_run_id=run.id,
created_by_role=run.created_by_role,
created_by=run.created_by,
run_version=run.version,
run_status=run.status,
run_triggered_from=run.triggered_from,
run_error=run.error,
run_elapsed_time=run.elapsed_time,
run_total_tokens=run.total_tokens,
run_total_steps=run.total_steps,
run_created_at=run.created_at,
run_finished_at=run.finished_at,
run_exceptions_count=run.exceptions_count,
trigger_metadata=trigger_metadata,
)
session.add(archive_log)
return 1
archive_logs = [
WorkflowArchiveLog(
log_id=app_log.id,
log_created_at=app_log.created_at,
log_created_from=app_log.created_from,
tenant_id=run.tenant_id,
app_id=run.app_id,
workflow_id=run.workflow_id,
workflow_run_id=run.id,
created_by_role=run.created_by_role,
created_by=run.created_by,
run_version=run.version,
run_status=run.status,
run_triggered_from=run.triggered_from,
run_error=run.error,
run_elapsed_time=run.elapsed_time,
run_total_tokens=run.total_tokens,
run_total_steps=run.total_steps,
run_created_at=run.created_at,
run_finished_at=run.finished_at,
run_exceptions_count=run.exceptions_count,
trigger_metadata=trigger_metadata,
)
for app_log in app_logs
]
session.add_all(archive_logs)
return len(archive_logs)
def get_archived_runs_by_time_range(
self,
session: Session,
tenant_ids: Sequence[str] | None,
start_date: datetime,
end_date: datetime,
limit: int,
) -> Sequence[WorkflowRun]:
"""
Retrieves WorkflowRun records by joining workflow_archive_logs.
Used to identify runs that are already archived and ready for deletion.
"""
stmt = (
select(WorkflowRun)
.join(WorkflowArchiveLog, WorkflowArchiveLog.workflow_run_id == WorkflowRun.id)
.where(
WorkflowArchiveLog.run_created_at >= start_date,
WorkflowArchiveLog.run_created_at < end_date,
)
.order_by(WorkflowArchiveLog.run_created_at.asc(), WorkflowArchiveLog.workflow_run_id.asc())
.limit(limit)
)
if tenant_ids:
stmt = stmt.where(WorkflowArchiveLog.tenant_id.in_(tenant_ids))
return list(session.scalars(stmt))
def get_archived_logs_by_time_range(
self,
session: Session,
tenant_ids: Sequence[str] | None,
start_date: datetime,
end_date: datetime,
limit: int,
) -> Sequence[WorkflowArchiveLog]:
# Returns WorkflowArchiveLog rows directly; use this when workflow_runs may be deleted.
stmt = (
select(WorkflowArchiveLog)
.where(
WorkflowArchiveLog.run_created_at >= start_date,
WorkflowArchiveLog.run_created_at < end_date,
)
.order_by(WorkflowArchiveLog.run_created_at.asc(), WorkflowArchiveLog.workflow_run_id.asc())
.limit(limit)
)
if tenant_ids:
stmt = stmt.where(WorkflowArchiveLog.tenant_id.in_(tenant_ids))
return list(session.scalars(stmt))
def count_runs_with_related(
self,
runs: Sequence[WorkflowRun],
@@ -459,7 +616,7 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
)
pause_ids = session.scalars(
select(WorkflowPauseModel.id).where(WorkflowPauseModel.workflow_run_id.in_(run_ids))
select(WorkflowPause.id).where(WorkflowPause.workflow_run_id.in_(run_ids))
).all()
pauses_count = len(pause_ids)
pause_reasons_count = 0
@@ -511,9 +668,7 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
ValueError: If workflow_run_id is invalid or workflow run doesn't exist
RuntimeError: If workflow is already paused or in invalid state
"""
previous_pause_model_query = select(WorkflowPauseModel).where(
WorkflowPauseModel.workflow_run_id == workflow_run_id
)
previous_pause_model_query = select(WorkflowPause).where(WorkflowPause.workflow_run_id == workflow_run_id)
with self._session_maker() as session, session.begin():
# Get the workflow run
workflow_run = session.get(WorkflowRun, workflow_run_id)
@@ -538,7 +693,7 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
# Upload the state file
# Create the pause record
pause_model = WorkflowPauseModel()
pause_model = WorkflowPause()
pause_model.id = str(uuidv7())
pause_model.workflow_id = workflow_run.workflow_id
pause_model.workflow_run_id = workflow_run.id
@@ -710,13 +865,13 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
"""
with self._session_maker() as session, session.begin():
# Get the pause model by ID
pause_model = session.get(WorkflowPauseModel, pause_entity.id)
pause_model = session.get(WorkflowPause, pause_entity.id)
if pause_model is None:
raise _WorkflowRunError(f"WorkflowPause not found: {pause_entity.id}")
self._delete_pause_model(session, pause_model)
@staticmethod
def _delete_pause_model(session: Session, pause_model: WorkflowPauseModel):
def _delete_pause_model(session: Session, pause_model: WorkflowPause):
storage.delete(pause_model.state_object_key)
# Delete the pause record
@@ -751,15 +906,15 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
_limit: int = limit or 1000
pruned_record_ids: list[str] = []
cond = or_(
WorkflowPauseModel.created_at < expiration,
WorkflowPause.created_at < expiration,
and_(
WorkflowPauseModel.resumed_at.is_not(null()),
WorkflowPauseModel.resumed_at < resumption_expiration,
WorkflowPause.resumed_at.is_not(null()),
WorkflowPause.resumed_at < resumption_expiration,
),
)
# First, collect pause records to delete with their state files
# Expired pauses (created before expiration time)
stmt = select(WorkflowPauseModel).where(cond).limit(_limit)
stmt = select(WorkflowPause).where(cond).limit(_limit)
with self._session_maker(expire_on_commit=False) as session:
# Old resumed pauses (resumed more than resumption_duration ago)
@@ -770,7 +925,7 @@ class DifyAPISQLAlchemyWorkflowRunRepository(APIWorkflowRunRepository):
# Delete state files from storage
for pause in pauses_to_delete:
with self._session_maker(expire_on_commit=False) as session, session.begin():
# todo: this issues a separate query for each WorkflowPauseModel record.
# todo: this issues a separate query for each WorkflowPause record.
# consider batching this lookup.
try:
storage.delete(pause.state_object_key)
@@ -1022,7 +1177,7 @@ class _PrivateWorkflowPauseEntity(WorkflowPauseEntity):
def __init__(
self,
*,
pause_model: WorkflowPauseModel,
pause_model: WorkflowPause,
reason_models: Sequence[WorkflowPauseReason],
human_input_form: Sequence = (),
) -> None:

View File

@@ -46,6 +46,11 @@ class SQLAlchemyWorkflowTriggerLogRepository(WorkflowTriggerLogRepository):
return self.session.scalar(query)
def list_by_run_id(self, run_id: str) -> Sequence[WorkflowTriggerLog]:
"""List trigger logs for a workflow run."""
query = select(WorkflowTriggerLog).where(WorkflowTriggerLog.workflow_run_id == run_id)
return list(self.session.scalars(query).all())
def get_failed_for_retry(
self, tenant_id: str, max_retry_count: int = 3, limit: int = 100
) -> Sequence[WorkflowTriggerLog]:

View File

@@ -521,12 +521,10 @@ class AppDslService:
raise ValueError("Missing model_config for chat/agent-chat/completion app")
# Initialize or update model config
if not app.app_model_config:
app_model_config = AppModelConfig().from_model_config_dict(model_config)
app_model_config = AppModelConfig(
app_id=app.id, created_by=account.id, updated_by=account.id
).from_model_config_dict(model_config)
app_model_config.id = str(uuid4())
app_model_config.app_id = app.id
app_model_config.created_by = account.id
app_model_config.updated_by = account.id
app.app_model_config_id = app_model_config.id
self._session.add(app_model_config)
@@ -783,15 +781,16 @@ class AppDslService:
return dependencies
@classmethod
def get_leaked_dependencies(cls, tenant_id: str, dsl_dependencies: list[dict]) -> list[PluginDependency]:
def get_leaked_dependencies(
cls, tenant_id: str, dsl_dependencies: list[PluginDependency]
) -> list[PluginDependency]:
"""
Returns the leaked dependencies in current workspace
"""
dependencies = [PluginDependency.model_validate(dep) for dep in dsl_dependencies]
if not dependencies:
if not dsl_dependencies:
return []
return DependenciesAnalysisService.get_leaked_dependencies(tenant_id=tenant_id, dependencies=dependencies)
return DependenciesAnalysisService.get_leaked_dependencies(tenant_id=tenant_id, dependencies=dsl_dependencies)
@staticmethod
def _generate_aes_key(tenant_id: str) -> bytes:

View File

@@ -1,6 +1,8 @@
from __future__ import annotations
import uuid
from collections.abc import Generator, Mapping
from typing import Any, Union
from typing import TYPE_CHECKING, Any, Union
from configs import dify_config
from core.app.apps.advanced_chat.app_generator import AdvancedChatAppGenerator
@@ -18,6 +20,9 @@ from services.errors.app import QuotaExceededError, WorkflowIdFormatError, Workf
from services.errors.llm import InvokeRateLimitError
from services.workflow_service import WorkflowService
if TYPE_CHECKING:
from controllers.console.app.workflow import LoopNodeRunPayload
class AppGenerateService:
@classmethod
@@ -165,7 +170,9 @@ class AppGenerateService:
raise ValueError(f"Invalid app mode {app_model.mode}")
@classmethod
def generate_single_loop(cls, app_model: App, user: Account, node_id: str, args: Any, streaming: bool = True):
def generate_single_loop(
cls, app_model: App, user: Account, node_id: str, args: LoopNodeRunPayload, streaming: bool = True
):
if app_model.mode == AppMode.ADVANCED_CHAT:
workflow = cls._get_workflow(app_model, InvokeFrom.DEBUGGER)
return AdvancedChatAppGenerator.convert_to_event_stream(

View File

@@ -150,10 +150,9 @@ class AppService:
db.session.flush()
if default_model_config:
app_model_config = AppModelConfig(**default_model_config)
app_model_config.app_id = app.id
app_model_config.created_by = account.id
app_model_config.updated_by = account.id
app_model_config = AppModelConfig(
**default_model_config, app_id=app.id, created_by=account.id, updated_by=account.id
)
db.session.add(app_model_config)
db.session.flush()

View File

@@ -131,7 +131,7 @@ class BillingService:
headers = {"Content-Type": "application/json", "Billing-Api-Secret-Key": cls.secret_key}
url = f"{cls.base_url}{endpoint}"
response = httpx.request(method, url, json=json, params=params, headers=headers)
response = httpx.request(method, url, json=json, params=params, headers=headers, follow_redirects=True)
if method == "GET" and response.status_code != httpx.codes.OK:
raise ValueError("Unable to retrieve billing information. Please try again later or contact support.")
if method == "PUT":
@@ -143,6 +143,9 @@ class BillingService:
raise ValueError("Invalid arguments.")
if method == "POST" and response.status_code != httpx.codes.OK:
raise ValueError(f"Unable to send request to {url}. Please try again later or contact support.")
if method == "DELETE" and response.status_code != httpx.codes.OK:
logger.error("billing_service: DELETE response: %s %s", response.status_code, response.text)
raise ValueError(f"Unable to process delete request {url}. Please try again later or contact support.")
return response.json()
@staticmethod
@@ -165,7 +168,7 @@ class BillingService:
def delete_account(cls, account_id: str):
"""Delete account."""
params = {"account_id": account_id}
return cls._send_request("DELETE", "/account/", params=params)
return cls._send_request("DELETE", "/account", params=params)
@classmethod
def is_email_in_freeze(cls, email: str) -> bool:

View File

@@ -4,6 +4,7 @@ from pydantic import BaseModel, ConfigDict, Field
from configs import dify_config
from enums.cloud_plan import CloudPlan
from enums.hosted_provider import HostedTrialProvider
from services.billing_service import BillingService
from services.enterprise.enterprise_service import EnterpriseService
@@ -170,6 +171,7 @@ class SystemFeatureModel(BaseModel):
plugin_installation_permission: PluginInstallationPermissionModel = PluginInstallationPermissionModel()
enable_change_email: bool = True
plugin_manager: PluginManagerModel = PluginManagerModel()
trial_models: list[str] = []
enable_trial_app: bool = False
enable_explore_banner: bool = False
@@ -202,7 +204,7 @@ class FeatureService:
return knowledge_rate_limit
@classmethod
def get_system_features(cls) -> SystemFeatureModel:
def get_system_features(cls, is_authenticated: bool = False) -> SystemFeatureModel:
system_features = SystemFeatureModel()
cls._fulfill_system_params_from_env(system_features)
@@ -212,7 +214,7 @@ class FeatureService:
system_features.webapp_auth.enabled = True
system_features.enable_change_email = False
system_features.plugin_manager.enabled = True
cls._fulfill_params_from_enterprise(system_features)
cls._fulfill_params_from_enterprise(system_features, is_authenticated)
if dify_config.MARKETPLACE_ENABLED:
system_features.enable_marketplace = True
@@ -227,9 +229,21 @@ class FeatureService:
system_features.is_allow_register = dify_config.ALLOW_REGISTER
system_features.is_allow_create_workspace = dify_config.ALLOW_CREATE_WORKSPACE
system_features.is_email_setup = dify_config.MAIL_TYPE is not None and dify_config.MAIL_TYPE != ""
system_features.trial_models = cls._fulfill_trial_models_from_env()
system_features.enable_trial_app = dify_config.ENABLE_TRIAL_APP
system_features.enable_explore_banner = dify_config.ENABLE_EXPLORE_BANNER
@classmethod
def _fulfill_trial_models_from_env(cls) -> list[str]:
return [
provider.value
for provider in HostedTrialProvider
if (
getattr(dify_config, f"HOSTED_{provider.config_key}_PAID_ENABLED", False)
and getattr(dify_config, f"HOSTED_{provider.config_key}_TRIAL_ENABLED", False)
)
]
@classmethod
def _fulfill_params_from_env(cls, features: FeatureModel):
features.can_replace_logo = dify_config.CAN_REPLACE_LOGO
@@ -310,7 +324,7 @@ class FeatureService:
features.next_credit_reset_date = billing_info["next_credit_reset_date"]
@classmethod
def _fulfill_params_from_enterprise(cls, features: SystemFeatureModel):
def _fulfill_params_from_enterprise(cls, features: SystemFeatureModel, is_authenticated: bool = False):
enterprise_info = EnterpriseService.get_info()
if "SSOEnforcedForSignin" in enterprise_info:
@@ -347,19 +361,14 @@ class FeatureService:
)
features.webapp_auth.sso_config.protocol = enterprise_info.get("SSOEnforcedForWebProtocol", "")
if "License" in enterprise_info:
license_info = enterprise_info["License"]
if is_authenticated and (license_info := enterprise_info.get("License")):
features.license.status = LicenseStatus(license_info.get("status", LicenseStatus.INACTIVE))
features.license.expired_at = license_info.get("expiredAt", "")
if "status" in license_info:
features.license.status = LicenseStatus(license_info.get("status", LicenseStatus.INACTIVE))
if "expiredAt" in license_info:
features.license.expired_at = license_info["expiredAt"]
if "workspaces" in license_info:
features.license.workspaces.enabled = license_info["workspaces"]["enabled"]
features.license.workspaces.limit = license_info["workspaces"]["limit"]
features.license.workspaces.size = license_info["workspaces"]["used"]
if workspaces_info := license_info.get("workspaces"):
features.license.workspaces.enabled = workspaces_info.get("enabled", False)
features.license.workspaces.limit = workspaces_info.get("limit", 0)
features.license.workspaces.size = workspaces_info.get("used", 0)
if "PluginInstallationPermission" in enterprise_info:
plugin_installation_info = enterprise_info["PluginInstallationPermission"]

View File

@@ -261,10 +261,9 @@ class MessageService:
else:
conversation_override_model_configs = json.loads(conversation.override_model_configs)
app_model_config = AppModelConfig(
id=conversation.app_model_config_id,
app_id=app_model.id,
)
app_model_config.id = conversation.app_model_config_id
app_model_config = app_model_config.from_model_config_dict(conversation_override_model_configs)
if not app_model_config:
raise ValueError("did not find app model config")

View File

@@ -870,15 +870,16 @@ class RagPipelineDslService:
return dependencies
@classmethod
def get_leaked_dependencies(cls, tenant_id: str, dsl_dependencies: list[dict]) -> list[PluginDependency]:
def get_leaked_dependencies(
cls, tenant_id: str, dsl_dependencies: list[PluginDependency]
) -> list[PluginDependency]:
"""
Returns the leaked dependencies in current workspace
"""
dependencies = [PluginDependency.model_validate(dep) for dep in dsl_dependencies]
if not dependencies:
if not dsl_dependencies:
return []
return DependenciesAnalysisService.get_leaked_dependencies(tenant_id=tenant_id, dependencies=dependencies)
return DependenciesAnalysisService.get_leaked_dependencies(tenant_id=tenant_id, dependencies=dsl_dependencies)
def _generate_aes_key(self, tenant_id: str) -> bytes:
"""Generate AES key based on tenant_id"""

View File

@@ -44,7 +44,7 @@ class RagPipelineTransformService:
doc_form = dataset.doc_form
if not doc_form:
return self._transform_to_empty_pipeline(dataset)
retrieval_model = dataset.retrieval_model
retrieval_model = RetrievalSetting.model_validate(dataset.retrieval_model) if dataset.retrieval_model else None
pipeline_yaml = self._get_transform_yaml(doc_form, datasource_type, indexing_technique)
# deal dependencies
self._deal_dependencies(pipeline_yaml, dataset.tenant_id)
@@ -154,7 +154,12 @@ class RagPipelineTransformService:
return node
def _deal_knowledge_index(
self, dataset: Dataset, doc_form: str, indexing_technique: str | None, retrieval_model: dict, node: dict
self,
dataset: Dataset,
doc_form: str,
indexing_technique: str | None,
retrieval_model: RetrievalSetting | None,
node: dict,
):
knowledge_configuration_dict = node.get("data", {})
knowledge_configuration = KnowledgeConfiguration.model_validate(knowledge_configuration_dict)
@@ -163,10 +168,9 @@ class RagPipelineTransformService:
knowledge_configuration.embedding_model = dataset.embedding_model
knowledge_configuration.embedding_model_provider = dataset.embedding_model_provider
if retrieval_model:
retrieval_setting = RetrievalSetting.model_validate(retrieval_model)
if indexing_technique == "economy":
retrieval_setting.search_method = RetrievalMethod.KEYWORD_SEARCH
knowledge_configuration.retrieval_model = retrieval_setting
retrieval_model.search_method = RetrievalMethod.KEYWORD_SEARCH
knowledge_configuration.retrieval_model = retrieval_model
else:
dataset.retrieval_model = knowledge_configuration.retrieval_model.model_dump()

View File

@@ -0,0 +1 @@
"""Workflow run retention services."""

View File

@@ -0,0 +1,531 @@
"""
Archive Paid Plan Workflow Run Logs Service.
This service archives workflow run logs for paid plan users older than the configured
retention period (default: 90 days) to S3-compatible storage.
Archived tables:
- workflow_runs
- workflow_app_logs
- workflow_node_executions
- workflow_node_execution_offload
- workflow_pauses
- workflow_pause_reasons
- workflow_trigger_logs
"""
import datetime
import io
import json
import logging
import time
import zipfile
from collections.abc import Sequence
from concurrent.futures import ThreadPoolExecutor
from dataclasses import dataclass, field
from typing import Any
import click
from sqlalchemy import inspect
from sqlalchemy.orm import Session, sessionmaker
from configs import dify_config
from core.workflow.enums import WorkflowType
from enums.cloud_plan import CloudPlan
from extensions.ext_database import db
from libs.archive_storage import (
ArchiveStorage,
ArchiveStorageNotConfiguredError,
get_archive_storage,
)
from models.workflow import WorkflowAppLog, WorkflowRun
from repositories.api_workflow_node_execution_repository import DifyAPIWorkflowNodeExecutionRepository
from repositories.api_workflow_run_repository import APIWorkflowRunRepository
from repositories.sqlalchemy_workflow_trigger_log_repository import SQLAlchemyWorkflowTriggerLogRepository
from services.billing_service import BillingService
from services.retention.workflow_run.constants import ARCHIVE_BUNDLE_NAME, ARCHIVE_SCHEMA_VERSION
logger = logging.getLogger(__name__)
@dataclass
class TableStats:
"""Statistics for a single archived table."""
table_name: str
row_count: int
checksum: str
size_bytes: int
@dataclass
class ArchiveResult:
"""Result of archiving a single workflow run."""
run_id: str
tenant_id: str
success: bool
tables: list[TableStats] = field(default_factory=list)
error: str | None = None
elapsed_time: float = 0.0
@dataclass
class ArchiveSummary:
"""Summary of the entire archive operation."""
total_runs_processed: int = 0
runs_archived: int = 0
runs_skipped: int = 0
runs_failed: int = 0
total_elapsed_time: float = 0.0
class WorkflowRunArchiver:
"""
Archive workflow run logs for paid plan users.
Storage Layout:
{tenant_id}/app_id={app_id}/year={YYYY}/month={MM}/workflow_run_id={run_id}/
└── archive.v1.0.zip
├── manifest.json
├── workflow_runs.jsonl
├── workflow_app_logs.jsonl
├── workflow_node_executions.jsonl
├── workflow_node_execution_offload.jsonl
├── workflow_pauses.jsonl
├── workflow_pause_reasons.jsonl
└── workflow_trigger_logs.jsonl
"""
ARCHIVED_TYPE = [
WorkflowType.WORKFLOW,
WorkflowType.RAG_PIPELINE,
]
ARCHIVED_TABLES = [
"workflow_runs",
"workflow_app_logs",
"workflow_node_executions",
"workflow_node_execution_offload",
"workflow_pauses",
"workflow_pause_reasons",
"workflow_trigger_logs",
]
start_from: datetime.datetime | None
end_before: datetime.datetime
def __init__(
self,
days: int = 90,
batch_size: int = 100,
start_from: datetime.datetime | None = None,
end_before: datetime.datetime | None = None,
workers: int = 1,
tenant_ids: Sequence[str] | None = None,
limit: int | None = None,
dry_run: bool = False,
delete_after_archive: bool = False,
workflow_run_repo: APIWorkflowRunRepository | None = None,
):
"""
Initialize the archiver.
Args:
days: Archive runs older than this many days
batch_size: Number of runs to process per batch
start_from: Optional start time (inclusive) for archiving
end_before: Optional end time (exclusive) for archiving
workers: Number of concurrent workflow runs to archive
tenant_ids: Optional tenant IDs for grayscale rollout
limit: Maximum number of runs to archive (None for unlimited)
dry_run: If True, only preview without making changes
delete_after_archive: If True, delete runs and related data after archiving
"""
self.days = days
self.batch_size = batch_size
if start_from or end_before:
if start_from is None or end_before is None:
raise ValueError("start_from and end_before must be provided together")
if start_from >= end_before:
raise ValueError("start_from must be earlier than end_before")
self.start_from = start_from.replace(tzinfo=datetime.UTC)
self.end_before = end_before.replace(tzinfo=datetime.UTC)
else:
self.start_from = None
self.end_before = datetime.datetime.now(datetime.UTC) - datetime.timedelta(days=days)
if workers < 1:
raise ValueError("workers must be at least 1")
self.workers = workers
self.tenant_ids = sorted(set(tenant_ids)) if tenant_ids else []
self.limit = limit
self.dry_run = dry_run
self.delete_after_archive = delete_after_archive
self.workflow_run_repo = workflow_run_repo
def run(self) -> ArchiveSummary:
"""
Main archiving loop.
Returns:
ArchiveSummary with statistics about the operation
"""
summary = ArchiveSummary()
start_time = time.time()
click.echo(
click.style(
self._build_start_message(),
fg="white",
)
)
# Initialize archive storage (will raise if not configured)
try:
if not self.dry_run:
storage = get_archive_storage()
else:
storage = None
except ArchiveStorageNotConfiguredError as e:
click.echo(click.style(f"Archive storage not configured: {e}", fg="red"))
return summary
session_maker = sessionmaker(bind=db.engine, expire_on_commit=False)
repo = self._get_workflow_run_repo()
def _archive_with_session(run: WorkflowRun) -> ArchiveResult:
with session_maker() as session:
return self._archive_run(session, storage, run)
last_seen: tuple[datetime.datetime, str] | None = None
archived_count = 0
with ThreadPoolExecutor(max_workers=self.workers) as executor:
while True:
# Check limit
if self.limit and archived_count >= self.limit:
click.echo(click.style(f"Reached limit of {self.limit} runs", fg="yellow"))
break
# Fetch batch of runs
runs = self._get_runs_batch(last_seen)
if not runs:
break
run_ids = [run.id for run in runs]
with session_maker() as session:
archived_run_ids = repo.get_archived_run_ids(session, run_ids)
last_seen = (runs[-1].created_at, runs[-1].id)
# Filter to paid tenants only
tenant_ids = {run.tenant_id for run in runs}
paid_tenants = self._filter_paid_tenants(tenant_ids)
runs_to_process: list[WorkflowRun] = []
for run in runs:
summary.total_runs_processed += 1
# Skip non-paid tenants
if run.tenant_id not in paid_tenants:
summary.runs_skipped += 1
continue
# Skip already archived runs
if run.id in archived_run_ids:
summary.runs_skipped += 1
continue
# Check limit
if self.limit and archived_count + len(runs_to_process) >= self.limit:
break
runs_to_process.append(run)
if not runs_to_process:
continue
results = list(executor.map(_archive_with_session, runs_to_process))
for run, result in zip(runs_to_process, results):
if result.success:
summary.runs_archived += 1
archived_count += 1
click.echo(
click.style(
f"{'[DRY RUN] Would archive' if self.dry_run else 'Archived'} "
f"run {run.id} (tenant={run.tenant_id}, "
f"tables={len(result.tables)}, time={result.elapsed_time:.2f}s)",
fg="green",
)
)
else:
summary.runs_failed += 1
click.echo(
click.style(
f"Failed to archive run {run.id}: {result.error}",
fg="red",
)
)
summary.total_elapsed_time = time.time() - start_time
click.echo(
click.style(
f"{'[DRY RUN] ' if self.dry_run else ''}Archive complete: "
f"processed={summary.total_runs_processed}, archived={summary.runs_archived}, "
f"skipped={summary.runs_skipped}, failed={summary.runs_failed}, "
f"time={summary.total_elapsed_time:.2f}s",
fg="white",
)
)
return summary
def _get_runs_batch(
self,
last_seen: tuple[datetime.datetime, str] | None,
) -> Sequence[WorkflowRun]:
"""Fetch a batch of workflow runs to archive."""
repo = self._get_workflow_run_repo()
return repo.get_runs_batch_by_time_range(
start_from=self.start_from,
end_before=self.end_before,
last_seen=last_seen,
batch_size=self.batch_size,
run_types=self.ARCHIVED_TYPE,
tenant_ids=self.tenant_ids or None,
)
def _build_start_message(self) -> str:
range_desc = f"before {self.end_before.isoformat()}"
if self.start_from:
range_desc = f"between {self.start_from.isoformat()} and {self.end_before.isoformat()}"
return (
f"{'[DRY RUN] ' if self.dry_run else ''}Starting workflow run archiving "
f"for runs {range_desc} "
f"(batch_size={self.batch_size}, tenant_ids={','.join(self.tenant_ids) or 'all'})"
)
def _filter_paid_tenants(self, tenant_ids: set[str]) -> set[str]:
"""Filter tenant IDs to only include paid tenants."""
if not dify_config.BILLING_ENABLED:
# If billing is not enabled, treat all tenants as paid
return tenant_ids
if not tenant_ids:
return set()
try:
bulk_info = BillingService.get_plan_bulk_with_cache(list(tenant_ids))
except Exception:
logger.exception("Failed to fetch billing plans for tenants")
# On error, skip all tenants in this batch
return set()
# Filter to paid tenants (any plan except SANDBOX)
paid = set()
for tid, info in bulk_info.items():
if info and info.get("plan") in (CloudPlan.PROFESSIONAL, CloudPlan.TEAM):
paid.add(tid)
return paid
def _archive_run(
self,
session: Session,
storage: ArchiveStorage | None,
run: WorkflowRun,
) -> ArchiveResult:
"""Archive a single workflow run."""
start_time = time.time()
result = ArchiveResult(run_id=run.id, tenant_id=run.tenant_id, success=False)
try:
# Extract data from all tables
table_data, app_logs, trigger_metadata = self._extract_data(session, run)
if self.dry_run:
# In dry run, just report what would be archived
for table_name in self.ARCHIVED_TABLES:
records = table_data.get(table_name, [])
result.tables.append(
TableStats(
table_name=table_name,
row_count=len(records),
checksum="",
size_bytes=0,
)
)
result.success = True
else:
if storage is None:
raise ArchiveStorageNotConfiguredError("Archive storage not configured")
archive_key = self._get_archive_key(run)
# Serialize tables for the archive bundle
table_stats: list[TableStats] = []
table_payloads: dict[str, bytes] = {}
for table_name in self.ARCHIVED_TABLES:
records = table_data.get(table_name, [])
data = ArchiveStorage.serialize_to_jsonl(records)
table_payloads[table_name] = data
checksum = ArchiveStorage.compute_checksum(data)
table_stats.append(
TableStats(
table_name=table_name,
row_count=len(records),
checksum=checksum,
size_bytes=len(data),
)
)
# Generate and upload archive bundle
manifest = self._generate_manifest(run, table_stats)
manifest_data = json.dumps(manifest, indent=2, default=str).encode("utf-8")
archive_data = self._build_archive_bundle(manifest_data, table_payloads)
storage.put_object(archive_key, archive_data)
repo = self._get_workflow_run_repo()
archived_log_count = repo.create_archive_logs(session, run, app_logs, trigger_metadata)
session.commit()
deleted_counts = None
if self.delete_after_archive:
deleted_counts = repo.delete_runs_with_related(
[run],
delete_node_executions=self._delete_node_executions,
delete_trigger_logs=self._delete_trigger_logs,
)
logger.info(
"Archived workflow run %s: tables=%s, archived_logs=%s, deleted=%s",
run.id,
{s.table_name: s.row_count for s in table_stats},
archived_log_count,
deleted_counts,
)
result.tables = table_stats
result.success = True
except Exception as e:
logger.exception("Failed to archive workflow run %s", run.id)
result.error = str(e)
session.rollback()
result.elapsed_time = time.time() - start_time
return result
def _extract_data(
self,
session: Session,
run: WorkflowRun,
) -> tuple[dict[str, list[dict[str, Any]]], Sequence[WorkflowAppLog], str | None]:
table_data: dict[str, list[dict[str, Any]]] = {}
table_data["workflow_runs"] = [self._row_to_dict(run)]
repo = self._get_workflow_run_repo()
app_logs = repo.get_app_logs_by_run_id(session, run.id)
table_data["workflow_app_logs"] = [self._row_to_dict(row) for row in app_logs]
node_exec_repo = self._get_workflow_node_execution_repo(session)
node_exec_records = node_exec_repo.get_executions_by_workflow_run(
tenant_id=run.tenant_id,
app_id=run.app_id,
workflow_run_id=run.id,
)
node_exec_ids = [record.id for record in node_exec_records]
offload_records = node_exec_repo.get_offloads_by_execution_ids(session, node_exec_ids)
table_data["workflow_node_executions"] = [self._row_to_dict(row) for row in node_exec_records]
table_data["workflow_node_execution_offload"] = [self._row_to_dict(row) for row in offload_records]
repo = self._get_workflow_run_repo()
pause_records = repo.get_pause_records_by_run_id(session, run.id)
pause_ids = [pause.id for pause in pause_records]
pause_reason_records = repo.get_pause_reason_records_by_run_id(
session,
pause_ids,
)
table_data["workflow_pauses"] = [self._row_to_dict(row) for row in pause_records]
table_data["workflow_pause_reasons"] = [self._row_to_dict(row) for row in pause_reason_records]
trigger_repo = SQLAlchemyWorkflowTriggerLogRepository(session)
trigger_records = trigger_repo.list_by_run_id(run.id)
table_data["workflow_trigger_logs"] = [self._row_to_dict(row) for row in trigger_records]
trigger_metadata = trigger_records[0].trigger_metadata if trigger_records else None
return table_data, app_logs, trigger_metadata
@staticmethod
def _row_to_dict(row: Any) -> dict[str, Any]:
mapper = inspect(row).mapper
return {str(column.name): getattr(row, mapper.get_property_by_column(column).key) for column in mapper.columns}
def _get_archive_key(self, run: WorkflowRun) -> str:
"""Get the storage key for the archive bundle."""
created_at = run.created_at
prefix = (
f"{run.tenant_id}/app_id={run.app_id}/year={created_at.strftime('%Y')}/"
f"month={created_at.strftime('%m')}/workflow_run_id={run.id}"
)
return f"{prefix}/{ARCHIVE_BUNDLE_NAME}"
def _generate_manifest(
self,
run: WorkflowRun,
table_stats: list[TableStats],
) -> dict[str, Any]:
"""Generate a manifest for the archived workflow run."""
return {
"schema_version": ARCHIVE_SCHEMA_VERSION,
"workflow_run_id": run.id,
"tenant_id": run.tenant_id,
"app_id": run.app_id,
"workflow_id": run.workflow_id,
"created_at": run.created_at.isoformat(),
"archived_at": datetime.datetime.now(datetime.UTC).isoformat(),
"tables": {
stat.table_name: {
"row_count": stat.row_count,
"checksum": stat.checksum,
"size_bytes": stat.size_bytes,
}
for stat in table_stats
},
}
def _build_archive_bundle(self, manifest_data: bytes, table_payloads: dict[str, bytes]) -> bytes:
buffer = io.BytesIO()
with zipfile.ZipFile(buffer, mode="w", compression=zipfile.ZIP_DEFLATED) as archive:
archive.writestr("manifest.json", manifest_data)
for table_name in self.ARCHIVED_TABLES:
data = table_payloads.get(table_name)
if data is None:
raise ValueError(f"Missing archive payload for {table_name}")
archive.writestr(f"{table_name}.jsonl", data)
return buffer.getvalue()
def _delete_trigger_logs(self, session: Session, run_ids: Sequence[str]) -> int:
trigger_repo = SQLAlchemyWorkflowTriggerLogRepository(session)
return trigger_repo.delete_by_run_ids(run_ids)
def _delete_node_executions(self, session: Session, runs: Sequence[WorkflowRun]) -> tuple[int, int]:
run_ids = [run.id for run in runs]
return self._get_workflow_node_execution_repo(session).delete_by_runs(session, run_ids)
def _get_workflow_node_execution_repo(
self,
session: Session,
) -> DifyAPIWorkflowNodeExecutionRepository:
from repositories.factory import DifyAPIRepositoryFactory
session_maker = sessionmaker(bind=session.get_bind(), expire_on_commit=False)
return DifyAPIRepositoryFactory.create_api_workflow_node_execution_repository(session_maker)
def _get_workflow_run_repo(self) -> APIWorkflowRunRepository:
if self.workflow_run_repo is not None:
return self.workflow_run_repo
from repositories.factory import DifyAPIRepositoryFactory
session_maker = sessionmaker(bind=db.engine, expire_on_commit=False)
self.workflow_run_repo = DifyAPIRepositoryFactory.create_api_workflow_run_repository(session_maker)
return self.workflow_run_repo

View File

@@ -0,0 +1,2 @@
ARCHIVE_SCHEMA_VERSION = "1.0"
ARCHIVE_BUNDLE_NAME = f"archive.v{ARCHIVE_SCHEMA_VERSION}.zip"

View File

@@ -0,0 +1,134 @@
"""
Delete Archived Workflow Run Service.
This service deletes archived workflow run data from the database while keeping
archive logs intact.
"""
import time
from collections.abc import Sequence
from dataclasses import dataclass, field
from datetime import datetime
from sqlalchemy.orm import Session, sessionmaker
from extensions.ext_database import db
from models.workflow import WorkflowRun
from repositories.api_workflow_run_repository import APIWorkflowRunRepository
from repositories.sqlalchemy_workflow_trigger_log_repository import SQLAlchemyWorkflowTriggerLogRepository
@dataclass
class DeleteResult:
run_id: str
tenant_id: str
success: bool
deleted_counts: dict[str, int] = field(default_factory=dict)
error: str | None = None
elapsed_time: float = 0.0
class ArchivedWorkflowRunDeletion:
def __init__(self, dry_run: bool = False):
self.dry_run = dry_run
self.workflow_run_repo: APIWorkflowRunRepository | None = None
def delete_by_run_id(self, run_id: str) -> DeleteResult:
start_time = time.time()
result = DeleteResult(run_id=run_id, tenant_id="", success=False)
repo = self._get_workflow_run_repo()
session_maker = sessionmaker(bind=db.engine, expire_on_commit=False)
with session_maker() as session:
run = session.get(WorkflowRun, run_id)
if not run:
result.error = f"Workflow run {run_id} not found"
result.elapsed_time = time.time() - start_time
return result
result.tenant_id = run.tenant_id
if not repo.get_archived_run_ids(session, [run.id]):
result.error = f"Workflow run {run_id} is not archived"
result.elapsed_time = time.time() - start_time
return result
result = self._delete_run(run)
result.elapsed_time = time.time() - start_time
return result
def delete_batch(
self,
tenant_ids: list[str] | None,
start_date: datetime,
end_date: datetime,
limit: int = 100,
) -> list[DeleteResult]:
session_maker = sessionmaker(bind=db.engine, expire_on_commit=False)
results: list[DeleteResult] = []
repo = self._get_workflow_run_repo()
with session_maker() as session:
runs = list(
repo.get_archived_runs_by_time_range(
session=session,
tenant_ids=tenant_ids,
start_date=start_date,
end_date=end_date,
limit=limit,
)
)
for run in runs:
results.append(self._delete_run(run))
return results
def _delete_run(self, run: WorkflowRun) -> DeleteResult:
start_time = time.time()
result = DeleteResult(run_id=run.id, tenant_id=run.tenant_id, success=False)
if self.dry_run:
result.success = True
result.elapsed_time = time.time() - start_time
return result
repo = self._get_workflow_run_repo()
try:
deleted_counts = repo.delete_runs_with_related(
[run],
delete_node_executions=self._delete_node_executions,
delete_trigger_logs=self._delete_trigger_logs,
)
result.deleted_counts = deleted_counts
result.success = True
except Exception as e:
result.error = str(e)
result.elapsed_time = time.time() - start_time
return result
@staticmethod
def _delete_trigger_logs(session: Session, run_ids: Sequence[str]) -> int:
trigger_repo = SQLAlchemyWorkflowTriggerLogRepository(session)
return trigger_repo.delete_by_run_ids(run_ids)
@staticmethod
def _delete_node_executions(
session: Session,
runs: Sequence[WorkflowRun],
) -> tuple[int, int]:
from repositories.factory import DifyAPIRepositoryFactory
run_ids = [run.id for run in runs]
repo = DifyAPIRepositoryFactory.create_api_workflow_node_execution_repository(
session_maker=sessionmaker(bind=session.get_bind(), expire_on_commit=False)
)
return repo.delete_by_runs(session, run_ids)
def _get_workflow_run_repo(self) -> APIWorkflowRunRepository:
if self.workflow_run_repo is not None:
return self.workflow_run_repo
from repositories.factory import DifyAPIRepositoryFactory
self.workflow_run_repo = DifyAPIRepositoryFactory.create_api_workflow_run_repository(
sessionmaker(bind=db.engine, expire_on_commit=False)
)
return self.workflow_run_repo

View File

@@ -0,0 +1,481 @@
"""
Restore Archived Workflow Run Service.
This service restores archived workflow run data from S3-compatible storage
back to the database.
"""
import io
import json
import logging
import time
import zipfile
from collections.abc import Callable
from concurrent.futures import ThreadPoolExecutor
from dataclasses import dataclass
from datetime import datetime
from typing import Any, cast
import click
from sqlalchemy.dialects.postgresql import insert as pg_insert
from sqlalchemy.engine import CursorResult
from sqlalchemy.orm import DeclarativeBase, Session, sessionmaker
from extensions.ext_database import db
from libs.archive_storage import (
ArchiveStorage,
ArchiveStorageNotConfiguredError,
get_archive_storage,
)
from models.trigger import WorkflowTriggerLog
from models.workflow import (
WorkflowAppLog,
WorkflowArchiveLog,
WorkflowNodeExecutionModel,
WorkflowNodeExecutionOffload,
WorkflowPause,
WorkflowPauseReason,
WorkflowRun,
)
from repositories.api_workflow_run_repository import APIWorkflowRunRepository
from repositories.factory import DifyAPIRepositoryFactory
from services.retention.workflow_run.constants import ARCHIVE_BUNDLE_NAME
logger = logging.getLogger(__name__)
# Mapping of table names to SQLAlchemy models
TABLE_MODELS = {
"workflow_runs": WorkflowRun,
"workflow_app_logs": WorkflowAppLog,
"workflow_node_executions": WorkflowNodeExecutionModel,
"workflow_node_execution_offload": WorkflowNodeExecutionOffload,
"workflow_pauses": WorkflowPause,
"workflow_pause_reasons": WorkflowPauseReason,
"workflow_trigger_logs": WorkflowTriggerLog,
}
SchemaMapper = Callable[[dict[str, Any]], dict[str, Any]]
SCHEMA_MAPPERS: dict[str, dict[str, SchemaMapper]] = {
"1.0": {},
}
@dataclass
class RestoreResult:
"""Result of restoring a single workflow run."""
run_id: str
tenant_id: str
success: bool
restored_counts: dict[str, int]
error: str | None = None
elapsed_time: float = 0.0
class WorkflowRunRestore:
"""
Restore archived workflow run data from storage to database.
This service reads archived data from storage and restores it to the
database tables. It handles idempotency by skipping records that already
exist in the database.
"""
def __init__(self, dry_run: bool = False, workers: int = 1):
"""
Initialize the restore service.
Args:
dry_run: If True, only preview without making changes
workers: Number of concurrent workflow runs to restore
"""
self.dry_run = dry_run
if workers < 1:
raise ValueError("workers must be at least 1")
self.workers = workers
self.workflow_run_repo: APIWorkflowRunRepository | None = None
def _restore_from_run(
self,
run: WorkflowRun | WorkflowArchiveLog,
*,
session_maker: sessionmaker,
) -> RestoreResult:
start_time = time.time()
run_id = run.workflow_run_id if isinstance(run, WorkflowArchiveLog) else run.id
created_at = run.run_created_at if isinstance(run, WorkflowArchiveLog) else run.created_at
result = RestoreResult(
run_id=run_id,
tenant_id=run.tenant_id,
success=False,
restored_counts={},
)
if not self.dry_run:
click.echo(
click.style(
f"Starting restore for workflow run {run_id} (tenant={run.tenant_id})",
fg="white",
)
)
try:
storage = get_archive_storage()
except ArchiveStorageNotConfiguredError as e:
result.error = str(e)
click.echo(click.style(f"Archive storage not configured: {e}", fg="red"))
result.elapsed_time = time.time() - start_time
return result
prefix = (
f"{run.tenant_id}/app_id={run.app_id}/year={created_at.strftime('%Y')}/"
f"month={created_at.strftime('%m')}/workflow_run_id={run_id}"
)
archive_key = f"{prefix}/{ARCHIVE_BUNDLE_NAME}"
try:
archive_data = storage.get_object(archive_key)
except FileNotFoundError:
result.error = f"Archive bundle not found: {archive_key}"
click.echo(click.style(result.error, fg="red"))
result.elapsed_time = time.time() - start_time
return result
with session_maker() as session:
try:
with zipfile.ZipFile(io.BytesIO(archive_data), mode="r") as archive:
try:
manifest = self._load_manifest_from_zip(archive)
except ValueError as e:
result.error = f"Archive bundle invalid: {e}"
click.echo(click.style(result.error, fg="red"))
return result
tables = manifest.get("tables", {})
schema_version = self._get_schema_version(manifest)
for table_name, info in tables.items():
row_count = info.get("row_count", 0)
if row_count == 0:
result.restored_counts[table_name] = 0
continue
if self.dry_run:
result.restored_counts[table_name] = row_count
continue
member_path = f"{table_name}.jsonl"
try:
data = archive.read(member_path)
except KeyError:
click.echo(
click.style(
f" Warning: Table data not found in archive: {member_path}",
fg="yellow",
)
)
result.restored_counts[table_name] = 0
continue
records = ArchiveStorage.deserialize_from_jsonl(data)
restored = self._restore_table_records(
session,
table_name,
records,
schema_version=schema_version,
)
result.restored_counts[table_name] = restored
if not self.dry_run:
click.echo(
click.style(
f" Restored {restored}/{len(records)} records to {table_name}",
fg="white",
)
)
# Verify row counts match manifest
manifest_total = sum(info.get("row_count", 0) for info in tables.values())
restored_total = sum(result.restored_counts.values())
if not self.dry_run:
# Note: restored count might be less than manifest count if records already exist
logger.info(
"Restore verification: manifest_total=%d, restored_total=%d",
manifest_total,
restored_total,
)
# Delete the archive log record after successful restore
repo = self._get_workflow_run_repo()
repo.delete_archive_log_by_run_id(session, run_id)
session.commit()
result.success = True
if not self.dry_run:
click.echo(
click.style(
f"Completed restore for workflow run {run_id}: restored={result.restored_counts}",
fg="green",
)
)
except Exception as e:
logger.exception("Failed to restore workflow run %s", run_id)
result.error = str(e)
session.rollback()
click.echo(click.style(f"Restore failed: {e}", fg="red"))
result.elapsed_time = time.time() - start_time
return result
def _get_workflow_run_repo(self) -> APIWorkflowRunRepository:
if self.workflow_run_repo is not None:
return self.workflow_run_repo
self.workflow_run_repo = DifyAPIRepositoryFactory.create_api_workflow_run_repository(
sessionmaker(bind=db.engine, expire_on_commit=False)
)
return self.workflow_run_repo
@staticmethod
def _load_manifest_from_zip(archive: zipfile.ZipFile) -> dict[str, Any]:
try:
data = archive.read("manifest.json")
except KeyError as e:
raise ValueError("manifest.json missing from archive bundle") from e
return json.loads(data.decode("utf-8"))
def _restore_table_records(
self,
session: Session,
table_name: str,
records: list[dict[str, Any]],
*,
schema_version: str,
) -> int:
"""
Restore records to a table.
Uses INSERT ... ON CONFLICT DO NOTHING for idempotency.
Args:
session: Database session
table_name: Name of the table
records: List of record dictionaries
schema_version: Archived schema version from manifest
Returns:
Number of records actually inserted
"""
if not records:
return 0
model = TABLE_MODELS.get(table_name)
if not model:
logger.warning("Unknown table: %s", table_name)
return 0
column_names, required_columns, non_nullable_with_default = self._get_model_column_info(model)
unknown_fields: set[str] = set()
# Apply schema mapping, filter to current columns, then convert datetimes
converted_records = []
for record in records:
mapped = self._apply_schema_mapping(table_name, schema_version, record)
unknown_fields.update(set(mapped.keys()) - column_names)
filtered = {key: value for key, value in mapped.items() if key in column_names}
for key in non_nullable_with_default:
if key in filtered and filtered[key] is None:
filtered.pop(key)
missing_required = [key for key in required_columns if key not in filtered or filtered.get(key) is None]
if missing_required:
missing_cols = ", ".join(sorted(missing_required))
raise ValueError(
f"Missing required columns for {table_name} (schema_version={schema_version}): {missing_cols}"
)
converted = self._convert_datetime_fields(filtered, model)
converted_records.append(converted)
if unknown_fields:
logger.warning(
"Dropped unknown columns for %s (schema_version=%s): %s",
table_name,
schema_version,
", ".join(sorted(unknown_fields)),
)
# Use INSERT ... ON CONFLICT DO NOTHING for idempotency
stmt = pg_insert(model).values(converted_records)
stmt = stmt.on_conflict_do_nothing(index_elements=["id"])
result = session.execute(stmt)
return cast(CursorResult, result).rowcount or 0
def _convert_datetime_fields(
self,
record: dict[str, Any],
model: type[DeclarativeBase] | Any,
) -> dict[str, Any]:
"""Convert ISO datetime strings to datetime objects."""
from sqlalchemy import DateTime
result = dict(record)
for column in model.__table__.columns:
if isinstance(column.type, DateTime):
value = result.get(column.key)
if isinstance(value, str):
try:
result[column.key] = datetime.fromisoformat(value)
except ValueError:
pass
return result
def _get_schema_version(self, manifest: dict[str, Any]) -> str:
schema_version = manifest.get("schema_version")
if not schema_version:
logger.warning("Manifest missing schema_version; defaulting to 1.0")
schema_version = "1.0"
schema_version = str(schema_version)
if schema_version not in SCHEMA_MAPPERS:
raise ValueError(f"Unsupported schema_version {schema_version}. Add a mapping before restoring.")
return schema_version
def _apply_schema_mapping(
self,
table_name: str,
schema_version: str,
record: dict[str, Any],
) -> dict[str, Any]:
# Keep hook for forward/backward compatibility when schema evolves.
mapper = SCHEMA_MAPPERS.get(schema_version, {}).get(table_name)
if mapper is None:
return dict(record)
return mapper(record)
def _get_model_column_info(
self,
model: type[DeclarativeBase] | Any,
) -> tuple[set[str], set[str], set[str]]:
columns = list(model.__table__.columns)
column_names = {column.key for column in columns}
required_columns = {
column.key
for column in columns
if not column.nullable
and column.default is None
and column.server_default is None
and not column.autoincrement
}
non_nullable_with_default = {
column.key
for column in columns
if not column.nullable
and (column.default is not None or column.server_default is not None or column.autoincrement)
}
return column_names, required_columns, non_nullable_with_default
def restore_batch(
self,
tenant_ids: list[str] | None,
start_date: datetime,
end_date: datetime,
limit: int = 100,
) -> list[RestoreResult]:
"""
Restore multiple workflow runs by time range.
Args:
tenant_ids: Optional tenant IDs
start_date: Start date filter
end_date: End date filter
limit: Maximum number of runs to restore (default: 100)
Returns:
List of RestoreResult objects
"""
results: list[RestoreResult] = []
if tenant_ids is not None and not tenant_ids:
return results
session_maker = sessionmaker(bind=db.engine, expire_on_commit=False)
repo = self._get_workflow_run_repo()
with session_maker() as session:
archive_logs = repo.get_archived_logs_by_time_range(
session=session,
tenant_ids=tenant_ids,
start_date=start_date,
end_date=end_date,
limit=limit,
)
click.echo(
click.style(
f"Found {len(archive_logs)} archived workflow runs to restore",
fg="white",
)
)
def _restore_with_session(archive_log: WorkflowArchiveLog) -> RestoreResult:
return self._restore_from_run(
archive_log,
session_maker=session_maker,
)
with ThreadPoolExecutor(max_workers=self.workers) as executor:
results = list(executor.map(_restore_with_session, archive_logs))
total_counts: dict[str, int] = {}
for result in results:
for table_name, count in result.restored_counts.items():
total_counts[table_name] = total_counts.get(table_name, 0) + count
success_count = sum(1 for result in results if result.success)
if self.dry_run:
click.echo(
click.style(
f"[DRY RUN] Would restore {len(results)} workflow runs: totals={total_counts}",
fg="yellow",
)
)
else:
click.echo(
click.style(
f"Restored {success_count}/{len(results)} workflow runs: totals={total_counts}",
fg="green",
)
)
return results
def restore_by_run_id(
self,
run_id: str,
) -> RestoreResult:
"""
Restore a single workflow run by run ID.
"""
repo = self._get_workflow_run_repo()
archive_log = repo.get_archived_log_by_run_id(run_id)
if not archive_log:
click.echo(click.style(f"Workflow run archive {run_id} not found", fg="red"))
return RestoreResult(
run_id=run_id,
tenant_id="",
success=False,
restored_counts={},
error=f"Workflow run archive {run_id} not found",
)
session_maker = sessionmaker(bind=db.engine, expire_on_commit=False)
result = self._restore_from_run(archive_log, session_maker=session_maker)
if self.dry_run and result.success:
click.echo(
click.style(
f"[DRY RUN] Would restore workflow run {run_id}: totals={result.restored_counts}",
fg="yellow",
)
)
return result

View File

@@ -7,7 +7,7 @@ from sqlalchemy import and_, func, or_, select
from sqlalchemy.orm import Session
from core.workflow.enums import WorkflowExecutionStatus
from models import Account, App, EndUser, WorkflowAppLog, WorkflowRun
from models import Account, App, EndUser, WorkflowAppLog, WorkflowArchiveLog, WorkflowRun
from models.enums import AppTriggerType, CreatorUserRole
from models.trigger import WorkflowTriggerLog
from services.plugin.plugin_service import PluginService
@@ -173,7 +173,80 @@ class WorkflowAppService:
"data": items,
}
def handle_trigger_metadata(self, tenant_id: str, meta_val: str) -> dict[str, Any]:
def get_paginate_workflow_archive_logs(
self,
*,
session: Session,
app_model: App,
page: int = 1,
limit: int = 20,
):
"""
Get paginate workflow archive logs using SQLAlchemy 2.0 style.
"""
stmt = select(WorkflowArchiveLog).where(
WorkflowArchiveLog.tenant_id == app_model.tenant_id,
WorkflowArchiveLog.app_id == app_model.id,
WorkflowArchiveLog.log_id.isnot(None),
)
stmt = stmt.order_by(WorkflowArchiveLog.run_created_at.desc())
count_stmt = select(func.count()).select_from(stmt.subquery())
total = session.scalar(count_stmt) or 0
offset_stmt = stmt.offset((page - 1) * limit).limit(limit)
logs = list(session.scalars(offset_stmt).all())
account_ids = {log.created_by for log in logs if log.created_by_role == CreatorUserRole.ACCOUNT}
end_user_ids = {log.created_by for log in logs if log.created_by_role == CreatorUserRole.END_USER}
accounts_by_id = {}
if account_ids:
accounts_by_id = {
account.id: account
for account in session.scalars(select(Account).where(Account.id.in_(account_ids))).all()
}
end_users_by_id = {}
if end_user_ids:
end_users_by_id = {
end_user.id: end_user
for end_user in session.scalars(select(EndUser).where(EndUser.id.in_(end_user_ids))).all()
}
items = []
for log in logs:
if log.created_by_role == CreatorUserRole.ACCOUNT:
created_by_account = accounts_by_id.get(log.created_by)
created_by_end_user = None
elif log.created_by_role == CreatorUserRole.END_USER:
created_by_account = None
created_by_end_user = end_users_by_id.get(log.created_by)
else:
created_by_account = None
created_by_end_user = None
items.append(
{
"id": log.id,
"workflow_run": log.workflow_run_summary,
"trigger_metadata": self.handle_trigger_metadata(app_model.tenant_id, log.trigger_metadata),
"created_by_account": created_by_account,
"created_by_end_user": created_by_end_user,
"created_at": log.log_created_at,
}
)
return {
"page": page,
"limit": limit,
"total": total,
"has_more": total > page * limit,
"data": items,
}
def handle_trigger_metadata(self, tenant_id: str, meta_val: str | None) -> dict[str, Any]:
metadata: dict[str, Any] | None = self._safe_json_loads(meta_val)
if not metadata:
return {}

View File

@@ -17,7 +17,7 @@ logger = logging.getLogger(__name__)
RETRY_TIMES_OF_ONE_PLUGIN_IN_ONE_TENANT = 3
CACHE_REDIS_KEY_PREFIX = "plugin_autoupgrade_check_task:cached_plugin_manifests:"
CACHE_REDIS_TTL = 60 * 15 # 15 minutes
CACHE_REDIS_TTL = 60 * 60 # 1 hour
def _get_redis_cache_key(plugin_id: str) -> str:

View File

@@ -11,8 +11,10 @@ from sqlalchemy.engine import CursorResult
from sqlalchemy.exc import SQLAlchemyError
from sqlalchemy.orm import sessionmaker
from configs import dify_config
from core.db.session_factory import session_factory
from extensions.ext_database import db
from libs.archive_storage import ArchiveStorageNotConfiguredError, get_archive_storage
from models import (
ApiToken,
AppAnnotationHitHistory,
@@ -43,6 +45,7 @@ from models.workflow import (
ConversationVariable,
Workflow,
WorkflowAppLog,
WorkflowArchiveLog,
)
from repositories.factory import DifyAPIRepositoryFactory
@@ -67,6 +70,9 @@ def remove_app_and_related_data_task(self, tenant_id: str, app_id: str):
_delete_app_workflow_runs(tenant_id, app_id)
_delete_app_workflow_node_executions(tenant_id, app_id)
_delete_app_workflow_app_logs(tenant_id, app_id)
if dify_config.BILLING_ENABLED and dify_config.ARCHIVE_STORAGE_ENABLED:
_delete_app_workflow_archive_logs(tenant_id, app_id)
_delete_archived_workflow_run_files(tenant_id, app_id)
_delete_app_conversations(tenant_id, app_id)
_delete_app_messages(tenant_id, app_id)
_delete_workflow_tool_providers(tenant_id, app_id)
@@ -252,6 +258,45 @@ def _delete_app_workflow_app_logs(tenant_id: str, app_id: str):
)
def _delete_app_workflow_archive_logs(tenant_id: str, app_id: str):
def del_workflow_archive_log(workflow_archive_log_id: str):
db.session.query(WorkflowArchiveLog).where(WorkflowArchiveLog.id == workflow_archive_log_id).delete(
synchronize_session=False
)
_delete_records(
"""select id from workflow_archive_logs where tenant_id=:tenant_id and app_id=:app_id limit 1000""",
{"tenant_id": tenant_id, "app_id": app_id},
del_workflow_archive_log,
"workflow archive log",
)
def _delete_archived_workflow_run_files(tenant_id: str, app_id: str):
prefix = f"{tenant_id}/app_id={app_id}/"
try:
archive_storage = get_archive_storage()
except ArchiveStorageNotConfiguredError as e:
logger.info("Archive storage not configured, skipping archive file cleanup: %s", e)
return
try:
keys = archive_storage.list_objects(prefix)
except Exception:
logger.exception("Failed to list archive files for app %s", app_id)
return
deleted = 0
for key in keys:
try:
archive_storage.delete_object(key)
deleted += 1
except Exception:
logger.exception("Failed to delete archive object %s", key)
logger.info("Deleted %s archive objects for app %s", deleted, app_id)
def _delete_app_conversations(tenant_id: str, app_id: str):
def del_conversation(session, conversation_id: str):
session.query(PinnedConversation).where(PinnedConversation.conversation_id == conversation_id).delete(

View File

@@ -172,7 +172,6 @@ class TestAgentService:
# Create app model config
app_model_config = AppModelConfig(
id=fake.uuid4(),
app_id=app.id,
provider="openai",
model_id="gpt-3.5-turbo",
@@ -180,6 +179,7 @@ class TestAgentService:
model="gpt-3.5-turbo",
agent_mode=json.dumps({"enabled": True, "strategy": "react", "tools": []}),
)
app_model_config.id = fake.uuid4()
db.session.add(app_model_config)
db.session.commit()
@@ -413,7 +413,6 @@ class TestAgentService:
# Create app model config
app_model_config = AppModelConfig(
id=fake.uuid4(),
app_id=app.id,
provider="openai",
model_id="gpt-3.5-turbo",
@@ -421,6 +420,7 @@ class TestAgentService:
model="gpt-3.5-turbo",
agent_mode=json.dumps({"enabled": True, "strategy": "react", "tools": []}),
)
app_model_config.id = fake.uuid4()
db.session.add(app_model_config)
db.session.commit()
@@ -485,7 +485,6 @@ class TestAgentService:
# Create app model config
app_model_config = AppModelConfig(
id=fake.uuid4(),
app_id=app.id,
provider="openai",
model_id="gpt-3.5-turbo",
@@ -493,6 +492,7 @@ class TestAgentService:
model="gpt-3.5-turbo",
agent_mode=json.dumps({"enabled": True, "strategy": "react", "tools": []}),
)
app_model_config.id = fake.uuid4()
db.session.add(app_model_config)
db.session.commit()

View File

@@ -226,26 +226,27 @@ class TestAppDslService:
app, account = self._create_test_app_and_account(db_session_with_containers, mock_external_service_dependencies)
# Create model config for the app
model_config = AppModelConfig()
model_config.id = fake.uuid4()
model_config.app_id = app.id
model_config.provider = "openai"
model_config.model_id = "gpt-3.5-turbo"
model_config.model = json.dumps(
{
"provider": "openai",
"name": "gpt-3.5-turbo",
"mode": "chat",
"completion_params": {
"max_tokens": 1000,
"temperature": 0.7,
},
}
model_config = AppModelConfig(
app_id=app.id,
provider="openai",
model_id="gpt-3.5-turbo",
model=json.dumps(
{
"provider": "openai",
"name": "gpt-3.5-turbo",
"mode": "chat",
"completion_params": {
"max_tokens": 1000,
"temperature": 0.7,
},
}
),
pre_prompt="You are a helpful assistant.",
prompt_type="simple",
created_by=account.id,
updated_by=account.id,
)
model_config.pre_prompt = "You are a helpful assistant."
model_config.prompt_type = "simple"
model_config.created_by = account.id
model_config.updated_by = account.id
model_config.id = fake.uuid4()
# Set the app_model_config_id to link the config
app.app_model_config_id = model_config.id

View File

@@ -4,7 +4,13 @@ import pytest
from faker import Faker
from enums.cloud_plan import CloudPlan
from services.feature_service import FeatureModel, FeatureService, KnowledgeRateLimitModel, SystemFeatureModel
from services.feature_service import (
FeatureModel,
FeatureService,
KnowledgeRateLimitModel,
LicenseStatus,
SystemFeatureModel,
)
class TestFeatureService:
@@ -274,7 +280,7 @@ class TestFeatureService:
mock_config.PLUGIN_MAX_PACKAGE_SIZE = 100
# Act: Execute the method under test
result = FeatureService.get_system_features()
result = FeatureService.get_system_features(is_authenticated=True)
# Assert: Verify the expected outcomes
assert result is not None
@@ -324,6 +330,61 @@ class TestFeatureService:
# Verify mock interactions
mock_external_service_dependencies["enterprise_service"].get_info.assert_called_once()
def test_get_system_features_unauthenticated(self, db_session_with_containers, mock_external_service_dependencies):
"""
Test system features retrieval for an unauthenticated user.
This test verifies that:
- The response payload is minimized (e.g., verbose license details are excluded).
- Essential UI configuration (Branding, SSO, Marketplace) remains available.
- The response structure adheres to the public schema for unauthenticated clients.
"""
# Arrange: Setup test data with exact same config as success test
with patch("services.feature_service.dify_config") as mock_config:
mock_config.ENTERPRISE_ENABLED = True
mock_config.MARKETPLACE_ENABLED = True
mock_config.ENABLE_EMAIL_CODE_LOGIN = True
mock_config.ENABLE_EMAIL_PASSWORD_LOGIN = True
mock_config.ENABLE_SOCIAL_OAUTH_LOGIN = False
mock_config.ALLOW_REGISTER = False
mock_config.ALLOW_CREATE_WORKSPACE = False
mock_config.MAIL_TYPE = "smtp"
mock_config.PLUGIN_MAX_PACKAGE_SIZE = 100
# Act: Execute with is_authenticated=False
result = FeatureService.get_system_features(is_authenticated=False)
# Assert: Basic structure
assert result is not None
assert isinstance(result, SystemFeatureModel)
# --- 1. Verify Response Payload Optimization (Data Minimization) ---
# Ensure only essential UI flags are returned to unauthenticated clients
# to keep the payload lightweight and adhere to architectural boundaries.
assert result.license.status == LicenseStatus.NONE
assert result.license.expired_at == ""
assert result.license.workspaces.enabled is False
assert result.license.workspaces.limit == 0
assert result.license.workspaces.size == 0
# --- 2. Verify Public UI Configuration Availability ---
# Ensure that data required for frontend rendering remains accessible.
# Branding should match the mock data
assert result.branding.enabled is True
assert result.branding.application_title == "Test Enterprise"
assert result.branding.login_page_logo == "https://example.com/logo.png"
# SSO settings should be visible for login page rendering
assert result.sso_enforced_for_signin is True
assert result.sso_enforced_for_signin_protocol == "saml"
# General auth settings should be visible
assert result.enable_email_code_login is True
# Marketplace should be visible
assert result.enable_marketplace is True
def test_get_system_features_basic_config(self, db_session_with_containers, mock_external_service_dependencies):
"""
Test system features retrieval with basic configuration (no enterprise).
@@ -1031,7 +1092,7 @@ class TestFeatureService:
}
# Act: Execute the method under test
result = FeatureService.get_system_features()
result = FeatureService.get_system_features(is_authenticated=True)
# Assert: Verify the expected outcomes
assert result is not None
@@ -1400,7 +1461,7 @@ class TestFeatureService:
}
# Act: Execute the method under test
result = FeatureService.get_system_features()
result = FeatureService.get_system_features(is_authenticated=True)
# Assert: Verify the expected outcomes
assert result is not None

View File

@@ -925,24 +925,24 @@ class TestWorkflowService:
# Create app model config (required for conversion)
from models.model import AppModelConfig
app_model_config = AppModelConfig()
app_model_config.id = fake.uuid4()
app_model_config.app_id = app.id
app_model_config.tenant_id = app.tenant_id
app_model_config.provider = "openai"
app_model_config.model_id = "gpt-3.5-turbo"
# Set the model field directly - this is what model_dict property returns
app_model_config.model = json.dumps(
{
"provider": "openai",
"name": "gpt-3.5-turbo",
"completion_params": {"max_tokens": 1000, "temperature": 0.7},
}
app_model_config = AppModelConfig(
app_id=app.id,
provider="openai",
model_id="gpt-3.5-turbo",
# Set the model field directly - this is what model_dict property returns
model=json.dumps(
{
"provider": "openai",
"name": "gpt-3.5-turbo",
"completion_params": {"max_tokens": 1000, "temperature": 0.7},
}
),
# Set pre_prompt for PromptTemplateConfigManager
pre_prompt="You are a helpful assistant.",
created_by=account.id,
updated_by=account.id,
)
# Set pre_prompt for PromptTemplateConfigManager
app_model_config.pre_prompt = "You are a helpful assistant."
app_model_config.created_by = account.id
app_model_config.updated_by = account.id
app_model_config.id = fake.uuid4()
from extensions.ext_database import db
@@ -987,24 +987,24 @@ class TestWorkflowService:
# Create app model config (required for conversion)
from models.model import AppModelConfig
app_model_config = AppModelConfig()
app_model_config.id = fake.uuid4()
app_model_config.app_id = app.id
app_model_config.tenant_id = app.tenant_id
app_model_config.provider = "openai"
app_model_config.model_id = "gpt-3.5-turbo"
# Set the model field directly - this is what model_dict property returns
app_model_config.model = json.dumps(
{
"provider": "openai",
"name": "gpt-3.5-turbo",
"completion_params": {"max_tokens": 1000, "temperature": 0.7},
}
app_model_config = AppModelConfig(
app_id=app.id,
provider="openai",
model_id="gpt-3.5-turbo",
# Set the model field directly - this is what model_dict property returns
model=json.dumps(
{
"provider": "openai",
"name": "gpt-3.5-turbo",
"completion_params": {"max_tokens": 1000, "temperature": 0.7},
}
),
# Set pre_prompt for PromptTemplateConfigManager
pre_prompt="Complete the following text:",
created_by=account.id,
updated_by=account.id,
)
# Set pre_prompt for PromptTemplateConfigManager
app_model_config.pre_prompt = "Complete the following text:"
app_model_config.created_by = account.id
app_model_config.updated_by = account.id
app_model_config.id = fake.uuid4()
from extensions.ext_database import db

View File

@@ -0,0 +1,27 @@
import builtins
import pytest
from flask import Flask
from flask.views import MethodView
from extensions import ext_fastopenapi
if not hasattr(builtins, "MethodView"):
builtins.MethodView = MethodView # type: ignore[attr-defined]
@pytest.fixture
def app() -> Flask:
app = Flask(__name__)
app.config["TESTING"] = True
return app
def test_console_ping_fastopenapi_returns_pong(app: Flask):
ext_fastopenapi.init_app(app)
client = app.test_client()
response = client.get("/console/api/ping")
assert response.status_code == 200
assert response.get_json() == {"result": "pong"}

View File

@@ -0,0 +1,56 @@
import builtins
from unittest.mock import patch
import pytest
from flask import Flask
from flask.views import MethodView
from extensions import ext_fastopenapi
if not hasattr(builtins, "MethodView"):
builtins.MethodView = MethodView # type: ignore[attr-defined]
@pytest.fixture
def app() -> Flask:
app = Flask(__name__)
app.config["TESTING"] = True
return app
def test_console_setup_fastopenapi_get_not_started(app: Flask):
ext_fastopenapi.init_app(app)
with (
patch("controllers.console.setup.dify_config.EDITION", "SELF_HOSTED"),
patch("controllers.console.setup.get_setup_status", return_value=None),
):
client = app.test_client()
response = client.get("/console/api/setup")
assert response.status_code == 200
assert response.get_json() == {"step": "not_started", "setup_at": None}
def test_console_setup_fastopenapi_post_success(app: Flask):
ext_fastopenapi.init_app(app)
payload = {
"email": "admin@example.com",
"name": "Admin",
"password": "Passw0rd1",
"language": "en-US",
}
with (
patch("controllers.console.wraps.dify_config.EDITION", "SELF_HOSTED"),
patch("controllers.console.setup.get_setup_status", return_value=None),
patch("controllers.console.setup.TenantService.get_tenant_count", return_value=0),
patch("controllers.console.setup.get_init_validate_status", return_value=True),
patch("controllers.console.setup.RegisterService.setup"),
):
client = app.test_client()
response = client.post("/console/api/setup", json=payload)
assert response.status_code == 201
assert response.get_json() == {"result": "success"}

View File

@@ -0,0 +1,35 @@
import builtins
from unittest.mock import patch
import pytest
from flask import Flask
from flask.views import MethodView
from configs import dify_config
from extensions import ext_fastopenapi
if not hasattr(builtins, "MethodView"):
builtins.MethodView = MethodView # type: ignore[attr-defined]
@pytest.fixture
def app() -> Flask:
app = Flask(__name__)
app.config["TESTING"] = True
return app
def test_console_version_fastopenapi_returns_current_version(app: Flask):
ext_fastopenapi.init_app(app)
with patch("controllers.console.version.dify_config.CHECK_UPDATE_URL", None):
client = app.test_client()
response = client.get("/console/api/version", query_string={"current_version": "0.0.0"})
assert response.status_code == 200
data = response.get_json()
assert data["version"] == dify_config.project.version
assert data["release_date"] == ""
assert data["release_notes"] == ""
assert data["can_auto_update"] is False
assert "features" in data

View File

@@ -1,39 +0,0 @@
from types import SimpleNamespace
from unittest.mock import patch
from controllers.console.setup import SetupApi
class TestSetupApi:
def test_post_lowercases_email_before_register(self):
"""Ensure setup registration normalizes email casing."""
payload = {
"email": "Admin@Example.com",
"name": "Admin User",
"password": "ValidPass123!",
"language": "en-US",
}
setup_api = SetupApi(api=None)
mock_console_ns = SimpleNamespace(payload=payload)
with (
patch("controllers.console.setup.console_ns", mock_console_ns),
patch("controllers.console.setup.get_setup_status", return_value=False),
patch("controllers.console.setup.TenantService.get_tenant_count", return_value=0),
patch("controllers.console.setup.get_init_validate_status", return_value=True),
patch("controllers.console.setup.extract_remote_ip", return_value="127.0.0.1"),
patch("controllers.console.setup.request", object()),
patch("controllers.console.setup.RegisterService.setup") as mock_register,
):
response, status = setup_api.post()
assert response == {"result": "success"}
assert status == 201
mock_register.assert_called_once_with(
email="admin@example.com",
name=payload["name"],
password=payload["password"],
ip_address="127.0.0.1",
language=payload["language"],
)

View File

@@ -0,0 +1,454 @@
"""Test multimodal image output handling in BaseAppRunner."""
from unittest.mock import MagicMock, patch
from uuid import uuid4
import pytest
from core.app.apps.base_app_queue_manager import PublishFrom
from core.app.apps.base_app_runner import AppRunner
from core.app.entities.app_invoke_entities import InvokeFrom
from core.app.entities.queue_entities import QueueMessageFileEvent
from core.file.enums import FileTransferMethod, FileType
from core.model_runtime.entities.message_entities import ImagePromptMessageContent
from models.enums import CreatorUserRole
class TestBaseAppRunnerMultimodal:
"""Test that BaseAppRunner correctly handles multimodal image content."""
@pytest.fixture
def mock_user_id(self):
"""Mock user ID."""
return str(uuid4())
@pytest.fixture
def mock_tenant_id(self):
"""Mock tenant ID."""
return str(uuid4())
@pytest.fixture
def mock_message_id(self):
"""Mock message ID."""
return str(uuid4())
@pytest.fixture
def mock_queue_manager(self):
"""Create a mock queue manager."""
manager = MagicMock()
manager.invoke_from = InvokeFrom.SERVICE_API
return manager
@pytest.fixture
def mock_tool_file(self):
"""Create a mock tool file."""
tool_file = MagicMock()
tool_file.id = str(uuid4())
return tool_file
@pytest.fixture
def mock_message_file(self):
"""Create a mock message file."""
message_file = MagicMock()
message_file.id = str(uuid4())
return message_file
def test_handle_multimodal_image_content_with_url(
self,
mock_user_id,
mock_tenant_id,
mock_message_id,
mock_queue_manager,
mock_tool_file,
mock_message_file,
):
"""Test handling image from URL."""
# Arrange
image_url = "http://example.com/image.png"
content = ImagePromptMessageContent(
url=image_url,
format="png",
mime_type="image/png",
)
with patch("core.app.apps.base_app_runner.ToolFileManager") as mock_mgr_class:
# Setup mock tool file manager
mock_mgr = MagicMock()
mock_mgr.create_file_by_url.return_value = mock_tool_file
mock_mgr_class.return_value = mock_mgr
with patch("core.app.apps.base_app_runner.MessageFile") as mock_msg_file_class:
# Setup mock message file
mock_msg_file_class.return_value = mock_message_file
with patch("core.app.apps.base_app_runner.db.session") as mock_session:
mock_session.add = MagicMock()
mock_session.commit = MagicMock()
mock_session.refresh = MagicMock()
# Act
# Create a mock runner with the method bound
runner = MagicMock()
method = AppRunner._handle_multimodal_image_content
runner._handle_multimodal_image_content = lambda *args, **kwargs: method(runner, *args, **kwargs)
runner._handle_multimodal_image_content(
content=content,
message_id=mock_message_id,
user_id=mock_user_id,
tenant_id=mock_tenant_id,
queue_manager=mock_queue_manager,
)
# Assert
# Verify tool file was created from URL
mock_mgr.create_file_by_url.assert_called_once_with(
user_id=mock_user_id,
tenant_id=mock_tenant_id,
file_url=image_url,
conversation_id=None,
)
# Verify message file was created with correct parameters
mock_msg_file_class.assert_called_once()
call_kwargs = mock_msg_file_class.call_args[1]
assert call_kwargs["message_id"] == mock_message_id
assert call_kwargs["type"] == FileType.IMAGE
assert call_kwargs["transfer_method"] == FileTransferMethod.TOOL_FILE
assert call_kwargs["belongs_to"] == "assistant"
assert call_kwargs["created_by"] == mock_user_id
# Verify database operations
mock_session.add.assert_called_once_with(mock_message_file)
mock_session.commit.assert_called_once()
mock_session.refresh.assert_called_once_with(mock_message_file)
# Verify event was published
mock_queue_manager.publish.assert_called_once()
publish_call = mock_queue_manager.publish.call_args
assert isinstance(publish_call[0][0], QueueMessageFileEvent)
assert publish_call[0][0].message_file_id == mock_message_file.id
# publish_from might be passed as positional or keyword argument
assert (
publish_call[0][1] == PublishFrom.APPLICATION_MANAGER
or publish_call.kwargs.get("publish_from") == PublishFrom.APPLICATION_MANAGER
)
def test_handle_multimodal_image_content_with_base64(
self,
mock_user_id,
mock_tenant_id,
mock_message_id,
mock_queue_manager,
mock_tool_file,
mock_message_file,
):
"""Test handling image from base64 data."""
# Arrange
import base64
# Create a small test image (1x1 PNG)
test_image_data = base64.b64encode(
b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00\x90wS\xde"
).decode()
content = ImagePromptMessageContent(
base64_data=test_image_data,
format="png",
mime_type="image/png",
)
with patch("core.app.apps.base_app_runner.ToolFileManager") as mock_mgr_class:
# Setup mock tool file manager
mock_mgr = MagicMock()
mock_mgr.create_file_by_raw.return_value = mock_tool_file
mock_mgr_class.return_value = mock_mgr
with patch("core.app.apps.base_app_runner.MessageFile") as mock_msg_file_class:
# Setup mock message file
mock_msg_file_class.return_value = mock_message_file
with patch("core.app.apps.base_app_runner.db.session") as mock_session:
mock_session.add = MagicMock()
mock_session.commit = MagicMock()
mock_session.refresh = MagicMock()
# Act
# Create a mock runner with the method bound
runner = MagicMock()
method = AppRunner._handle_multimodal_image_content
runner._handle_multimodal_image_content = lambda *args, **kwargs: method(runner, *args, **kwargs)
runner._handle_multimodal_image_content(
content=content,
message_id=mock_message_id,
user_id=mock_user_id,
tenant_id=mock_tenant_id,
queue_manager=mock_queue_manager,
)
# Assert
# Verify tool file was created from base64
mock_mgr.create_file_by_raw.assert_called_once()
call_kwargs = mock_mgr.create_file_by_raw.call_args[1]
assert call_kwargs["user_id"] == mock_user_id
assert call_kwargs["tenant_id"] == mock_tenant_id
assert call_kwargs["conversation_id"] is None
assert "file_binary" in call_kwargs
assert call_kwargs["mimetype"] == "image/png"
assert call_kwargs["filename"].startswith("generated_image")
assert call_kwargs["filename"].endswith(".png")
# Verify message file was created
mock_msg_file_class.assert_called_once()
# Verify database operations
mock_session.add.assert_called_once()
mock_session.commit.assert_called_once()
mock_session.refresh.assert_called_once()
# Verify event was published
mock_queue_manager.publish.assert_called_once()
def test_handle_multimodal_image_content_with_base64_data_uri(
self,
mock_user_id,
mock_tenant_id,
mock_message_id,
mock_queue_manager,
mock_tool_file,
mock_message_file,
):
"""Test handling image from base64 data with URI prefix."""
# Arrange
# Data URI format: data:image/png;base64,<base64_data>
test_image_data = (
"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
)
content = ImagePromptMessageContent(
base64_data=f"data:image/png;base64,{test_image_data}",
format="png",
mime_type="image/png",
)
with patch("core.app.apps.base_app_runner.ToolFileManager") as mock_mgr_class:
# Setup mock tool file manager
mock_mgr = MagicMock()
mock_mgr.create_file_by_raw.return_value = mock_tool_file
mock_mgr_class.return_value = mock_mgr
with patch("core.app.apps.base_app_runner.MessageFile") as mock_msg_file_class:
# Setup mock message file
mock_msg_file_class.return_value = mock_message_file
with patch("core.app.apps.base_app_runner.db.session") as mock_session:
mock_session.add = MagicMock()
mock_session.commit = MagicMock()
mock_session.refresh = MagicMock()
# Act
# Create a mock runner with the method bound
runner = MagicMock()
method = AppRunner._handle_multimodal_image_content
runner._handle_multimodal_image_content = lambda *args, **kwargs: method(runner, *args, **kwargs)
runner._handle_multimodal_image_content(
content=content,
message_id=mock_message_id,
user_id=mock_user_id,
tenant_id=mock_tenant_id,
queue_manager=mock_queue_manager,
)
# Assert - verify that base64 data was extracted correctly (without prefix)
mock_mgr.create_file_by_raw.assert_called_once()
call_kwargs = mock_mgr.create_file_by_raw.call_args[1]
# The base64 data should be decoded, so we check the binary was passed
assert "file_binary" in call_kwargs
def test_handle_multimodal_image_content_without_url_or_base64(
self,
mock_user_id,
mock_tenant_id,
mock_message_id,
mock_queue_manager,
):
"""Test handling image content without URL or base64 data."""
# Arrange
content = ImagePromptMessageContent(
url="",
base64_data="",
format="png",
mime_type="image/png",
)
with patch("core.app.apps.base_app_runner.ToolFileManager") as mock_mgr_class:
with patch("core.app.apps.base_app_runner.MessageFile") as mock_msg_file_class:
with patch("core.app.apps.base_app_runner.db.session") as mock_session:
# Act
# Create a mock runner with the method bound
runner = MagicMock()
method = AppRunner._handle_multimodal_image_content
runner._handle_multimodal_image_content = lambda *args, **kwargs: method(runner, *args, **kwargs)
runner._handle_multimodal_image_content(
content=content,
message_id=mock_message_id,
user_id=mock_user_id,
tenant_id=mock_tenant_id,
queue_manager=mock_queue_manager,
)
# Assert - should not create any files or publish events
mock_mgr_class.assert_not_called()
mock_msg_file_class.assert_not_called()
mock_session.add.assert_not_called()
mock_queue_manager.publish.assert_not_called()
def test_handle_multimodal_image_content_with_error(
self,
mock_user_id,
mock_tenant_id,
mock_message_id,
mock_queue_manager,
):
"""Test handling image content when an error occurs."""
# Arrange
image_url = "http://example.com/image.png"
content = ImagePromptMessageContent(
url=image_url,
format="png",
mime_type="image/png",
)
with patch("core.app.apps.base_app_runner.ToolFileManager") as mock_mgr_class:
# Setup mock to raise exception
mock_mgr = MagicMock()
mock_mgr.create_file_by_url.side_effect = Exception("Network error")
mock_mgr_class.return_value = mock_mgr
with patch("core.app.apps.base_app_runner.MessageFile") as mock_msg_file_class:
with patch("core.app.apps.base_app_runner.db.session") as mock_session:
# Act
# Create a mock runner with the method bound
runner = MagicMock()
method = AppRunner._handle_multimodal_image_content
runner._handle_multimodal_image_content = lambda *args, **kwargs: method(runner, *args, **kwargs)
# Should not raise exception, just log it
runner._handle_multimodal_image_content(
content=content,
message_id=mock_message_id,
user_id=mock_user_id,
tenant_id=mock_tenant_id,
queue_manager=mock_queue_manager,
)
# Assert - should not create message file or publish event on error
mock_msg_file_class.assert_not_called()
mock_session.add.assert_not_called()
mock_queue_manager.publish.assert_not_called()
def test_handle_multimodal_image_content_debugger_mode(
self,
mock_user_id,
mock_tenant_id,
mock_message_id,
mock_queue_manager,
mock_tool_file,
mock_message_file,
):
"""Test that debugger mode sets correct created_by_role."""
# Arrange
image_url = "http://example.com/image.png"
content = ImagePromptMessageContent(
url=image_url,
format="png",
mime_type="image/png",
)
mock_queue_manager.invoke_from = InvokeFrom.DEBUGGER
with patch("core.app.apps.base_app_runner.ToolFileManager") as mock_mgr_class:
# Setup mock tool file manager
mock_mgr = MagicMock()
mock_mgr.create_file_by_url.return_value = mock_tool_file
mock_mgr_class.return_value = mock_mgr
with patch("core.app.apps.base_app_runner.MessageFile") as mock_msg_file_class:
# Setup mock message file
mock_msg_file_class.return_value = mock_message_file
with patch("core.app.apps.base_app_runner.db.session") as mock_session:
mock_session.add = MagicMock()
mock_session.commit = MagicMock()
mock_session.refresh = MagicMock()
# Act
# Create a mock runner with the method bound
runner = MagicMock()
method = AppRunner._handle_multimodal_image_content
runner._handle_multimodal_image_content = lambda *args, **kwargs: method(runner, *args, **kwargs)
runner._handle_multimodal_image_content(
content=content,
message_id=mock_message_id,
user_id=mock_user_id,
tenant_id=mock_tenant_id,
queue_manager=mock_queue_manager,
)
# Assert - verify created_by_role is ACCOUNT for debugger mode
call_kwargs = mock_msg_file_class.call_args[1]
assert call_kwargs["created_by_role"] == CreatorUserRole.ACCOUNT
def test_handle_multimodal_image_content_service_api_mode(
self,
mock_user_id,
mock_tenant_id,
mock_message_id,
mock_queue_manager,
mock_tool_file,
mock_message_file,
):
"""Test that service API mode sets correct created_by_role."""
# Arrange
image_url = "http://example.com/image.png"
content = ImagePromptMessageContent(
url=image_url,
format="png",
mime_type="image/png",
)
mock_queue_manager.invoke_from = InvokeFrom.SERVICE_API
with patch("core.app.apps.base_app_runner.ToolFileManager") as mock_mgr_class:
# Setup mock tool file manager
mock_mgr = MagicMock()
mock_mgr.create_file_by_url.return_value = mock_tool_file
mock_mgr_class.return_value = mock_mgr
with patch("core.app.apps.base_app_runner.MessageFile") as mock_msg_file_class:
# Setup mock message file
mock_msg_file_class.return_value = mock_message_file
with patch("core.app.apps.base_app_runner.db.session") as mock_session:
mock_session.add = MagicMock()
mock_session.commit = MagicMock()
mock_session.refresh = MagicMock()
# Act
# Create a mock runner with the method bound
runner = MagicMock()
method = AppRunner._handle_multimodal_image_content
runner._handle_multimodal_image_content = lambda *args, **kwargs: method(runner, *args, **kwargs)
runner._handle_multimodal_image_content(
content=content,
message_id=mock_message_id,
user_id=mock_user_id,
tenant_id=mock_tenant_id,
queue_manager=mock_queue_manager,
)
# Assert - verify created_by_role is END_USER for service API
call_kwargs = mock_msg_file_class.call_args[1]
assert call_kwargs["created_by_role"] == CreatorUserRole.END_USER

View File

@@ -1,7 +1,6 @@
"""Unit tests for the message cycle manager optimization."""
from types import SimpleNamespace
from unittest.mock import ANY, Mock, patch
from unittest.mock import Mock, patch
import pytest
from flask import current_app
@@ -28,17 +27,14 @@ class TestMessageCycleManagerOptimization:
def test_get_message_event_type_with_message_file(self, message_cycle_manager):
"""Test get_message_event_type returns MESSAGE_FILE when message has files."""
with (
patch("core.app.task_pipeline.message_cycle_manager.Session") as mock_session_class,
patch("core.app.task_pipeline.message_cycle_manager.db", new=SimpleNamespace(engine=Mock())),
):
with patch("core.app.task_pipeline.message_cycle_manager.session_factory") as mock_session_factory:
# Setup mock session and message file
mock_session = Mock()
mock_session_class.return_value.__enter__.return_value = mock_session
mock_session_factory.create_session.return_value.__enter__.return_value = mock_session
mock_message_file = Mock()
# Current implementation uses session.query(...).scalar()
mock_session.query.return_value.scalar.return_value = mock_message_file
# Current implementation uses session.scalar(select(...))
mock_session.scalar.return_value = mock_message_file
# Execute
with current_app.app_context():
@@ -46,19 +42,16 @@ class TestMessageCycleManagerOptimization:
# Assert
assert result == StreamEvent.MESSAGE_FILE
mock_session.query.return_value.scalar.assert_called_once()
mock_session.scalar.assert_called_once()
def test_get_message_event_type_without_message_file(self, message_cycle_manager):
"""Test get_message_event_type returns MESSAGE when message has no files."""
with (
patch("core.app.task_pipeline.message_cycle_manager.Session") as mock_session_class,
patch("core.app.task_pipeline.message_cycle_manager.db", new=SimpleNamespace(engine=Mock())),
):
with patch("core.app.task_pipeline.message_cycle_manager.session_factory") as mock_session_factory:
# Setup mock session and no message file
mock_session = Mock()
mock_session_class.return_value.__enter__.return_value = mock_session
# Current implementation uses session.query(...).scalar()
mock_session.query.return_value.scalar.return_value = None
mock_session_factory.create_session.return_value.__enter__.return_value = mock_session
# Current implementation uses session.scalar(select(...))
mock_session.scalar.return_value = None
# Execute
with current_app.app_context():
@@ -66,21 +59,18 @@ class TestMessageCycleManagerOptimization:
# Assert
assert result == StreamEvent.MESSAGE
mock_session.query.return_value.scalar.assert_called_once()
mock_session.scalar.assert_called_once()
def test_message_to_stream_response_with_precomputed_event_type(self, message_cycle_manager):
"""MessageCycleManager.message_to_stream_response expects a valid event_type; callers should precompute it."""
with (
patch("core.app.task_pipeline.message_cycle_manager.Session") as mock_session_class,
patch("core.app.task_pipeline.message_cycle_manager.db", new=SimpleNamespace(engine=Mock())),
):
with patch("core.app.task_pipeline.message_cycle_manager.session_factory") as mock_session_factory:
# Setup mock session and message file
mock_session = Mock()
mock_session_class.return_value.__enter__.return_value = mock_session
mock_session_factory.create_session.return_value.__enter__.return_value = mock_session
mock_message_file = Mock()
# Current implementation uses session.query(...).scalar()
mock_session.query.return_value.scalar.return_value = mock_message_file
# Current implementation uses session.scalar(select(...))
mock_session.scalar.return_value = mock_message_file
# Execute: compute event type once, then pass to message_to_stream_response
with current_app.app_context():
@@ -94,11 +84,11 @@ class TestMessageCycleManagerOptimization:
assert result.answer == "Hello world"
assert result.id == "test-message-id"
assert result.event == StreamEvent.MESSAGE_FILE
mock_session.query.return_value.scalar.assert_called_once()
mock_session.scalar.assert_called_once()
def test_message_to_stream_response_with_event_type_skips_query(self, message_cycle_manager):
"""Test that message_to_stream_response skips database query when event_type is provided."""
with patch("core.app.task_pipeline.message_cycle_manager.Session") as mock_session_class:
with patch("core.app.task_pipeline.message_cycle_manager.session_factory") as mock_session_factory:
# Execute with event_type provided
result = message_cycle_manager.message_to_stream_response(
answer="Hello world", message_id="test-message-id", event_type=StreamEvent.MESSAGE
@@ -109,8 +99,8 @@ class TestMessageCycleManagerOptimization:
assert result.answer == "Hello world"
assert result.id == "test-message-id"
assert result.event == StreamEvent.MESSAGE
# Should not query database when event_type is provided
mock_session_class.assert_not_called()
# Should not open a session when event_type is provided
mock_session_factory.create_session.assert_not_called()
def test_message_to_stream_response_with_from_variable_selector(self, message_cycle_manager):
"""Test message_to_stream_response with from_variable_selector parameter."""
@@ -130,24 +120,21 @@ class TestMessageCycleManagerOptimization:
def test_optimization_usage_example(self, message_cycle_manager):
"""Test the optimization pattern that should be used by callers."""
# Step 1: Get event type once (this queries database)
with (
patch("core.app.task_pipeline.message_cycle_manager.Session") as mock_session_class,
patch("core.app.task_pipeline.message_cycle_manager.db", new=SimpleNamespace(engine=Mock())),
):
with patch("core.app.task_pipeline.message_cycle_manager.session_factory") as mock_session_factory:
mock_session = Mock()
mock_session_class.return_value.__enter__.return_value = mock_session
# Current implementation uses session.query(...).scalar()
mock_session.query.return_value.scalar.return_value = None # No files
mock_session_factory.create_session.return_value.__enter__.return_value = mock_session
# Current implementation uses session.scalar(select(...))
mock_session.scalar.return_value = None # No files
with current_app.app_context():
event_type = message_cycle_manager.get_message_event_type("test-message-id")
# Should query database once
mock_session_class.assert_called_once_with(ANY, expire_on_commit=False)
# Should open session once
mock_session_factory.create_session.assert_called_once()
assert event_type == StreamEvent.MESSAGE
# Step 2: Use event_type for multiple calls (no additional queries)
with patch("core.app.task_pipeline.message_cycle_manager.Session") as mock_session_class:
mock_session_class.return_value.__enter__.return_value = Mock()
with patch("core.app.task_pipeline.message_cycle_manager.session_factory") as mock_session_factory:
mock_session_factory.create_session.return_value.__enter__.return_value = Mock()
chunk1_response = message_cycle_manager.message_to_stream_response(
answer="Chunk 1", message_id="test-message-id", event_type=event_type
@@ -157,8 +144,8 @@ class TestMessageCycleManagerOptimization:
answer="Chunk 2", message_id="test-message-id", event_type=event_type
)
# Should not query database again
mock_session_class.assert_not_called()
# Should not open session again when event_type provided
mock_session_factory.create_session.assert_not_called()
assert chunk1_response.event == StreamEvent.MESSAGE
assert chunk2_response.event == StreamEvent.MESSAGE

View File

@@ -1,5 +1,7 @@
from unittest.mock import MagicMock, patch
import pytest
from core.model_runtime.entities.message_entities import AssistantPromptMessage
from core.model_runtime.model_providers.__base.large_language_model import _increase_tool_call
@@ -97,3 +99,14 @@ def test__increase_tool_call():
mock_id_generator.side_effect = [_exp_case.id for _exp_case in EXPECTED_CASE_4]
with patch("core.model_runtime.model_providers.__base.large_language_model._gen_tool_call_id", mock_id_generator):
_run_case(INPUTS_CASE_4, EXPECTED_CASE_4)
def test__increase_tool_call__no_id_no_name_first_delta_should_raise():
inputs = [
ToolCall(id="", type="function", function=ToolCall.ToolCallFunction(name="", arguments='{"arg1": ')),
ToolCall(id="", type="function", function=ToolCall.ToolCallFunction(name="func_foo", arguments='"value"}')),
]
actual: list[ToolCall] = []
with patch("core.model_runtime.model_providers.__base.large_language_model._gen_tool_call_id", MagicMock()):
with pytest.raises(ValueError):
_increase_tool_call(inputs, actual)

View File

@@ -0,0 +1,103 @@
from core.model_runtime.entities.llm_entities import LLMResult, LLMResultChunk, LLMResultChunkDelta, LLMUsage
from core.model_runtime.entities.message_entities import (
AssistantPromptMessage,
TextPromptMessageContent,
UserPromptMessage,
)
from core.model_runtime.model_providers.__base.large_language_model import _normalize_non_stream_plugin_result
def _make_chunk(
*,
model: str = "test-model",
content: str | list[TextPromptMessageContent] | None,
tool_calls: list[AssistantPromptMessage.ToolCall] | None = None,
usage: LLMUsage | None = None,
system_fingerprint: str | None = None,
) -> LLMResultChunk:
message = AssistantPromptMessage(content=content, tool_calls=tool_calls or [])
delta = LLMResultChunkDelta(index=0, message=message, usage=usage)
return LLMResultChunk(model=model, delta=delta, system_fingerprint=system_fingerprint)
def test__normalize_non_stream_plugin_result__from_first_chunk_str_content_and_tool_calls():
prompt_messages = [UserPromptMessage(content="hi")]
tool_calls = [
AssistantPromptMessage.ToolCall(
id="1",
type="function",
function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="func_foo", arguments=""),
),
AssistantPromptMessage.ToolCall(
id="",
type="function",
function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments='{"arg1": '),
),
AssistantPromptMessage.ToolCall(
id="",
type="function",
function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments='"value"}'),
),
]
usage = LLMUsage.empty_usage().model_copy(update={"prompt_tokens": 1, "total_tokens": 1})
chunk = _make_chunk(content="hello", tool_calls=tool_calls, usage=usage, system_fingerprint="fp-1")
result = _normalize_non_stream_plugin_result(
model="test-model", prompt_messages=prompt_messages, result=iter([chunk])
)
assert result.model == "test-model"
assert result.prompt_messages == prompt_messages
assert result.message.content == "hello"
assert result.usage.prompt_tokens == 1
assert result.system_fingerprint == "fp-1"
assert result.message.tool_calls == [
AssistantPromptMessage.ToolCall(
id="1",
type="function",
function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="func_foo", arguments='{"arg1": "value"}'),
)
]
def test__normalize_non_stream_plugin_result__from_first_chunk_list_content():
prompt_messages = [UserPromptMessage(content="hi")]
content_list = [TextPromptMessageContent(data="a"), TextPromptMessageContent(data="b")]
chunk = _make_chunk(content=content_list, usage=LLMUsage.empty_usage())
result = _normalize_non_stream_plugin_result(
model="test-model", prompt_messages=prompt_messages, result=iter([chunk])
)
assert result.message.content == content_list
def test__normalize_non_stream_plugin_result__passthrough_llm_result():
prompt_messages = [UserPromptMessage(content="hi")]
llm_result = LLMResult(
model="test-model",
prompt_messages=prompt_messages,
message=AssistantPromptMessage(content="ok"),
usage=LLMUsage.empty_usage(),
)
assert (
_normalize_non_stream_plugin_result(model="test-model", prompt_messages=prompt_messages, result=llm_result)
== llm_result
)
def test__normalize_non_stream_plugin_result__empty_iterator_defaults():
prompt_messages = [UserPromptMessage(content="hi")]
result = _normalize_non_stream_plugin_result(model="test-model", prompt_messages=prompt_messages, result=iter([]))
assert result.model == "test-model"
assert result.prompt_messages == prompt_messages
assert result.message.content == []
assert result.message.tool_calls == []
assert result.usage == LLMUsage.empty_usage()
assert result.system_fingerprint is None

View File

@@ -475,3 +475,130 @@ def test_valid_api_key_works():
headers = executor._assembling_headers()
assert "Authorization" in headers
assert headers["Authorization"] == "Bearer valid-api-key-123"
def test_executor_with_json_body_and_unquoted_uuid_variable():
"""Test that unquoted UUID variables are correctly handled in JSON body.
This test verifies the fix for issue #31436 where json_repair would truncate
certain UUID patterns (like 57eeeeb1-...) when they appeared as unquoted values.
"""
# UUID that triggers the json_repair truncation bug
test_uuid = "57eeeeb1-450b-482c-81b9-4be77e95dee2"
variable_pool = VariablePool(
system_variables=SystemVariable.empty(),
user_inputs={},
)
variable_pool.add(["pre_node_id", "uuid"], test_uuid)
node_data = HttpRequestNodeData(
title="Test JSON Body with Unquoted UUID Variable",
method="post",
url="https://api.example.com/data",
authorization=HttpRequestNodeAuthorization(type="no-auth"),
headers="Content-Type: application/json",
params="",
body=HttpRequestNodeBody(
type="json",
data=[
BodyData(
key="",
type="text",
# UUID variable without quotes - this is the problematic case
value='{"rowId": {{#pre_node_id.uuid#}}}',
)
],
),
)
executor = Executor(
node_data=node_data,
timeout=HttpRequestNodeTimeout(connect=10, read=30, write=30),
variable_pool=variable_pool,
)
# The UUID should be preserved in full, not truncated
assert executor.json == {"rowId": test_uuid}
assert len(executor.json["rowId"]) == len(test_uuid)
def test_executor_with_json_body_and_unquoted_uuid_with_newlines():
"""Test that unquoted UUID variables with newlines in JSON are handled correctly.
This is a specific case from issue #31436 where the JSON body contains newlines.
"""
test_uuid = "57eeeeb1-450b-482c-81b9-4be77e95dee2"
variable_pool = VariablePool(
system_variables=SystemVariable.empty(),
user_inputs={},
)
variable_pool.add(["pre_node_id", "uuid"], test_uuid)
node_data = HttpRequestNodeData(
title="Test JSON Body with Unquoted UUID and Newlines",
method="post",
url="https://api.example.com/data",
authorization=HttpRequestNodeAuthorization(type="no-auth"),
headers="Content-Type: application/json",
params="",
body=HttpRequestNodeBody(
type="json",
data=[
BodyData(
key="",
type="text",
# JSON with newlines and unquoted UUID variable
value='{\n"rowId": {{#pre_node_id.uuid#}}\n}',
)
],
),
)
executor = Executor(
node_data=node_data,
timeout=HttpRequestNodeTimeout(connect=10, read=30, write=30),
variable_pool=variable_pool,
)
# The UUID should be preserved in full
assert executor.json == {"rowId": test_uuid}
def test_executor_with_json_body_preserves_numbers_and_strings():
"""Test that numbers are preserved and string values are properly quoted."""
variable_pool = VariablePool(
system_variables=SystemVariable.empty(),
user_inputs={},
)
variable_pool.add(["node", "count"], 42)
variable_pool.add(["node", "id"], "abc-123")
node_data = HttpRequestNodeData(
title="Test JSON Body with mixed types",
method="post",
url="https://api.example.com/data",
authorization=HttpRequestNodeAuthorization(type="no-auth"),
headers="",
params="",
body=HttpRequestNodeBody(
type="json",
data=[
BodyData(
key="",
type="text",
value='{"count": {{#node.count#}}, "id": {{#node.id#}}}',
)
],
),
)
executor = Executor(
node_data=node_data,
timeout=HttpRequestNodeTimeout(connect=10, read=30, write=30),
variable_pool=variable_pool,
)
assert executor.json["count"] == 42
assert executor.json["id"] == "abc-123"

View File

@@ -30,3 +30,12 @@ class TestWorkflowExecutionStatus:
for status in non_ended_statuses:
assert not status.is_ended(), f"{status} should not be considered ended"
def test_ended_values(self):
"""Test ended_values returns the expected status values."""
assert set(WorkflowExecutionStatus.ended_values()) == {
WorkflowExecutionStatus.SUCCEEDED.value,
WorkflowExecutionStatus.FAILED.value,
WorkflowExecutionStatus.PARTIAL_SUCCEEDED.value,
WorkflowExecutionStatus.STOPPED.value,
}

View File

@@ -37,6 +37,20 @@ def _client_error(code: str) -> ClientError:
def _mock_client(monkeypatch):
client = MagicMock()
client.head_bucket.return_value = None
# Configure put_object to return a proper ETag that matches the MD5 hash
# The ETag format is typically the MD5 hash wrapped in quotes
def mock_put_object(**kwargs):
md5_hash = kwargs.get("Body", b"")
if isinstance(md5_hash, bytes):
md5_hash = hashlib.md5(md5_hash).hexdigest()
else:
md5_hash = hashlib.md5(md5_hash.encode()).hexdigest()
response = MagicMock()
response.get.return_value = f'"{md5_hash}"'
return response
client.put_object.side_effect = mock_put_object
boto_client = MagicMock(return_value=client)
monkeypatch.setattr(storage_module.boto3, "client", boto_client)
return client, boto_client
@@ -254,8 +268,8 @@ def test_serialization_roundtrip():
{"id": "2", "value": 123},
]
data = ArchiveStorage.serialize_to_jsonl_gz(records)
decoded = ArchiveStorage.deserialize_from_jsonl_gz(data)
data = ArchiveStorage.serialize_to_jsonl(records)
decoded = ArchiveStorage.deserialize_from_jsonl(data)
assert decoded[0]["id"] == "1"
assert decoded[0]["payload"]["nested"] == "value"

View File

@@ -0,0 +1,54 @@
"""
Unit tests for workflow run archiving functionality.
This module contains tests for:
- Archive service
- Rollback service
"""
from datetime import datetime
from unittest.mock import MagicMock, patch
from services.retention.workflow_run.constants import ARCHIVE_BUNDLE_NAME
class TestWorkflowRunArchiver:
"""Tests for the WorkflowRunArchiver class."""
@patch("services.retention.workflow_run.archive_paid_plan_workflow_run.dify_config")
@patch("services.retention.workflow_run.archive_paid_plan_workflow_run.get_archive_storage")
def test_archiver_initialization(self, mock_get_storage, mock_config):
"""Test archiver can be initialized with various options."""
from services.retention.workflow_run.archive_paid_plan_workflow_run import WorkflowRunArchiver
mock_config.BILLING_ENABLED = False
archiver = WorkflowRunArchiver(
days=90,
batch_size=100,
tenant_ids=["test-tenant"],
limit=50,
dry_run=True,
)
assert archiver.days == 90
assert archiver.batch_size == 100
assert archiver.tenant_ids == ["test-tenant"]
assert archiver.limit == 50
assert archiver.dry_run is True
def test_get_archive_key(self):
"""Test archive key generation."""
from services.retention.workflow_run.archive_paid_plan_workflow_run import WorkflowRunArchiver
archiver = WorkflowRunArchiver.__new__(WorkflowRunArchiver)
mock_run = MagicMock()
mock_run.tenant_id = "tenant-123"
mock_run.app_id = "app-999"
mock_run.id = "run-456"
mock_run.created_at = datetime(2024, 1, 15, 12, 0, 0)
key = archiver._get_archive_key(mock_run)
assert key == f"tenant-123/app_id=app-999/year=2024/month=01/workflow_run_id=run-456/{ARCHIVE_BUNDLE_NAME}"

View File

@@ -171,22 +171,26 @@ class TestBillingServiceSendRequest:
"status_code", [httpx.codes.BAD_REQUEST, httpx.codes.INTERNAL_SERVER_ERROR, httpx.codes.NOT_FOUND]
)
def test_delete_request_non_200_with_valid_json(self, mock_httpx_request, mock_billing_config, status_code):
"""Test DELETE request with non-200 status code but valid JSON response.
"""Test DELETE request with non-200 status code raises ValueError.
DELETE doesn't check status code, so it returns the error JSON.
DELETE now checks status code and raises ValueError for non-200 responses.
"""
# Arrange
error_response = {"detail": "Error message"}
mock_response = MagicMock()
mock_response.status_code = status_code
mock_response.text = "Error message"
mock_response.json.return_value = error_response
mock_httpx_request.return_value = mock_response
# Act
result = BillingService._send_request("DELETE", "/test", json={"key": "value"})
# Assert
assert result == error_response
# Act & Assert
with patch("services.billing_service.logger") as mock_logger:
with pytest.raises(ValueError) as exc_info:
BillingService._send_request("DELETE", "/test", json={"key": "value"})
assert "Unable to process delete request" in str(exc_info.value)
# Verify error logging
mock_logger.error.assert_called_once()
assert "DELETE response" in str(mock_logger.error.call_args)
@pytest.mark.parametrize(
"status_code", [httpx.codes.BAD_REQUEST, httpx.codes.INTERNAL_SERVER_ERROR, httpx.codes.NOT_FOUND]
@@ -210,9 +214,9 @@ class TestBillingServiceSendRequest:
"status_code", [httpx.codes.BAD_REQUEST, httpx.codes.INTERNAL_SERVER_ERROR, httpx.codes.NOT_FOUND]
)
def test_delete_request_non_200_with_invalid_json(self, mock_httpx_request, mock_billing_config, status_code):
"""Test DELETE request with non-200 status code and invalid JSON response raises exception.
"""Test DELETE request with non-200 status code raises ValueError before JSON parsing.
DELETE doesn't check status code, so it calls response.json() which raises JSONDecodeError
DELETE now checks status code before calling response.json(), so ValueError is raised
when the response cannot be parsed as JSON (e.g., empty response).
"""
# Arrange
@@ -223,8 +227,13 @@ class TestBillingServiceSendRequest:
mock_httpx_request.return_value = mock_response
# Act & Assert
with pytest.raises(json.JSONDecodeError):
BillingService._send_request("DELETE", "/test", json={"key": "value"})
with patch("services.billing_service.logger") as mock_logger:
with pytest.raises(ValueError) as exc_info:
BillingService._send_request("DELETE", "/test", json={"key": "value"})
assert "Unable to process delete request" in str(exc_info.value)
# Verify error logging
mock_logger.error.assert_called_once()
assert "DELETE response" in str(mock_logger.error.call_args)
def test_retry_on_request_error(self, mock_httpx_request, mock_billing_config):
"""Test that _send_request retries on httpx.RequestError."""
@@ -789,7 +798,7 @@ class TestBillingServiceAccountManagement:
# Assert
assert result == expected_response
mock_send_request.assert_called_once_with("DELETE", "/account/", params={"account_id": account_id})
mock_send_request.assert_called_once_with("DELETE", "/account", params={"account_id": account_id})
def test_is_email_in_freeze_true(self, mock_send_request):
"""Test checking if email is frozen (returns True)."""

View File

@@ -0,0 +1,180 @@
"""
Unit tests for archived workflow run deletion service.
"""
from unittest.mock import MagicMock, patch
class TestArchivedWorkflowRunDeletion:
def test_delete_by_run_id_returns_error_when_run_missing(self):
from services.retention.workflow_run.delete_archived_workflow_run import ArchivedWorkflowRunDeletion
deleter = ArchivedWorkflowRunDeletion()
repo = MagicMock()
session = MagicMock()
session.get.return_value = None
session_maker = MagicMock()
session_maker.return_value.__enter__.return_value = session
session_maker.return_value.__exit__.return_value = None
mock_db = MagicMock()
mock_db.engine = MagicMock()
with (
patch("services.retention.workflow_run.delete_archived_workflow_run.db", mock_db),
patch(
"services.retention.workflow_run.delete_archived_workflow_run.sessionmaker", return_value=session_maker
),
patch.object(deleter, "_get_workflow_run_repo", return_value=repo),
):
result = deleter.delete_by_run_id("run-1")
assert result.success is False
assert result.error == "Workflow run run-1 not found"
repo.get_archived_run_ids.assert_not_called()
def test_delete_by_run_id_returns_error_when_not_archived(self):
from services.retention.workflow_run.delete_archived_workflow_run import ArchivedWorkflowRunDeletion
deleter = ArchivedWorkflowRunDeletion()
repo = MagicMock()
repo.get_archived_run_ids.return_value = set()
run = MagicMock()
run.id = "run-1"
run.tenant_id = "tenant-1"
session = MagicMock()
session.get.return_value = run
session_maker = MagicMock()
session_maker.return_value.__enter__.return_value = session
session_maker.return_value.__exit__.return_value = None
mock_db = MagicMock()
mock_db.engine = MagicMock()
with (
patch("services.retention.workflow_run.delete_archived_workflow_run.db", mock_db),
patch(
"services.retention.workflow_run.delete_archived_workflow_run.sessionmaker", return_value=session_maker
),
patch.object(deleter, "_get_workflow_run_repo", return_value=repo),
patch.object(deleter, "_delete_run") as mock_delete_run,
):
result = deleter.delete_by_run_id("run-1")
assert result.success is False
assert result.error == "Workflow run run-1 is not archived"
mock_delete_run.assert_not_called()
def test_delete_by_run_id_calls_delete_run(self):
from services.retention.workflow_run.delete_archived_workflow_run import ArchivedWorkflowRunDeletion
deleter = ArchivedWorkflowRunDeletion()
repo = MagicMock()
repo.get_archived_run_ids.return_value = {"run-1"}
run = MagicMock()
run.id = "run-1"
run.tenant_id = "tenant-1"
session = MagicMock()
session.get.return_value = run
session_maker = MagicMock()
session_maker.return_value.__enter__.return_value = session
session_maker.return_value.__exit__.return_value = None
mock_db = MagicMock()
mock_db.engine = MagicMock()
with (
patch("services.retention.workflow_run.delete_archived_workflow_run.db", mock_db),
patch(
"services.retention.workflow_run.delete_archived_workflow_run.sessionmaker", return_value=session_maker
),
patch.object(deleter, "_get_workflow_run_repo", return_value=repo),
patch.object(deleter, "_delete_run", return_value=MagicMock(success=True)) as mock_delete_run,
):
result = deleter.delete_by_run_id("run-1")
assert result.success is True
mock_delete_run.assert_called_once_with(run)
def test_delete_batch_uses_repo(self):
from services.retention.workflow_run.delete_archived_workflow_run import ArchivedWorkflowRunDeletion
deleter = ArchivedWorkflowRunDeletion()
repo = MagicMock()
run1 = MagicMock()
run1.id = "run-1"
run1.tenant_id = "tenant-1"
run2 = MagicMock()
run2.id = "run-2"
run2.tenant_id = "tenant-1"
repo.get_archived_runs_by_time_range.return_value = [run1, run2]
session = MagicMock()
session_maker = MagicMock()
session_maker.return_value.__enter__.return_value = session
session_maker.return_value.__exit__.return_value = None
start_date = MagicMock()
end_date = MagicMock()
mock_db = MagicMock()
mock_db.engine = MagicMock()
with (
patch("services.retention.workflow_run.delete_archived_workflow_run.db", mock_db),
patch(
"services.retention.workflow_run.delete_archived_workflow_run.sessionmaker", return_value=session_maker
),
patch.object(deleter, "_get_workflow_run_repo", return_value=repo),
patch.object(
deleter, "_delete_run", side_effect=[MagicMock(success=True), MagicMock(success=True)]
) as mock_delete_run,
):
results = deleter.delete_batch(
tenant_ids=["tenant-1"],
start_date=start_date,
end_date=end_date,
limit=2,
)
assert len(results) == 2
repo.get_archived_runs_by_time_range.assert_called_once_with(
session=session,
tenant_ids=["tenant-1"],
start_date=start_date,
end_date=end_date,
limit=2,
)
assert mock_delete_run.call_count == 2
def test_delete_run_dry_run(self):
from services.retention.workflow_run.delete_archived_workflow_run import ArchivedWorkflowRunDeletion
deleter = ArchivedWorkflowRunDeletion(dry_run=True)
run = MagicMock()
run.id = "run-1"
run.tenant_id = "tenant-1"
with patch.object(deleter, "_get_workflow_run_repo") as mock_get_repo:
result = deleter._delete_run(run)
assert result.success is True
mock_get_repo.assert_not_called()
def test_delete_run_calls_repo(self):
from services.retention.workflow_run.delete_archived_workflow_run import ArchivedWorkflowRunDeletion
deleter = ArchivedWorkflowRunDeletion()
run = MagicMock()
run.id = "run-1"
run.tenant_id = "tenant-1"
repo = MagicMock()
repo.delete_runs_with_related.return_value = {"runs": 1}
with patch.object(deleter, "_get_workflow_run_repo", return_value=repo):
result = deleter._delete_run(run)
assert result.success is True
assert result.deleted_counts == {"runs": 1}
repo.delete_runs_with_related.assert_called_once()

View File

@@ -0,0 +1,65 @@
"""
Unit tests for workflow run restore functionality.
"""
from datetime import datetime
from unittest.mock import MagicMock
class TestWorkflowRunRestore:
"""Tests for the WorkflowRunRestore class."""
def test_restore_initialization(self):
"""Restore service should respect dry_run flag."""
from services.retention.workflow_run.restore_archived_workflow_run import WorkflowRunRestore
restore = WorkflowRunRestore(dry_run=True)
assert restore.dry_run is True
def test_convert_datetime_fields(self):
"""ISO datetime strings should be converted to datetime objects."""
from models.workflow import WorkflowRun
from services.retention.workflow_run.restore_archived_workflow_run import WorkflowRunRestore
record = {
"id": "test-id",
"created_at": "2024-01-01T12:00:00",
"finished_at": "2024-01-01T12:05:00",
"name": "test",
}
restore = WorkflowRunRestore()
result = restore._convert_datetime_fields(record, WorkflowRun)
assert isinstance(result["created_at"], datetime)
assert result["created_at"].year == 2024
assert result["created_at"].month == 1
assert result["name"] == "test"
def test_restore_table_records_returns_rowcount(self):
"""Restore should return inserted rowcount."""
from services.retention.workflow_run.restore_archived_workflow_run import WorkflowRunRestore
session = MagicMock()
session.execute.return_value = MagicMock(rowcount=2)
restore = WorkflowRunRestore()
records = [{"id": "p1", "workflow_run_id": "r1", "created_at": "2024-01-01T00:00:00"}]
restored = restore._restore_table_records(session, "workflow_pauses", records, schema_version="1.0")
assert restored == 2
session.execute.assert_called_once()
def test_restore_table_records_unknown_table(self):
"""Unknown table names should be ignored gracefully."""
from services.retention.workflow_run.restore_archived_workflow_run import WorkflowRunRestore
session = MagicMock()
restore = WorkflowRunRestore()
restored = restore._restore_table_records(session, "unknown_table", [{"id": "x1"}], schema_version="1.0")
assert restored == 0
session.execute.assert_not_called()

View File

@@ -2,7 +2,11 @@ from unittest.mock import ANY, MagicMock, call, patch
import pytest
from libs.archive_storage import ArchiveStorageNotConfiguredError
from models.workflow import WorkflowArchiveLog
from tasks.remove_app_and_related_data_task import (
_delete_app_workflow_archive_logs,
_delete_archived_workflow_run_files,
_delete_draft_variable_offload_data,
_delete_draft_variables,
delete_draft_variables_batch,
@@ -324,3 +328,68 @@ class TestDeleteDraftVariableOffloadData:
# Verify error was logged
mock_logging.exception.assert_called_once_with("Error deleting draft variable offload data:")
class TestDeleteWorkflowArchiveLogs:
@patch("tasks.remove_app_and_related_data_task._delete_records")
@patch("tasks.remove_app_and_related_data_task.db")
def test_delete_app_workflow_archive_logs_calls_delete_records(self, mock_db, mock_delete_records):
tenant_id = "tenant-1"
app_id = "app-1"
_delete_app_workflow_archive_logs(tenant_id, app_id)
mock_delete_records.assert_called_once()
query_sql, params, delete_func, name = mock_delete_records.call_args[0]
assert "workflow_archive_logs" in query_sql
assert params == {"tenant_id": tenant_id, "app_id": app_id}
assert name == "workflow archive log"
mock_query = MagicMock()
mock_delete_query = MagicMock()
mock_query.where.return_value = mock_delete_query
mock_db.session.query.return_value = mock_query
delete_func("log-1")
mock_db.session.query.assert_called_once_with(WorkflowArchiveLog)
mock_query.where.assert_called_once()
mock_delete_query.delete.assert_called_once_with(synchronize_session=False)
class TestDeleteArchivedWorkflowRunFiles:
@patch("tasks.remove_app_and_related_data_task.get_archive_storage")
@patch("tasks.remove_app_and_related_data_task.logger")
def test_delete_archived_workflow_run_files_not_configured(self, mock_logger, mock_get_storage):
mock_get_storage.side_effect = ArchiveStorageNotConfiguredError("missing config")
_delete_archived_workflow_run_files("tenant-1", "app-1")
assert mock_logger.info.call_count == 1
assert "Archive storage not configured" in mock_logger.info.call_args[0][0]
@patch("tasks.remove_app_and_related_data_task.get_archive_storage")
@patch("tasks.remove_app_and_related_data_task.logger")
def test_delete_archived_workflow_run_files_list_failure(self, mock_logger, mock_get_storage):
storage = MagicMock()
storage.list_objects.side_effect = Exception("list failed")
mock_get_storage.return_value = storage
_delete_archived_workflow_run_files("tenant-1", "app-1")
storage.list_objects.assert_called_once_with("tenant-1/app_id=app-1/")
storage.delete_object.assert_not_called()
mock_logger.exception.assert_called_once_with("Failed to list archive files for app %s", "app-1")
@patch("tasks.remove_app_and_related_data_task.get_archive_storage")
@patch("tasks.remove_app_and_related_data_task.logger")
def test_delete_archived_workflow_run_files_success(self, mock_logger, mock_get_storage):
storage = MagicMock()
storage.list_objects.return_value = ["key-1", "key-2"]
mock_get_storage.return_value = storage
_delete_archived_workflow_run_files("tenant-1", "app-1")
storage.list_objects.assert_called_once_with("tenant-1/app_id=app-1/")
storage.delete_object.assert_has_calls([call("key-1"), call("key-2")], any_order=False)
mock_logger.info.assert_called_with("Deleted %s archive objects for app %s", 2, "app-1")

4678
api/uv.lock generated

File diff suppressed because it is too large Load Diff

28
dev/setup Executable file
View File

@@ -0,0 +1,28 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(dirname "$(realpath "$0")")"
ROOT="$(dirname "$SCRIPT_DIR")"
API_ENV_EXAMPLE="$ROOT/api/.env.example"
API_ENV="$ROOT/api/.env"
WEB_ENV_EXAMPLE="$ROOT/web/.env.example"
WEB_ENV="$ROOT/web/.env.local"
MIDDLEWARE_ENV_EXAMPLE="$ROOT/docker/middleware.env.example"
MIDDLEWARE_ENV="$ROOT/docker/middleware.env"
# 1) Copy api/.env.example -> api/.env
cp "$API_ENV_EXAMPLE" "$API_ENV"
# 2) Copy web/.env.example -> web/.env.local
cp "$WEB_ENV_EXAMPLE" "$WEB_ENV"
# 3) Copy docker/middleware.env.example -> docker/middleware.env
cp "$MIDDLEWARE_ENV_EXAMPLE" "$MIDDLEWARE_ENV"
# 4) Install deps
cd "$ROOT/api"
uv sync --group dev
cd "$ROOT/web"
pnpm install

View File

@@ -3,8 +3,9 @@
set -x
SCRIPT_DIR="$(dirname "$(realpath "$0")")"
cd "$SCRIPT_DIR/.."
cd "$SCRIPT_DIR/../api"
uv run flask db upgrade
uv --directory api run \
uv run \
flask run --host 0.0.0.0 --port=5001 --debug

8
dev/start-docker-compose Executable file
View File

@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(dirname "$(realpath "$0")")"
ROOT="$(dirname "$SCRIPT_DIR")"
cd "$ROOT/docker"
docker compose -f docker-compose.middleware.yaml --profile postgresql --profile weaviate -p dify up -d

Some files were not shown because too many files have changed in this diff Show More