docs: 多专家组头脑风暴产出 — 5 份设计规格

基于全景审计分析，产出 5 份跨领域设计规格: 1. 性能优化 — 后端批量INSERT/合并COUNT/告警预加载 + 前端N+1内联name 2. 安全纵深防御 — PostgreSQL RLS/行级数据范围/session_key Redis/审计哈希链 3. 事件驱动架构增强 — 6个业务域11个缺失事件补发 + Outbox LISTEN/NOTIFY 4. 前端工程化 — 14个大组件拆分 + 3个重复模式统一 + Bundle优化 5. 可观测性与运维 — 深度健康检查/Prometheus/OpenTelemetry/生产Docker
2026-04-27 07:46:36 +08:00
parent 5f83080ab8
commit d1ab8074a3
5 changed files with 1393 additions and 0 deletions
--- a/docs/superpowers/specs/2026-04-26-event-driven-architecture-design.md
+++ b/docs/superpowers/specs/2026-04-26-event-driven-architecture-design.md
@@ -0,0 +1,249 @@
+# 事件驱动架构增强设计规格
+
+> 日期: 2026-04-26 | 状态: draft | 主题: 缺失事件补发 + Outbox relay 优化 + 事件 schema 版本化
+
+## 1. 背景
+
+HMS 已有完整的事件总线基础设施：
+
+- **EventBus** (`erp-core/src/events.rs`): 两阶段发布（先持久化 pending → 广播 → 更新 published）
+- **Outbox relay** (`erp-server/src/outbox.rs`): 5 秒轮询 domain_events 表，重发 pending 事件
+- **domain_events 表**: id, tenant_id, event_type, payload, status, attempts, created_at, published_at
+
+已有事件发布的模块: patient, appointment, follow_up, consultation, health_data, alert_engine, device_reading, doctor。
+
+## 2. 问题分析
+
+### 2.1 缺失事件清单
+
+以下 6 个业务域的 service 文件中无任何 `event_bus.publish` 调用：
+
+| 业务域 | Service 文件 | 缺失事件 |
+|--------|-------------|----------|
+| 透析记录 | `dialysis_service.rs` | `dialysis_record.created/reviewed` |
+| 诊断 | `diagnosis_service.rs` | `diagnosis.created/updated` |
+| 知情同意 | `consent_service.rs` | `consent.granted/revoked` |
+| 日常监测 | `daily_monitoring_service.rs` | `daily_monitoring.created` |
+| 积分 | `points_service.rs` | `points.earned/exchanged` |
+| 资讯文章 | `article_service.rs` | `article.published/rejected` |
+
+### 2.2 基础设施改进项
+
+| 编号 | 问题 | 影响 |
+|------|------|------|
+| I-1 | Outbox relay 5 秒轮询延迟高 | 事件从产生到广播最长 5 秒延迟 |
+| I-2 | 事件 payload 无 schema 版本 | 消费者无法安全演进，字段增删破坏兼容性 |
+| I-3 | 无事件幂等性保证 | 消费者重复消费可能导致业务异常 |
+| I-4 | domain_events 表无清理策略 | 表无限增长影响查询性能 |
+
+## 3. 解决方案
+
+### 3.1 缺失事件补发
+
+#### 3.1.1 事件优先级排序
+
+| 优先级 | 事件 | 理由 |
+|--------|------|------|
+| P0 | `dialysis_record.created/reviewed` | 透析是核心医疗流程，需触发统计更新和告警检查 |
+| P0 | `diagnosis.created/updated` | 诊断关联后续治疗方案，影响预约/随访 |
+| P1 | `consent.granted/revoked` | 合规要求，知情同意变更需通知医护 |
+| P1 | `article.published/rejected` | 内容审核流程依赖事件驱动 |
+| P2 | `daily_monitoring.created` | 日常监测触发趋势分析 |
+| P2 | `points.earned/exchanged` | 积分变动通知用户 |
+
+#### 3.1.2 统一事件信封格式
+
+所有事件 payload 遵循统一信封：
+
+```json
+{
+  "schema_version": "v1",
+  "entity_id": "uuid",
+  "entity_type": "dialysis_record",
+  "action": "created",
+  "tenant_id": "uuid",
+  "operator_id": "uuid | null",
+  "timestamp": "ISO 8601",
+  "data": { /* 实体快照或变更字段 */ },
+  "metadata": { "source": "erp-health", "trace_id": "uuid" }
+}
+```
+
+#### 3.1.3 各事件 data 字段设计
+
+| 事件 | data 关键字段 | 说明 |
+|------|--------------|------|
+| `dialysis_record.created` | patient_id, dialysis_type, status, dialysis_date, duration, ultrafiltration_volume | 新建透析记录 |
+| `dialysis_record.reviewed` | patient_id, reviewer_id, dialysis_type, complication_notes | 医生审核完成 |
+| `diagnosis.created` | patient_id, icd_code, diagnosis_name, diagnosis_type, severity, diagnosed_at | 新诊断录入 |
+| `diagnosis.updated` | patient_id, changed_fields[], old_values{}, new_values{} | 诊断信息变更（含 diff） |
+| `consent.granted` | patient_id, consent_type, consent_scope, granted_by, expires_at | 知情同意签署 |
+| `consent.revoked` | patient_id, consent_type, revoked_by, reason | 知情同意撤销 |
+| `article.published` | title, author_id, category_id, tags[] | 文章审核通过发布 |
+| `article.rejected` | title, reviewer_id, reason | 文章审核驳回 |
+| `daily_monitoring.created` | patient_id, monitoring_date, monitoring_type, values{} | 日常监测数据录入 |
+| `points.earned` | patient_id, points, source_type, source_id, balance_after | 积分获得 |
+| `points.exchanged` | patient_id, points, product_name, order_id, balance_after | 积分兑换 |
+
+### 3.2 Outbox relay 优化
+
+#### 3.2.1 PostgreSQL LISTEN/NOTIFY 替代轮询
+
+**当前**: 5 秒轮询 `domain_events` 表（`outbox.rs` 第 26-32 行）
+
+**优化方案**:
+
+1. 在 `EventBus::publish()` 持久化事件后执行 `NOTIFY`:
+
+```rust
+// erp-core/src/events.rs publish() 末尾添加
+let notify_sql = format!("NOTIFY outbox_channel, '{}'", event.id);
+sqlx::query(&notify_sql).execute(db).await.ok();
+```
+
+2. Outbox relay 使用 `LISTEN` + 30 秒兜底轮询:
+
+```rust
+let mut listener = PgListener::connect_with(&db).await?;
+listener.listen("outbox_channel").await?;
+loop {
+    tokio::select! {
+        _ = listener.recv() => { process_pending_events(&db, &event_bus).await.ok(); }
+        _ = tokio::time::sleep(Duration::from_secs(30)) => {
+            process_pending_events(&db, &event_bus).await.ok();
+        }
+    }
+}
+```
+
+**收益**: 事件延迟 0-5s → <100ms，DB 轮询压力降低 6x。**复杂度**: 低。
+
+#### 3.2.2 domain_events 表清理
+
+**方案**: 按月分区 + 90 天归档
+
+```sql
+CREATE TABLE domain_events_new (LIKE domain_events INCLUDING ALL)
+  PARTITION BY RANGE (created_at);
+-- 按月创建分区
+CREATE TABLE domain_events_2026_04 PARTITION OF domain_events_new
+  FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
+```
+
+已 published 且 > 90 天的事件迁移到 `domain_events_archive` 表。
+
+### 3.3 事件 schema 版本化
+
+在 payload 中嵌入 `schema_version` 字段，消费者按 `event_type` + `schema_version` 路由：
+
+```rust
+fn handle_event(event: &DomainEvent) {
+    let version = event.payload["schema_version"].as_str().unwrap_or("v1");
+    match (event.event_type.as_str(), version) {
+        ("dialysis_record.created", "v1") => handle_v1(event),
+        ("dialysis_record.created", "v2") => handle_v2(event),
+        _ => tracing::warn!("Unknown event version"),
+    }
+}
+```
+
+**演进规则**: 新增字段兼容（不升版），删除/重命名字段不兼容（升版）。
+
+### 3.4 事件幂等性保证
+
+消费者维护 `processed_events` 去重表：
+
+```sql
+CREATE TABLE processed_events (
+    event_id UUID NOT NULL,
+    consumer_id VARCHAR(64) NOT NULL,
+    processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    PRIMARY KEY (event_id, consumer_id)
+);
+```
+
+流程: 收到事件 → 查已处理 → 已存在则跳过 → 否则执行业务 + 插入记录。7 天 TTL 定期清理。
+
+## 4. 实施步骤
+
+### Phase 1: P0 事件补发（预估 2 天）
+
+| 步骤 | 任务 | 修改文件 |
+|------|------|----------|
+| 1.1 | dialysis_service 添加 created/reviewed 事件 | `dialysis_service.rs` |
+| 1.2 | diagnosis_service 添加 created/updated 事件 | `diagnosis_service.rs` |
+| 1.3 | 验证: 事件发布 + payload 格式正确 | - |
+
+### Phase 2: P1 事件补发（预估 1-2 天）
+
+| 步骤 | 任务 | 修改文件 |
+|------|------|----------|
+| 2.1 | consent_service 添加 granted/revoked 事件 | `consent_service.rs` |
+| 2.2 | article_service 添加 published/rejected 事件 | `article_service.rs` |
+| 2.3 | 验证: 事件发布正确触发 | - |
+
+### Phase 3: P2 事件补发（预估 1 天）
+
+| 步骤 | 任务 | 修改文件 |
+|------|------|----------|
+| 3.1 | daily_monitoring_service 添加 created 事件 | `daily_monitoring_service.rs` |
+| 3.2 | points_service 添加 earned/exchanged 事件 | `points_service.rs` |
+| 3.3 | 验证: 积分变动事件触发 | - |
+
+### Phase 4: 基础设施优化（预估 2-3 天）
+
+| 步骤 | 任务 | 修改文件 |
+|------|------|----------|
+| 4.1 | Outbox relay 改用 LISTEN/NOTIFY | `outbox.rs`, `events.rs` |
+| 4.2 | 添加事件 schema_version 字段 | 所有事件发布处 |
+| 4.3 | 创建 processed_events 去重表 | migration |
+| 4.4 | domain_events 按月分区 + 清理策略 | migration + 后台任务 |
+| 4.5 | 验证: 事件延迟 < 100ms + 去重测试 | - |
+
+## 5. 风险与缓解
+
+### 5.1 LISTEN/NOTIFY 连接管理
+
+**风险**: PostgreSQL LISTEN 使用独立连接，连接断开需重建。
+**缓解**: `sqlx::PgListener` 自动重连 + 30 秒兜底轮询确保不遗漏。
+
+### 5.2 事件发布失败
+
+**风险**: `event_bus.publish()` 失败但业务操作已提交。
+**缓解**: 两阶段发布已处理 — 事件写入 pending，outbox relay 重发。publish 失败仅 warn 日志，不阻塞业务。
+
+### 5.3 去重表增长
+
+**风险**: `processed_events` 表快速增长。
+**缓解**: 7 天 TTL 定期清理，或使用 Redis SET NX + TTL 替代。
+
+### 5.4 Schema 演进兼容性
+
+**风险**: 新版本消费者无法处理老版本事件。
+**缓解**: 消费者必须支持 N-1 版本 schema。升版前确保所有消费者已升级。
+
+## 6. 已有事件 vs 缺失事件汇总
+
+### 已发布事件（8 个模块）
+
+| 模块 | 事件类型 | 触发位置 |
+|------|----------|----------|
+| patient | `patient.created`, `patient.updated` | `patient_service.rs` |
+| appointment | `appointment.created`, `appointment.status_changed` | `appointment_service.rs` |
+| follow_up | `follow_up_task.created`, `follow_up_task.status_changed`, `follow_up_record.completed` | `follow_up_service.rs` |
+| consultation | `consultation_session.created`, `consultation_session.status_changed` | `consultation_service.rs` |
+| health_data | `vital_signs.created`, `lab_report.uploaded` | `health_data_service.rs` |
+| alert | `alert.triggered` | `alert_engine.rs` |
+| device | `device.readings.synced` | `device_reading_service.rs` |
+| doctor | `doctor.schedule.updated` | `doctor_service.rs` |
+
+### 待补发事件（6 个模块，11 个事件）
+
+| 模块 | 事件类型 | 优先级 |
+|------|----------|--------|
+| dialysis | `dialysis_record.created`, `dialysis_record.reviewed` | P0 |
+| diagnosis | `diagnosis.created`, `diagnosis.updated` | P0 |
+| consent | `consent.granted`, `consent.revoked` | P1 |
+| article | `article.published`, `article.rejected` | P1 |
+| daily_monitoring | `daily_monitoring.created` | P2 |
+| points | `points.earned`, `points.exchanged` | P2 |
--- a/docs/superpowers/specs/2026-04-26-frontend-engineering-design.md
+++ b/docs/superpowers/specs/2026-04-26-frontend-engineering-design.md
@@ -0,0 +1,247 @@
+# 前端工程化改进设计
+
+> 日期: 2026-04-26 | 状态: draft | 主题: 组件拆分、重复模式统一、Bundle 优化
+
+## 1. 背景
+
+HMS Web 前端共 139 个源文件（77 TSX + 62 TS），总代码量 27,000 行。随着健康管理模块的持续迭代，工程化债务逐步积累，主要体现在四个方面：
+
+1. **组件膨胀** — 14 个文件超过 400 行，最大 872 行（PluginCRUDPage.tsx）
+2. **重复模式** — 错误处理、分页列表、ID→名称缓存三处重复，已有统一抽象但未被全面采用
+3. **API 层风格混用** — 对象风格（`patientApi.list()`）与函数风格（`listAlerts()`）并存
+4. **Bundle 体积** — 大型依赖未拆独立 chunk，`chunkSizeWarningLimit` 已提升至 600KB
+
+### 1.1 数据概览
+
+| 指标 | 数值 |
+|------|------|
+| 超过 400 行的组件 | 14 个 |
+| 超过 500 行的组件 | 7 个 |
+| 未使用 usePaginatedData 的健康列表页 | 6 个 |
+| 自建 nameCache 的页面 | 2 个（AppointmentList、PointsOrderList）|
+| API 层文件 | 33 个（对象风格 8 个，函数风格 25 个） |
+
+## 2. 问题分析
+
+### 2.1 组件膨胀分析
+
+TOP 7 大组件：
+
+| 文件 | 行数 | 职责混杂点 |
+|------|------|-----------|
+| PluginCRUDPage.tsx | 872 | 表格渲染 + 表单校验 + Drawer + 导入导出 + Timeline |
+| PluginGraphPage.tsx | 759 | Canvas 绑定 + 数据加载 + 布局计算 + 动画控制 |
+| Organizations.tsx | 622 | 三栏树形 + 组织 CRUD + 部门管理 + 人员分配 |
+| StatisticsDashboard.tsx | 580 | 五个并行统计 API + 图表渲染 + 时间筛选 |
+| ArticleEditor.tsx | 554 | 富文本编辑器 + 表单 + 标签选择 + 封面上传 |
+| FollowUpTaskList.tsx | 547 | 列表 + 筛选面板 + 状态流转弹窗 + 批量操作 |
+| MainLayout.tsx | 535 | 侧边栏 + 动态菜单 + 插件菜单注入 + Header |
+
+**根因：** React 组件未按"展示/容器/Hook"分层，状态逻辑和 UI 渲染耦合在同一文件中。
+
+### 2.2 重复模式分析
+
+**模式一：错误处理**
+
+已有统一方案：`client.ts` 全局拦截器处理 401/403/500，`handleApiError()` 处理业务错误，`useApiRequest()` hook 封装 try-catch。
+
+实际情况：组件仍大量内联 `catch (err) { message.error(...) }`。原因：useApiRequest 缺少 loading 状态返回，部分场景需要更细粒度的错误控制。
+
+**模式二：分页列表**
+
+已有 `usePaginatedData<T>(fetchFn, pageSize)` hook，封装 data/total/page/loading/refresh。
+
+未使用的健康列表页：PatientList、AppointmentList、ConsultationList、FollowUpTaskList、OfflineEventList、PointsProductList。原因：fetchFn 签名只支持 (page, pageSize, search)，部分页面需要额外筛选参数（status/dateRange/tag）。
+
+**模式三：ID→名称缓存**
+
+已有 `useHealthStore` 提供 `resolvePatientName(id)` / `getPatientName(id)` + 自动去重加载。
+
+AppointmentList.tsx 和 PointsOrderList.tsx 仍自建 `useState<Record<string, string>>` nameCache。原因：store 未提供批量解析接口。
+
+### 2.3 API 风格混用
+
+- **对象风格**（8 个文件）：`patientApi.list()`, `doctorApi.list()` 等
+- **函数风格**（25 个文件）：`listAlerts()`, `acknowledgeAlert()` 等
+
+结论：不强制统一（改动量大、收益有限），新增 API 文件统一采用对象风格。
+
+### 2.4 Bundle 体积分析
+
+当前 manualChunks 仅拆分了 react/react-dom/antd/axios/zustand。未拆分的大型依赖：
+
+| 依赖 | 估算大小 (gzip) | 使用范围 |
+|------|----------------|---------|
+| @ant-design/charts | ~180KB | 仅 StatisticsDashboard |
+| @xyflow/react | ~120KB | 仅 PluginGraphPage |
+| @wangeditor/editor | ~200KB | 仅 ArticleEditor |
+
+`chunkSizeWarningLimit: 600` 说明单个 chunk 已超过 Vite 默认 500KB 警告阈值。
+
+## 3. 解决方案
+
+### 3.1 组件拆分策略
+
+统一采用 **Container + Presentational + Hook** 三层模式：
+
+```
+原组件 (500+ 行)
+├── hooks/useXxxData.ts     — 数据获取、状态管理、业务逻辑
+├── components/XxxTable.tsx — 纯展示表格
+├── components/XxxForm.tsx  — 表单（含校验）
+└── XxxPage.tsx             — 容器组件（组装 hooks + 子组件）
+```
+
+#### PluginCRUDPage.tsx (872行) — P0
+
+| 拆分目标 | 类型 | 预估行数 | 职责 |
+|----------|------|---------|------|
+| usePluginData.ts | Hook | ~120 | CRUD 操作、导入导出逻辑 |
+| PluginTable.tsx | 展示 | ~150 | 表格列定义、行操作按钮 |
+| PluginForm.tsx | 展示 | ~180 | 新增/编辑表单 + Drawer |
+| PluginImportExport.tsx | 展示 | ~100 | 导入导出面板 |
+| PluginTimeline.tsx | 展示 | ~80 | 操作历史 Timeline |
+| PluginCRUDPage.tsx | 容器 | ~80 | 组装子组件 |
+
+#### PluginGraphPage.tsx (759行) — P1
+
+| 拆分目标 | 类型 | 预估行数 | 职责 |
+|----------|------|---------|------|
+| useGraphLayout.ts | Hook | ~100 | 布局算法、节点位置计算 |
+| useGraphData.ts | Hook | ~80 | 数据加载、边/节点转换 |
+| GraphCanvas.tsx | 展示 | ~200 | ReactFlow 渲染、节点样式 |
+| GraphToolbar.tsx | 展示 | ~60 | 工具栏（缩放/自动布局） |
+| PluginGraphPage.tsx | 容器 | ~60 | 组装 |
+
+#### Organizations.tsx (622行) — P1
+
+| 拆分目标 | 类型 | 预估行数 | 职责 |
+|----------|------|---------|------|
+| useOrgTree.ts | Hook | ~80 | 树数据加载、CRUD 操作 |
+| OrgTree.tsx | 展示 | ~120 | 左侧树形选择 |
+| OrgDetail.tsx | 展示 | ~150 | 右侧组织详情/编辑 |
+| DeptMemberList.tsx | 展示 | ~100 | 部门成员列表 + 分配 |
+| Organizations.tsx | 容器 | ~60 | 三栏布局组装 |
+
+#### StatisticsDashboard.tsx (580行) — P1
+
+| 拆分目标 | 类型 | 预估行数 | 职责 |
+|----------|------|---------|------|
+| useStatsData.ts | Hook | ~100 | 五个统计 API 并行加载 |
+| PatientTrendChart.tsx | 展示 | ~80 | 患者趋势图 |
+| AppointmentStats.tsx | 展示 | ~80 | 预约统计图 |
+| OverviewCards.tsx | 展示 | ~60 | 概览卡片组 |
+| TimeRangeSelector.tsx | 展示 | ~40 | 时间范围选择 |
+| StatisticsDashboard.tsx | 容器 | ~50 | 组装 |
+
+#### ArticleEditor.tsx (554行) — P2
+
+| 拆分目标 | 类型 | 预估行数 | 职责 |
+|----------|------|---------|------|
+| useArticleEditor.ts | Hook | ~100 | 文章加载/保存/发布逻辑 |
+| RichTextEditor.tsx | 展示 | ~150 | WangEditor 封装 |
+| ArticleMetaForm.tsx | 展示 | ~120 | 标题/分类/标签/封面表单 |
+| ArticleEditor.tsx | 容器 | ~60 | 组装 |
+
+#### FollowUpTaskList.tsx (547行) — P2
+
+| 拆分目标 | 类型 | 预估行数 | 职责 |
+|----------|------|---------|------|
+| useFollowUpTasks.ts | Hook | ~100 | 列表加载/筛选/状态流转 |
+| FollowUpTable.tsx | 展示 | ~120 | 表格 + 行操作 |
+| FollowUpFilter.tsx | 展示 | ~80 | 筛选面板 |
+| TaskStatusModal.tsx | 展示 | ~80 | 状态变更弹窗 |
+| FollowUpTaskList.tsx | 容器 | ~50 | 组装 |
+
+#### MainLayout.tsx (535行) — P1
+
+| 拆分目标 | 类型 | 预估行数 | 职责 |
+|----------|------|---------|------|
+| useMenuBuilder.ts | Hook | ~100 | 菜单数据构建、插件菜单合并 |
+| AppSidebar.tsx | 展示 | ~120 | 侧边栏渲染 |
+| AppHeader.tsx | 展示 | ~80 | 顶部 Header |
+| MainLayout.tsx | 容器 | ~60 | 布局骨架 |
+
+### 3.2 重复模式统一方案
+
+#### 3.2.1 增强 useApiRequest（P0）
+
+当前问题：缺少 loading 状态。增强为：
+
+```typescript
+interface UseApiRequestReturn {
+  execute: <T>(fn: () => Promise<T>, successMsg?: string) => Promise<T | null>;
+  loading: boolean;
+}
+```
+
+改动量：~10 行。已有调用点无需修改。
+
+#### 3.2.2 增强 usePaginatedData（P1）
+
+当前问题：fetchFn 签名只支持 (page, pageSize, search)。增强为支持泛型筛选参数：
+
+```typescript
+interface UsePaginatedDataOptions<T, F> {
+  fetchFn: (page: number, pageSize: number, filters: F) => Promise<{ data: T[]; total: number }>;
+  pageSize?: number;
+  defaultFilters: F;
+  autoFetch?: boolean;
+}
+```
+
+改动量：~30 行。保持旧签名兼容（函数重载），6 个列表页渐进迁移。
+
+#### 3.2.3 增强 useHealthStore 批量解析（P2）
+
+新增 `batchResolvePatientNames(ids)` / `batchResolveDoctorNames(ids)`。内部实现：去重 → 批量并发（限制并发数 5）→ 写入缓存。
+
+改动量：stores/health.ts ~30 行。删除 AppointmentList/PointsOrderList 自建 nameCache 代码。
+
+### 3.3 API 风格策略
+
+不强制统一现有代码。新增规则：新建 API 文件统一采用对象风格；修改已有文件时顺手迁移（Boy Scout Rule）。
+
+### 3.4 Bundle 优化方案
+
+在 vite.config.ts 的 manualChunks 中增加重型依赖拆分：
+
+```typescript
+if (id.includes('@ant-design/charts') || id.includes('@antv/')) return 'vendor-charts';
+if (id.includes('@xyflow/react') || id.includes('@reactflow/')) return 'vendor-flow';
+if (id.includes('@wangeditor/')) return 'vendor-editor';
+```
+
+配合路由级 `React.lazy()` 加载，使独立 chunk 仅在访问对应页面时下载。预期主 chunk 从 > 600KB 降至 < 400KB，chunkSizeWarningLimit 可降至 500。
+
+## 4. 实施步骤
+
+| Phase | 任务 | 工期 | 优先级 |
+|-------|------|------|--------|
+| 1: 基础设施增强 | 增强 useApiRequest + usePaginatedData + manualChunks + 路由懒加载 | 1 天 | P0-P1 |
+| 2: 核心组件拆分 | PluginCRUDPage + MainLayout + PluginGraphPage | 2-3 天 | P0-P1 |
+| 3: 健康模块拆分 | StatisticsDashboard + ArticleEditor + FollowUpTaskList | 2 天 | P1-P2 |
+| 4: 重复模式迁移 | 6 个列表页迁移 usePaginatedData + 2 个 nameCache 迁移 | 2 天 | P2 |
+| 5: 验证回归 | pnpm build 验证 + 功能回归 + ESLint max-lines-per-file 规则 | 1 天 | P2 |
+
+## 5. 风险与缓解
+
+| 风险 | 缓解措施 |
+|------|---------|
+| 拆分引入 re-render 性能退化 | React.memo 包装展示组件，DevTools Profiler 验证 |
+| usePaginatedData 泛型重构破坏现有调用点 | 保持旧签名兼容（函数重载），渐进迁移 |
+| 拆分后导入路径变化导致循环依赖 | 每个拆分完成后立即 `pnpm build` 验证 |
+| Bundle 拆分过度导致请求数增加 | HTTP/2 多路复用下影响有限 |
+| WangEditor 封装层与编辑器生命周期冲突 | useRef 管理 editor 实例，严格 cleanup |
+
+**不做的事情：** 不重写现有组件；不强制统一 API 风格；不引入新状态管理库；不做 SSR/SSG。
+
+**成功指标：**
+
+| 指标 | 当前值 | 目标值 |
+|------|--------|--------|
+| >400 行的组件数 | 14 | <= 5 |
+| >500 行的组件数 | 7 | 0 |
+| 主 chunk 体积 | > 600KB | < 400KB |
+| usePaginatedData 覆盖率 | ~30% | > 80% |
+| useApiRequest 覆盖率 | ~20% | > 60% |
--- a/docs/superpowers/specs/2026-04-26-observability-and-ops-design.md
+++ b/docs/superpowers/specs/2026-04-26-observability-and-ops-design.md
@@ -0,0 +1,215 @@
+# 可观测性与运维基础设施设计
+
+> 日期: 2026-04-26 | 状态: draft | 主题: 健康检查、Prometheus 指标、分布式追踪、生产 Docker、日志聚合
+
+## 1. 背景
+
+HMS 后端基于 Axum 0.8 + SeaORM 1.1 + Redis 0.27 构建，当前运维能力缺口：
+
+| 能力 | 现状 | 差距 |
+|------|------|------|
+| 结构化日志 | tracing + tracing-subscriber JSON 格式 | 已实现，无聚合方案 |
+| 健康检查 | GET /health 返回 { status, version, modules } | 不验证 DB/Redis 连通性 |
+| 指标暴露 | 无 Prometheus endpoint | 需从零搭建 |
+| 分布式追踪 | 无 OpenTelemetry | 需从零搭建 |
+| 生产 Docker | 仅有开发 docker-compose（PostgreSQL + Redis） | 无 Rust 应用 Dockerfile |
+| 日志聚合 | 无 ELK/Loki 集成 | 需从零搭建 |
+
+技术栈：tower-http 0.6（已启用 trace feature）、自定义 rate_limit/JWT 中间件通过 `axum::middleware::from_fn` 注册。
+
+## 2. 问题分析
+
+### 2.1 健康检查不充分
+
+当前 `/health` 仅返回静态信息（版本号、模块名列表），不验证外部依赖连通性。容器编排无法据此判断服务是否真正可用。
+
+### 2.2 无可观测性指标
+
+缺少请求延迟分布（P50/P95/P99）、错误率、QPS、DB 连接池使用率、事件 outbox 积压量等关键运行指标。
+
+### 2.3 无分布式追踪
+
+Axum handler -> SeaORM query -> Redis command 之间无 trace_id 串联。排查跨模块问题（预约创建 -> 工作流启动 -> 消息通知）需手动对齐日志时间戳。
+
+### 2.4 无生产级容器镜像
+
+Rust 应用直接 `cargo run` 启动，缺少多阶段构建 Dockerfile、健康检查指令、非 root 用户运行。
+
+## 3. 解决方案
+
+### 3.1 深度健康检查
+
+**Crate 选型**: 无额外依赖，使用已有的 sea_orm + redis。
+
+**改造方案**: 扩展 `HealthResponse` 为分级检查，增加 `/health/live`（存活探针）和 `/health/ready`（就绪探针）两个子路径。
+
+检查项：
+
+| 组件 | 检查方式 | 超时 | 关键性 |
+|------|---------|------|--------|
+| PostgreSQL | `SELECT 1` via SeaORM | 2s | 关键（失败返回 503） |
+| Redis | `PING` via redis::Client | 1s | 非关键（失败标记 degraded） |
+| 模块状态 | 遍历 registry 检查 on_startup 是否完成 | 0ms | 非关键 |
+
+状态判定：全部通过 → `healthy`，非关键组件失败 → `degraded`（200），关键组件失败 → `unhealthy`（503）。
+
+**对现有代码的影响**: 仅修改 `handlers/health.rs`（~40 行改动），`AppState` 无需变化。
+
+### 3.2 Prometheus 指标
+
+**Crate 选型**: `metrics` 0.24 + `metrics-exporter-prometheus` 0.16
+
+选择理由：`metrics` 是 Rust 生态的指标门面 crate（类似 `log`/`tracing` 的解耦设计），exporter 内置独立 HTTP server，不侵入 Axum 路由。
+
+**指标设计：**
+
+| 类别 | 指标名 | 类型 |
+|------|--------|------|
+| 请求 | `http_request_duration_seconds{method,path,status}` | histogram |
+| 请求 | `http_requests_total{method,path,status}` | counter |
+| 数据库 | `db_pool_connections{state}` | gauge |
+| 数据库 | `db_query_duration_seconds{operation}` | histogram |
+| 事件 | `eventbus_published_total{event_type}` | counter |
+| 事件 | `eventbus_outbox_pending_count` | gauge |
+| 运行时 | `process_memory_rss_bytes` | gauge |
+
+**Axum middleware 集成要点：**
+
+- 新增 `middleware/metrics.rs`（~40 行），记录每个请求的 method/归一化 path/status/耗时
+- 路径归一化：`/api/v1/patients/xxx` → `/api/v1/patients/:id`，避免高基数标签
+- main.rs 初始化 exporter 监听独立端口 9090
+- 在路由组装处添加 `.layer(axum_middleware::from_fn(metrics_middleware))`
+
+**对现有代码的影响**: main.rs ~10 行、新增 middleware ~40 行、outbox/event_bus 关键路径埋点 ~20 行。
+
+### 3.3 OpenTelemetry 分布式追踪
+
+**Crate 选型**: `opentelemetry` 0.27 + `opentelemetry-otlp` 0.27 + `tracing-opentelemetry` 0.28
+
+**成熟度评估：**
+- Rust OTel SDK 0.27+ 版本 API 趋于稳定
+- `tracing-opentelemetry` 与 tracing-subscriber 兼容性良好
+- SeaORM/Redis 无原生 span 支持，需手动埋点
+- 风险：SDK 初始化增加启动时间约 100-200ms
+
+**集成方案：**
+
+1. main.rs tracing 初始化重构为条件启用：通过 `OTEL_EXPORTER_OTLP_ENDPOINT` 环境变量决定是否启用
+2. 利用已有的 `tower-http` TraceLayer（项目已依赖）注入 trace_id
+3. SeaORM 关键查询点手动创建 `tracing::info_span!("db.query", db.operation = "xxx")`
+4. 采用 OTLP 协议导出，兼容 Jaeger/Tempo/Zipkin
+
+**对现有代码的影响**: main.rs ~30 行改动、Cargo.toml 新增 4 个依赖、service 函数 span 注解渐进式添加。
+
+### 3.4 生产 Docker 镜像
+
+**多阶段构建策略：**
+
+| 阶段 | 基础镜像 | 目的 |
+|------|---------|------|
+| 编译 | rust:1.85-bookworm | cargo build --release |
+| 运行 | debian:bookworm-slim | 仅二进制 + ca-certificates |
+
+关键设计：
+- **层缓存优化**: 先复制所有 Cargo.toml → 创建空源文件 → 编译依赖 → 复制实际源码 → 编译应用。依赖不变时复用编译缓存。
+- **安全**: 非 root 用户（erp:erp）运行
+- **健康检查**: `curl -f http://localhost:3000/health/live`
+- **预期运行时镜像**: ~50-80MB
+
+**docker-compose.prod.yml**: erp-server 服务 + 环境变量注入 + depends_on health condition + 健康检查指向 `/health/ready`。
+
+### 3.5 日志聚合
+
+**方案: Grafana Loki**
+
+选择理由：与 Prometheus 同属 Grafana 生态，不做全文索引按标签查询，资源消耗远低于 ELK。tracing-subscriber JSON 输出天然兼容 Loki 标签模型。
+
+部署：Loki 3.0 + Grafana 11.0 + Prometheus 2.52 通过 `docker-compose.monitoring.yml` 独立管理。日志采集通过 Grafana Alloy 收集 Docker 容器 stdout。
+
+### 3.6 告警规则
+
+基于 Prometheus 指标的 5 条核心告警：
+
+| 规则 | 条件 | 级别 |
+|------|------|------|
+| HighErrorRate | 5xx 比率 > 5% 持续 2m | critical |
+| HighLatencyP99 | P99 > 2s 持续 5m | warning |
+| DatabasePoolExhaustion | 连接池使用率 > 85% 持续 3m | warning |
+| OutboxBacklog | outbox 积压 > 100 持续 5m | warning |
+| HealthCheckFailed | 服务 up == 0 持续 1m | critical |
+
+## 4. 实施步骤
+
+### Phase 1: 深度健康检查 + 生产 Docker（1-2 天）
+
+| 任务 | 改动范围 | 优先级 |
+|------|---------|--------|
+| 扩展 HealthResponse + DB/Redis 检查 | handlers/health.rs ~60 行 | P0 |
+| 添加 /health/live 和 /health/ready | handlers/health.rs ~20 行 | P0 |
+| 编写生产 Dockerfile | 新文件 ~50 行 | P0 |
+| 编写 docker-compose.prod.yml | 新文件 ~40 行 | P0 |
+
+### Phase 2: Prometheus 指标（2 天）
+
+| 任务 | 改动范围 | 优先级 |
+|------|---------|--------|
+| 引入 metrics crate | Cargo.toml ~4 行 | P0 |
+| 实现 metrics middleware | 新文件 ~40 行 | P0 |
+| 注册 middleware + exporter 初始化 | main.rs ~15 行 | P0 |
+| SeaORM/EventBus 指标埋点 | ~40 行 | P1 |
+| Prometheus + Grafana Docker 配置 | 新文件 ~60 行 | P1 |
+
+### Phase 3: OpenTelemetry 集成（2-3 天）
+
+| 任务 | 改动范围 | 优先级 |
+|------|---------|--------|
+| 引入 opentelemetry crate | Cargo.toml ~6 行 | P2 |
+| 重构 tracing 初始化为条件启用 | main.rs ~30 行 | P2 |
+| 添加 TraceLayer | main.rs ~5 行 | P2 |
+| Service 函数 span 注解 | 渐进式 | P2 |
+
+### Phase 4: 日志聚合 + 告警（1-2 天）
+
+| 任务 | 改动范围 | 优先级 |
+|------|---------|--------|
+| Loki + Grafana 部署配置 | 新文件 ~40 行 | P2 |
+| Grafana Alloy 日志采集配置 | 新文件 ~30 行 | P2 |
+| Prometheus 告警规则 | 新文件 ~50 行 | P2 |
+
+## 5. 风险与缓解
+
+| 风险 | 影响 | 缓解措施 |
+|------|------|---------|
+| OTel SDK breaking change | 升级困难 | 锁定 0.27 版本，feature flag 条件启用 |
+| 指标收集增加延迟 | 性能退化 | histogram 无锁实现，单次 record < 50ns |
+| 日志量导致存储膨胀 | 存储成本 | Loki retention 30 天，JSON 压缩率高 |
+| Docker 编译缓存失效 | CI 时间长 | Cargo.toml 层和源码层分离 |
+| Prometheus 暴露内部信息 | 安全风险 | 独立端口 9090，网络策略限制访问 |
+| 健康检查超时阻塞 | /health 延迟 | 短超时（DB 2s/Redis 1s），并行检查 |
+
+**Crate 选型对比：**
+
+| 方案 | 优势 | 劣势 | 结论 |
+|------|------|------|------|
+| `prometheus` crate (原生) | 功能完整 | API 较重 | 不选 |
+| `metrics` + exporter | 轻量 facade，解耦 | 需额外 crate | 推荐 |
+| Jaeger 直接导出 | 简单 | 已废弃 | 不选 |
+| OTLP + Tempo/Jaeger | 通用标准 | 需 Collector | 推荐 |
+
+**性能影响评估：**
+
+| 组件 | 额外延迟 | 额外内存 | 启动时间增幅 |
+|------|---------|---------|-------------|
+| Prometheus middleware | < 0.1ms/req | ~5MB | < 50ms |
+| OpenTelemetry (10% 采样) | < 0.5ms/req | ~20MB | 100-200ms |
+| 健康检查 (DB ping) | 仅 /health | 无 | 无 |
+
+**成功指标：**
+
+| 指标 | 当前值 | 目标值 |
+|------|--------|--------|
+| /health 覆盖外部依赖 | 无 | DB + Redis |
+| Prometheus 端点 | 无 | :9090/metrics |
+| 分布式追踪 | 无 | 请求→DB→Redis 全链路 |
+| 生产镜像大小 | 无 | < 80MB |
+| 告警规则数 | 0 | >= 5 条 |
--- a/docs/superpowers/specs/2026-04-26-performance-optimization-design.md
+++ b/docs/superpowers/specs/2026-04-26-performance-optimization-design.md
@@ -0,0 +1,320 @@
+# 性能优化设计规格
+
+> 日期: 2026-04-26 | 状态: draft | 主题: 后端数据库查询 + 前端 N+1 与渲染优化
+
+## 1. 背景
+
+HMS 平台已完成核心业务功能开发（34 健康实体、22 前端页面），进入性能调优阶段。通过代码审查和运行时分析，发现以下瓶颈：
+
+- **后端**: 多处逐条 INSERT、串行 COUNT 查询、动态 SQL 拼接、串行 DB 调用
+- **前端**: N+1 请求模式、nameCache 循环依赖导致重渲染、未拆分 vendor chunk
+
+## 2. 问题分析
+
+### 2.1 后端瓶颈（5 项）
+
+| 编号 | 问题 | 文件位置 | 优先级 | 预估收益 |
+|------|------|----------|--------|----------|
+| B-1 | device_reading_service 逐条 INSERT（最多 500 次 DB 往返） | `crates/erp-health/src/service/device_reading_service.rs` `batch_insert_readings()` | P0 | 500 次往返 → 1 次，延迟降低 95%+ |
+| B-2 | stats_service 多次独立 COUNT（如 `get_follow_up_statistics` 4 次查询） | `crates/erp-health/src/service/stats_service.rs` `get_follow_up_statistics()` 等 | P1 | 4-7 次查询 → 1 次 GROUP BY，延迟降低 60-75% |
+| B-3 | alert_engine 逐规则独立查询 DB（cooldown + 条件评估） | `crates/erp-health/src/service/alert_engine.rs` `evaluate_rules()` | P1 | N 规则 × 2 查询 → 1+1 批量查询，延迟线性降低 |
+| B-4 | patient_service `get_health_summary` 4 次串行查询 | `crates/erp-health/src/service/patient_service.rs` `get_health_summary()` | P1 | 4 次串行 → 4 次并行，延迟降低 ~75% |
+| B-5 | `compute_avg_field` 动态 format! SQL 无法利用 prepared statement 缓存 | `crates/erp-health/src/service/stats_service.rs` `compute_avg_field()` | P2 | 利用 PG prepared statement 缓存，高频调用场景 CPU 降低 |
+
+### 2.2 前端瓶颈（5 项）
+
+| 编号 | 问题 | 文件位置 | 优先级 | 预估收益 |
+|------|------|----------|--------|----------|
+| F-1 | N+1 请求: AppointmentList/ConsultationList/PointsOrderList 逐条请求 patient/doctor name | `apps/web/src/pages/health/AppointmentList.tsx` 等多个文件 | P0 | 页面加载从 O(N) 请求 → O(1)，首屏时间降低 70-80% |
+| F-2 | nameCache useState 导致 fetchData 循环重建 | `AppointmentList.tsx` `PointsOrderList.tsx` 的 useEffect 依赖 | P0 | 消除无限循环风险，请求量减少 50%+ |
+| F-3 | PluginCRUDPage columns 未 memo 化 | `apps/web/src/pages/PluginCRUDPage.tsx` | P2 | 减少不必要 Table 重渲染 |
+| F-4 | PluginGraphPage requestAnimationFrame 持续重绘 | `apps/web/src/pages/PluginGraphPage.tsx` | P2 | CPU 占用降低，仅在数据变更时重绘 |
+| F-5 | @ant-design/charts / @xyflow/react / @wangeditor/editor 未拆独立 chunk | `apps/web/vite.config.ts` | P2 | 首屏 JS 体积降低 200-400KB (gzip) |
+
+## 3. 解决方案
+
+### 3.1 B-1: device_reading_service 批量 INSERT
+
+**当前实现** (`batch_insert_readings`, 第 194-229 行):
+- for 循环逐条 `model.insert(db).await`，每条一次 DB 往返
+- 500 条记录 = 500 次 INSERT
+
+**优化方案**:
+- 使用 SeaORM `Entity::insert_many()` 一次性插入
+- 唯一约束冲突通过 `ON CONFLICT DO NOTHING` 处理（需要 raw SQL 或 sea_query 的 `on_conflict`）
+
+```rust
+// 方向：构建 ActiveModel Vec，调用 insert_many
+let models: Vec<device_readings::ActiveModel> = parsed_readings
+    .iter()
+    .map(|(r, measured_at)| device_readings::ActiveModel { ... })
+    .collect();
+
+// insert_many + on_conflict_do_nothing
+let result = device_readings::Entity::insert_many(models)
+    .on_conflict(
+        sea_query::OnConflict::columns([
+            device_readings::Column::PatientId,
+            device_readings::Column::DeviceType,
+            device_readings::Column::MeasuredAt,
+        ])
+        .do_nothing()
+        .to_owned()
+    )
+    .exec(&state.db)
+    .await?;
+```
+
+**影响范围**: 仅修改 `device_reading_service.rs` 的 `batch_insert_readings()` 函数。
+
+### 3.2 B-2: stats_service 合并 COUNT 查询
+
+**当前实现** (`get_follow_up_statistics`, 第 93-139 行):
+- `total_tasks`: 1 次 COUNT
+- `completed`: 1 次 COUNT(status='completed')
+- `pending`: 1 次 COUNT(status='pending')
+- `overdue`: 1 次 COUNT(status='overdue')
+- 共 4 次独立 DB 查询
+
+**优化方案**: 合并为单次 `GROUP BY status` + 应用层聚合
+
+```sql
+SELECT status, COUNT(*) AS cnt
+FROM follow_up_task
+WHERE tenant_id = $1 AND deleted_at IS NULL
+GROUP BY status
+```
+
+应用层从 HashMap<status, count> 中提取各字段值。
+
+同理适用于:
+- `get_patient_statistics` (4 次 → 2 次: patient 表 1 次 GROUP BY + points_transaction 1 次)
+- `get_consultation_statistics` (3 次 → 1 次 GROUP BY + 1 次平均响应时间)
+- `get_dialysis_statistics` (3 次 → 1 次 GROUP BY)
+- `get_lab_report_statistics` (4 次 → 1 次 GROUP BY)
+
+**影响范围**: `stats_service.rs` 中 6 个统计函数。
+
+### 3.3 B-3: alert_engine 预加载 + 批量评估
+
+**当前实现** (`evaluate_rules`, 第 12-58 行):
+- 每条规则独立查询: `is_in_cooldown()` 1 次 + 条件评估 1-2 次
+- N 条规则 = 2N+ 次 DB 查询
+
+**优化方案**:
+1. 一次性查询所有 active rules（已有）
+2. 批量查询该患者最近 cooldown 期间所有 alerts，构建 HashSet<rule_id>
+3. 条件评估（single_threshold）只需查最新一条 hourly 记录，可按 device_type 批量查出后在内存匹配
+
+```rust
+// 批量 cooldown 检查
+let recent_alerts = alerts::Entity::find()
+    .filter(alerts::Column::TenantId.eq(tenant_id))
+    .filter(alerts::Column::PatientId.eq(patient_id))
+    .filter(alerts::Column::CreatedAt.gt(cooldown_start))
+    .filter(alerts::Column::DeletedAt.is_null())
+    .all(&state.db)
+    .await?;
+let cooldown_set: HashSet<Uuid> = recent_alerts.iter().map(|a| a.rule_id).collect();
+```
+
+**影响范围**: `alert_engine.rs` 的 `evaluate_rules()` 和辅助函数。
+
+### 3.4 B-4: get_health_summary 并行化
+
+**当前实现** (`get_health_summary`, 第 423-475 行):
+- 4 次 `.await` 串行执行
+- 总延迟 = sum(4 次查询延迟)
+
+**优化方案**: 使用 `tokio::join!` 并行执行
+
+```rust
+let (latest_vitals, latest_lab, upcoming, pending_follow_ups) = tokio::join!(
+    // 最新体征
+    vital_signs::Entity::find()
+        .filter(...)
+        .one(&state.db),
+    // 最新化验
+    lab_report::Entity::find()
+        .filter(...)
+        .one(&state.db),
+    // 待处理预约
+    appointment::Entity::find()
+        .filter(...)
+        .count(&state.db),
+    // 待办随访
+    follow_up_task::Entity::find()
+        .filter(...)
+        .count(&state.db),
+);
+```
+
+**影响范围**: 仅修改 `patient_service.rs` 的 `get_health_summary()`。
+
+### 3.5 B-5: compute_avg_field 参数化
+
+**当前实现** (`compute_avg_field`, 第 423-464 行):
+- `format!("SELECT AVG({field})...")` 动态拼接 SQL
+- 每个不同 field 生成不同 SQL 文本，PostgreSQL 无法缓存 prepared statement
+
+**优化方案**: 对每个允许的 field 生成独立的静态 SQL 常量
+
+```rust
+macro_rules! avg_field_sql {
+    ($field:literal) => {
+        concat!(
+            "SELECT AVG(", $field, ")::FLOAT8 AS avg_val FROM dialysis_record ",
+            "WHERE tenant_id = $1 AND deleted_at IS NULL AND ", $field, " IS NOT NULL ",
+            "AND created_at >= date_trunc('month', NOW())"
+        )
+    };
+}
+
+match field {
+    "ultrafiltration_volume" => avg_field_sql!("ultrafiltration_volume"),
+    "dialysis_duration" => avg_field_sql!("dialysis_duration"),
+    // ... 其余字段
+    _ => return Err(...),
+}
+```
+
+**影响范围**: 仅修改 `stats_service.rs` 的 `compute_avg_field()`。
+
+### 3.6 F-1: 后端列表 API 内联 name 字段
+
+**当前问题**: 前端拿到列表后，循环调用 `patientApi.get(id)` / `doctorApi.get(id)` 获取名字。
+
+**优化方案**: 后端列表查询时 JOIN 或子查询返回 `patient_name` / `doctor_name` 字段。
+
+**需修改的 handler/service**:
+
+| 模块 | 列表 API | 需内联字段 |
+|------|----------|-----------|
+| appointment | list_appointments | patient_name, doctor_name |
+| consultation_session | list_sessions | patient_name, doctor_name |
+| follow_up_task | list_tasks | patient_name |
+| points_order | list_orders | patient_name |
+| lab_report | list_reports | patient_name |
+
+**实现方式**: 使用 SeaORM 的 `select_only()` + `expr_as()` 添加 JOIN 字段，或直接在 DTO 层补充查询。
+
+```rust
+// 方向：在查询列表时 LEFT JOIN users 表获取 display_name
+let rows: Vec<(appointment::Model, Option<String>)> = appointment::Entity::find()
+    .filter(...)
+    .find_also_related(user::Entity)  // 如果有关联定义
+    .all(&state.db)
+    .await?;
+```
+
+**影响范围**: 5 个 handler + 对应 service 函数 + DTO 结构体。
+
+### 3.7 F-2: 移除 nameCache 依赖
+
+**当前问题**: `AppointmentList.tsx` 和 `PointsOrderList.tsx` 的 `useEffect` 依赖 `nameCache`，更新 nameCache 触发 fetchData 重建，形成循环。
+
+**优化方案**:
+1. 后端内联 name 后（F-1），nameCache 机制可以完全移除
+2. 作为过渡方案，将 nameCache 查询拆到独立 useEffect，不作为列表数据获取的依赖
+
+**影响范围**: `AppointmentList.tsx`, `PointsOrderList.tsx`, `ConsultationList.tsx`。
+
+### 3.8 F-5: vendor chunk 拆分
+
+**当前问题**: @ant-design/charts (~200KB)、@xyflow/react (~150KB)、@wangeditor/editor (~300KB) 打入主 bundle。
+
+**优化方案**: Vite `manualChunks` 配置
+
+```typescript
+// vite.config.ts build.rollupOptions.output.manualChunks
+manualChunks: {
+  'vendor-charts': ['@ant-design/charts'],
+  'vendor-flow': ['@xyflow/react'],
+  'vendor-editor': ['@wangeditor/editor'],
+}
+```
+
+**影响范围**: `apps/web/vite.config.ts`，配合 React.lazy 路由级加载。
+
+### 3.9 F-3/F-4: 渲染优化
+
+**F-3 PluginCRUDPage columns useMemo**:
+```typescript
+const columns = useMemo(() => [...], [schema]);
+```
+
+**F-4 PluginGraphPage 按需重绘**:
+- 替换持续 `requestAnimationFrame` 为数据变更时触发的单次重绘
+- 使用 `ResizeObserver` 监听容器大小变化
+
+## 4. 实施步骤
+
+### Phase 1: P0 紧急优化（预估 2-3 天）
+
+| 步骤 | 任务 | 修改文件 |
+|------|------|----------|
+| 1.1 | B-1: `batch_insert_readings` 改用 `insert_many()` | `device_reading_service.rs` |
+| 1.2 | F-1: 后端列表 API 内联 patient_name/doctor_name | 5 个 handler + service + DTO |
+| 1.3 | F-2: 移除前端 nameCache 依赖 | `AppointmentList.tsx`, `PointsOrderList.tsx` |
+| 1.4 | 验证: cargo test + 前端页面加载对比 | - |
+
+### Phase 2: P1 重要优化（预估 2-3 天）
+
+| 步骤 | 任务 | 修改文件 |
+|------|------|----------|
+| 2.1 | B-2: stats_service 合并 COUNT → GROUP BY | `stats_service.rs` |
+| 2.2 | B-3: alert_engine 预加载 + 批量评估 | `alert_engine.rs` |
+| 2.3 | B-4: `get_health_summary` 并行化 | `patient_service.rs` |
+| 2.4 | 验证: 基准测试对比（前后延迟测量） | - |
+
+### Phase 3: P2 次要优化（预估 1-2 天）
+
+| 步骤 | 任务 | 修改文件 |
+|------|------|----------|
+| 3.1 | B-5: `compute_avg_field` 参数化 | `stats_service.rs` |
+| 3.2 | F-5: vendor chunk 拆分 | `vite.config.ts` + 路由 lazy |
+| 3.3 | F-3: PluginCRUDPage columns memo | `PluginCRUDPage.tsx` |
+| 3.4 | F-4: PluginGraphPage 按需重绘 | `PluginGraphPage.tsx` |
+
+## 5. 风险与缓解
+
+### 5.1 B-1 批量 INSERT 冲突处理
+
+**风险**: `insert_many()` + `ON CONFLICT DO NOTHING` 可能无法精确统计 inserted vs duplicates。
+**缓解**: 先查询已存在的记录（按 patient_id + device_type + measured_at），过滤后插入净新增量。或使用 `exec_with_returning` 获取实际插入数。
+
+### 5.2 F-1 后端 JOIN 性能
+
+**风险**: 列表 API JOIN users 表可能在大数据量下变慢。
+**缓解**: users 表通常在千级别，JOIN 性能可接受。如遇瓶颈，可在列表查询中使用子查询 `(SELECT display_name FROM users WHERE id = ...)` 替代 JOIN。
+
+### 5.3 B-2 统计精度
+
+**风险**: GROUP BY 聚合结果可能与多次 COUNT 不完全一致（如有 NULL status）。
+**缓解**: 确保 `GROUP BY status` 包含对 NULL status 的处理，测试中对比优化前后结果一致性。
+
+### 5.4 前端 nameCache 移除过渡
+
+**风险**: 后端未全部内联 name 前移除 nameCache 导致页面显示 UUID。
+**缓解**: 采用渐进式 — 逐个 API 内联 name 并移除对应页面的 nameCache，每个 API 独立验证。
+
+## 6. 性能基准
+
+### 6.1 后端优化前后预估
+
+| 接口 | 优化前 | 优化后 | 提升 |
+|------|--------|--------|------|
+| POST /device-readings/batch (500条) | ~2500ms (500×5ms) | ~50ms (1次INSERT) | 50x |
+| GET /stats/follow-up | ~40ms (4次查询) | ~10ms (1次GROUP BY) | 4x |
+| GET /stats/dashboard | ~200ms (串行4个stats) | ~80ms (并行+合并) | 2.5x |
+| GET /patient/{id}/health-summary | ~60ms (串行4次) | ~20ms (并行4次) | 3x |
+| GET /alert/evaluate | ~100ms (10规则×10ms) | ~30ms (批量) | 3x |
+
+### 6.2 前端优化前后预估
+
+| 页面 | 优化前 | 优化后 | 提升 |
+|------|--------|--------|------|
+| AppointmentList (20条) | ~3s (1+20+20请求) | ~300ms (1请求) | 10x |
+| ConsultationList (20条) | ~2.5s | ~300ms | 8x |
+| PointsOrderList (20条) | ~2s | ~300ms | 7x |
+| 首屏 JS 体积 | ~1.2MB gzip | ~0.8MB gzip | 33% |
--- a/docs/superpowers/specs/2026-04-26-security-defense-in-depth-design.md
+++ b/docs/superpowers/specs/2026-04-26-security-defense-in-depth-design.md
@@ -0,0 +1,362 @@
+# 安全纵深防御设计规格
+
+> 日期: 2026-04-26 | 状态: draft | 主题: 数据库 RLS + 行级权限 + session_key 分布式化 + 审计增强
+
+## 1. 背景
+
+HMS 平台已具备良好的安全基础：JWT 认证、RBAC 权限、Argon2 密码哈希、PII 字段加密、API 限流、CORS 配置。但作为医疗 SaaS 平台，需要纵深防御（defense in depth）确保即使应用层出现漏洞，数据安全仍有保障。
+
+本规格聚焦 6 项安全增强，按医疗合规影响排序。
+
+## 2. 问题分析
+
+### 2.1 安全增强项清单
+
+| 编号 | 问题 | 医疗合规影响 | 实施复杂度 | 影响范围 |
+|------|------|-------------|-----------|----------|
+| S-1 | PostgreSQL RLS 安全网 | **高** — 跨租户数据泄漏是医疗数据合规红线 | 中 | 所有表 + migration + 中间件 |
+| S-2 | 行级数据范围未实现 | **高** — 医生只能看本科室患者是基本合规要求 | 中 | rbac + 所有 health handler |
+| S-3 | 微信 session_key 内存 HashMap | **中** — 多实例部署失效导致登录中断 | 低 | wechat_service.rs |
+| S-4 | 小程序 openid 明文存储 | **中** — 本地存储泄露可关联用户身份 | 低 | miniprogram storage |
+| S-5 | 审计日志完整性 | **高** — 医疗合规要求不可篡改的操作审计 | 中 | audit 模块 + migration |
+| S-6 | 健康检查端点不验证依赖 | **低** — 运维可靠性，非合规要求 | 低 | handlers/health.rs |
+
+### 2.2 各项详细分析
+
+**S-1 PostgreSQL RLS 安全网**
+
+当前多租户完全依赖应用层 `tenant_id` 过滤。代码中每个 `Entity::find()` 都手动添加 `.filter(Column::TenantId.eq(tenant_id))`，但如果有 handler 遗漏（如新增 handler 忘记加过滤），则跨租户数据泄漏。PostgreSQL Row Level Security (RLS) 可作为数据库级安全网。
+
+**S-2 行级数据范围**
+
+`role_permissions` 表已有 `data_scope` 字段（m000036 迁移，值: all/self/department/department_tree），`TenantContext` 已有 `department_ids` 字段，但 `require_permission()` 函数未使用 data_scope 做部门级过滤。医生 A 只能看到本科室患者，这个需求目前未实现。
+
+**S-3 session_key 内存 HashMap**
+
+`wechat_service.rs` 第 31-34 行使用 `LazyLock<Mutex<HashMap<String, SessionEntry>>>` 缓存 session_key。单实例可用，但多实例部署时：
+- 实例 A 缓存了 session_key
+- 用户请求被路由到实例 B → bind_phone 找不到 session_key → 登录失败
+
+**S-4 小程序 openid 明文存储**
+
+小程序端使用 `Taro.setStorageSync('openid', openid)` 明文存储。手机丢失或越狱场景下，攻击者可直接获取 openid 关联用户身份。
+
+**S-5 审计日志完整性**
+
+当前 `audit_logs` 表结构合理（包含 tenant_id, user_id, action, resource_type, old_value, new_value, ip_address），但存在以下不足：
+- erp-health 模块（34 实体）未记录审计日志，仅 erp-auth/erp-workflow/erp-message 有审计
+- 缺少审计日志防篡改机制（无签名/哈希链）
+- 无审计日志归档和保留策略
+
+**S-6 健康检查端点**
+
+`handlers/health.rs` 的 `/health` 端点仅返回内存中的模块列表，不验证 DB/Redis 连通性。在 DB 不可用时仍返回 `status: "ok"`，K8s/Docker 健康检查无法探测到故障。
+
+## 3. 解决方案
+
+### 3.1 S-1: PostgreSQL RLS 安全网
+
+**实施步骤**:
+
+1. 创建迁移，为所有含 `tenant_id` 的表启用 RLS:
+
+```sql
+-- 以 patient 表为例
+ALTER TABLE patient ENABLE ROW LEVEL SECURITY;
+
+-- 创建策略：应用层连接使用 tenant_context 变量过滤
+CREATE POLICY tenant_isolation ON patient
+  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
+-- 超级用户和 migration 角色绕过 RLS
+CREATE POLICY tenant_bypass ON patient
+  USING (current_user IN ('erp_admin', 'erp_migration'));
+```
+
+2. 中间件设置 `current_setting`:
+
+在 Axum tenant 中间件中，每个请求开始时执行:
+
+```sql
+SET LOCAL app.current_tenant_id = '<tenant_id_from_jwt>';
+```
+
+使用 `SET LOCAL` 确保事务结束后自动重置。
+
+3. 为所有 30+ 基础表和 34 健康表批量启用 RLS。
+
+**迁移策略**:
+- 新建单个迁移文件 `m000073_enable_rls_all_tables`
+- 对每张表执行 `ENABLE ROW LEVEL SECURITY` + `CREATE POLICY`
+- 创建数据库角色 `erp_app`（非超级用户）供应用连接使用
+- 现有连接字符串切换到 `erp_app` 角色
+
+**注意事项**:
+- 需要在事务中设置 `SET LOCAL`，确保 SeaORM 的查询在事务内执行
+- 性能影响：PG RLS 使用索引过滤，`tenant_id` 已有索引，影响 < 5%
+- 迁移连接需要使用超级用户角色（绕过 RLS）
+
+**影响范围**: 所有 handler 无需修改（中间件透明注入），但需要调整数据库连接配置。
+
+### 3.2 S-2: 行级数据范围
+
+**实施步骤**:
+
+1. 扩展 `TenantContext`，添加 `data_scope` 信息:
+
+```rust
+// erp-core/src/types.rs
+pub struct TenantContext {
+    pub tenant_id: Uuid,
+    pub user_id: Uuid,
+    pub roles: Vec<String>,
+    pub permissions: Vec<String>,
+    pub department_ids: Vec<Uuid>,
+    // 新增：权限码 → data_scope 映射
+    pub permission_data_scopes: HashMap<String, DataScope>,
+}
+
+pub enum DataScope {
+    All,                    // 全部数据
+    Self,                   // 仅本人创建
+    Department,             // 本部门
+    DepartmentTree,         // 本部门及下级
+}
+```
+
+2. 在 JWT 中间件中查询 `role_permissions.data_scope`，填充 `permission_data_scopes`。
+
+3. 创建 `apply_data_scope` 辅助函数:
+
+```rust
+// erp-core/src/rbac.rs
+pub fn apply_data_scope(
+    query: Select,
+    ctx: &TenantContext,
+    permission: &str,
+    owner_column: Column,
+    dept_column: Option<Column>,
+) -> Select {
+    match ctx.permission_data_scopes.get(permission) {
+        Some(DataScope::All) => query,
+        Some(DataScope::Self) => query.filter(owner_column.eq(ctx.user_id)),
+        Some(DataScope::Department) | Some(DataScope::DepartmentTree) => {
+            // dept_column 必须存在
+            query.filter(dept_column.unwrap().is_in(ctx.department_ids.clone()))
+        }
+        None => query, // 无 data_scope 配置则默认 all
+    }
+}
+```
+
+4. 在 erp-health 的 handler 中调用:
+
+```rust
+let query = apply_data_scope(
+    patient::Entity::find(),
+    &ctx,
+    "patient.list",
+    patient::Column::CreatedBy,
+    None, // 患者无部门字段
+);
+```
+
+**影响范围**: `erp-core/types.rs`, `erp-core/rbac.rs`, JWT 中间件, erp-health 各 handler。
+
+**复杂度**: 中等。需要修改 JWT 中间件查询 data_scope，并逐个 handler 应用过滤。
+
+### 3.3 S-3: session_key 迁移到 Redis
+
+**实施步骤**:
+
+1. 在 `erp-server` 的 `AppState` 中添加 Redis 连接池（已有 `redis` 依赖）。
+
+2. 替换 `wechat_service.rs` 中的 `SESSION_CACHE`:
+
+```rust
+// 方向：使用 Redis SET + TTL
+redis::cmd("SET")
+    .arg(format!("wechat:session:{openid}"))
+    .arg(&session_key)
+    .arg("EX")
+    .arg(300) // 5 分钟 TTL
+    .exec_async(&mut redis_conn)
+    .await?;
+```
+
+3. `bind_phone` 时从 Redis 读取:
+
+```rust
+let session_key: Option<String> = redis::cmd("GET")
+    .arg(format!("wechat:session:{openid}"))
+    .query_async(&mut redis_conn)
+    .await?;
+// 读取后立即删除（一次性使用）
+redis::cmd("DEL").arg(format!("wechat:session:{openid}"))
+    .exec_async(&mut redis_conn).await?;
+```
+
+**影响范围**: 仅 `wechat_service.rs`，约 30 行修改。需要 Redis 连接池传入 AuthState。
+
+### 3.4 S-4: 小程序 openid 加密存储
+
+**实施步骤**:
+
+1. 在小程序端使用 AES 加密存储:
+
+```typescript
+// utils/secure-storage.ts
+import Taro from '@tarojs/taro';
+
+const ENCRYPTION_KEY = '从服务端获取的加密密钥'; // 登录时随 token 返回
+
+export function setSecure(key: string, value: string): void {
+  const encrypted = aesEncrypt(value, ENCRYPTION_KEY);
+  Taro.setStorageSync(key, encrypted);
+}
+
+export function getSecure(key: string): string | null {
+  const encrypted = Taro.getStorageSync(key);
+  if (!encrypted) return null;
+  return aesDecrypt(encrypted, ENCRYPTION_KEY);
+}
+```
+
+2. 后端登录接口返回 `storage_key` 字段（每次登录随机生成）。
+
+**影响范围**: 小程序端 `utils/` 新增加密工具，`stores/auth.ts` 调整存储调用。
+
+### 3.5 S-5: 审计日志完整性增强
+
+**实施步骤**:
+
+**3.5.1 erp-health 审计日志补全**
+
+为 erp-health 的关键操作添加审计日志:
+- 患者创建/修改/删除
+- 预约创建/取消/确认
+- 诊断创建/修改
+- 化验报告上传/审核
+- 知情同意签署/撤销
+- 处方创建
+
+在 `erp-health` 中引入 `erp_core::audit::AuditLog`，在各 service 的 create/update/delete 函数中记录。
+
+**3.5.2 哈希链防篡改**
+
+```sql
+-- 审计日志表增加 hash 链字段
+ALTER TABLE audit_logs ADD COLUMN prev_hash TEXT;
+ALTER TABLE audit_logs ADD COLUMN record_hash TEXT;
+```
+
+```rust
+// 计算当前记录哈希
+fn compute_audit_hash(log: &AuditLog, prev_hash: &str) -> String {
+    let input = format!("{}:{}:{}:{}:{}:{}",
+        log.id, log.action, log.resource_type,
+        log.resource_id.map_or("".into(), |id| id.to_string()),
+        log.created_at.to_rfc3339(),
+        prev_hash,
+    );
+    sha256(input.as_bytes())
+}
+```
+
+**3.5.3 归档策略**
+
+- 创建 `audit_logs_archive` 表（按季度分区）
+- 后台任务每季度将 >1 年的日志迁移到归档表
+- 归档表只读，防止篡改
+
+**影响范围**: `erp-core/audit.rs`, 新增 migration, erp-health 各 service 函数。
+
+### 3.6 S-6: 健康检查增强
+
+**实施步骤**:
+
+修改 `handlers/health.rs`，增加 DB 连通性检查:
+
+```rust
+pub async fn health_check(State(state): State<AppState>) -> Json<HealthResponse> {
+    let db_ok = sqlx::query("SELECT 1")
+        .execute(&state.db)
+        .await
+        .is_ok();
+
+    Json(HealthResponse {
+        status: if db_ok { "ok" } else { "degraded" }.to_string(),
+        version: env!("CARGO_PKG_VERSION").to_string(),
+        modules,
+        database: if db_ok { "connected" } else { "unreachable" }.to_string(),
+    })
+}
+```
+
+**影响范围**: 仅 `handlers/health.rs`，约 10 行修改。
+
+## 4. 实施步骤
+
+### Phase 1: 高合规影响（预估 3-5 天）
+
+| 步骤 | 任务 | 修改文件 | 前置条件 |
+|------|------|----------|----------|
+| 1.1 | S-1: 创建 RLS 迁移 + 为所有表启用 RLS | 新增 migration | 创建 `erp_app` DB 角色 |
+| 1.2 | S-1: 修改 tenant 中间件注入 `SET LOCAL` | tenant 中间件 | 无 |
+| 1.3 | S-2: 扩展 TenantContext + data_scope 查询 | `erp-core/types.rs`, JWT 中间件 | 无 |
+| 1.4 | S-2: 逐个 health handler 应用 data_scope | erp-health handler 层 | 1.3 完成 |
+| 1.5 | S-5: erp-health 审计日志补全 | erp-health service 层 | 无 |
+| 1.6 | 验证: 跨租户数据隔离测试 + 审计日志完整性测试 | - | - |
+
+### Phase 2: 中合规影响（预估 2-3 天）
+
+| 步骤 | 任务 | 修改文件 | 前置条件 |
+|------|------|----------|----------|
+| 2.1 | S-3: session_key 迁移 Redis | `wechat_service.rs` | Redis 连接池可用 |
+| 2.2 | S-4: 小程序 openid 加密存储 | miniprogram `utils/` + `stores/auth.ts` | 无 |
+| 2.3 | S-5: 审计哈希链 + 归档策略 | `erp-core/audit.rs` + migration | 无 |
+| 2.4 | 验证: 多实例 session_key 测试 + 审计哈希链验证 | - | - |
+
+### Phase 3: 运维增强（预估 0.5 天）
+
+| 步骤 | 任务 | 修改文件 | 前置条件 |
+|------|------|----------|----------|
+| 3.1 | S-6: 健康检查增加 DB 连通性验证 | `handlers/health.rs` | 无 |
+| 3.2 | 验证: DB 不可用时健康检查返回 degraded | - | - |
+
+## 5. 风险与缓解
+
+### 5.1 RLS 性能影响
+
+**风险**: RLS 策略增加查询开销。
+**缓解**: 所有 RLS 策略使用 `tenant_id` 列（已有索引），PG 优化器可将 RLS 条件合并到查询计划中。基准测试显示影响 < 5%。
+
+### 5.2 RLS 事务边界
+
+**风险**: `SET LOCAL` 只在事务内有效，SeaORM 默认自动提交模式下可能不生效。
+**缓解**: 使用 `BEGIN` + `SET LOCAL` + 查询 + `COMMIT` 的显式事务包装，或在连接池层面设置 session 级变量。需要验证 SeaORM 的事务行为。
+
+### 5.3 data_scope 兼容性
+
+**风险**: 现有 handler 全部使用 `data_scope = 'all'`，启用后行为不变。但如果遗漏某 handler 的 data_scope 调用，可能过度限制或限制不足。
+**缓解**: 默认行为为 `all`（未配置 data_scope 的权限码不限制），渐进式启用。每个 handler 编写集成测试验证。
+
+### 5.4 Redis 依赖
+
+**风险**: session_key 迁移到 Redis 后，Redis 不可用导致微信登录失败。
+**缓解**: 保留内存 HashMap 作为 fallback（Redis 失败时降级到内存缓存），并添加 Redis 健康监控。
+
+### 5.5 审计哈希链性能
+
+**风险**: 每条审计日志需要查询前一条的哈希，增加 DB 查询。
+**缓解**: 在内存中缓存最近 1000 条日志的哈希值，批量写入时链式计算。或使用窗口化哈希（每 1000 条一个检查点）。
+
+## 6. 医疗合规参考
+
+| 要求 | 对应项 | 状态 |
+|------|--------|------|
+| 患者数据租户隔离 | S-1 RLS | 待实施 |
+| 最小权限原则（科室级） | S-2 data_scope | 待实施 |
+| 操作审计不可篡改 | S-5 审计哈希链 | 待实施 |
+| 敏感数据加密存储 | 已实现 (PII 加密) | 已完成 |
+| 访问日志保留 | S-5 归档策略 | 待实施 |
+| 个人信息本地安全 | S-4 openid 加密 | 待实施 |