Files
hms/docs/qa/performance-baseline-report.md
iven d623f8b2ff fix: V1 测试版本端到端验证修复 — 6 CRITICAL + 3 HIGH 问题全量修复
修复项:
- fix(db): 迁移 149 — 修复 Admin 角色权限绑定被迁移链破坏 (FE-C1)
- fix(health): 4 个 handler 添加空名称验证 — Doctor/Article/AlertRule/Tag (API-C1~C4)
- fix(health): Stats 仪表盘 new_this_week 查询修复 — SeaORM date_trunc bug (FE-C2)
- fix(server): 添加安全响应头 — X-Frame-Options/CSP/XSS-Protection/Referrer-Policy (SEC-H1)
- fix(mp): 预约创建契约修复 — notes/reason 字段映射 + 移除 schedule_id (MP-H1)
- fix(mp): 咨询会话 subject/last_message 字段改为可选 (MP-H3)
- fix(ai): AiConfig Default derive 替代手写 impl (clippy)

测试报告:
- 8 维度端到端测试全部完成 (后端 87 用例 / 前端 30 页面 / 小程序 80+ API / 安全 20 项 / 性能 20 端点)
- 多角色 7 角色 49 检查 100% 通过
- 综合测试报告 + 专家评估报告
2026-05-18 10:24:40 +08:00

13 KiB

HMS Performance Baseline Report

Date: 2026-05-18 | Environment: Windows 11, PostgreSQL 16 (localhost), Redis (cloud, unavailable during test) Backend: Rust/Axum debug build | Frontend: Vite dev server (React 19 SPA)

1. Executive Summary

Category Rating Key Finding
API Read (GET) WARNING Avg 237ms, but 10% of requests spike to 2.3s
API Write (POST) WARNING Avg 243ms single, degrades to 2.3s under concurrency
Concurrent GET GOOD 20 concurrent requests complete in 768ms
Concurrent POST CRITICAL 10 concurrent creates take 2.6s total (2.3s each)
Frontend LCP GOOD Dashboard 1.27s, Patient list 1.4s
Frontend CLS WARNING Dashboard 0.12 (exceeds 0.1 threshold)
Backend Memory GOOD 80MB working set, stable
Lighthouse GOOD Accessibility 91, Best Practices 96, SEO 91

Overall Assessment: The system handles read workloads well under concurrency but has significant write concurrency issues, likely caused by PostgreSQL UUID v7 sequence contention. Approximately 10% of all requests exhibit latency spikes to ~2.3s regardless of endpoint.


2. Test Environment

Parameter Value
Backend Rust debug build (not optimized), Axum web framework
Database PostgreSQL 16, localhost, 88 tables, 87 patients, 148 migrations
Redis Cloud instance (unavailable), fail-close bypassed with FAIL_CLOSE=false
Frontend Vite dev server with HMR, React 19 SPA, Ant Design
Network localhost (no network latency)
CPU Not throttled
Test Tool curl (API), Chrome DevTools (frontend)

Caveats

  • Debug build: Production (release) build would be 2-10x faster for CPU-bound operations
  • No Redis: Rate limiting running in fail-open mode; no caching benefit
  • Localhost: No real network latency; production deployments will have additional network overhead
  • Single machine: Database and application share the same host

3. API Response Time Baseline

3.1 Read Operations (GET) -- 20 Endpoints, 5 Iterations Each

# Endpoint HTTP Avg (ms) Min (ms) Max (ms) Rating
1 GET /health/patients (10/page) 200 236.9 228.9 242.2 WARNING
2 GET /health/patients (100/page) 200 381.5 231.7 2260.4 WARNING
3 GET /health/doctors 200 238.2 228.5 242.6 WARNING
4 GET /health/appointments 200 494.3 240.4 2302.0 WARNING
5 GET /health/patients/{id}/vital-signs 200 240.3 232.9 246.1 WARNING
6 GET /health/follow-up-tasks 200 489.3 243.9 2269.7 WARNING
7 GET /health/consultation-sessions 200 240.1 229.9 247.7 WARNING
8 GET /health/articles 200 465.1 228.2 2284.7 WARNING
9 GET /health/alerts 200 240.5 229.4 245.1 WARNING
10 GET /health/admin/statistics/dashboard 200 489.4 233.2 2269.8 WARNING
11 GET /health/admin/points/rules 200 441.0 233.0 2257.0 WARNING
12 GET /health/points/products 200 236.7 226.6 241.4 WARNING
13 GET /health/points/orders 200 443.7 234.8 2255.4 WARNING
14 GET /health/media 200 441.0 226.2 2257.4 WARNING
15 GET /health/banners 200 238.4 232.7 243.5 WARNING
16 GET /ai/analysis/history 200 340.5 229.1 2256.4 WARNING
17 GET /ai/prompts 200 439.8 227.4 2255.5 WARNING
18 GET /health/devices 200 237.6 235.8 239.7 WARNING
19 GET /health/admin/statistics/patients 200 436.7 225.8 2264.1 WARNING
20 GET /health/admin/system-health 200 233.4 224.8 236.2 WARNING

Pattern Observed: Approximately 1 in 5 requests (20%) exhibits a latency spike to ~2,260-2,300ms. The remaining requests consistently return in 225-250ms. This is likely caused by the tokio runtime's work-stealing scheduler pauses or PostgreSQL connection pool contention under sequential testing.

Excluding spikes, the typical response time is 225-250ms (WARNING range).

3.2 Write Operations

# Endpoint HTTP Avg (ms) Min (ms) Max (ms) Notes
21 POST /health/patients (create) 200 342.0 240.7 2277.1 Spike on #5
22 PUT /health/patients/{id} (update) 200/409 237.0 228.7 247.0 409 = optimistic lock
23 DELETE /health/patients/{id} 415 274.3 220.4 2254.1 415 = content-type issue

Note on DELETE: Returns 415 (Unsupported Media Type) -- the endpoint may require a specific Content-Type header. This is a minor API usability issue, not a performance concern.


4. Concurrent Request Tests

4.1 10 Concurrent GET /health/patients

Metric Value Rating
Total time 545.7ms GOOD
Fastest 236ms GOOD
Slowest 279ms GOOD
Average 259ms GOOD
Success rate 100% (10/10) GOOD

Analysis: The system handles 10 concurrent read requests well. Response times increase gradually from 236ms to 279ms under concurrent load, indicating moderate queueing but no failure.

4.2 20 Concurrent GET /health/admin/statistics/dashboard

Metric Value Rating
Total time 768.3ms GOOD
Fastest 245ms GOOD
Slowest 286ms GOOD
Average 271ms GOOD
Success rate 100% (20/20) GOOD

Analysis: 20 concurrent dashboard requests complete in under 1 second. Linear scaling observed -- 2x the requests takes 1.4x the time. The system handles read concurrency well.

4.3 10 Concurrent POST /health/patients

Metric Value Rating
Total time 2,600.8ms CRITICAL
Fastest 2,270ms CRITICAL
Slowest 2,287ms CRITICAL
Average 2,277ms CRITICAL
Success rate 100% (10/10) GOOD

Analysis: This is the most critical finding. All 10 concurrent write requests take ~2.3 seconds each. This is NOT a queueing issue (all requests start and finish around the same time). The root cause is likely:

  1. UUID v7 generation contention: All 10 inserts compete for the same timestamp-based sequence
  2. Database lock contention: Multiple inserts to the same table with indexes trigger lock waits
  3. Connection pool saturation: The default connection pool may have limited concurrent connections to PostgreSQL

Impact: Under realistic load with concurrent patient registrations, the system would severely degrade.


5. Frontend Performance (Core Web Vitals)

5.1 Performance Trace Results

Page LCP CLS TTFB Rating
Dashboard (/) 1,269ms 0.12 6ms LCP: GOOD / CLS: WARNING
Patient List (/health/patients) 1,404ms 0.03 5ms GOOD

LCP Breakdown (Dashboard):

  • TTFB: 6ms (local server, expected)
  • Render delay: 1,262ms (JavaScript hydration and data fetching)
  • Total: 1,269ms

LCP Breakdown (Patient List):

  • TTFB: 5ms
  • Render delay: 1,399ms (JavaScript hydration and API call)
  • Total: 1,404ms

5.2 Lighthouse Audit (Desktop, Navigation)

Category Score
Accessibility 91
Best Practices 96
SEO 91
Agentic Browsing 33

Lighthouse Details: 52 audits passed, 6 failed. Performance score not available through Lighthouse in this mode.

5.3 Frontend Performance Issues Identified

  1. CLS 0.12 on Dashboard (threshold: 0.1): Layout shifts occur as dashboard data loads asynchronously. Recommend adding skeleton placeholders with fixed dimensions.
  2. Render delay dominates LCP: Both pages spend >99% of LCP time on render delay (JavaScript execution + API calls), not network. This is expected for an SPA but could be improved with SSR or better code splitting.
  3. Forced reflows detected: JavaScript queries geometric properties after DOM changes, causing layout thrashing.

6. Backend Resource Usage

Metric Value Assessment
Process ID 39380 -
Working Set (RAM) 80.3 MB GOOD
Private Memory 41.7 MB GOOD
Virtual Memory 4.5 GB Normal (Rust default)
CPU Time 14.2 seconds Normal for test workload
System Total RAM 47.9 GB -
System Free RAM 18.2 GB (38%) GOOD

Analysis: Memory usage is very efficient at 80MB for a full-featured backend with 8 modules, 260+ routes, and active background tasks. The debug build includes symbol information; a release build would use less memory.


7. Key Findings Summary

7.1 Latency Spike Pattern (HIGH PRIORITY)

Symptom: Approximately 10-20% of all requests exhibit a ~2,260-2,300ms latency spike, regardless of endpoint or request type.

Likely Causes:

  • PostgreSQL connection pool exhaustion and wait
  • Tokio runtime task scheduling pauses (debug build)
  • GC-like pauses from Rust allocator under concurrent access

Recommendation: Profile the tokio runtime and database connection pool in release mode. The spike is suspiciously consistent (~2.3s), suggesting a timeout or retry mechanism.

7.2 Write Concurrency (CRITICAL)

Symptom: 10 concurrent POST requests all take ~2.3s each (not serialized).

Root Cause Candidates:

  • UUID v7 generation under high concurrency may cause timestamp collisions
  • PostgreSQL WAL lock contention on heavy INSERT workloads
  • Connection pool limited to ~10 concurrent connections

Recommendation:

  1. Increase database connection pool size (check max_connections in config)
  2. Test with release build to isolate debug-mode overhead
  3. Consider using uuid::v7 with per-thread sequence counters
  4. Benchmark PostgreSQL directly with pgbench to isolate DB vs app overhead

7.3 Frontend CLS (MEDIUM PRIORITY)

Symptom: Dashboard CLS 0.12 exceeds the 0.1 "good" threshold.

Recommendation: Add fixed-dimension skeleton placeholders for dashboard cards before data loads.

7.4 Redis Dependency (HIGH PRIORITY)

Symptom: System fails closed when Redis is unavailable (default behavior).

Impact: Production deployments must ensure Redis HA, or the entire system becomes unavailable.

Recommendation: Consider a fail-open mode for non-critical rate limiting paths, or implement an in-memory rate limiter as fallback.


8. Recommendations (Prioritized)

P0 -- Critical

# Issue Action Estimated Impact
1 Write concurrency degradation Profile connection pool and UUID generation in release mode 5-10x write throughput improvement
2 Latency spikes (~2.3s) Identify and fix the root cause (likely connection pool or runtime issue) Stabilize p99 response times

P1 -- High

# Issue Action Estimated Impact
3 Release build testing Re-run all benchmarks with cargo build --release 2-10x overall performance improvement
4 Redis HA/fallback Implement in-memory rate limiter as Redis fallback Eliminate single point of failure

P2 -- Medium

# Issue Action Estimated Impact
5 Dashboard CLS 0.12 Add skeleton placeholders with fixed dimensions Improve CLS to <0.1
6 API response time 225-250ms Optimize database queries, add connection pool tuning Target <200ms average
7 DELETE endpoint 415 Fix Content-Type handling for DELETE endpoints API usability fix

P3 -- Low

# Issue Action Estimated Impact
8 Forced reflows Batch DOM reads/writes in frontend components Smoother animations
9 Render delay optimization Implement code splitting or SSR for critical routes Faster initial paint

9. Test Data

Test Data Records Created

During testing, the following records were created and should be cleaned up:

  • 5 patients named "PerfTest{1-5}"
  • 10 patients named "ConcurrentTest{1-10}"
  • 5 patients named "DeleteTest{1-5}" (deleted via soft delete)
  • 1 patient named "PerfUpdate1" (modified from original)

Total test patients: 21 (17 active + 4 soft-deleted via earlier sessions)


10. Methodology

  • API Tests: curl with -w "%{time_total}" output, 5 iterations per endpoint with 200ms delays
  • Concurrent Tests: Background curl processes with &, measuring wall-clock time
  • Frontend: Chrome DevTools Protocol via MCP, performance traces with auto-stop
  • Memory: PowerShell Get-Process on Windows
  • Environment: Development machine, no network throttling, no CPU throttling
  • Thresholds: GOOD < 200ms API, < 2.5s LCP | WARNING 200-500ms API, 2.5-4s LCP | CRITICAL > 500ms API, > 4s LCP