iven/hms

Files

iven d623f8b2ff fix: V1 测试版本端到端验证修复 — 6 CRITICAL + 3 HIGH 问题全量修复

修复项:
- fix(db): 迁移 149 — 修复 Admin 角色权限绑定被迁移链破坏 (FE-C1)
- fix(health): 4 个 handler 添加空名称验证 — Doctor/Article/AlertRule/Tag (API-C1~C4)
- fix(health): Stats 仪表盘 new_this_week 查询修复 — SeaORM date_trunc bug (FE-C2)
- fix(server): 添加安全响应头 — X-Frame-Options/CSP/XSS-Protection/Referrer-Policy (SEC-H1)
- fix(mp): 预约创建契约修复 — notes/reason 字段映射 + 移除 schedule_id (MP-H1)
- fix(mp): 咨询会话 subject/last_message 字段改为可选 (MP-H3)
- fix(ai): AiConfig Default derive 替代手写 impl (clippy)

测试报告:
- 8 维度端到端测试全部完成 (后端 87 用例 / 前端 30 页面 / 小程序 80+ API / 安全 20 项 / 性能 20 端点)
- 多角色 7 角色 49 检查 100% 通过
- 综合测试报告 + 专家评估报告

2026-05-18 10:24:40 +08:00

13 KiB

Raw Blame History

HMS Performance Baseline Report

Date: 2026-05-18 | Environment: Windows 11, PostgreSQL 16 (localhost), Redis (cloud, unavailable during test) Backend: Rust/Axum debug build | Frontend: Vite dev server (React 19 SPA)

1. Executive Summary

Category	Rating	Key Finding
API Read (GET)	WARNING	Avg 237ms, but 10% of requests spike to 2.3s
API Write (POST)	WARNING	Avg 243ms single, degrades to 2.3s under concurrency
Concurrent GET	GOOD	20 concurrent requests complete in 768ms
Concurrent POST	CRITICAL	10 concurrent creates take 2.6s total (2.3s each)
Frontend LCP	GOOD	Dashboard 1.27s, Patient list 1.4s
Frontend CLS	WARNING	Dashboard 0.12 (exceeds 0.1 threshold)
Backend Memory	GOOD	80MB working set, stable
Lighthouse	GOOD	Accessibility 91, Best Practices 96, SEO 91

Overall Assessment: The system handles read workloads well under concurrency but has significant write concurrency issues, likely caused by PostgreSQL UUID v7 sequence contention. Approximately 10% of all requests exhibit latency spikes to ~2.3s regardless of endpoint.

2. Test Environment

Parameter	Value
Backend	Rust debug build (not optimized), Axum web framework
Database	PostgreSQL 16, localhost, 88 tables, 87 patients, 148 migrations
Redis	Cloud instance (unavailable), fail-close bypassed with FAIL_CLOSE=false
Frontend	Vite dev server with HMR, React 19 SPA, Ant Design
Network	localhost (no network latency)
CPU	Not throttled
Test Tool	curl (API), Chrome DevTools (frontend)

Caveats

Debug build: Production (release) build would be 2-10x faster for CPU-bound operations
No Redis: Rate limiting running in fail-open mode; no caching benefit
Localhost: No real network latency; production deployments will have additional network overhead
Single machine: Database and application share the same host

3. API Response Time Baseline

3.1 Read Operations (GET) -- 20 Endpoints, 5 Iterations Each

#	Endpoint	HTTP	Avg (ms)	Min (ms)	Max (ms)	Rating
1	GET /health/patients (10/page)	200	236.9	228.9	242.2	WARNING
2	GET /health/patients (100/page)	200	381.5	231.7	2260.4	WARNING
3	GET /health/doctors	200	238.2	228.5	242.6	WARNING
4	GET /health/appointments	200	494.3	240.4	2302.0	WARNING
5	GET /health/patients/{id}/vital-signs	200	240.3	232.9	246.1	WARNING
6	GET /health/follow-up-tasks	200	489.3	243.9	2269.7	WARNING
7	GET /health/consultation-sessions	200	240.1	229.9	247.7	WARNING
8	GET /health/articles	200	465.1	228.2	2284.7	WARNING
9	GET /health/alerts	200	240.5	229.4	245.1	WARNING
10	GET /health/admin/statistics/dashboard	200	489.4	233.2	2269.8	WARNING
11	GET /health/admin/points/rules	200	441.0	233.0	2257.0	WARNING
12	GET /health/points/products	200	236.7	226.6	241.4	WARNING
13	GET /health/points/orders	200	443.7	234.8	2255.4	WARNING
14	GET /health/media	200	441.0	226.2	2257.4	WARNING
15	GET /health/banners	200	238.4	232.7	243.5	WARNING
16	GET /ai/analysis/history	200	340.5	229.1	2256.4	WARNING
17	GET /ai/prompts	200	439.8	227.4	2255.5	WARNING
18	GET /health/devices	200	237.6	235.8	239.7	WARNING
19	GET /health/admin/statistics/patients	200	436.7	225.8	2264.1	WARNING
20	GET /health/admin/system-health	200	233.4	224.8	236.2	WARNING

Pattern Observed: Approximately 1 in 5 requests (20%) exhibits a latency spike to ~2,260-2,300ms. The remaining requests consistently return in 225-250ms. This is likely caused by the tokio runtime's work-stealing scheduler pauses or PostgreSQL connection pool contention under sequential testing.

Excluding spikes, the typical response time is 225-250ms (WARNING range).

3.2 Write Operations

#	Endpoint	HTTP	Avg (ms)	Min (ms)	Max (ms)	Notes
21	POST /health/patients (create)	200	342.0	240.7	2277.1	Spike on #5
22	PUT /health/patients/{id} (update)	200/409	237.0	228.7	247.0	409 = optimistic lock
23	DELETE /health/patients/{id}	415	274.3	220.4	2254.1	415 = content-type issue

Note on DELETE: Returns 415 (Unsupported Media Type) -- the endpoint may require a specific Content-Type header. This is a minor API usability issue, not a performance concern.

4. Concurrent Request Tests

4.1 10 Concurrent GET /health/patients

Metric	Value	Rating
Total time	545.7ms	GOOD
Fastest	236ms	GOOD
Slowest	279ms	GOOD
Average	259ms	GOOD
Success rate	100% (10/10)	GOOD

Analysis: The system handles 10 concurrent read requests well. Response times increase gradually from 236ms to 279ms under concurrent load, indicating moderate queueing but no failure.

4.2 20 Concurrent GET /health/admin/statistics/dashboard

Metric	Value	Rating
Total time	768.3ms	GOOD
Fastest	245ms	GOOD
Slowest	286ms	GOOD
Average	271ms	GOOD
Success rate	100% (20/20)	GOOD

Analysis: 20 concurrent dashboard requests complete in under 1 second. Linear scaling observed -- 2x the requests takes 1.4x the time. The system handles read concurrency well.

4.3 10 Concurrent POST /health/patients

Metric	Value	Rating
Total time	2,600.8ms	CRITICAL
Fastest	2,270ms	CRITICAL
Slowest	2,287ms	CRITICAL
Average	2,277ms	CRITICAL
Success rate	100% (10/10)	GOOD

Analysis: This is the most critical finding. All 10 concurrent write requests take ~2.3 seconds each. This is NOT a queueing issue (all requests start and finish around the same time). The root cause is likely:

UUID v7 generation contention: All 10 inserts compete for the same timestamp-based sequence
Database lock contention: Multiple inserts to the same table with indexes trigger lock waits
Connection pool saturation: The default connection pool may have limited concurrent connections to PostgreSQL

Impact: Under realistic load with concurrent patient registrations, the system would severely degrade.

5. Frontend Performance (Core Web Vitals)

5.1 Performance Trace Results

Page	LCP	CLS	TTFB	Rating
Dashboard (/)	1,269ms	0.12	6ms	LCP: GOOD / CLS: WARNING
Patient List (/health/patients)	1,404ms	0.03	5ms	GOOD

LCP Breakdown (Dashboard):

TTFB: 6ms (local server, expected)
Render delay: 1,262ms (JavaScript hydration and data fetching)
Total: 1,269ms

LCP Breakdown (Patient List):

TTFB: 5ms
Render delay: 1,399ms (JavaScript hydration and API call)
Total: 1,404ms

Category	Score
Accessibility	91
Best Practices	96
SEO	91
Agentic Browsing	33

Lighthouse Details: 52 audits passed, 6 failed. Performance score not available through Lighthouse in this mode.

5.3 Frontend Performance Issues Identified

CLS 0.12 on Dashboard (threshold: 0.1): Layout shifts occur as dashboard data loads asynchronously. Recommend adding skeleton placeholders with fixed dimensions.
Render delay dominates LCP: Both pages spend >99% of LCP time on render delay (JavaScript execution + API calls), not network. This is expected for an SPA but could be improved with SSR or better code splitting.
Forced reflows detected: JavaScript queries geometric properties after DOM changes, causing layout thrashing.

6. Backend Resource Usage

Metric	Value	Assessment
Process ID	39380	-
Working Set (RAM)	80.3 MB	GOOD
Private Memory	41.7 MB	GOOD
Virtual Memory	4.5 GB	Normal (Rust default)
CPU Time	14.2 seconds	Normal for test workload
System Total RAM	47.9 GB	-
System Free RAM	18.2 GB (38%)	GOOD

Analysis: Memory usage is very efficient at 80MB for a full-featured backend with 8 modules, 260+ routes, and active background tasks. The debug build includes symbol information; a release build would use less memory.

7. Key Findings Summary

7.1 Latency Spike Pattern (HIGH PRIORITY)

Symptom: Approximately 10-20% of all requests exhibit a ~2,260-2,300ms latency spike, regardless of endpoint or request type.

Likely Causes:

PostgreSQL connection pool exhaustion and wait
Tokio runtime task scheduling pauses (debug build)
GC-like pauses from Rust allocator under concurrent access

Recommendation: Profile the tokio runtime and database connection pool in release mode. The spike is suspiciously consistent (~2.3s), suggesting a timeout or retry mechanism.

7.2 Write Concurrency (CRITICAL)

Symptom: 10 concurrent POST requests all take ~2.3s each (not serialized).

Root Cause Candidates:

UUID v7 generation under high concurrency may cause timestamp collisions
PostgreSQL WAL lock contention on heavy INSERT workloads
Connection pool limited to ~10 concurrent connections

Recommendation:

Increase database connection pool size (check max_connections in config)
Test with release build to isolate debug-mode overhead
Consider using uuid::v7 with per-thread sequence counters
Benchmark PostgreSQL directly with pgbench to isolate DB vs app overhead

7.3 Frontend CLS (MEDIUM PRIORITY)

Symptom: Dashboard CLS 0.12 exceeds the 0.1 "good" threshold.

Recommendation: Add fixed-dimension skeleton placeholders for dashboard cards before data loads.

7.4 Redis Dependency (HIGH PRIORITY)

Symptom: System fails closed when Redis is unavailable (default behavior).

Impact: Production deployments must ensure Redis HA, or the entire system becomes unavailable.

Recommendation: Consider a fail-open mode for non-critical rate limiting paths, or implement an in-memory rate limiter as fallback.

8. Recommendations (Prioritized)

P0 -- Critical

#	Issue	Action	Estimated Impact
1	Write concurrency degradation	Profile connection pool and UUID generation in release mode	5-10x write throughput improvement
2	Latency spikes (~2.3s)	Identify and fix the root cause (likely connection pool or runtime issue)	Stabilize p99 response times

P1 -- High

#	Issue	Action	Estimated Impact
3	Release build testing	Re-run all benchmarks with `cargo build --release`	2-10x overall performance improvement
4	Redis HA/fallback	Implement in-memory rate limiter as Redis fallback	Eliminate single point of failure

P2 -- Medium

#	Issue	Action	Estimated Impact
5	Dashboard CLS 0.12	Add skeleton placeholders with fixed dimensions	Improve CLS to <0.1
6	API response time 225-250ms	Optimize database queries, add connection pool tuning	Target <200ms average
7	DELETE endpoint 415	Fix Content-Type handling for DELETE endpoints	API usability fix

P3 -- Low

#	Issue	Action	Estimated Impact
8	Forced reflows	Batch DOM reads/writes in frontend components	Smoother animations
9	Render delay optimization	Implement code splitting or SSR for critical routes	Faster initial paint

9. Test Data

Test Data Records Created

During testing, the following records were created and should be cleaned up:

5 patients named "PerfTest{1-5}"
10 patients named "ConcurrentTest{1-10}"
5 patients named "DeleteTest{1-5}" (deleted via soft delete)
1 patient named "PerfUpdate1" (modified from original)

Total test patients: 21 (17 active + 4 soft-deleted via earlier sessions)

10. Methodology

API Tests: curl with -w "%{time_total}" output, 5 iterations per endpoint with 200ms delays
Concurrent Tests: Background curl processes with &, measuring wall-clock time
Frontend: Chrome DevTools Protocol via MCP, performance traces with auto-stop
Memory: PowerShell Get-Process on Windows
Environment: Development machine, no network throttling, no CPU throttling
Thresholds: GOOD < 200ms API, < 2.5s LCP | WARNING 200-500ms API, 2.5-4s LCP | CRITICAL > 500ms API, > 4s LCP

13 KiB Raw Blame History