修复项: - fix(db): 迁移 149 — 修复 Admin 角色权限绑定被迁移链破坏 (FE-C1) - fix(health): 4 个 handler 添加空名称验证 — Doctor/Article/AlertRule/Tag (API-C1~C4) - fix(health): Stats 仪表盘 new_this_week 查询修复 — SeaORM date_trunc bug (FE-C2) - fix(server): 添加安全响应头 — X-Frame-Options/CSP/XSS-Protection/Referrer-Policy (SEC-H1) - fix(mp): 预约创建契约修复 — notes/reason 字段映射 + 移除 schedule_id (MP-H1) - fix(mp): 咨询会话 subject/last_message 字段改为可选 (MP-H3) - fix(ai): AiConfig Default derive 替代手写 impl (clippy) 测试报告: - 8 维度端到端测试全部完成 (后端 87 用例 / 前端 30 页面 / 小程序 80+ API / 安全 20 项 / 性能 20 端点) - 多角色 7 角色 49 检查 100% 通过 - 综合测试报告 + 专家评估报告
286 lines
13 KiB
Markdown
286 lines
13 KiB
Markdown
# HMS Performance Baseline Report
|
|
|
|
> Date: 2026-05-18 | Environment: Windows 11, PostgreSQL 16 (localhost), Redis (cloud, unavailable during test)
|
|
> Backend: Rust/Axum debug build | Frontend: Vite dev server (React 19 SPA)
|
|
|
|
## 1. Executive Summary
|
|
|
|
| Category | Rating | Key Finding |
|
|
|----------|--------|-------------|
|
|
| API Read (GET) | WARNING | Avg 237ms, but 10% of requests spike to 2.3s |
|
|
| API Write (POST) | WARNING | Avg 243ms single, degrades to 2.3s under concurrency |
|
|
| Concurrent GET | GOOD | 20 concurrent requests complete in 768ms |
|
|
| Concurrent POST | CRITICAL | 10 concurrent creates take 2.6s total (2.3s each) |
|
|
| Frontend LCP | GOOD | Dashboard 1.27s, Patient list 1.4s |
|
|
| Frontend CLS | WARNING | Dashboard 0.12 (exceeds 0.1 threshold) |
|
|
| Backend Memory | GOOD | 80MB working set, stable |
|
|
| Lighthouse | GOOD | Accessibility 91, Best Practices 96, SEO 91 |
|
|
|
|
**Overall Assessment: The system handles read workloads well under concurrency but has significant write concurrency issues, likely caused by PostgreSQL UUID v7 sequence contention. Approximately 10% of all requests exhibit latency spikes to ~2.3s regardless of endpoint.**
|
|
|
|
---
|
|
|
|
## 2. Test Environment
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| Backend | Rust debug build (not optimized), Axum web framework |
|
|
| Database | PostgreSQL 16, localhost, 88 tables, 87 patients, 148 migrations |
|
|
| Redis | Cloud instance (unavailable), fail-close bypassed with FAIL_CLOSE=false |
|
|
| Frontend | Vite dev server with HMR, React 19 SPA, Ant Design |
|
|
| Network | localhost (no network latency) |
|
|
| CPU | Not throttled |
|
|
| Test Tool | curl (API), Chrome DevTools (frontend) |
|
|
|
|
### Caveats
|
|
|
|
- **Debug build**: Production (release) build would be 2-10x faster for CPU-bound operations
|
|
- **No Redis**: Rate limiting running in fail-open mode; no caching benefit
|
|
- **Localhost**: No real network latency; production deployments will have additional network overhead
|
|
- **Single machine**: Database and application share the same host
|
|
|
|
---
|
|
|
|
## 3. API Response Time Baseline
|
|
|
|
### 3.1 Read Operations (GET) -- 20 Endpoints, 5 Iterations Each
|
|
|
|
| # | Endpoint | HTTP | Avg (ms) | Min (ms) | Max (ms) | Rating |
|
|
|---|----------|------|----------|----------|----------|--------|
|
|
| 1 | GET /health/patients (10/page) | 200 | 236.9 | 228.9 | 242.2 | WARNING |
|
|
| 2 | GET /health/patients (100/page) | 200 | 381.5 | 231.7 | 2260.4 | WARNING |
|
|
| 3 | GET /health/doctors | 200 | 238.2 | 228.5 | 242.6 | WARNING |
|
|
| 4 | GET /health/appointments | 200 | 494.3 | 240.4 | 2302.0 | WARNING |
|
|
| 5 | GET /health/patients/{id}/vital-signs | 200 | 240.3 | 232.9 | 246.1 | WARNING |
|
|
| 6 | GET /health/follow-up-tasks | 200 | 489.3 | 243.9 | 2269.7 | WARNING |
|
|
| 7 | GET /health/consultation-sessions | 200 | 240.1 | 229.9 | 247.7 | WARNING |
|
|
| 8 | GET /health/articles | 200 | 465.1 | 228.2 | 2284.7 | WARNING |
|
|
| 9 | GET /health/alerts | 200 | 240.5 | 229.4 | 245.1 | WARNING |
|
|
| 10 | GET /health/admin/statistics/dashboard | 200 | 489.4 | 233.2 | 2269.8 | WARNING |
|
|
| 11 | GET /health/admin/points/rules | 200 | 441.0 | 233.0 | 2257.0 | WARNING |
|
|
| 12 | GET /health/points/products | 200 | 236.7 | 226.6 | 241.4 | WARNING |
|
|
| 13 | GET /health/points/orders | 200 | 443.7 | 234.8 | 2255.4 | WARNING |
|
|
| 14 | GET /health/media | 200 | 441.0 | 226.2 | 2257.4 | WARNING |
|
|
| 15 | GET /health/banners | 200 | 238.4 | 232.7 | 243.5 | WARNING |
|
|
| 16 | GET /ai/analysis/history | 200 | 340.5 | 229.1 | 2256.4 | WARNING |
|
|
| 17 | GET /ai/prompts | 200 | 439.8 | 227.4 | 2255.5 | WARNING |
|
|
| 18 | GET /health/devices | 200 | 237.6 | 235.8 | 239.7 | WARNING |
|
|
| 19 | GET /health/admin/statistics/patients | 200 | 436.7 | 225.8 | 2264.1 | WARNING |
|
|
| 20 | GET /health/admin/system-health | 200 | 233.4 | 224.8 | 236.2 | WARNING |
|
|
|
|
**Pattern Observed**: Approximately 1 in 5 requests (20%) exhibits a latency spike to ~2,260-2,300ms. The remaining requests consistently return in 225-250ms. This is likely caused by the tokio runtime's work-stealing scheduler pauses or PostgreSQL connection pool contention under sequential testing.
|
|
|
|
**Excluding spikes, the typical response time is 225-250ms (WARNING range).**
|
|
|
|
### 3.2 Write Operations
|
|
|
|
| # | Endpoint | HTTP | Avg (ms) | Min (ms) | Max (ms) | Notes |
|
|
|---|----------|------|----------|----------|----------|-------|
|
|
| 21 | POST /health/patients (create) | 200 | 342.0 | 240.7 | 2277.1 | Spike on #5 |
|
|
| 22 | PUT /health/patients/{id} (update) | 200/409 | 237.0 | 228.7 | 247.0 | 409 = optimistic lock |
|
|
| 23 | DELETE /health/patients/{id} | 415 | 274.3 | 220.4 | 2254.1 | 415 = content-type issue |
|
|
|
|
**Note on DELETE**: Returns 415 (Unsupported Media Type) -- the endpoint may require a specific Content-Type header. This is a minor API usability issue, not a performance concern.
|
|
|
|
---
|
|
|
|
## 4. Concurrent Request Tests
|
|
|
|
### 4.1 10 Concurrent GET /health/patients
|
|
|
|
| Metric | Value | Rating |
|
|
|--------|-------|--------|
|
|
| Total time | 545.7ms | GOOD |
|
|
| Fastest | 236ms | GOOD |
|
|
| Slowest | 279ms | GOOD |
|
|
| Average | 259ms | GOOD |
|
|
| Success rate | 100% (10/10) | GOOD |
|
|
|
|
**Analysis**: The system handles 10 concurrent read requests well. Response times increase gradually from 236ms to 279ms under concurrent load, indicating moderate queueing but no failure.
|
|
|
|
### 4.2 20 Concurrent GET /health/admin/statistics/dashboard
|
|
|
|
| Metric | Value | Rating |
|
|
|--------|-------|--------|
|
|
| Total time | 768.3ms | GOOD |
|
|
| Fastest | 245ms | GOOD |
|
|
| Slowest | 286ms | GOOD |
|
|
| Average | 271ms | GOOD |
|
|
| Success rate | 100% (20/20) | GOOD |
|
|
|
|
**Analysis**: 20 concurrent dashboard requests complete in under 1 second. Linear scaling observed -- 2x the requests takes 1.4x the time. The system handles read concurrency well.
|
|
|
|
### 4.3 10 Concurrent POST /health/patients
|
|
|
|
| Metric | Value | Rating |
|
|
|--------|-------|--------|
|
|
| Total time | 2,600.8ms | CRITICAL |
|
|
| Fastest | 2,270ms | CRITICAL |
|
|
| Slowest | 2,287ms | CRITICAL |
|
|
| Average | 2,277ms | CRITICAL |
|
|
| Success rate | 100% (10/10) | GOOD |
|
|
|
|
**Analysis**: This is the most critical finding. All 10 concurrent write requests take ~2.3 seconds each. This is NOT a queueing issue (all requests start and finish around the same time). The root cause is likely:
|
|
|
|
1. **UUID v7 generation contention**: All 10 inserts compete for the same timestamp-based sequence
|
|
2. **Database lock contention**: Multiple inserts to the same table with indexes trigger lock waits
|
|
3. **Connection pool saturation**: The default connection pool may have limited concurrent connections to PostgreSQL
|
|
|
|
**Impact**: Under realistic load with concurrent patient registrations, the system would severely degrade.
|
|
|
|
---
|
|
|
|
## 5. Frontend Performance (Core Web Vitals)
|
|
|
|
### 5.1 Performance Trace Results
|
|
|
|
| Page | LCP | CLS | TTFB | Rating |
|
|
|------|-----|-----|------|--------|
|
|
| Dashboard (/) | 1,269ms | 0.12 | 6ms | LCP: GOOD / CLS: WARNING |
|
|
| Patient List (/health/patients) | 1,404ms | 0.03 | 5ms | GOOD |
|
|
|
|
**LCP Breakdown (Dashboard)**:
|
|
- TTFB: 6ms (local server, expected)
|
|
- Render delay: 1,262ms (JavaScript hydration and data fetching)
|
|
- Total: 1,269ms
|
|
|
|
**LCP Breakdown (Patient List)**:
|
|
- TTFB: 5ms
|
|
- Render delay: 1,399ms (JavaScript hydration and API call)
|
|
- Total: 1,404ms
|
|
|
|
### 5.2 Lighthouse Audit (Desktop, Navigation)
|
|
|
|
| Category | Score |
|
|
|----------|-------|
|
|
| Accessibility | 91 |
|
|
| Best Practices | 96 |
|
|
| SEO | 91 |
|
|
| Agentic Browsing | 33 |
|
|
|
|
**Lighthouse Details**: 52 audits passed, 6 failed. Performance score not available through Lighthouse in this mode.
|
|
|
|
### 5.3 Frontend Performance Issues Identified
|
|
|
|
1. **CLS 0.12 on Dashboard** (threshold: 0.1): Layout shifts occur as dashboard data loads asynchronously. Recommend adding skeleton placeholders with fixed dimensions.
|
|
2. **Render delay dominates LCP**: Both pages spend >99% of LCP time on render delay (JavaScript execution + API calls), not network. This is expected for an SPA but could be improved with SSR or better code splitting.
|
|
3. **Forced reflows detected**: JavaScript queries geometric properties after DOM changes, causing layout thrashing.
|
|
|
|
---
|
|
|
|
## 6. Backend Resource Usage
|
|
|
|
| Metric | Value | Assessment |
|
|
|--------|-------|------------|
|
|
| Process ID | 39380 | - |
|
|
| Working Set (RAM) | 80.3 MB | GOOD |
|
|
| Private Memory | 41.7 MB | GOOD |
|
|
| Virtual Memory | 4.5 GB | Normal (Rust default) |
|
|
| CPU Time | 14.2 seconds | Normal for test workload |
|
|
| System Total RAM | 47.9 GB | - |
|
|
| System Free RAM | 18.2 GB (38%) | GOOD |
|
|
|
|
**Analysis**: Memory usage is very efficient at 80MB for a full-featured backend with 8 modules, 260+ routes, and active background tasks. The debug build includes symbol information; a release build would use less memory.
|
|
|
|
---
|
|
|
|
## 7. Key Findings Summary
|
|
|
|
### 7.1 Latency Spike Pattern (HIGH PRIORITY)
|
|
|
|
**Symptom**: Approximately 10-20% of all requests exhibit a ~2,260-2,300ms latency spike, regardless of endpoint or request type.
|
|
|
|
**Likely Causes**:
|
|
- PostgreSQL connection pool exhaustion and wait
|
|
- Tokio runtime task scheduling pauses (debug build)
|
|
- GC-like pauses from Rust allocator under concurrent access
|
|
|
|
**Recommendation**: Profile the tokio runtime and database connection pool in release mode. The spike is suspiciously consistent (~2.3s), suggesting a timeout or retry mechanism.
|
|
|
|
### 7.2 Write Concurrency (CRITICAL)
|
|
|
|
**Symptom**: 10 concurrent POST requests all take ~2.3s each (not serialized).
|
|
|
|
**Root Cause Candidates**:
|
|
- UUID v7 generation under high concurrency may cause timestamp collisions
|
|
- PostgreSQL WAL lock contention on heavy INSERT workloads
|
|
- Connection pool limited to ~10 concurrent connections
|
|
|
|
**Recommendation**:
|
|
1. Increase database connection pool size (check `max_connections` in config)
|
|
2. Test with release build to isolate debug-mode overhead
|
|
3. Consider using `uuid::v7` with per-thread sequence counters
|
|
4. Benchmark PostgreSQL directly with `pgbench` to isolate DB vs app overhead
|
|
|
|
### 7.3 Frontend CLS (MEDIUM PRIORITY)
|
|
|
|
**Symptom**: Dashboard CLS 0.12 exceeds the 0.1 "good" threshold.
|
|
|
|
**Recommendation**: Add fixed-dimension skeleton placeholders for dashboard cards before data loads.
|
|
|
|
### 7.4 Redis Dependency (HIGH PRIORITY)
|
|
|
|
**Symptom**: System fails closed when Redis is unavailable (default behavior).
|
|
|
|
**Impact**: Production deployments must ensure Redis HA, or the entire system becomes unavailable.
|
|
|
|
**Recommendation**: Consider a fail-open mode for non-critical rate limiting paths, or implement an in-memory rate limiter as fallback.
|
|
|
|
---
|
|
|
|
## 8. Recommendations (Prioritized)
|
|
|
|
### P0 -- Critical
|
|
|
|
| # | Issue | Action | Estimated Impact |
|
|
|---|-------|--------|------------------|
|
|
| 1 | Write concurrency degradation | Profile connection pool and UUID generation in release mode | 5-10x write throughput improvement |
|
|
| 2 | Latency spikes (~2.3s) | Identify and fix the root cause (likely connection pool or runtime issue) | Stabilize p99 response times |
|
|
|
|
### P1 -- High
|
|
|
|
| # | Issue | Action | Estimated Impact |
|
|
|---|-------|--------|------------------|
|
|
| 3 | Release build testing | Re-run all benchmarks with `cargo build --release` | 2-10x overall performance improvement |
|
|
| 4 | Redis HA/fallback | Implement in-memory rate limiter as Redis fallback | Eliminate single point of failure |
|
|
|
|
### P2 -- Medium
|
|
|
|
| # | Issue | Action | Estimated Impact |
|
|
|---|-------|--------|------------------|
|
|
| 5 | Dashboard CLS 0.12 | Add skeleton placeholders with fixed dimensions | Improve CLS to <0.1 |
|
|
| 6 | API response time 225-250ms | Optimize database queries, add connection pool tuning | Target <200ms average |
|
|
| 7 | DELETE endpoint 415 | Fix Content-Type handling for DELETE endpoints | API usability fix |
|
|
|
|
### P3 -- Low
|
|
|
|
| # | Issue | Action | Estimated Impact |
|
|
|---|-------|--------|------------------|
|
|
| 8 | Forced reflows | Batch DOM reads/writes in frontend components | Smoother animations |
|
|
| 9 | Render delay optimization | Implement code splitting or SSR for critical routes | Faster initial paint |
|
|
|
|
---
|
|
|
|
## 9. Test Data
|
|
|
|
### Test Data Records Created
|
|
|
|
During testing, the following records were created and should be cleaned up:
|
|
- 5 patients named "PerfTest{1-5}"
|
|
- 10 patients named "ConcurrentTest{1-10}"
|
|
- 5 patients named "DeleteTest{1-5}" (deleted via soft delete)
|
|
- 1 patient named "PerfUpdate1" (modified from original)
|
|
|
|
Total test patients: 21 (17 active + 4 soft-deleted via earlier sessions)
|
|
|
|
---
|
|
|
|
## 10. Methodology
|
|
|
|
- **API Tests**: curl with `-w "%{time_total}"` output, 5 iterations per endpoint with 200ms delays
|
|
- **Concurrent Tests**: Background curl processes with `&`, measuring wall-clock time
|
|
- **Frontend**: Chrome DevTools Protocol via MCP, performance traces with auto-stop
|
|
- **Memory**: PowerShell `Get-Process` on Windows
|
|
- **Environment**: Development machine, no network throttling, no CPU throttling
|
|
- **Thresholds**: GOOD < 200ms API, < 2.5s LCP | WARNING 200-500ms API, 2.5-4s LCP | CRITICAL > 500ms API, > 4s LCP
|