406 lines
6.7 KiB
Markdown
406 lines
6.7 KiB
Markdown
# 🔍 故障排除指南
|
||
|
||
## 常见问题
|
||
|
||
### 1. 数据库连接失败
|
||
|
||
**问题**:`failed to connect to database`
|
||
|
||
**解决方案**:
|
||
|
||
```bash
|
||
# 检查数据库配置
|
||
cat config.yaml | grep -A 10 database
|
||
|
||
# 测试数据库连接
|
||
psql -h localhost -U postgres -d tyapi_dev
|
||
|
||
# 检查环境变量
|
||
env | grep DB_
|
||
```
|
||
|
||
### 2. Redis 连接失败
|
||
|
||
**问题**:`failed to connect to redis`
|
||
|
||
**解决方案**:
|
||
|
||
```bash
|
||
# 检查Redis状态
|
||
redis-cli ping
|
||
|
||
# 检查配置
|
||
cat config.yaml | grep -A 5 redis
|
||
|
||
# 重启Redis
|
||
docker restart tyapi-redis
|
||
```
|
||
|
||
### 3. JWT 令牌验证失败
|
||
|
||
**问题**:`invalid token`
|
||
|
||
**解决方案**:
|
||
|
||
```bash
|
||
# 检查JWT密钥配置
|
||
echo $JWT_SECRET
|
||
|
||
# 验证令牌格式
|
||
echo "your-token" | cut -d. -f2 | base64 -d
|
||
```
|
||
|
||
### 4. 内存使用过高
|
||
|
||
**问题**:应用内存占用持续增长
|
||
|
||
**解决方案**:
|
||
|
||
```bash
|
||
# 启用pprof分析
|
||
go tool pprof http://localhost:8080/debug/pprof/heap
|
||
|
||
# 检查Goroutine泄露
|
||
go tool pprof http://localhost:8080/debug/pprof/goroutine
|
||
|
||
# 优化数据库连接池
|
||
# 在config.yaml中调整max_open_conns和max_idle_conns
|
||
```
|
||
|
||
### 5. 端口冲突
|
||
|
||
**问题**:`bind: address already in use`
|
||
|
||
**解决方案**:
|
||
|
||
```bash
|
||
# 查找占用端口的进程
|
||
netstat -tlnp | grep :8080
|
||
lsof -i :8080
|
||
|
||
# 终止占用端口的进程
|
||
kill -9 <PID>
|
||
|
||
# 修改配置使用其他端口
|
||
```
|
||
|
||
### 6. 权限问题
|
||
|
||
**问题**:`permission denied`
|
||
|
||
**解决方案**:
|
||
|
||
```bash
|
||
# 检查文件权限
|
||
ls -la config.yaml
|
||
ls -la logs/
|
||
|
||
# 修复权限
|
||
chmod 644 config.yaml
|
||
chmod 755 logs/
|
||
chown -R $(whoami) logs/
|
||
```
|
||
|
||
## 日志分析
|
||
|
||
### 1. 应用日志
|
||
|
||
```bash
|
||
# 查看应用日志
|
||
tail -f logs/app.log
|
||
|
||
# 过滤错误日志
|
||
grep "ERROR" logs/app.log
|
||
|
||
# 分析请求延迟
|
||
grep "request_duration" logs/app.log | awk '{print $NF}' | sort -n
|
||
```
|
||
|
||
### 2. 数据库日志
|
||
|
||
```bash
|
||
# PostgreSQL日志
|
||
docker logs tyapi-postgres 2>&1 | grep ERROR
|
||
|
||
# 慢查询分析
|
||
grep "duration:" logs/postgresql.log | awk '$3 > 1000'
|
||
```
|
||
|
||
### 3. 性能监控
|
||
|
||
```bash
|
||
# 查看系统指标
|
||
curl http://localhost:8080/metrics
|
||
|
||
# Prometheus查询示例
|
||
# HTTP请求QPS
|
||
rate(http_requests_total[5m])
|
||
|
||
# 平均响应时间
|
||
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])
|
||
```
|
||
|
||
## 容器相关问题
|
||
|
||
### 1. 容器启动失败
|
||
|
||
```bash
|
||
# 查看容器状态
|
||
docker-compose ps
|
||
|
||
# 查看容器日志
|
||
docker-compose logs <service_name>
|
||
|
||
# 重新构建镜像
|
||
docker-compose build --no-cache
|
||
```
|
||
|
||
### 2. 网络连接问题
|
||
|
||
```bash
|
||
# 检查网络配置
|
||
docker network ls
|
||
docker network inspect tyapi-network
|
||
|
||
# 测试容器间连接
|
||
docker exec -it tyapi-server ping postgres
|
||
```
|
||
|
||
### 3. 数据持久化问题
|
||
|
||
```bash
|
||
# 检查数据卷
|
||
docker volume ls
|
||
docker volume inspect postgres_data
|
||
|
||
# 备份数据
|
||
docker exec tyapi-postgres pg_dump -U postgres tyapi_dev > backup.sql
|
||
```
|
||
|
||
## 性能问题
|
||
|
||
### 1. 响应时间过长
|
||
|
||
**诊断步骤**:
|
||
|
||
```bash
|
||
# 启用详细日志
|
||
export LOG_LEVEL=debug
|
||
|
||
# 分析慢查询
|
||
grep "slow query" logs/app.log
|
||
|
||
# 检查数据库索引
|
||
psql -h localhost -U postgres -d tyapi_dev -c "\di"
|
||
```
|
||
|
||
### 2. 内存泄漏
|
||
|
||
**诊断步骤**:
|
||
|
||
```bash
|
||
# 监控内存使用
|
||
top -p $(pgrep tyapi-server)
|
||
|
||
# 生成内存分析报告
|
||
go tool pprof -http=:6060 http://localhost:8080/debug/pprof/heap
|
||
```
|
||
|
||
### 3. 高 CPU 使用率
|
||
|
||
**诊断步骤**:
|
||
|
||
```bash
|
||
# CPU性能分析
|
||
go tool pprof http://localhost:8080/debug/pprof/profile
|
||
|
||
# 检查系统负载
|
||
uptime
|
||
iostat 1 5
|
||
```
|
||
|
||
## 开发环境问题
|
||
|
||
### 1. 开发服务器问题
|
||
|
||
```bash
|
||
# 停止当前开发服务器
|
||
Ctrl+C
|
||
|
||
# 重新启动开发服务器
|
||
make dev
|
||
|
||
# 检查Go模块状态
|
||
go mod tidy
|
||
go mod download
|
||
```
|
||
|
||
### 2. 测试失败
|
||
|
||
```bash
|
||
# 运行特定测试
|
||
go test -v ./internal/domains/user/...
|
||
|
||
# 清理测试缓存
|
||
go clean -testcache
|
||
|
||
# 运行集成测试
|
||
go test -tags=integration ./test/...
|
||
```
|
||
|
||
## 生产环境问题
|
||
|
||
### 1. 健康检查失败
|
||
|
||
```bash
|
||
# 手动测试健康检查
|
||
curl -f http://localhost:8080/api/v1/health
|
||
|
||
# 检查依赖服务
|
||
curl -f http://localhost:8080/api/v1/health/ready
|
||
|
||
# 查看详细错误
|
||
curl -v http://localhost:8080/api/v1/health
|
||
```
|
||
|
||
### 2. 负载均衡问题
|
||
|
||
```bash
|
||
# 检查上游服务器状态
|
||
nginx -t
|
||
systemctl status nginx
|
||
|
||
# 查看负载均衡日志
|
||
tail -f /var/log/nginx/access.log
|
||
tail -f /var/log/nginx/error.log
|
||
```
|
||
|
||
### 3. 证书问题
|
||
|
||
```bash
|
||
# 检查SSL证书
|
||
openssl x509 -in /etc/ssl/certs/server.crt -text -noout
|
||
|
||
# 验证证书有效期
|
||
openssl x509 -in /etc/ssl/certs/server.crt -checkend 86400
|
||
|
||
# 测试HTTPS连接
|
||
curl -I https://api.yourdomain.com
|
||
```
|
||
|
||
## 调试工具
|
||
|
||
### 1. 日志查看工具
|
||
|
||
```bash
|
||
# 实时查看日志
|
||
journalctl -u tyapi-server -f
|
||
|
||
# 过滤特定级别日志
|
||
journalctl -u tyapi-server -p err
|
||
|
||
# 按时间范围查看日志
|
||
journalctl -u tyapi-server --since "2024-01-01 00:00:00"
|
||
```
|
||
|
||
### 2. 网络调试
|
||
|
||
```bash
|
||
# 检查端口监听
|
||
ss -tlnp | grep :8080
|
||
|
||
# 网络连接测试
|
||
telnet localhost 8080
|
||
nc -zv localhost 8080
|
||
|
||
# DNS解析测试
|
||
nslookup api.yourdomain.com
|
||
dig api.yourdomain.com
|
||
```
|
||
|
||
### 3. 数据库调试
|
||
|
||
```bash
|
||
# 连接数据库
|
||
psql -h localhost -U postgres -d tyapi_dev
|
||
|
||
# 查看活动连接
|
||
SELECT * FROM pg_stat_activity;
|
||
|
||
# 查看慢查询
|
||
SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 5;
|
||
```
|
||
|
||
## 紧急响应流程
|
||
|
||
### 1. 服务宕机
|
||
|
||
1. **快速恢复**:
|
||
|
||
```bash
|
||
# 重启服务
|
||
systemctl restart tyapi-server
|
||
|
||
# 或使用Docker
|
||
docker-compose restart tyapi-server
|
||
```
|
||
|
||
2. **回滚部署**:
|
||
|
||
```bash
|
||
# K8s回滚
|
||
kubectl rollout undo deployment/tyapi-server
|
||
|
||
# Docker回滚
|
||
docker-compose down
|
||
docker-compose up -d --scale tyapi-server=3
|
||
```
|
||
|
||
### 2. 数据库问题
|
||
|
||
1. **主从切换**:
|
||
|
||
```bash
|
||
# 提升从库为主库
|
||
sudo -u postgres /usr/lib/postgresql/13/bin/pg_promote -D /var/lib/postgresql/13/main
|
||
```
|
||
|
||
2. **数据恢复**:
|
||
```bash
|
||
# 从备份恢复
|
||
psql -h localhost -U postgres -d tyapi_dev < backup_latest.sql
|
||
```
|
||
|
||
### 3. 联系支持
|
||
|
||
当遇到无法解决的问题时:
|
||
|
||
1. 收集错误信息和日志
|
||
2. 记录重现步骤
|
||
3. 准备系统环境信息
|
||
4. 联系技术支持团队
|
||
|
||
**支持信息收集脚本**:
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
echo "=== TYAPI Server Debug Info ===" > debug_info.txt
|
||
echo "Date: $(date)" >> debug_info.txt
|
||
echo "Version: $(cat VERSION 2>/dev/null || echo 'unknown')" >> debug_info.txt
|
||
echo "" >> debug_info.txt
|
||
|
||
echo "=== System Info ===" >> debug_info.txt
|
||
uname -a >> debug_info.txt
|
||
echo "" >> debug_info.txt
|
||
|
||
echo "=== Docker Status ===" >> debug_info.txt
|
||
docker-compose ps >> debug_info.txt
|
||
echo "" >> debug_info.txt
|
||
|
||
echo "=== Recent Logs ===" >> debug_info.txt
|
||
tail -50 logs/app.log >> debug_info.txt
|
||
echo "" >> debug_info.txt
|
||
|
||
echo "Debug info collected in debug_info.txt"
|
||
```
|