Phase 01
Scenario Setup
- Define personalized usage scenarios
- Plant auditable private assets (canary tokens)
- Configure tool privileges and memory stores
Personalized Agent Security Evaluation from Specification
§01 · Overview
Surtor evaluates the security posture of personalized LLM-based agents against established attack primitives under real-world deployment conditions. Each instance pairs a personalized scenario with adversarial payloads and auditable private assets (canary tokens), requiring models to resist prompt injection, tool-return deception, and memory poisoning across long-horizon interactions. Two models (GLM-5 and Nova-2-Lite) are evaluated through a four-stage pipeline: scenario setup, attack injection, execution tracing, and automated adjudication. Two attack categories are evaluated: Indirect Prompt Injection (IPI) via carrier files and Memory Credential Extraction.
§02 · Key metrics
Attack Instances
Attack Types
IPI Scenarios
Memory Scenarios
§03 · Pipeline
Four stages turn a specification into a security verdict.
Phase 01
Phase 02
Phase 03
Phase 04
§04 · Results
GLM-5 achieves full attack success on 17/20 instances (85% ASR). Nova-2-Lite achieves 12/20 (60% ASR).
GLM-5 leaks credentials in 9 of 10 memory extraction scenarios. Nova-2-Lite leaks in 6 of 10. IPI attacks score 24/30 vs 18/30 respectively.
GLM-5 costs ~$0.06/instance avg. Nova-2-Lite costs ~$0.01/instance — 5x cheaper but less effective.
Attack success metrics across attack types (Fig. 1), response rates (Fig. 2), memory extraction (Fig. 3), and per-instance detail (Fig. 4). See §05 for per-scenario data.
§05 · Dataset Viewer
| # | Name | Type | Technique | GLM-5 | Nova |
|---|
§06 · Model comparison
Head-to-head breakdown of both evaluated models on the Surtor dataset.
GLM-5
Nova-2-Lite
§07 · Methodology
Four principles govern the Surtor evaluation framework.
Evaluation framework
Principle 01
Principle 02
Principle 03
Principle 04
§08 · Resources