Phase 01
Scenario Setup
- Define personalized usage scenarios
- Plant auditable private assets (canary tokens)
- Configure tool privileges and memory stores
Personalized Agent Security Evaluation from Specification
§01 · Overview
Pax evaluates the security posture of personalized LLM-based agents against established attack primitives under real-world deployment conditions. Each instance pairs a personalized scenario with adversarial payloads and auditable private assets (canary tokens), requiring models to resist prompt injection, tool-return deception, and memory poisoning across long-horizon interactions. Two models (GLM-5 and Nova-2-Lite) are evaluated through a four-stage pipeline: scenario setup, attack injection, execution tracing, and automated adjudication. Three attack categories are evaluated: Indirect Prompt Injection (IPI) via carrier files, Memory Credential Extraction, and Tool-Return Deception (TRD) via poisoned tool responses.
§02 · Key metrics
Attack Instances
Attack Types
GLM-5 Success Rate
72%
Nova-2-Lite Success Rate
51%
Difficulty Levels
§03 · Pipeline
Four stages turn a specification into a security verdict.
Phase 01
Phase 02
Phase 03
Phase 04
§04 · Results
GLM-5 achieves ASR 0.722. Nova-2-Lite achieves ASR 0.511.
GLM-5 succeeds on 80% IPI, 90% MEM, 30% TRD. Nova-2-Lite succeeds on 40% IPI, 60% MEM, 10% TRD.
30 scenarios · 3 attack types · GLM-5 41% more vulnerable — at 4× the cost.
§05 · Dataset Viewer
| # | Name | Type | Technique | GLM-5 | Nova |
|---|
§06 · Model comparison
2 models evaluated across 30 instances covering 3 attack types (IPI, MEM, TRD). 30 unique attack techniques targeting 10 credential types and 6 tool surfaces. GLM-5 is 41% more vulnerable overall.
GLM-5
Nova-2-Lite
§07 · Methodology
Four principles govern the Pax evaluation framework.
Evaluation framework
Principle 01
Principle 02
Principle 03
Principle 04
§08 · Resources