Cisco Talos releases EvidenceForge for correlated synthetic logs
Cisco Talos released EvidenceForge on May 27, 2026. The open-source tool generates causally consistent synthetic security logs across 20+ formats and includes AI-assisted scenario authoring.
Cisco Talos released EvidenceForge on May 27, 2026. The project is open source and available on GitHub. EvidenceForge produces correlated synthetic security logs across more than 20 Windows, Linux and network formats for training, testing and dataset creation.
The tool generates synchronized telemetry and provides ground-truth documentation to help security teams train analysts, validate detections and create labeled datasets. Scenarios produce both an analyst briefing and a detailed ground-truth document with a narrative, timeline and indicators of compromise.
EvidenceForge uses a single canonical SecurityEvent model that carries timestamps and composable context objects such as ProcessContext, NetworkContext, AuthContext, DnsContext and HttpContext. Emitters read only the fields relevant to their format so emitted logs share the same identifiers and timing across sources.
Users describe environments in a YAML scenario file that lists hosts, users and network topology and may include an attack narrative. From that file the engine produces outputs such as Windows Security Events (about 30 event IDs), Sysmon (around 10 event IDs), EDR/XDR telemetry, Linux syslog, bash history, Zeek JSON logs, Snort IDS alerts, firewall logs, web server access logs and forward proxy logs.
Causality is an explicit feature. A composable rule engine auto-generates prerequisite events with realistic timing offsets so dependent evidence appears in expected order; for example, Kerberos ticket exchanges can be generated before domain logons and DNS queries before hostname connections.
Network visibility is modelled in the scenario. Authors can declare sensor placement such as SPAN or TAP ports, monitored segments and direction. The engine emits network logs only where a declared sensor could observe traffic, producing monitoring gaps that reflect the configured topology.
The baseline generator adds background noise including legitimate lateral movement patterns, normal user and application activity, per-user command diversity and benign red herrings. EvidenceForge exposes timing parameters and uses three timing models: a Hawkes self-exciting process for bursty user behavior, a periodic envelope for weekday cycles, and periodic intervals with jitter for scheduled automated events.
Scenario creation is assisted by AI tools that run guided interviews, research tactics and techniques, and produce a validated YAML configuration. After validation a Python script performs schema and cross-reference checks and deterministically emits the correlated logs. Randomness is seeded so repeated runs produce identical output, and the AI can propose fixes for schema errors.
EvidenceForge runs from a command-line interface or a guided conversation workflow. Typical uses include building SOC analyst training programs, testing detection logic against labeled datasets, generating training data for machine learning models and stress-testing SIEM pipelines. Scenarios are version-controllable and shareable.
Cisco Talos notes that purely synthetic datasets will not fully fool expert analysts in every case and that EvidenceForge is intended for training, testing and development rather than as a substitute for production telemetry.








