Deductive Engine: Human Inspired Taint Reasoning (OAIC)

This is a transcription for the talk **Deductive Engine - Human Inspired Taint Reasoning on Offensive AI Con (OAIC), fantastic conference and great talks! (one of my favorite venue - Oceanside, San Diego - sunshine. you can see the beach and hear seagull right on the balcony).

A little abstraction: We purposed a reimagine taint analysis system built upon reverse-engineering deductive reasoning, counter-intuitive philosophy and human heuristics from real-world bug finding; and tell a story of how deductive reasoning became the engine behind conceptualizing a bespoke dataflow system, and enabling AIs in autonomous grey-boxed vulnerability discovery in binaries.

This slide is bit towards to the under-the-hood, philosophical, angle of Deductive Engine; and why it’s purposed and architected in such way. Throwing technicals aspects away, I wanted Deductive Engine to tell a story of conceptualizing, architecting a “thinking” system that’s native for LLMs for security research.

April last year, I have found Backward Taint Propagation - this surprising intuitive but logical (even bit philosophical) methodological system borrowed from SASTs, seems perfectly follows what we’re doing in our head during bug finding. It intrigues me how this system of interpretation can also be interpreted from a cognitive angle of “finding out” and have much to deal with the essences of programming itself (program ontology). In terms on papers, they say contamination - but just essentially association of program slices by relationships (data dependency); and how you can “find” a bug by connecting two “tainted” node is essentially implicit-ing this specific program relationship between a “source” and a “sink” to find out’s “sink”’s dependency on the given “source” (much like gradient in back propagation for neural networks, I recently realized);Or borrowing dataflow philosophy, what’s interesting is you can interpret every slice of code in program as it playing a role to the program’s data pipeline: “source” as entry-points, “intermediate” as **modifiers / divided (to-branch) to change, and “sink” as where data meets and do a action of the given information - programs are essentially big boxes of I/O; Or programs can be a big box of restrictions, or sequences (controls)

And that’s when I decided this system is the direction I want to focus on, little bit of philosophical (program ontology) good for a over-thinker like me, and little bit of cognitive perspective good because I always over-reflect myself. And these abstractions is exactly what made this little system of “Backward Taint Propagation” such playful and just intriguing to me. (I can talk about it all day), and why it’s possibly capable of serving the fundamentals a problem I care about a lot: ML with Program Analysis (where vulnerability research is type of task among).

I am happy to bring Boschko, who’s not only a fantastic AI, binary security researcher, but also a great friend of mine to this journey of finding out and think. We met pretty interesting too, few years ago finding bugs in Router firmwares, not just the nicest but also brilliant.

So here’s our talk! we encourage you to make comments (via Notion) or reach out about anything you have questions on or you just find interesting. As a little teaser, Deductive Engine ends-up finding 12 CVEs in a grey-boxed router binary in one day of benchmarking before the conference, and two overflows in UniTree Robotics BLE Protocol at the first day of the conference! I hope our talk help you and reach out!

Ruikai: Hey folks, before we start, we just wanted to say that we're really happy and honored today speaking at Offensive AI Con. I've got to admit that even after my Black Hat presentation, I still get so nervous talking about anything on stage.

My name is Ruikai. I'm a 16-year-old high school sophomore. I've been working on binary exploitation, the innermost workings of computers, since I was 11. I've focused on AI/ML security with binary exploitation for the past three years. I'm currently founding a company, Pwno, focused on AI/ML with low-level security for the innermost workings of computers.

Today, I'm here to introduce one of our most interesting works, Deductive Engine: Human Insight, Thinking, Reasoning. This evolved from a project we worked on a year prior to this, where deduction, the superpower that we have since birth, became the key.

All of this starts with one question: If I delete your AST, CFG, sanitizer database, and symbol tables, can we still find bugs? Of course we can, because we're human. This talk is about teaching LLMs to do the same, or even better, by stealing our own cognitive playbook.

Ruikai: Tree-of-AST was a project that we worked on a year prior for Black Hat, which is also where most of the philosophy of the Deductive Engine came from. It's a project where we put Google Deepmind's Tree-of-Thought strategic decision-making process for LLMs into security, under Taint Analysis.

Taint Analysis is the security research methodology that most closely resembles how we find bugs at a cognitive level. Specifically, under the roof of Taint Pruning, we treat vulnerability research as an exploration problem where we generate AST or CFG as the position trees and deliberately explore them to connect the sync to source. By exploring around this project, we ran into some other very interesting questions and a deeper understanding of how this combination of Taint Analysis and LLM can be interesting.

Ruikai: During the R&D process of Tree-of-ASTs, by asking ourselves how the Dataflow Engine works (since we have to upgrade our own Dataflow Engine in the R&D process), it led us to the discovery of simple general heuristics by taking a counterintuitive perspective. In the meantime, by asking ourselves how to apply ASTs, we found this secured research methodology that we as secured researchers use naturally and intuitively. This methodology includes pages of explanation on how it can be analytically advantageous, even for LLMs.