1. Introduction:

    1. Straight into takeaway that audience will be expected to walk out the room of (Bit reveal on what this talk will be about) : The way to put AIs in security research (vulnerability discovery, exploit development) is observing from the lowest-level of how we do this task in a humanic, anthropic or even bit insanely philisophical way, this changes and makes (new) a lot of things.

    2. Into the topic; Idea from Black Hat Project

      1. Tree-of-AST, where we sort of put Tree-of-Thoughts's thought and decision making process for the task of "Game of 24" of LLMs and bring it on the of dataflow analysis (why we choose dataflow's pathfinding problem - most similar way of how we find bugs in a cogntive level) - where (in Tree-of-AST) we suprisingly find this task is a better fit for reasoning model proposed from Tree-of-Thought's original task (the sort of stateful decision making with context relevant). (Instead of generate new thoughts, Tree-of-AST are based on computated states from codes and data structures (AST-based analysis))

        However in that computation, we notice it is one step away from being a generic LLM security research methodology: dependency of context-based computation. However, for Deductive Engine, we discovered (inducted) a way without depency of programic computation - cage that been limiting papers on LLM in vulnerability research (e.g. LATTE from Tsinghua) to be generic enough for every piece of software; environment-agnostic, from fully white box to binary level with stripped symbols.

    3. Open with a paper (just like how I did in Black Hat), but this time for past approach: LATTE

      1. LATTE: Find bugs without predefined dataflow and sinks; similar to the problem that Tree-of-AST is solving

        1. They jump right into binary-level greybox information lacked scenarios (without predefined dataflow input / sinks) and use LLMs (as in a medium or explicit way) to make up context for the purpose of dataflow analysis

          they predefined them by looking at the code, we starting at easy-identifiable point-of-interest and dynamically path-finds to prove connectivity

      2. The improvement of this simple concept (the innovation hides in this simple goal) - of making up for the context enough for us to apply for computational approaches. *(by here I meant SAST way.

        1. MCP, tool calling
      3. In scenarios lacking information (e.g., binary), we use LLMs to recovery / make-up for these context until a degree enough we can us a traditional SAST to "compute for bugs") - had became so efficient, and pretty much there's nothing much you can improve talk upon it (no room for novelty)

      4. At the end of the day, this depency of "compute for bugs" (In the case of Tree-of-AST computers needs to see AST, to compute) is a cage (or a pain-in-the-ass) that no-one is able to escape or not talk about (or talk about not to)

        1. In other words, even through we're just talking about LLMs with traditional SAST, but effciency of security research LLMs can't escape this box of SAST already

        We directly replace the "compute" for computers with LLMs

        1. Because these methodology are rooted on these traditional SAST framework as LLM as a role of assistant (why we're trapped in this cage of dependency) - just like how it's set as the default role for ChatGPT when talking to you (even through they're just predicting the next word)
    4. Here by the first ingenious,intriguing part of the main ideology of Deductive Engine: replace the part where's usually done by these programic analysis, computers, by LLMs - Instead only a role of assistant of LLMs, we entirely put LLMs, heuristics programing, leading the process, even

      spoiler: the purpose is to complelely turn the process hueristically-lead just as human in the task, even though lossing bit of computational advantages but the efficiency lead by this hueristically-lead exploration (how we did it) will manifest in scale

      1. This sounds crazy, insane, but we will dive right into just in a seconds.
  2. Let's jump right into a completely different thing, even bit philosophic - Deductive reasoning.

    1. Not sure if you even heard, read, watched "Sherlock Holmes" before, I loved that show too much when I was young, at a degree I even bought a set of autopsy tools that Sherlock used.
      1. One concept that really intruiged me, the is "Deductive" reasoning.
        1. Sherlock uses "Deduction" to reason out who is the murder

        2. The word is attrative in a mysterious way. Because if you translate it into Chinese, a color of "expository", or "acting" is given literally. (If you literally approach it's meaning in Chinese, it means "finding through acting")

        3. What is Deduction? [Insert overly long and hard-to-understand definition...] - In another word, a fundamental way for us to learn (know new) through sequential, vertical inferential chains-of-nodes (nodes) called premises (dependent information). A fundamental way of how we think.

          1. From general to particular.

          Cognition logical tool that we're given since birth other than "heuristic reasoning" (inspirational, intuitive)

        4. e.g., murders: "Silver Blaze" from Sherlock Holmes. classic syllogism

          1. Major Premise: Dogs bark at strangers
          2. Minor Premise: The dog did not bark that night
          3. Conclusion: The visitor was not a stranger (an insider committed the crime)

          We're explaining the epistomological system of "reasoning" everything before here.

        5. This such simple, intuative (in a degree that we're not even aware we're using) way of reasoning that we're given since birth, is how pretty much 80% of cases we seen in crime podcasts, movies and series are solve, with bit of help with of other cognitive tools of our problem, and ~.

        6. spoiler: we're leading the center-of-weight of the talk into congitive science, and sequencially prove why dataflow analysis is the most "cognitive" a.k.a. human way of reasoning to prove it's approachability, since dataflow analysis's "sink-to-source" analysis is eventually "Abductive reasoning", the reverse way of "deductive reasoning"

      2. Pivot into abdutive reasoning -> misunderstanding -> potential
      3. But why are we even talking about the meta-cognition level, the epistomological system behind of how we reason and make sense? You will see why just in a minutes. For now, let's jump into security research and bug bounties.
  3. Security research, bug hunting, and dataflow technique from us (human).

    All you need is wheels (tools), a frame (thoughts framework) and a engine (reasonable brain) to find every-single possible bugs on this world.

    1. Whats specifically interesting for us behind all these complex, from Chromium browser exploitation to the Apple 0click exploitations that we seen recently - the primitive (not the memory primitives) for the "finding" process (how we do these research) are theoretically simple.
    2. e.g., looking at how we find bugs, for example, in Linux kernel (fuzzing aside), althought the research methodology framework might differs, all they need is a code browser. Why? As we induced, security research back then can be generalized into such specific heuristics
      1. We start with one specific entry point:
        1. can be a seemly dangerous functions, sinks, (e.g., kfree, memcpy, copy_from_user...)
        2. or a point-of-interest they concluded from, for example, analysing diff files or types of bugs they saw previously.
      2. We then approach these entry with a specific approach :
        1. e.g., Taint analysis (sink-to-source), manually connect sink with source considering existance of sanitizers (may block between), if proven connectivity, reachability is proven (while exploitability was proven already by the existance of sinks)
        2. mutation analysis, explore the difference of seeing similarities of previous vulnerability to learn if this is a exploitable variant of previous discovered vulnerability.
      3. Validate if the point-of-interest and plan works:
        1. for taint analysis (reverse propagation), as we mentioned, use connectivity between source-of-taint (sinks) with source-of-data
        2. e.g., if program slice A3 didn't sanitize / inspect well enough just like previous vulnernability discovered component B-2, with other similarity in traits that defined previous discovered vulnerable component B, we can prove that component A is a variant (B') of previous vulnerable component B,
        3. Or just the simplest ASAN output that tells you a heap overflow exists here.
    3. the following steps ii and iii are where things start to gets trickier - where we're exposed to the complex relationships of what makes a vulnerability a vulnerability - it's exploitability and reachability - might start to seem problematic, but looking back into how we did security research, the answer lies right in at the only tool we used (analytically) - from stack-overflows in CTFs to Sandbox escape RCEs in Chromium - code browsers, the two tools we have direct access on VSCode
      1. Within code navigation, two tools are specifically highlighted with this need - the only two symbolic tools - find definition and cross referencing as logical reasoning and the ability to analysis relationships were required - since these code navigations functions as logical tools to reason (by finding inter-abstractions relationships) Definition and Reference is All You Need

        Concept that a successful sink-to-source analysis can be conducted with and only with two fundamental symbolic navigational tools (definition & reference) with heuristically guidance (when to use which, when to stop). We explained the part of the "wheels"

  4. (quick break and the implications on larger scope and answer previous question) What can be achieved by these two tools is just interesting. What it's saying is there might be an angle of putting LLMs into security that we haven't seen before, and can be particularly interesting (just as we mentioned previously) - other that directly throwing chunks of codes into LLMs, or playing the role of assistant beside SASTs - using LLMs to replace the role of SAST under the SAST purposed methodology system (designed for machines) - can be unimaginabiliy simple and effective.

    Just like the attention mechanism for Attention is All You Need

    1. (back to main narritive): giving bananas a a set code navigation tools doesn't make it able to find heap-overflows in Apple Airplay. These tools would still needs reason-able "operator" behind on where to start, when to stop, when to call "find definitions" or "find references". And this is what we called the fundamental heuristics.

      1. "It is said that to explain is to explain away. This maxim is nowhere so well fulfilled as in the area of computer programming, especially in what is called heuristic programming and artificial intelligence" - a quote I really like from Joseph Weizenbaum, creator of first chatbot "Eliza", while this explains the purpose and the destiny of this talk ("to explain is to explain away."), I like the parallelized relationship between "Heuristic Programming" and "Artifical Intelligence" (Putting the equal sign in between)
    2. These heuristics, are the Engine behind that set of Wheels of Definition and Reference, also the determinating characteristics of which level of bug a one can finds.

      Two set of "heuristics" or "engine" - "Parameter-first heuristic search" to generate, explore (limit, prune to program slices with relevant, and "Abductive reasoning" (reverse-engineered way of deduction, thoughts) for pathfinding and strategic decisions making

    3. How to build the engine behind these wheels is the million dollar question, it's only the engine that tells the difference of two "car" (maybe bad analogy here) - similar to how we discovered code navigation tools are all that human needs - we derive the research from ourselves and look for answers (from deeper cognitive, lower-level).

      1. [5:50] *Finding the rules behind it, about why we as human do "this" specifically, what's effected the way of how we do this in this specificed way; sort of finding where that heuristics, pattern, inspirations; where that thought about when to use which tool came from - we might not even need the entire human brain but only the heuristics that's driving this ahead
      2. And this led us back to what we're most familar of - taint analysis. Which is the methodology that resembles how a human mind acts in security research the most - human are dataflow engine
        1. There's tons of philisophical reasoning on why is this is specifically, regarding how we interpret data - and this is what's interesting - how we interpret taint flow? How we make that decision (heuristics driving behind).

        2. And what's also interesting is, once you put that equal sign between human and taint engine, you find that there's something taint engine has that's fun, counter-intuitive

          1. In order design a turing-complete (as in identifing all branches way) taint engine heuristically - You will have to put your focus right in the parameters, but rather methods (e.g., for "callers" in taint engines, the right way should be finding parameter initive-er, rather the common function/method initive-er) - learnt by how we did it
          2. Which is in reverse to the OOP philosophy develop since the first line of cpp code.
        3. In the lowest level of taint framework, it's data and "potential context" (reference BH talk) that's put emphasis on, instead of the top-down executive flow, thus everything is in reverse, just like the relationship between deduction (dataflow) and abductive reasoning (taint) (planning the seed of abductive reasoning).

          Interesting note: it's funny that the title of this talk is "deductive engine" even though it's more like "abductive engine", this was a kudo for Arthur Conan Doyle mis-usage of "deduction" for Sherlock Holmes's instead "induction" reasoning that Sherlock been oftenly using, we guessed Conan Doyle did this on purpose for the Cult of Science in the Victorian Era, but we also just sticked with deductive engine because we can't change it.

          1. What this means is when backtracing (abducting), when we trace the caller we trace the initive of the variable in current namespace... (We have a set of heuristics already designed for this back when perparing BH)
          2. The effective of this approach was proven by experimenting with cases of backtracing scenarios during implementation stage of Tree-of-AST. Traditional top-down heuristics was only catch around 1/4 of evenly distributed cases (equally different), while such abductive reasoning was able to catch all. (turing complete in such identifing all branches way as mentioned previously)
        4. This of heuristics under abductive reasoning - (reversed deduction) under the framework of taint analysis tells us; covers pretty much every heuristic needed for driving the wheels of find definition and cross-reference - we got it by reverse-engineering ourselves.

          1. These heuristics is all we need, instead actually getting LLMs (complex reasoning, next-token prediction machine trained on...) involved.
  5. But this is the first layer of the engine, this engine allow us to generate possible states, explore the entire program without the dependency issue that we mentioned is the very beginning of this talk - but it doesn't mean we're successful, not yet.

    1. Just like the Google Deepmind paper that I love to reference from our black hat talk, we have only solved the problem of generating stateful graphs through exploration - but the rest is easier, decision making, and path-finding.
    2. [based on these heuristically generated paths, we use abducting to pathfind, finish the security research heuristics we mentioned previously at kernel security research i.e. TBC...]
  6. This is Deductive Engine, and only place that we got inspirations is from human ourselves, we look into how we did it, and we find out you don't need a completely functioning human brain to did it; all you need is a set of heuristics are enough.

  7. Based on the premise we purposed (Definition and Reference is All You Need, Parameter-first heuristic search....), we deduct our way to this Deductive Engine. A environment-agnostic, generic security research framework - instead of taking the role of assistanct, we merge LLMs right into the job that's supposely done by programs, SASTs, inducted from the cognitive level from us ourselves doing security research - enabled AIs to explore much more efficiently on what we failed to see from a set of systems entirely created by us, and doing it from such lowest-level - just like the Attention mechanism - from simple heuristics derives from how we work, with one tool that we're born with - deductive reasoning


Thoughts: