https://github.com/angr/angr Uses a Concolic execution engine where it can switch from running a binary concretely, break, and then define an unknown input and find what should I be to trigger a different breakpoint. - e.g. what should the “password” pointer be pointing to in order to trigger the “you’re in” branch of code.
Note: it still can’t reverse hashes. If you try to reverse md5 using this approach it’ll consume petabytes of RAM.
I think radare2 was looking into integrating with angr but I don’t know the status of the integration.
Any trends in the world of security research or malware analysis that security-minded should be paying attention to?
The concolic execution research is speeding up though slowly. At CGC we showed that automation could find plenty of vulnerabilities of the 90s. At the same time at the end of CGC the best machine was pitted against humans in DEFCON capture the flag and the best machine placed second-to-last. So old school vulnerabilities can now be found automatically, but we also have all-purpose mitigations for them now like no-execute memory pages, stack canaries, and addres-space layout randomization. Once automation is able to reason about those general purpose mitigations we will probably see many zero days in existing code bases. I think that day is about 10 years away.
Thanks for doing this ama!
Without revealing to much, what are your customers or is it pure research based?
A second question, is the code generated vulnerable often because using certain programming languages that have “known” problems or are the problems coming mostly from bad coding habits?
I was associated with this so you can infer clients from there.
Overall - no, even memory-safe languages can let you write vulnerable code. Heck even SQL which is a database query language can have SQL injections. Developers write code to reason over infinite possible data. We can’t reason over infinite data so we use assumptions about it. Vulnerabilities happen when our assumptions can be broken. Theoretically if you formalize all of your assumptions you can have a computer check if those assumptions hold, but then what if you forgot to list an assumption? There are infinite amount of possible assumptions too so even fully formalized approaches can’t help you 100% (though they can make your code a lot more resilient).
Better coding practices essentially help developers manage assumptions better. But what happens if the requirement changed and you didn’t account for old assumptions in the new code? Or what if you’re the new developer and you don’t know what assumptions the code holds? It’s hard. Automation can make it easier, but I doubt it’ll ever be 100% non vulnerable code.
What was your journey like getting into this as a career? What have been some of the toughest challenges you’ve faced as a researcher? Why did you specialise in automated binary analysis?
I think I technically started by trying to cheat in Diablo 1 using cheat’o’matic when I was 12😅 Then I started learning programming, I got an electrical engineering bachelors which got my understanding close to the wires inside of the CPU. Then I got my PhD in engineering with concentration in cyber security. I think my toughest challenge was just she sheer amount of domain-specific research there is in binary analysis. For example preventing stack overflows, SQL injections, cross-site scripting, or unauthorized access - all completely disjoint.
One Darpa PM said that binary analysis feels like using an electron tunneling microscope scanning the whole baseball field and trying to figure out the rules of baseball based of the scans.
I’m an incident responder/malware analyst. Mostly do static analysis and reverse engineering. What would you say the benefit of your research and this binary analysis is compared to other offerings? What do you do about highly obfuscated or ‘benign’ looking binaries that aren’t?
I’m not too sure about the chain of command during incident response. Theoretically this research is going to make finding vulnerabilities and finding attack vectors easier. Once you have the malicious binary (and we solved some problems) you can say “what input caused this malicious binary to call ptrace” and the automation will say “if socket X read ‘write \0\0\0 to stdin of pid 3738’ then the binary eventually will call ptrace”. The analysis is dynamic and works on stripped binaries so generally obfuscation isn’t a concern. Currently the biggest challenge is variable-sized loops where the size is symbolic (as in the path to ptrace depends on the iteration count). The automation needs domain specific knowledge about reasoning over variable sized loops. (Eg the automation needs to be taught how to invert strlen())
deleted by creator
deleted by creator
Uh… I think I agree, but… wrong thread?
Yes, thank you! My screen hiccuped and I don’t know how my comment landed here!