Q: How does one find vulnerabilities?
A: I'll start by noting that this question is quite high-level - e.g. it doesn't reveal the technology of interest. More importantly, it's not clear whether we're discussing a system vulnerability (i.e. a configuration weakness or a known-but-unpatched bug in an installed service) that one usually looks for during a regular network-wide pentest, or if it's about discovering a previously unknown vulnerability in a an application, service, driver / kernel module, operating system, firmware, etc. Given that I'm more into vulnerability research than penetration testing I'll assume it's the latter. And also, the answer will be as high-level as the question, but should give one a general idea.
My personal pet theory is that there are three* main groups of methods (I'll go in more details below):
* If I missed anything, please let me know in the comments; as said, it's just a pet theory (or actually a pet hypothesis).
1. Code review (this also includes code that had to be reverse-engineered).
2. Black box (this includes using automated tools like scanners, fuzzers, etc).
3. Documentation research.
All of the above methods have a set of requirements and limitations, and are good at one thing or the other. There is no "best method" that always works - it's more target specific I would say. Usually a combination of the above methods is used during a review of a target anyway.
1. Code review
Requirements:
• Knowledge of vulnerability classes** specific to the given technology, as well as universal bug classes.
• [In case of binary targets] Reverse engineering skills.
Benefits:
• Ability to find quite complicated bugs.
Limitations:
• Takes a lot of time and focus.
** - A vulnerability class is a type of a vulnerability that is usually well known, there are known mitigations and sometimes even known patterns/ways/tools to discover it. An example would be e.g. a stack-based buffer overflow, a reflected XSS or an SQL injection. Sometimes security bugs occur due to a couple of problems mixed together (a common example would be an integer overflow into a buffer overflow). Please note that not all vulnerabilities which are being found have actually been classified (as in "classification", not "kept secret") - you might encounter application-specific or technology-specific bugs which don't fall into any common category. For comprehensive lists of classes please check out these three links: Common Weakness Enumeration (MITRE), Adversarial Tactics, Techniques & Common Knowledge (MITRE) and OWASP Periodic Table of Vulnerabilities (special thanks to these folks on twitter and cody for links).
The general idea is to analyze the code and try to pinpoint both errors in logic and classic vulnerability classes (e.g. XSSes, buffer overflows, wrong ACLs, etc). This method is basically as good as the researcher is - i.e. if one loses focus (and just skims through the code instead of trying to truly understand it) or a given vulnerability class is unknown to them, then a bug will be missed.
This method doesn't really scale - the smaller the project, the easier it is to do a full code review. However the more lines of code, the more it's needed to actually limit the review to only interesting sections of code (usually the ones where we expect the most bugs to be, e.g. protocol / file format parsers, usage of user input, etc).
2. Black box
Requirements:
• Knowledge of vulnerability classes specific to the given technology, as well as universal bug classes.
• [Automated black box] Knowledge of how a given tool works and how to set it up.
Benefits:
• [Automated black box] Scales well.
• [Manual black box] If you trigger a bug, you found a bug. The rate of false-positives will be limited (where e.g. during code review you might think you've found a bug, but later discover that something is checked/sanitized/escaped in a lower/upper layer of the code).
Limitations:
• [Automated black box] Scanners/fuzzers are great tools, but they are pretty much limited to low-hanging fruits. Most complicated bugs won't get discovered by them.
• [Manual black box] Takes less time than a code review, but still doesn't scale as well as e.g. automated black box.
• [Manual black box] Limited by the imagination of the researcher (i.e. a lot of complicated bugs will probably not be discovered this way).
The idea here is to "poke" at the application without looking at the code itself, focusing on the interactive side instead. This method focuses on attacking the application from the surface and observing its responses, instead of going through its internals as is the usual approach during a code review.
Manual black box is usually what I would start with when working with a web application, as it gives one a general overview of the tested target. In case of other technologies setting up the application and poking around (checking ACLs, etc) doesn't hurt either.
Automated black box usually scales very well. From my experience, it's good to set up a fuzzer / scanner and let it run in the background while using another (manual) method at the same time. Also, don't forget to review the findings - depending on the technology and the tool there might be a lot of false positives or duplicates. And remember, that these tools are usually limited to only a subset of classes and won't find anything beyond that.
3. Documentation research
Requirements:
• Experience as a programmer or knowledge how a programmer thinks.
• Knowledge of vulnerability classes specific to the given technology, as well as universal bug classes.
Benefits:
• Possibility of discovering the same vulnerability in a set of similar targets.
Limitations:
• Limited by the imagination of the researcher.
• Focuses on a single protocol / format / part of the system.
The general idea is to go through the documentation / specification / reference manual and pinpoint places where the programmer implementing the target might understand something in an incorrect way. In contrast to both code review and black box review there is no interaction with the tested implementation until later phases. To give you an example, please consider the following:
Specification of a format says:
The output codes are of variable length, starting at <code size>+1 bits per code, up to 12 bits per code.
What some programmers might think (e.g. when having a bad day or due to lack of experience):
No need to check the code's length - it's guaranteed to be 12 bits or less.
The correct interpretation:
Verify that the length is 12 or less, otherwise bad things may happen.
And yes, this is a real example from the GIF specification and vulnerabilities related to this were discovered.
Having found an interesting bit one usually creates a set of inputs that break the rule specified in the documentation and then use them to test a given implementation (and perhaps other implementations as well) to check if any problem is triggered (alternatively one can also do a selective code review to check if the programmer got this right).
By the way...
If you'd like to learn SSH in depth, in the second half of January'25 we're running a 6h course - you can find the details at hexarcana.ch/workshops/ssh-course
In some cases it's possible to automate browsing the documentation as was shown e.g. by Mateusz "j00ru" Jurczyk and (a few years later) by James Forshaw.
Final notes
Please remember that once you think you've discovered a vulnerability the work is not yet done. Usually the following steps need to be taken:
0. [Prerequisite] A potential vulnerability must be discovered.
1. [Usually not needed in black box review] An exploit triggering the vulnerability must be created to make sure the code path to the vulnerability is actually reachable (it happens surprisingly often that at this phase it turns out that the vulnerability is not triggerable due to various reasons).
2. A proof of concept exploit (commonly referred to as PoC) must be created to prove the actual effect of the vulnerability (is it really a code execution? maybe due to some reasons its effects are limited to a denial of service?).
3. [Optional] During penetration testing one usually would also create a fully weaponized exploit (spawning a shell, limiting the damage to the process, etc).
And also, please remember that not every bug is a security bug. The rule of thumb here is the following:
• A security bug (i.e. a vulnerability) breaks a context boundary (e.g. is triggerable from low-privileged context, but affects high-privileged context that normally is inaccessible to the attacker).
• A non-security bug stays within the same context (e.g. it influences only the attackers domain, things that the attacker could do anyway or things that affect only the attacker).
One more note is that even security bugs are not always the end of the world (hint: when reporting a bug, don't try to oversell it) - there are several more things that one needs to take into consideration, like the severity (in the worst case scenario, what can a skilled attacker do with such bug?) and risk of exploitation (i.e. will anyone even care to exploit this in real world? or maybe one has to do a billion attempts before hitting the correct conditions and with each attempt the server reboots?). An example of a high-severity high-risk bug is a stable remote code execution in Apache web server. On the other hand an example of a low-severity low-risk bug is changing the language of the UI in a web framework used by 10 people by exploiting an XSRF that requires guessing a 16-bit number (sure, an attacker could do it, but why bother?). A common mistake done by junior researchers (and this includes me in the past as well) is to try to claim high-severity high-risk for every bug, when that's obviously not the case.
And here you go - a high-level answer to a high-level question
In practice start by learning vulnerability classes specific to the technology you're most interested in (and in time extend this to other technologies and universal vulnerability classes as well of course), and then try all of the above methods or a mix of them. And don't be afraid to dedicate A LOT of time to this - even great researchers might go weeks without finding anything (but one still learns through this process), so don't expect to find a good bug in a few hours.
Comments:
Great article. One thing that I am missing here that is connected to code reviews and a bit to automation is using tools that analyzes AST (Abstract-Syntax Tree) of a given program.
I have tested a commercial tool like that on some web apps in my recent job and it turned out it can work pretty well. After it parsed AST it found all the possible inputs - request query params, headers, request body and so on. Then it tracked every place when those were used and then it pointed out e.g. "here is a parameter that is going to SQL query not being parametrized or sanitized".
Of course it had some database of standard ways of doing things right for given web frameworks/libraries e.g. how to make a parametrized SQL query or how to sanitize/validate data before deserializing it or rendering it to the user. One could also add more rules - either for classifying new bugs or to provide information that particular methods are used for sanitization/validation against particular bugs.
I think this is a big place for improvement in terms of compilers or security linters (*looks at Rust language which seems to fix a lot of places where programmer can hurt himself when writing native code*).
Btw an example of such security linter is Bandit - https://github.com/openstack/bandit - not really that robust as I have described above, s this is for Python in which it is much harder to analyse everything that much, but still - good to include in CI builds or use when doing review/audit of the code.
Good point and I agree. I didn't mention any ways to automate code reviews, though these of course exist.
That said, I did intend to keep this answer high-level and not go into specific solutions or tools that can be used for a review of a product using specific technology. Feel free to do it in the comments if you like of course.
Just check TAOSSA's TOC.
A good question - I've added the following explanation to the post:
** - A vulnerability class is a type of a vulnerability that is usually well known, there are known mitigations and sometimes even known patterns/ways/tools to discover it. An example would be e.g. a stack-based buffer overflow, a reflected XSS or an SQL injection. Sometimes security bugs occur due to a couple of problems mixed together (a common example would be an integer overflow into a buffer overflow). Please note that not all vulnerabilities which are being found have actually been classified (as in "classification", not "kept secret") - you might encounter application-specific or technology-specific bugs which don't fall into any common category. For comprehensive lists of classes please check out these three links: Common Weakness Enumeration (MITRE), Adversarial Tactics, Techniques & Common Knowledge (MITRE) and OWASP Periodic Table of Vulnerabilities (special thanks to these folks on twitter and cody for links).
(links are clickable in the post above of course)
What is your go-to tool set for this ?
Personally I use a mix of Chrome dev tools, Fiddler (though burp/zap would work too) and ad-hoc Python scripts.
Ah, I need to translate that old blog post about math...
You can try your luck with google translate: https://gynvael.coldwind.pl/?id=428
But long story short: no, you don't, unless you're doing crypto.
But there are some areas like probability or statistics that are useful (even if only on intuitive level), so that one know that e.g. a brute-force for 128-bits is a no-go, but 32-bits are usually bruteforcable (+being able to calc how long will that actually take).
Add a comment: