Business & Technology

Advanced Crawling: A Crucial Step in Dynamic Web Application Security Testing

By Jasper Libranda Manuel | September 11, 2022

In the early days of the internet, most websites were simple static websites. Protecting these websites from cyber attacks was easier. Most people didn’t care much about security when visiting these websites anyway since these websites were not handling much sensitive information.

Today, when surfing online, there seems to be a web application for almost everything. From e-commerce or banking and finance, to real estate, healthcare, and entertainment. You name it and you will most likely find a web application for it. And the more we use these web applications to handle our personal and business information, the more attractive it is for attackers to try stealing the valuable data and credentials that reside in these applications.

From the early basic static websites to today’s complex web applications, web development has come a long way and has transformed into a technological marvel. Dynamic web applications are now powered by stacks of complex technologies to handle huge amounts of data. This has brought about complexity in proactively securing these modern applications and each layer means more potential attack surface for cyber adversaries to target. Because of this digital transformation and web application evolution, automated and dynamic web application security testing tools, like FortiPenTest, are more relevant than ever. FortiPenTest is built with the unique needs of dynamic web application security testing in mind.

What is FortiPenTest?

FortiPenTest is a cloud native penetration-testing-as-a-service tool. It can be used to find issues before they are exploited by cyber adversaries. FortiPenTest leverages extensive FortiGuard Labs threat research and data to test target systems for security vulnerabilities.

The Role of a Crawler in Web Application Security Testing

Crawlers are basically used to systematically browse a target website or application to search for documents and resources. While crawlers, sometimes called spiders, are more commonly known to be used by search engines for Web Indexing, crawlers in the web security testing context, are used to find potential attack vectors.

Each URL in a web application can be a potential entry point to launch attacks that take advantage of underlying system weaknesses. And for this reason, a good crawler should be able to find as many URLs as possible in a web application to minimize blind spots in the security testing. Imagine if there’s only one URL in a web application that has vulnerabilities, but the crawler of a penetration testing tool is not able to find it, then this pentest tool becomes useless even if it has the best vulnerability coverage.

"The FortiPenTest crawler plays the crucial role in making sure that all possible entry points of attacks are known. With its advanced crawling capability, all branches and paths in a web application can be accurately scanned. "

The Need for an Advanced Crawler

Today, JavaScript-heavy web applications are becoming more and more prevalent. Single Page Applications, for example, are the popular choice for building web applications. They offer great user experience in the browser with no page reloading. They can load new information into a single page upon user request instead of loading other pages. The dynamic nature of the content of an SPA makes it difficult for crawlers that use old technologies to properly crawl these applications.

Grepping for links from flat static files is a fundamental technique in web crawling. It is still very useful even today. However, there might be cases where a document page of a dynamic web application may not contain URLs or links to other pages at all. Instead, JavaScript event handlers may have been added to events of this page’s elements. Therefore, crawlers that rely on this grepping technique alone are missing a big part of the attack surface and thus, tools using these crawlers are also potentially missing vulnerabilities.

For example, in the below figure, the button on the left menu will bring us to the path /flooranalytics when clicked. When inspecting the source of the dynamically loaded content, nowhere in the source you can find a link to /flooranalytics path.

Fig 1. Inspecting the source of the “Floor Analytics” button element

This is because an event handler has been added to the “Floor Analytics” button element that generates the path to visit when the button is clicked. What if we grep links from the JavaScript file instead? That might work but not always. Most web applications nowadays minify and obfuscate their JavaScript. The purpose of doing this is not to conceal the code but to optimize the code and make it lighter. A good crawler should be able to identify these events to get the paths they generate when triggered.

Another issue for web crawlers is crawling duplicate pages. Duplicate pages might make a poorly designed crawler go into infinite loop. An example of this scenario is when a web application has different language versions (English, Chinese, Japanese, etc.). One version will have the same paths as the other versions so crawling the other versions is unnecessary. Also, redirections to a “page not found” page may cause this issue. Some web applications still return HTTP status code 200 and show a “page not found” page instead of returning HTTP status code 404 when a path does not exist. When a crawler has a list of words for brute force to find hidden paths, most of the paths generated from this word list don’t exist. A crawler should be able to detect that the loaded page for a non-existent path is just the same page loaded for the other, so it won’t be crawled again.

Speaking of redirections, sometimes, the crawler also needs to report all paths in multiple redirections. There are many ways to do redirections and sometimes, crawlers that just use HTTP libraries with follow redirection option turned on does not capture all paths.

Lastly, although there are actually more difficulties in creating a good crawler, a crawler should be able to somehow discover web API endpoints. In an SPA, most of the traffic is made by calls to web API endpoints and so these are very important points to test for weaknesses. A target web application may not always have an API definition file and if it does, sometimes, it is not complete. A good crawler needs to somehow discover web API endpoints and determine their parameters even if no API definition is provided.

FortiPenTest Advanced Crawler

The FortiPenTest crawler is built to address all the above issues and more. You don’t have to worry if an application requires authentication as FortiPenTest can be easily configured to fully access protected web applications. With this, the crawler can do authenticated crawling.

FortiPentest crawler simulates a real user interaction with the browser on web application pages – by clicking on a button, selecting an item in a drop-down menu, entering text in a form on a page. It makes sure that it gets all the necessary information on every possible URL path that can be tested for weaknesses by analyzing the output of all these interactions.

Evolving Your Web Application Security Testing with FortiPenTest

Web development has come a long way. From basic static websites, modern web applications are now JavaScript-heavy and dynamic which prevents them from being fully scanned by security scanners.

FortiPenTest is built with the complexities in scanning modern applications in mind. The crawling, and the actual scanning methods which FortiPenTest uses, behave just like how real users, penetration testers, and hackers would interact with the target web application.

The FortiPenTest crawler plays a crucial role in making sure that all possible entry points of attacks are known. With its advanced crawling capability, all branches and paths in a web application can be accurately scanned.  

Start doing Dynamic Web Application Security Testing now with FortiPenTest to see if web applications are vulnerable to attacks and to know what to do about it.