Google Dorking in Depth

Vaibhav Kumar Srivastava
4 min readFeb 11, 2021

In simple words, if I have to define Dorking, I would say it is just a vulnerability of Google search engine which can be exploited and transformed into a threat. You will be amused to know that this exploitation doesn’t even require any high-tech security tools. I know it can be hard to digest the fact that one of the most sophisticated search engines can also be exploited so easily! The most surprising thing is, Dorking has been documented in the early 2000s and now we are in 2021, still, it is active. If I say that Dorking cannot be eliminated, Unfortunately, I would be true, because this vulnerability itself is one of the most prominent features of the Google search engine.

Now let us dive deeply to understand what Google Dorking is and its impact. You can also visit my channel to get hands-on experience with dorking methods: Codewithvamp. In technical terms, Google Dorking means using some combination of operators or query or specific keywords to perform “advanced search” to dig up the information which may be present on the web but not intended for public viewing.

This could be easy to understand with an example of a website, suppose you own a website that has some pages that you have made public on the internet. Parallelly you have some pages which hold your credentials and configuration required to operate your website, but these pages are not made available for public viewing due to obvious reasons. Now if you haven’t taken any security measures to protect your data there is a high probability that a person with the knowledge of Google Dorking can retrieve your hidden pages just by typing a few combinations of “keywords” in the search bar of Google. Yes! I’m not kidding, it is possible and this method is frequently used as a part of Passive Reconnaissance by hackers. You will get dozens of videos on YouTube to learn the operators of Dorking but hardly anyone explains the reason behind this weird phenomenon. Let us try to understand how Google itself is responsible for this hacking method.

The answer is hidden in the question: How does Google search work? probably you may have noticed earlier, whenever you search any stuff in Google search engine it fetches the results in a fraction of seconds for you. It would be lame to think that Google searches the complete internet every time you hit enter. So how Google managed to fetch the result so fast and accurately?

Well, the two major concepts behind Google search are “Web Crawlers” for linking and gathering information & “Indexing” for faster access to stored targeted data. Web Crawlers are software programs that are made with the intention of information gathering from web pages. It starts by fetching a few web pages, following the available links on that web page, and will keep going until a large heap of relevant data is collected. The term Indexing you may hear in Operating System or probably in Database Management System and it works with the same motivation here too.

Suppose you have searched for some stuff like “vamp”, Instead of searching the whole web Google firstly check its Index table for the keyword “vamp‘ if it’s available it will follow the link available in the table and return the result. the Second case would be “vamp” keyword is not present in the index table then Google will start its Web Crawlers for information gathering and will return the result after saving the entry in the Index table so that next time you will get the keyword directly in the index table.

Now can you try to guess what is the loophole in this whole procedure??

Let me tell you, the Web Crawlers don’t differentiate among the type of data. So whether it’s your web page or your password credentials or maybe your credit card details, Web Crawlers will index each and everything if they find the relevant keyword in it. Because of that Google unintentionally indexes the information which is not intended for public viewing. Therefore with few operators and keywords, you can retrieve classified information.

There is an infamous database already available known as the “Google Hacking database” which stores the popular Dorking strings to test your website. To prevent data from Dorking there have been various steps taken by the government and corporates including the “robots.txt” file which directs the crawler what to index and what not to. Google will also block your connection if you frequently use Dorking with a static IP address. Dorking is legal or not is a different topic to debate. But I can feel the words by McAfee “Go, Dork, yourself !! because Hackers are already Dorking you”.

Stay curious stay Protected !!

Youtube Channel: Codewithvamp

--

--