No need for an introduction, Google is quite possibly the more powerful search engine used today, even used sometimes to check our connectivity; except that the power of the single search bar on the top of Google has become a source of concern for many, and if not they should and we will see why!
This non-exhaustive list of solutions may possibly help you to protect yourself against search engines and especially against Google, but you must be very careful when handling the way Googlebot (or any other search engine crawler) can see your website to not see your pages disappearing completely from their search engine results!
You have to know that queries on Google are not case sensitive, thus there is no difference between lower or upper cases or even a combination of both: Security, SECURITY and SeCuriTY will return exactly the same result, but this rule has an exception when using logical operators.
Logical operators and symbols
Google can understand three logical operators: AND, NOT and OR, so Google recognizes the "OR" as the operator and "Or", "oR" or "or" as search elements or keywords.
- The AND operator is used to include more than one keyword in a single research query and can be replaced by a single space " " even if the results differ slightly between both, as you can see by looking for example for "reverse AND engineering AND tutorials "and" reverse engineering tutorials"
- The NOT operator is extremely useful and can be used to eliminate some keywords from the result of a query, this operator is equivalent to the sign "-" (less) used within a keyword, to figure out the meaning try searching for "email service" and "email service -marketing" (please note that there is no space between "-" and "marketing")
- The OR operator is used to include in the result of a query a keyword or another keyword but not both, and is equivalent to the use of "|" , eg "reverse OR engineering" means to Google exactly "reverse|engineering" (try it then try "reverse engineering" to see the difference)
In addition to these operators, Google distinguishes between some symbols like ~, +, *,""
- Using the tilde "~"
This little character is used to include in the result of a query the desired keyword, its synonyms and words similar to it, for example, if you search "it security ~tools" the result will be more consistent the result of "it security tools", since Google will consider also terms such as "Software" and show them among the returned result.
- Using the sign plus "+"
Google tends to ignore punctuations and removes little words like "we", "the", 'to", and "of"… Using the sign plus before a word tells Google to include it in the search query, so this way and for instance, the result of this query "security is never complete" will definitely differs from this one "security +is never complete"
- Use of quotation marks "" (or exact phrase search)
If you are sure that you have entered a word as it should be written but Google continues to suggest spelling corrections, or if you want to search for a phrase, quote or an error message … putting your query between quotes marks provides you with a more relevant result, example try searching "Debugging DLLs" with and without quotes.
- Using the asterisk "*" also called wildcard or Joker
The use of the wildcard helps a lot when you want to search something but with one or more missing words (generally used with exact phrase search). For example if you want to find the title of the movie "Get the Gringo" but you are remembering only "Get The" you can try "Get The * movie", try also "the art of *" hacking book"
Now that we know a little more about how the Google search bar interprets what we type in, let's see some more interesting operators and keywords, especially when talking about security!
This query returns the definition of the given word from the most reliable sources (websites). Define:Security
Using Filetype you can find files with specific extensions; this means that you restrict your search to a specific file type. Note that there is no space between filetype: and the following word; eg. We can search for databases backups using "backup filetype:sql"
Regarding this operator, we can say that it has more or less the same role as the one cited above (filetype), except that the use of "ext" to seek uncommon extensions (like dmp, ks, key …) sends a more deep and accurate result.
This keyword allows you to search for a single word or a whole phrase present in the title of web pages and it is a commonly used keyword / operator to find directory listings. For example: intitle:index of "Last modified"
You can also use allintitle:keyword1 keyword2 keyword3 … to find results with all these different elements / keywords in web page titles.
As Intitle and allintitle, Inurl and Allinurl can be used find one or more keywords present in the web pages URLs, this operator is widely used and can provide a lot of sensitive information such as in the case of the use of this query inurl:cgi-bin/etc/
Intext :keyword / Allintext :keyword1 keyword2 keyword3 …
Allintext and intext can search for keywords present in the body of web pages or documents and can be very helpful to find some interesting things like: allintext:"Control Panel" "login"
The use of the keyword site restricts the result to a particular website; specifying the domain, Google filters the result by limiting it to the chosen domain or website. Site:com, site:fr , site:gov … or you can limit your query to a specific website "reverse engineering site:infosecinstitute.com"
Once a website is indexed by Google, there are a lot of chances that it is kept in the Google cache, so we can get some old information even after website's updates or in some cases even if the website is not available anymore:
Info :www .site.com
This query returns links to pages containing information about the website or web page in question. For example info:infosecinstitute.com
Google is not only good at finding stuff, it can even do math!
Until now, there's nothing bad, but we will see that by combining different operator's together, different keywords and knowing exactly what we want to find … the results usually exceed our expectations and especially when we are looking for vulnerabilities or some "private" data. This is conventionally called Google Hacking.
A according to the Wikipedia definition, Google hacking involves using advanced operators in the Google search engine to locate specific strings of text within search results. Some of the more popular examples are finding specific versions of vulnerable web applications. The following search query would locate all web pages that have that particular text contained within them. It is normal for default installations of applications to include their running version in every page they serve, e.g., "Powered by XOOPS 2.2.3 Final".
We will use Google to find files containing user names which is useful for making dictionaries for example. allintext:username filetype:log . Here is a part of a file with more than 2209 rows:
Error Retrieving RSS File: username:picklepeople user_id:7321 rss:http://a*******l.org/feed XML Processing Error: 4Empty document username:inferno user_id:240 rss:http://r*****o.l******n.com/rss/ XML Processing Error: 9Invalid character username:rishey user_id:338 rss:http://feeds.feedburner.com/____dio.xml
And using the same query I found an SQL injection log attack:
2012-08-15 03:48:50 213.xxx.xx.229 cid http://www.h*****.at/index.php?option=com_yelp&controller=showdetail&task=showdetail&cid=-1+UNION+ALL+SELECT+1,2,3,concat(0×26,0×26,0×26,0×25,0×25,0×25,username,0x3a,password,0×25,0×25,0×25,0×26,0×26,0×26),5,6,7,8,9,10,11,12,13,14,15,16,17+FROM+jos_users– 2012-08-21 04:48:01 61.xxx.xxx.72 id http://www.h*****.at/index.php?option=com_recipes&[email protected]&func=detail&id=-1/**/union/**/select/**/0,1,concat(username,0x3a,password),username,0x3a,5,6,7,8,9,10,11,12,0x3a,0x3a,0x3a,username,username,0x3a,0x3a,0x3a,21,0x3a/**/from/**/mos_users/*
Collecting email addresses
allintext:email OR mail +*gmail.com filetype:txt, with this query I was really surprised since the first result was a text file (without talking about the very interesting host found) containing 35,572 email addresses and passwords
Finding sensitive files and directories
intitle:"index of" inurl:ftp (pub OR incoming) intitle:"Index of" phpMyAdmin , intitle:index of inurl:config* intext:last modified intitle:"index of" AND password OR passwd OR pwd intext:"last modified"
All these queries return interesting results; we just need to know what we want to find and how to tell Google to look for it. Example of a result returned by one of these queries:
define("MYSQL_HOST", "mysql106.db.******.***.jp"); define("MYSQL_ID" , "na***o-hoso"); define("MYSQL_PASS", "mJtp2XfG"); define("DBNAME", "na***o-hoso");
Finding error messages (eg finding some websites vulnerable to SQL Injection)
allintext:"Warning: mysql_connect(): Access denied for user: '*@*" "on line" -help -forum -tuto* inurl:"id=" & intext:"Warning: mysql_num_rows()" -help –forum
We can almost find everything we want using Google if we are able enough to sharpen our query. I enjoyed making some queries using different combinations of keywords within different operators, see some of results below:
Full information about some website's customers with their names, addresses, postal codes, cities, phones, mobiles and emails addresses
You can see that things are getting more serious. As you probably guessed, no one escapes the indexation's spiders and crawlers of Google! Google is certainly our common friend, including malicious people with malicious intents, before putting a file, a directory or any other information that's not supposed to be publicl, you should remember checking the state of access to your sensitive files and folders.
The use of an empty index.hml file within a directory can be very useful to remove simple directory listing, think also about applying the correct CHMOD to your sensitive directories and limit or remove access to your uploaded backups.
The use of the file Robots.txt can also save the privacy of your data; you can prevent Google or any other search engine from indexing your website, files or directories by correctly filling a Robots.txt file.
The following tips may help:
Preventing Google from indexing your site:
User-agent: Googlebot Disallow: /
Preventing every search engine from indexing your site:
User-agent: * Disallow: /
You can also prohibit Google from indexing a specific file type:
User-agent: Googlebot Disallow: /*.sql$
To prohibit a directory and all its content from being indexed by Google:
User-agent: Googlebot Disallow: /directoryName/
To prohibit a specific page from being indexed by Google:
User-agent: Googlebot Disallow: /confidential.html
original source: http://resources.infosecinstitute.com