Advanced Search Advanced
Search
Help Help

Gigablast Help

Table of Contents

  1. Search Query Syntax
  2. Business Services Technical Support
    1. XML Search Feed
    2. Web Search
    3. Site Search
    4. Custom Topic Search

Query Syntax

Standard Query Description
cat dog Search results have the word cat and the word dog in them. Some preference is given to results that also have the phrase cat dog.
cat .. dog Use two periods to separate concepts. Like above, but no weight is given to the phrase cat dog.
mp3 "take five" Search results have the word mp3 and the phrase take file in them.
"john smith" -"bob dole" Search results have the phrase john smith but NOT the phrase bob dole in them.
bmx -game Search results have the word bmx but not game.
john | smith Query refinement Gigablast performs a search for john then another search is done on just those results for smith. So all results have both john and smith but they are scored solely by smith.
suburl:edu title:university Search results have university in their title and edu in their url.
site:www.ibm.com "big blue" Search results are from www.ibm.com and have the phrase big blue in them.
url:www.yahoo.com Search result must be nothing or the page http://www.yahoo.com
title:"the news" -"weather report" Search results have the phrase the news in their title, and do NOT have the phrase weather report anywhere in their content.
ip:216.32.120 cars Search results have the ip 216.32.120.* and have the word cars in their content.
link:www.yahoo.com Serach results link to http://www.yahoo.com/
-link:www.yahoo.com/clubs/ clubs Search results do NOT link to http://www.yahoo.com/clubs/ and have the word clubs in their content.
type:pdf nutrition Search results are PDF(Portable Document Format) documents that contain the word nutrition
type:doc Search results are Microsoft Word documents.
type:xls Search results are Microsoft Excel documents.
type:ppt Search results are Microsoft Power Point documents.
type:ps Search results are Postscript documents.
type:text Search results are plain text documents.
type:pdf Search results are PDF documents.
 
Boolean Query Description
Note: boolean operators must be in UPPER CASE.
Note: boolean queries CAN NOT have minus signs.
cat AND dog Search results have the word cat AND the word dog in them.
cat OR dog Search results have the word cat OR the word dog in them, but preference is given to the results that have both words.
cat dog OR pig Search results have the two words cat and dog OR search results have the word pig, but preference is given to results that have all three words. This illustrates how the individual words of one operand are all required for that operand to be true.
"cat dog" OR pig Search results have the phrase "cat dog" in them OR they have the word pig, but preference is given to the results that have both.
title:"cat dog" OR pig Search results have the phrase "cat dog" in their title OR they have the word pig, but preference is given to results with both.
cat OR dog OR pig Search results need only have one word, cat or dog or pig, but preference is given to results that have the most of the words.
cat OR dog AND pig Search results have dog and pig, but they may or may not have cat. Preference is given to results that have all three. To evaluate expressions with more than two operands, as in this case where we have three, you can divide the expression up into sub-expressions that consist of only one operator each. In this case we would have the following two sub-expressions: cat OR dog and dog AND pig. Then, for the original expression to be true, at least one of the sub-expressions that have an OR operator must be true, and, in addition, all of the sub-expressions that have AND operators must be true. Using this logic you can evaluate expressions with more than one boolean operator.
cat AND NOT dog Search results have cat but do not have dog.
cat AND NOT (dog OR pig) Search results have cat but do not have dog and do not have pig. When evaluating a boolean expression that contains ()'s you can evaluate the sub-expression in the ()'s first. So if a document has dog or it has pig or it has both, then the expression, (dog OR pig) would be true. So you could, in this case, substitute true for that expression to get the following: cat AND NOT (true) = cat AND false = false. Does anyone actually read this far?
(cat OR dog) AND NOT (cat AND dog) Search results have cat or dog but not both.
left-operand OPERATOR right-operand This is the general format of a boolean expression. The possible operators are: OR and AND. The operands can themselves be boolean expressions and can be optionally enclosed in parentheses. A NOT operator can optionally precede the left or the right operand.

XML Search Feed

The Input
To get search results from Gigablast use a url like:
http://feed.gigablast.com/search?q=test&sc=0&dr=0&raw=8&nrt=11"
where:
n=X
returns X search results. Default is 10. Max is 50.
s=X
returns results starting at result #X. The first result #0. Default is 0. Max is 499.
ns=X
returns X summary excerpts in the summary of each search result.
site=X
returned results will have URLs from the site, X.
sites=X
returned results will have URLs from the space-separated list of sites, X. X can be up to 500 sites. A site can include sub folders. This allows you to build a Custom Topic Search Engine.
plus=X
returned results will have all words in X. Like a default AND.
minus=X
returned results will not have any words in X.
rat=1
returned results will have ALL query terms. This is also known as a default and search. rat means Require All Terms.
sc=X
X can be 0 or 1 to respectively disable or enable site clustering. Default is 1, but 0 if the raw parameter is used.
dr=X
X can be 0 or 1 to respectively disable or enable duplicate result removal. Default is 1, but 0 if the raw parameter is used.
psc=X
X ranges from 0 to 100 and is the 'percent similar cutoff' such that a search result that is X% similar to a search result about it will be hidden from view. psc is only valid when dr is set to 1 (see above). If psc is 100 then only documents that are exactly alike are deduped. Default is 80, but 0 if the raw parameter is used.
raw=X
X ranges from 0 to 9 to specify the format of the search results. raw=8 requests the XML feed. raw=9 requests XML feed in utf8.
raw=2
Just display a list of docids between <pre> tags. Will display one extra docid than requested if possible, so you know if you have more docids available or not. Does not have to generate summaries so it is a bit faster, especially if you do not perform site clustering or dup removal.
qh=X
X can be 0 or 1 to respectively disable or enable highlighting of query terms in the titles and summaries. Defaut is 1, but 0 if the raw parameter is used.
bq=X
X can be 0 or 1 or 2. 0 means the query is NOT boolean, 1 means the query is boolean and 2 means auto-detect. Default is 2.
dt=X
X is a space-separated string of meta tag names. Do not forget to url-encode the spaces to +'s or %%20's. Gigablast will extract the contents of these specified meta tags out of the pages listed in the search results and display that content after each summary. i.e. &dt=description will display the meta description of each search result &dt=description:32+keywords:64 will display the meta description and meta keywords of each search result and limit the fields to 32 and 64 characters respectively. When used in an XML feed the <display name="meta_tag_name">meta_tag_content</> XML tag will be used to convey each requested meta tag's content.
spell=X
X can be 0 or 1 to respectively disable or enable spell checking. If enabled while using the XML feed, when Gigablast finds a spelling recommendation it will be included in the XML tag. Default is 0 if using an XML feed, 1 otherwise.
nrt=X
X is the maximum number of related topics, also known as GigaBits, to be displayed.
dsrt=X
X is the number of results that will be used to generate the Gigabits. A default of 30 is appropriate.

Site Clustering

It is often undesirable to have many results listed from the same site. Site Clustering will essentially limit the number of returned results from any given site or two, but it will provide a link which says "more results from this site" in case the searcher wishes it.

Duplicate Results Removal

When dup results removal is enabled Gigablast will remove results that have the exact same content as other results. The psc parameter can be used to dedup documents with similar content.

Cached Web Page Parameters

To get a cached web page from Gigablast use a url like: http://www.gigablast.com/get?d=94555390410&ih=1&q=my+query where:

d=X
X is the docid of the page you want returned. DocIds are 64-bit, so you'll need 8 bytes to hold one. DocIds can be harvested from the XML seach feed output.
ih=X
X is 1 to include the Gigablast header in the returned page, and 0 to exclude it.
ibh=X
X is 1 to include the Gigablast BASE HREF tag in the cached page. The default is 1.
q=X
X is the query that, when present, will cause Gigablast to highlight the query terms on the returned page.
cas=X
X can be 0 or 1 to respectively disable or enable click and scroll. Default is 1.
strip=X
X can be 0, 1 or 2. If X is 0 then no stripping is performed. If X is 1 then image and other tags are removed. An X of 2 is another from of removing tags. Default is 0.

The Output

Gigablast allows you to receive the search results in a number of formats useful for interfacing to your program. Here is an example of the XML feed.

The XML reply has the following format(but without the comments):


# The XML reply uses the Latin-1 Character Set (ISO 8859-1) when using raw=8
<?xml version="1.0" encoding="ISO-8859-1" ?>

# OR when using raw=9
<?xml version="1.0" encoding="utf-8" ?>

# It consists of one, and only one, response.
<response>

  # If any error was received in processing the request, it will be here.
  <error>Out of memory</error>
  # The numeric code of the error, if any, goes here.
  # See all the Error Codes, but the   # following errors are most likely:
  # 32771 - A cached page was not found when it should have been.
  #    12 - There was a shortage of memory to properly process the request.
  # 32863 - Queried collection does not exist.
  <errno>32790</errno>

  # Total number of documents in the collection being searched.
  <docsInCollection>2060245584</docsInCollection>
  # An APPROXIMATION of the total number of search results for the query.
  <hits>4838158</hits>
  # This is "1" if more results are available after these, "0" if not.
  <moreResultsFollow>1</moreResultsFollow>

  # If present and value is 1, some words in the query were censored for content.
  <queryCensored>1</queryCensored>
  # If present, the value is the number of results that were censored for content.
  <resultsCensored>3</resultsCensored>
  # If this tag is present, it will hold an alternate spelling recommendation 
  # for the query. The &spell=1 parameter must be present in the query url,
  # however, for you to get a spelling recommendation back.
  <spell>nose</spell>

  # If this tag is present, it contains the list of query words that were 
  # ignored as individual words, but not necessarily as part of a phrase
  <ignoredWords>the in of</ignoredWords>
  # This is how many of the search results contain ALL of the query terms.
  # It is only used for printing the "blue bar" for doing SuperRecall
  <minNumExactMatches>300</minNumExactMatches>

  # The list of related topics, each enclosed by <topic> tags. 
  # You must provide a topics parameter to the query url to get topics.
  <topic>
    # Each topic has a score. A score of 50% or more is considered pretty good.
    <score>63</score>
    # Out of the documents scanned, how many contain this topic.
    <docCount>4</docCount>
    # The topic popularity. A measure of how popular the word or phrase is
    # based on how many web pages contain it overall. Ranges from 0 to 1000.
    # 1000 being the most popular.
    <popularity>16</popularity>
    # The docIds of the documents scanned that contain this topic.
    <docId>9030668134</docId>
    <docId>265962215563</docId>
    <docId>43940265200</docId>
    <docId>264861015824</docId>
    # The topic name.
    <name><![CDATA[Race Cars]]></name>
    # And OPTIONALLY the name of the meta tag it was derived from.
    <from>keywords</from>
  </topic>

  # The list of reference pages for the search results.  Each reference is
  # enclosed in <reference> tags.
  <reference>
    # Each reference has a score based on its relevance to the query.
    <score>93</score>
    # Title of the reference page
    <title></title>
    # Url of the reference page
    <url><![CDATA[http://www.greatreference.com/]]></url>
  </reference>

  # The list of related pages for the search results.  Each related page is
  # enclosed in <related> tags.
  <related>
    # Each related page has a score based on its relevance to the query.
    <score>91</score>
    # Title of the related page.
    <title></title>
    # Url of the related page.
    <url><![CDATA[http://www.similar.com/]]></url>
    # Summary of the related page.
    <sum><![CDATA[This page is similar to the results]]></sum>
  </related>

  # The list of search results, each enclosed in <result> tags.
  <result>

    # Each result has a title. This may be empty if none was found on the page.
    <title><![CDATA[My Homepage]]></title>

    # Each result has a summary. This may be empty. The summary is generated 
    # so as to contain the query terms if possible.
    <sum><![CDATA[All about my interests and hobbies]]></sum>

    # If this result is categorized under the DMOZ Directory, data about each
    # category it is in will be enclosed in a <dmoz> tag.
    <dmoz>
      # The category ID number of this category.
      <dmozCatId>172</dmozCatId>
      # The path of this category in the directory.
      <dmozCat><![CDATA[Health: Dentistry]]></dmozCat>
      # Title of this result as listed in the directory.
      <dmozTitle><![CDATA[My Homepage]]></dmozTitle>
      # Description of this page as listed in the directory.
      <dmozDesc><![CDATA[A Dentist's Home Page]]></dmozDesc>
    </dmoz>
    # If the directory is being given along with the results, this is the number of
    # stars given to this page based on its quality.
    <stars>3</stars>

    # Each result may have a sequence of <display> tags if the feed input
    # contained a dt parameter. This allows you to extract
    # information contained in meta tags in the content of each search result.
    # To obtain the contents of the author meta tag, you would need to pass in
    # dt=author.
    <display name="author"><![CDATA[Contents of the meta author tag]]></display>

    # Each result has a URL. This should never be empty.
    <url><![CDATA[http://www.mydomain.com/mypage.html]]></url>
    # The size of the page in kilobytes. Accurate to the tenth of a kilobyte.
    <size>5.6</size>
    # The time the page was last INDEXED. It may not have been indexed in a 
    # long time if the page's content has not changed. The time is expressed 
    # in seconds since the epoch. (Jan 1, 1969)
    <spidered>1064367311</spidered>
    # The time the page was last modified. This is taken from the HTTP reply 
    # of the web server when downloading the page. It is 0 if unknown. The time
    # is expressed in seconds since the epoch. (Jan 1, 1969)
    <lastMod>1058477041</lastMod>

    # The assigned docid for this page. This number is unique and used 
    # internally by Gigablast to identify this page. It is used to retrieve the
    # "cached copy" of the page.
    <docId>65990704587</docId>
    # When doing site clustering, this tag will be present if the result is 
    # from the same hostname as a previous result for the same query. It 
    # indicates that you might want to indent the result. Any further results 
    # from this same hostname will be stripped from the feed.
    <clustered>1</clustered>

    # When Topic Clustering is being used, these will display results which 
    # are considered similar to this result and have been clustered under it. 
    # Each similar result is enclosed in a <similar> tag. 
    <similar>
      # The url for the similar result.
      <url><![CDATA[http://www.similar.com/]]></url>
      # The title of the similar result.
      <title><![CDATA[A similar topic]]></title>
    </similar>
    # If this is present and set to 1, there are more similar results beyond 
    # those given here. 
    <moreSimilar>1</moreSimilar>

    # This is a standard HTTP MIME content classification of the result. It is 
    # not present if the page is text/html. Otherwise, it will be one of the
    # following: text/plain
    #            text/xml
    #            application/pdf
    #            application/msword
    #            application/vnd.ms-excel
    #            application/mspowerpoint
    #            application/postscript
    <contentType>text/plain</contentType>
    # The documents are all sorted by this score. This score is a generally a
    # product of the WEIGHT of the query term and the COUNT of the query term
    # in this document. The WEIGHT is usually influenced by them term frequency
    # of the query term (rarer terms get more WEIGHT), by the additional weight
    # received by phrases which can be adjusted in the Master Controls, and,
    # possibly, by any user-defined weight in the query (See Weighting Query Terms).
    # This score is normalized by dividing by the maximum
    # score for all documents in the search results and then making it into a
    # percentage, so the score ranges from 0 to 100, and the first result
    # should always have score 100.
    <score>100</score>
    # This is the absolute score. Useful for merging results from other
    # collections or other search engines.
    <absScore>5132</absScore>

    # This is the language the page was detected as.
    <language><![CDATA[English]]></language>
    # The quality of the document as determined by Gigablast. Ranges from 0 to 100.
    <quality>80</quality>
    # The character set this page was originally encoded in. 
    <charset><![CDATA[utf-8]]></charset>

  </result>

  <result>

  ...
  </result>

  ...

  # If the directory has been requested, this node will include the directory
  # structure for the requested category.  Typically this is above the results.
  <directory>
    # Category ID for the displayed directory structure.
    <dirId>172</dirId>
    # Directory path of this category listing.
    <dirName>Health: Dentistry</dirName>

    # Specifies if the directory listing is displayed in a Right-To-Left format.
    <dirIsRTL>1</dirIsRTL>
    # Sub-Categories listed as letters meant to be displayed as a letter bar.
    # Each sub-category will be enclosed in a <letterbar> tag.
    <letterbar><![CDATA[Health/Dentistry/A]]>    
    # Every sub category will include a count of how many urls are listed under it.
      <urlcount>5<urlcount>

    </letterbar>
    # Normal sub-categories listed in groups.  These are listed in order of group
    # and alphabetically within each group. Each sub-category is enclosed in a
    # <narrow2>, <narrow1>, or <narrow> tag.
    <narrow2><![CDATA[Health/Dentistry/Regional]]>
      <urlcount>0<urlcount>

    </narrow2>
    <narrow1><![CDATA[Health/Dentistry/Association]]>
      <urlcount>122<urlcount>
    </narrow1>
    <narrow><![CDATA[Health/Dentistry/Children]]>

      <urlcount>24<urlcount>
    </narrow>
    # Symbolically linked sub-categories physically under a different category.
    # These will be interwoven alphabetically within the respective narrow groups.
    # The name listed before the path is the symbolic name.  Each symbolically linked
    # sub-category is enclosed in a <symbolic2>, <symbolic1>, or 
    # <symbolic> tag.
    <symbolic2><![CDATA[Dentophobia:Health/Mental_Health/Disorders/Anxiety/Phobias/Dentophobia]]>

      <urlcount>2<urlcount>
    </symbolic2>
    <symbolic1><![CDATA[Dental_Laboratories:Buisness/Healthcare/Products_and_Services/Dentistry/]]>
      <urlcount>71<urlcount>

    </symbolic1>
    <symbolic><![CDATA[Products:Shopping/Health/Dental]]>
      <urlcount>71<urlcount>
    </symbolic>
    # Seperate categories in the directory which are related to this one.
    <related><![CDATA[Society/Issues/Health/Dentistry]]>

      <urlcount>4</urlcount>
    </related>
    # This category in other languages in the directory.
    <altlang><![CDATA[Basque:World/Euskara/Osasuna/Odontologia]]>
      <urlcount>7</urlcount>

    </altlang>
  </directory>

</response>

Gigablast Web Search

Copy and paste the following html code into your web page to display a search box that searchs the web via Gigablast:


 <form method="get"
 action="http://www.gigablast.com/search">
 <input type="text" name="q" size="40">
 <input type="submit" value="search">
 </form>

Gigablast Site Search

See below for instructions on how to add a search box to your web site.

  • Copy and paste the following html code into your web page to display your site search box:
    
      <form method="get" action="http://www.gigablast.com/search">
      <input type="hidden" name="sc" value="0">
      <input type="hidden" name="sites" value="www.mydomain.com">
      <input type="hidden" name="iu" value="http://www.mydomain.com/logo.gif">
      <input type="hidden" name="iw" value="200">
      <input type="hidden" name="ih" value="50">
      <input type="hidden" name="ix" value="http://www.mylink.com/">
      <input type="text" name="q" size="40">
      <input type="submit" value="search">
      </form>
      
  • Replace www.mydomain.com above with your site name, or a space separated list of up to 500 of the sites you want to search. Sites need not be only domains, but can contain file paths as well.
  • Replace www.mydomain.com/logo.gif above with the url of your logo.
  • Replace 200 and 50 with the width and height of your logo in pixels. Alternatively eliminate the lines and the image will default to its native size.
  • Replace www.mylink.com above with the url to visit when your logo is clicked.
  • If you are searching a long list of sites you may need to replace method=get above with method=post since some browsers can not submit long get requests.
  • If you are using the site search index, then be sure to add your root url(s) via the addurl page. If your root url page uses frames and does not have any outgoing hyperlinks then you will need to add some other pages in order to spider your site. Wait a few minutes for Gigablast to download your pages so your search box actually has something to search.

Gigablast Custom Topic Search

To make your own custom topic engine, follow the steps below:

For instance, the following HTML produces the search box you see below:


<form method="post" action="http://www.gigablast.com/search">
<input type="text" name="q" size="60">
<input type="submit" value="search">
<input type="hidden" name="sc" value="0">
<input type="hidden" name="iu" value="http://www.gigablast.com/images/logo.gif">
<input type="hidden" name="iw" value="310">
<input type="hidden" name="ih" value="75">
<input type="hidden" name="ix" value="http://gigablast.com/index.php?page=business&subPage=cts">
<input type="hidden" name="sites" value="
www.althon.com
www.appian.com
www.atitech.ca
www.colorgraphic.net
www.videodigital.org
www.americas.creative.com
www.3dlabs.com
www.datapath.co.uk
www.dektec.com
www.diamondmm.com
www.iste.com
www.hercules.com
www.integraltech.com
www.jknelectronics.com
www.lmahd.com
www.matrox.commga
www.microlabs.com
www.multiscreenvideo.com
www.nine.com
www.peritek.com
www.pixelsmart.com
www.prolink.com.tw
www.quantum3d.com
www.salientsys.com
www.sparkle.com.tw
www.spectrah.com
www.axle-hk.com
www.xfxgraphics.com
www.x-micro.com">
</form>

Enter your search here to search all indexed pages from all of the sites listed on http://dir.gigablast.com/index.php/Computers/Hardware/Components/Video_Cards/Graphics_Board_Vendors: