com.norconex.collector.http.handler.impl
Class DefaultDocumentFetcher

java.lang.Object
  extended by com.norconex.collector.http.handler.impl.DefaultDocumentFetcher
All Implemented Interfaces:
IHttpDocumentFetcher, IXMLConfigurable, Serializable

public class DefaultDocumentFetcher
extends Object
implements IHttpDocumentFetcher, IXMLConfigurable

Default implementation of IHttpDocumentFetcher.

XML configuration usage:

  <httpDocumentFetcher  
      class="com.norconex.collector.http.handler.impl.DefaultDocumentFetcher">
      <validStatusCodes>200</validStatusCodes>
      <headersPrefix>(string to prefix headers)</headersPrefix>
  </httpDocumentFetcher>
 

The "validStatusCodes" attribute expects a coma-separated list of HTTP response code.

Author:
Pascal Essiembre
See Also:
Serialized Form

Constructor Summary
DefaultDocumentFetcher()
           
DefaultDocumentFetcher(int[] validStatusCodes)
           
 
Method Summary
 CrawlStatus fetchDocument(org.apache.http.impl.client.DefaultHttpClient httpClient, HttpDocument doc)
          Fetches HTTP document and saves it to a local file
 String getHeadersPrefix()
           
 int[] getValidStatusCodes()
           
 void loadFromXML(Reader in)
           
 void saveToXML(Writer out)
           
 void setHeadersPrefix(String headersPrefix)
           
 void setValidStatusCodes(int[] validStatusCodes)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DefaultDocumentFetcher

public DefaultDocumentFetcher()

DefaultDocumentFetcher

public DefaultDocumentFetcher(int[] validStatusCodes)
Method Detail

fetchDocument

public CrawlStatus fetchDocument(org.apache.http.impl.client.DefaultHttpClient httpClient,
                                 HttpDocument doc)
Description copied from interface: IHttpDocumentFetcher
Fetches HTTP document and saves it to a local file

Specified by:
fetchDocument in interface IHttpDocumentFetcher
Parameters:
httpClient - the HTTP client
doc - HttpDocument the document to fetch and save
Returns:
URL status

getValidStatusCodes

public int[] getValidStatusCodes()

setValidStatusCodes

public final void setValidStatusCodes(int[] validStatusCodes)

getHeadersPrefix

public String getHeadersPrefix()

setHeadersPrefix

public void setHeadersPrefix(String headersPrefix)

loadFromXML

public void loadFromXML(Reader in)
Specified by:
loadFromXML in interface IXMLConfigurable

saveToXML

public void saveToXML(Writer out)
               throws IOException
Specified by:
saveToXML in interface IXMLConfigurable
Throws:
IOException


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.