com.norconex.collector.http
Class HttpCollector

java.lang.Object
  extended by com.norconex.collector.http.HttpCollector
All Implemented Interfaces:
IJobSuiteFactory

public class HttpCollector
extends Object
implements IJobSuiteFactory

Main application class. In order to use it properly, you must first configure it, either by providing a populated instance of HttpCollectorConfig, or by XML configuration, loaded using HttpCollectorConfigLoader. Instances of this class can hold several crawler, running at once. This is convenient when there are configuration setting to be shared amongst crawlers. When you have many crawler jobs defined that have nothing in common, it may be best to configure and run them separately, to facilitate troubleshooting. There is no fair rule for this, experimenting with your target sites will help you.

Author:
Pascal Essiembre

Constructor Summary
HttpCollector()
          Creates a non-configured HTTP collector.
HttpCollector(File configFile, File variablesFile)
          Creates an HTTP Collector configured using the provided configuration fine and variable files.
HttpCollector(HttpCollectorConfig collectorConfig)
          Creates and configure an HTTP Collector with the provided configuration.
 
Method Summary
 void crawl(boolean resumeNonCompleted)
          Launched all crawlers defined in configuration.
 JobSuite createJobSuite()
           
 File getConfigurationFile()
           
 HttpCrawler[] getCrawlers()
           
 File getVariablesFile()
           
static void main(String[] args)
          Invokes the HTTP Collector from the command line.
 void setConfigurationFile(File configurationFile)
           
 void setCrawlers(HttpCrawler[] crawlers)
           
 void setVariablesFile(File variablesFile)
           
 void stop()
          Stops a running instance of this HTTP Collector.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HttpCollector

public HttpCollector()
Creates a non-configured HTTP collector.


HttpCollector

public HttpCollector(File configFile,
                     File variablesFile)
Creates an HTTP Collector configured using the provided configuration fine and variable files. Sample configuration files and documentation on configuration options and the differences between a variables file and configuration are found on the HTTP Collector web site.

Parameters:
configFile - a configuration file
variablesFile - a variables file

HttpCollector

public HttpCollector(HttpCollectorConfig collectorConfig)
Creates and configure an HTTP Collector with the provided configuration.

Parameters:
collectorConfig - HTTP Collector configuration
Method Detail

getConfigurationFile

public File getConfigurationFile()

setConfigurationFile

public void setConfigurationFile(File configurationFile)

getVariablesFile

public File getVariablesFile()

setVariablesFile

public void setVariablesFile(File variablesFile)

getCrawlers

public HttpCrawler[] getCrawlers()

setCrawlers

public void setCrawlers(HttpCrawler[] crawlers)

main

public static void main(String[] args)
Invokes the HTTP Collector from the command line.

Parameters:
args - Invoke it once without any arguments to get a list of command-line options.

crawl

public void crawl(boolean resumeNonCompleted)
Launched all crawlers defined in configuration.

Parameters:
resumeNonCompleted - whether to resume where previous crawler aborted (if applicable)

stop

public void stop()
Stops a running instance of this HTTP Collector.


createJobSuite

public JobSuite createJobSuite()
Specified by:
createJobSuite in interface IJobSuiteFactory


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.