Download List

專案描述

Heritrix is the Internet Archive's extensible, Web-scale,
archival-quality Web crawler.

System Requirements

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2009-09-20 07:05
1.14.3

This is a 'micro' release with bugfixes and small requested improvements. The next major release will be 2.2 in 2009, which is planned to include updates to the Heritrix 2 configuration system and checkpointing functionality, and tools easing transition from 1.14.x to Heritrix 2.2.

2005-12-02 08:57
1.6.0

This release offers improved remote control and
monitoring via JMX, a crawl-checkpointing
facility, experimental support for bloom filter
already-included testing, partitioning a crawl
across multiple independent crawlers, and
per-host/domain/queue-grouping collection quotas.
Performance and stability in large crawls was
improved. 39 requested enhancements were included
and 96 reported bugs were fixed. You will need to
tweak your old order files again to make them work
with the new release.
標籤: Major feature enhancements

2005-04-29 08:37
1.4.0

This release features a much improved memory usage, a new experimental scoping/filter model, and a new revisiting frontier. Over 90 bugs were fixed.
標籤: Major feature enhancements

2004-11-17 04:01
1.2.0

This release adds IP-based politeness, configurable URI-
canonicalization, and mid-fetch abort. There were also lots of
bugfixes.
標籤: Minor feature enhancements

2004-09-23 20:53
1.0.4

Crawl.log and ARC metadata lines could previously have whitespace in URIs and MIME etype fields.
標籤: Minor bugfixes

Project Resources