2005-08-08 11:01:47 | Davide Pozza |
examples/googleImages/conf/google_images-config.xml
v 1.4
set preservePath to false |
2005-08-08 11:01:00 | Davide Pozza |
src/main/java/org/smartcrawler/Crawler.java
v 1.18
Renamed var conf to context |
2005-08-08 11:00:09 | Davide Pozza |
examples/run-example.bat
v 1.2
examples/run-example.sh
v 1.2
Fixed messages |
2005-08-05 16:22:21 | Davide Pozza |
src/main/resources/log4j.properties
v 1.5
Logging is programmatically handled |
2005-08-05 15:55:52 | Davide Pozza |
src/main/java/org/smartcrawler/common/AbstractParametrizableComponent.java
v 1.2
src/main/java/org/smartcrawler/common/ConfigReader.java
v 1.12
src/main/java/org/smartcrawler/common/Context.java
v 1.2
src/main/java/org/smartcrawler/common/Link.java
v 1.8
src/main/java/org/smartcrawler/common/SCLoggerFactory.java
v 1.8
src/main/java/org/smartcrawler/common/SiteConfiguration.java
v 1.5
src/main/java/org/smartcrawler/examples/QuickTest.java
v 1.9
src/main/java/org/smartcrawler/extractor/HtmlURLImpl.java
v 1.10
src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java
v 1.6
src/main/java/org/smartcrawler/extractor/PatternProvider.java
v 1.6
src/main/java/org/smartcrawler/extractor/RegExpLinksExtractor.java
v 1.8
src/main/java/org/smartcrawler/extractor/pattern/AbstractPattern.java
v 1.5
src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java
v 1.5
src/main/java/org/smartcrawler/filter/ContainedWordFilter.java
v 1.3
src/main/java/org/smartcrawler/filter/LinkFilter.java
v 1.4
src/main/java/org/smartcrawler/persistence/FileSystemPersister.java
v 1.15
src/main/java/org/smartcrawler/retriever/Call.java
v 1.3
src/main/java/org/smartcrawler/retriever/HttpCallRetriever.java
v 1.6
src/main/java/org/smartcrawler/retriever/MultiThreadHttpCallRetriever.java
v 1.2
src/main/java/org/smartcrawler/retriever/SmartGetMethod.java
v 1.2
Removed trailing spaces |
2005-08-05 15:54:27 | Davide Pozza |
examples/googleImages/conf/google_images-config.xml
v 1.3
examples/photosig/conf/photosig-config.xml
v 1.2
Layout |
2005-08-05 15:53:47 | Davide Pozza |
xdocs/navigation.xml
v 1.4
xdocs/start/quick-start.xml
v 1.5
xdocs/start/samples.xml
v 1.2
Updated docs |
2005-08-05 15:52:39 | Davide Pozza |
src/main/java/org/smartcrawler/Crawler.java
v 1.17
src/main/java/org/smartcrawler/common/SCLoggerFactory.java
v 1.7
Fixed Log4J issues |
2005-08-05 14:07:40 | Davide Pozza |
xdocs/test/testpage.html
v 1.2
Added text |
2005-08-05 14:06:28 | Davide Pozza |
src/main/java/org/smartcrawler/Crawler.java
v 1.16
src/main/java/org/smartcrawler/DownloadEngine.java
v 1.8
Changed Config to Contex |
2005-08-05 14:04:22 | Davide Pozza |
src/main/java/org/smartcrawler/persistence/PersisterFactory.java
v 1.6
The persister is now buildt by the ConfigReader |
2005-08-05 14:04:59 | Davide Pozza |
src/main/java/org/smartcrawler/examples/QuickTest.java
v 1.8
Added test for photosig |
2005-08-05 14:03:10 | Davide Pozza |
src/main/java/org/smartcrawler/retriever/RetrieverFactory.java
v 1.8
The retriever is now buildt by the ConfigReader |
2005-08-05 14:01:42 | Davide Pozza |
src/main/java/org/smartcrawler/retriever/MultiThreadHttpCallRetriever.java
v 1.1
This object extends the HttpCallRetriever and uses the multiThread httpClient |
2005-08-05 14:00:21 | Davide Pozza |
src/main/java/org/smartcrawler/retriever/HttpCallRetriever.java
v 1.5
This object handles now only the single thread httpClient creation and it is buildt fronm the xml config file. |
2005-08-05 13:58:19 | Davide Pozza |
src/main/java/org/smartcrawler/persistence/FileSystemPersister.java
v 1.14
This object is now thread safe and is buildt dynamically by reading the xml config. The parameter "preserveUrl" has been added. |
2005-08-05 13:56:14 | Davide Pozza |
src/main/java/org/smartcrawler/common/Config.java
v 1.7
src/main/java/org/smartcrawler/common/Context.java
v 1.1
Config.java has been replaced by Context.java |
2005-08-05 13:54:59 | Davide Pozza |
src/main/java/org/smartcrawler/common/ConfigReader.java
v 1.11
The configuration has been made more dynamic: the persister and the retriever are now fully configurable |
2005-08-05 13:53:57 | Davide Pozza |
src/main/java/org/smartcrawler/common/AbstractParametrizableComponent.java
v 1.1
Abstract class extended by the objects which are buildt from the xml config and which need to receive custom parameters on creation time |
2005-08-05 13:51:57 | Davide Pozza |
src/main/java/org/smartcrawler/filter/AbstractFilter.java
v 1.6
Removed and created the more generic AbstractParametrizableComponent |
2005-08-05 13:51:00 | Davide Pozza |
src/main/java/org/smartcrawler/filter/ContainedWordFilter.java
v 1.2
src/main/java/org/smartcrawler/filter/ContentTypeLinkFilter.java
v 1.4
src/main/java/org/smartcrawler/filter/DefaultLinkFilter.java
v 1.6
src/main/java/org/smartcrawler/filter/FilterManager.java
v 1.7
src/main/java/org/smartcrawler/filter/LinkFilter.java
v 1.3
src/main/java/org/smartcrawler/filter/PostFilterLink.java
v 1.4
src/main/java/org/smartcrawler/filter/PrecFilterLink.java
v 1.4
Changed extended object: AbstractParametrizableComponent |
2005-08-05 13:48:26 | Davide Pozza |
maven.xml
v 1.6
Added examples dir to the dist build |
2005-08-05 13:46:55 | Davide Pozza |
src/bin/conf/smartcrawler-config.xml
v 1.3
Changed the structure of the xml configuration file |
2005-08-05 13:45:47 | Davide Pozza |
src/bin/smartcrawler.bat
v 1.3
Added property definition of custom extractionPatterns |
2005-08-05 13:39:43 | Davide Pozza |
src/test/java/org/smartcrawler/filter/AbstractFilterTest.java
v 1.1
src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java
v 1.6
src/test/java/org/smartcrawler/retriever/HttpCallRetrieverTest.java
v 1.4
Changes due to the refactoring of the persister and of the retriever |
2005-08-05 13:38:04 | Davide Pozza |
src/test/resources/extractPatterns.xml
v 1.1
src/test/resources/log4j.properties
v 1.1
Added custom junit resources |
2005-08-05 13:36:50 | Davide Pozza |
project.xml
v 1.19
configured junit resources dir |
2005-08-05 13:35:03 | Davide Pozza |
examples/photosig/run.bat
v 1.1
examples/photosig/run.sh
v 1.1
examples/photosig/conf/extractPatterns.xml
v 1.1
examples/photosig/conf/photosig-config.xml
v 1.1
New crawling example |
2005-08-05 13:35:54 | Davide Pozza |
examples/googleImages/conf/google_images-config.xml
v 1.2
examples/nytRss/conf/nyt_rss-config.xml
v 1.2
examples/others/only-html-config.xml
v 1.2
examples/others/yellowPages-config.xml
v 1.2
Changed the structure of the xml configuration file |
2005-08-03 07:49:14 | Davide Pozza |
src/bin/conf/google_images-config.xml
v 1.3
src/bin/conf/nyt_rss-config.xml
v 1.3
src/bin/conf/only-html-config.xml
v 1.3
src/bin/conf/yellowPages-config.xml
v 1.2
Moved and reorganized on "examples" dir |
2005-08-03 07:48:39 | Davide Pozza |
examples/run-example.bat
v 1.1
examples/run-example.sh
v 1.1
examples/googleImages/run.bat
v 1.1
examples/googleImages/run.sh
v 1.1
examples/googleImages/conf/extractPatterns.xml
v 1.1
examples/googleImages/conf/google_images-config.xml
v 1.1
examples/nytRss/run.bat
v 1.1
examples/nytRss/run.sh
v 1.1
examples/nytRss/conf/nyt_rss-config.xml
v 1.1
examples/others/only-html-config.xml
v 1.1
examples/others/yellowPages-config.xml
v 1.1
New location and easier usage for examples |
2005-07-29 10:37:21 | Davide Pozza |
src/main/java/org/smartcrawler/filter/LinkFilter.java
v 1.2
Added parameters list handler with placeholders |
2005-07-29 10:34:41 | Davide Pozza |
src/main/java/org/smartcrawler/filter/FilterManager.java
v 1.6
Added check on null links |
2005-07-29 10:33:41 | Davide Pozza |
src/main/java/org/smartcrawler/filter/AbstractFilter.java
v 1.5
added getParameters method |
2005-07-29 10:31:33 | Davide Pozza |
src/main/java/org/smartcrawler/extractor/RegExpLinksExtractor.java
v 1.7
removed commented out section |
2005-07-29 10:30:48 | Davide Pozza |
src/main/java/org/smartcrawler/extractor/PatternProvider.java
v 1.5
Extraction patterns made customizable/overwritable |
2005-07-29 10:20:27 | Davide Pozza |
src/main/java/org/smartcrawler/extractor/HtmlURLImpl.java
v 1.9
Added urldecoder into "clean" method |
2005-07-29 10:18:55 | Davide Pozza |
src/main/java/org/smartcrawler/examples/QuickTest.java
v 1.7
updated |
2005-07-29 10:18:29 | Davide Pozza |
src/main/java/org/smartcrawler/common/SCLogger.java
v 1.7
src/main/java/org/smartcrawler/common/SCLoggerFactory.java
v 1.6
Code beautify |
2005-07-29 10:17:25 | Davide Pozza |
src/main/java/org/smartcrawler/common/Config.java
v 1.6
src/main/java/org/smartcrawler/common/ConfigReader.java
v 1.10
Added single thread support |
2005-07-29 10:15:59 | Davide Pozza |
src/main/java/org/smartcrawler/Crawler.java
v 1.15
Added single thread support |
2005-07-25 08:17:27 | Davide Pozza |
xdocs/index.xml
v 1.2
Changed title |
2005-07-25 08:12:26 | Davide Pozza |
src/main/java/org/smartcrawler/filter/ContainedWordFilter.java
v 1.1
New experimental filter |
2005-07-25 08:11:34 | Davide Pozza |
src/main/java/org/smartcrawler/filter/AbstractFilter.java
v 1.4
Added license header |
2005-07-24 20:06:23 | Davide Pozza |
src/main/java/org/smartcrawler/examples/QuickTest.java
v 1.6
Updated |
2005-07-24 20:06:01 | Davide Pozza |
src/bin/conf/yellowPages-config.xml
v 1.1
New demo xml |
2005-07-24 20:05:28 | Davide Pozza |
src/main/java/org/smartcrawler/extractor/HtmlURLImpl.java
v 1.8
Added cleaning of encoded & |
2005-07-24 20:04:36 | Davide Pozza |
src/main/java/org/smartcrawler/retriever/SmartGetMethod.java
v 1.1
Experimental support of gzipped contents |
2005-07-24 20:03:11 | Davide Pozza |
src/main/java/org/smartcrawler/filter/LinkFilter.java
v 1.1
Added generic PREC filter with parameter "link" |
2005-07-24 20:01:42 | Davide Pozza |
src/main/java/org/smartcrawler/Crawler.java
v 1.14
src/main/java/org/smartcrawler/retriever/HttpCall.java
v 1.4
src/main/java/org/smartcrawler/retriever/HttpCallRetriever.java
v 1.4
Working on new config option "useMultiThread" |
2005-07-24 19:58:33 | Davide Pozza |
project.xml
v 1.18
Updated to httpClient 3.0-rc3 |
2005-07-24 19:57:28 | Davide Pozza |
src/main/java/org/smartcrawler/common/ConfigReader.java
v 1.9
Fixed reader bug |
2005-07-15 13:12:22 | Davide Pozza |
xdocs/index.xml
v 1.1
xdocs/navigation.xml
v 1.3
xdocs/start/configuring.xml
v 1.3
xdocs/start/quick-start.xml
v 1.4
xdocs/start/samples.xml
v 1.1
texts updates |
2005-07-14 14:38:04 | Davide Pozza |
xdocs/navigation.xml
v 1.2
Changed menu items order and label |
2005-07-14 14:37:39 | Davide Pozza |
xdocs/start/configuring.xml
v 1.2
xdocs/start/quick-start.xml
v 1.3
Updated manual |
2005-07-14 14:36:46 | Davide Pozza |
project.xml
v 1.17
Added new version |
2005-07-14 14:10:48 | Davide Pozza |
src/bin/smartcrawler.sh
v 1.5
Added again in order to preserve execution flag |
2005-07-14 14:09:39 | Davide Pozza |
src/bin/smartcrawler.sh
v 1.4
Removed in order to change execution flag |
2005-07-14 14:07:54 | Davide Pozza |
src/bin/smartcrawler.sh
v 1.3
*** empty log message *** |
2005-07-14 13:50:25 | Davide Pozza |
src/bin/smartcrawler.bat
v 1.2
src/bin/smartcrawler.sh
v 1.2
fixed script name |
2005-07-14 13:42:29 | Davide Pozza |
src/main/java/org/smartcrawler/Crawler.java
v 1.13
src/main/java/org/smartcrawler/common/ConfigReader.java
v 1.8
src/main/java/org/smartcrawler/common/Link.java
v 1.7
src/main/java/org/smartcrawler/common/SiteConfiguration.java
v 1.4
src/main/java/org/smartcrawler/examples/QuickTest.java
v 1.5
src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java
v 1.5
src/main/java/org/smartcrawler/extractor/PatternProvider.java
v 1.4
src/main/java/org/smartcrawler/extractor/pattern/AbstractPattern.java
v 1.4
src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java
v 1.4
src/main/java/org/smartcrawler/filter/AbstractFilter.java
v 1.3
src/main/java/org/smartcrawler/persistence/FileSystemPersister.java
v 1.13
src/main/java/org/smartcrawler/retriever/Call.java
v 1.2
src/test/java/org/smartcrawler/common/LinkTest.java
v 1.4
src/test/java/org/smartcrawler/extractor/MimeTypeTranslatorTest.java
v 1.2
src/test/java/org/smartcrawler/extractor/PatternProviderTest.java
v 1.3
src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java
v 1.5
src/test/java/org/smartcrawler/retriever/HttpCallRetrieverTest.java
v 1.3
Code beautify |
2005-07-14 13:38:44 | Davide Pozza |
src/bin/run.bat
v 1.3
src/bin/run.sh
v 1.4
src/bin/smartcrawler.bat
v 1.1
src/bin/smartcrawler.sh
v 1.1
file renamed |
2005-07-14 13:20:10 | Davide Pozza |
checkstyle.xml
v 1.4
maven.xml
v 1.5
project.xml
v 1.16
src/bin/conf/google_images-config.xml
v 1.2
src/bin/conf/nyt_rss-config.xml
v 1.2
src/bin/conf/only-html-config.xml
v 1.2
src/bin/conf/smartcrawler-config.xml
v 1.2
fixed license |
2005-07-14 09:19:53 | Davide Pozza |
src/bin/run.bat
v 1.2
src/bin/run.sh
v 1.3
src/main/java/org/smartcrawler/Crawler.java
v 1.12
Path fix |
2005-07-13 15:57:19 | Davide Pozza |
src/bin/run.sh
v 1.2
Fixed for $SMARTCRAWLER_HOME support |
2005-07-13 15:18:29 | Davide Pozza |
src/main/java/org/smartcrawler/persistence/FileSystemPersister.java
v 1.12
Activated renaming system by checking the content type |
2005-07-13 15:17:28 | Davide Pozza |
src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java
v 1.4
Added all the mimetypes |
2005-07-13 15:16:50 | Davide Pozza |
src/main/java/org/smartcrawler/Crawler.java
v 1.11
Handle property smartcrawler.home |
2005-07-13 15:15:22 | Davide Pozza |
src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java
v 1.4
added testLinkToFilePathWithExtension |
2005-07-13 15:13:57 | Davide Pozza |
src/test/java/org/smartcrawler/extractor/MimeTypeTranslatorTest.java
v 1.1
Added testcase |
2005-07-13 15:12:27 | Davide Pozza |
maven.xml
v 1.4
src/bin/cpappend.bat
v 1.1
src/bin/run.bat
v 1.1
src/bin/run.sh
v 1.1
src/bin/conf/google_images-config.xml
v 1.1
src/bin/conf/nyt_rss-config.xml
v 1.1
src/bin/conf/only-html-config.xml
v 1.1
src/bin/conf/smartcrawler-config.xml
v 1.1
Moved startup scripts and configuration files to to src/bin |
2005-07-09 16:30:38 | Davide Pozza |
checkstyle.xml
v 1.3
maven.xml
v 1.3
project.xml
v 1.15
src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java
v 1.3
src/main/java/org/smartcrawler/extractor/PatternProvider.java
v 1.3
src/main/java/org/smartcrawler/extractor/UnhandledMimeTypeException.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java
v 1.3
src/main/resources/extractPatterns.xml
v 1.2
src/test/java/org/smartcrawler/common/LinkTest.java
v 1.3
src/test/java/org/smartcrawler/extractor/PatternProviderTest.java
v 1.2
src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java
v 1.3
Fixed license text |
2005-07-09 15:56:53 | Davide Pozza |
src/main/java/org/smartcrawler/extractor/PatternProvider.java
v 1.2
src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java
v 1.2
Removed trailing spaces |
2005-07-09 15:53:16 | Davide Pozza |
src/test/java/org/smartcrawler/extractor/PatternProviderTest.java
v 1.1
Implementation of the new patterns provider by using an xml file instead of a different class for every pattern: test case |
2005-07-09 15:49:20 | Davide Pozza |
project.xml
v 1.14
modified version id |
2005-07-09 15:48:31 | Davide Pozza |
src/main/java/org/smartcrawler/examples/QuickTest.java
v 1.4
src/main/java/org/smartcrawler/extractor/PatternProvider.java
v 1.1
src/main/java/org/smartcrawler/extractor/RegExpLinksExtractor.java
v 1.6
src/main/java/org/smartcrawler/extractor/pattern/AbstractPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java
v 1.1
src/main/resources/extractPatterns.xml
v 1.1
Implementation of the new patterns provider by using an xml file instead of a different class for every pattern |
2005-07-09 15:44:31 | Davide Pozza |
src/main/java/org/smartcrawler/extractor/pattern/AnchorExtrPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/AreaExtrPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/ImgExtrPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/KnownExtensionsPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/LinkExtrPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/MetaExtrPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/ScriptExtrPattern.java
v 1.3
src/main/java/org/smartcrawler/extractor/pattern/StyleExtrPattern.java
v 1.3
Removal due to the new xml pattern configuration system |