Changelog Report

Timeframe: 30 days, Total Commits: 77 Total Number of Files Changed: 198

DateAuthorFile/Message
2005-08-08 11:01:47Davide Pozza

examples/googleImages/conf/google_images-config.xml v 1.4

set preservePath to false
2005-08-08 11:01:00Davide Pozza

src/main/java/org/smartcrawler/Crawler.java v 1.18

Renamed var conf to context
2005-08-08 11:00:09Davide Pozza

examples/run-example.bat v 1.2

examples/run-example.sh v 1.2

Fixed messages
2005-08-05 16:22:21Davide Pozza

src/main/resources/log4j.properties v 1.5

Logging is programmatically handled
2005-08-05 15:55:52Davide Pozza

src/main/java/org/smartcrawler/common/AbstractParametrizableComponent.java v 1.2

src/main/java/org/smartcrawler/common/ConfigReader.java v 1.12

src/main/java/org/smartcrawler/common/Context.java v 1.2

src/main/java/org/smartcrawler/common/Link.java v 1.8

src/main/java/org/smartcrawler/common/SCLoggerFactory.java v 1.8

src/main/java/org/smartcrawler/common/SiteConfiguration.java v 1.5

src/main/java/org/smartcrawler/examples/QuickTest.java v 1.9

src/main/java/org/smartcrawler/extractor/HtmlURLImpl.java v 1.10

src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java v 1.6

src/main/java/org/smartcrawler/extractor/PatternProvider.java v 1.6

src/main/java/org/smartcrawler/extractor/RegExpLinksExtractor.java v 1.8

src/main/java/org/smartcrawler/extractor/pattern/AbstractPattern.java v 1.5

src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java v 1.5

src/main/java/org/smartcrawler/filter/ContainedWordFilter.java v 1.3

src/main/java/org/smartcrawler/filter/LinkFilter.java v 1.4

src/main/java/org/smartcrawler/persistence/FileSystemPersister.java v 1.15

src/main/java/org/smartcrawler/retriever/Call.java v 1.3

src/main/java/org/smartcrawler/retriever/HttpCallRetriever.java v 1.6

src/main/java/org/smartcrawler/retriever/MultiThreadHttpCallRetriever.java v 1.2

src/main/java/org/smartcrawler/retriever/SmartGetMethod.java v 1.2

Removed trailing spaces
2005-08-05 15:54:27Davide Pozza

examples/googleImages/conf/google_images-config.xml v 1.3

examples/photosig/conf/photosig-config.xml v 1.2

Layout
2005-08-05 15:53:47Davide Pozza

xdocs/navigation.xml v 1.4

xdocs/start/quick-start.xml v 1.5

xdocs/start/samples.xml v 1.2

Updated docs
2005-08-05 15:52:39Davide Pozza

src/main/java/org/smartcrawler/Crawler.java v 1.17

src/main/java/org/smartcrawler/common/SCLoggerFactory.java v 1.7

Fixed Log4J issues
2005-08-05 14:07:40Davide Pozza

xdocs/test/testpage.html v 1.2

Added text
2005-08-05 14:06:28Davide Pozza

src/main/java/org/smartcrawler/Crawler.java v 1.16

src/main/java/org/smartcrawler/DownloadEngine.java v 1.8

Changed Config to Contex
2005-08-05 14:04:22Davide Pozza

src/main/java/org/smartcrawler/persistence/PersisterFactory.java v 1.6

The persister is now buildt by the ConfigReader
2005-08-05 14:04:59Davide Pozza

src/main/java/org/smartcrawler/examples/QuickTest.java v 1.8

Added test for photosig
2005-08-05 14:03:10Davide Pozza

src/main/java/org/smartcrawler/retriever/RetrieverFactory.java v 1.8

The retriever is now buildt by the ConfigReader
2005-08-05 14:01:42Davide Pozza

src/main/java/org/smartcrawler/retriever/MultiThreadHttpCallRetriever.java v 1.1

This object extends the HttpCallRetriever and uses the multiThread httpClient
2005-08-05 14:00:21Davide Pozza

src/main/java/org/smartcrawler/retriever/HttpCallRetriever.java v 1.5

This object handles now only the single thread httpClient creation and it is buildt fronm the xml config file.
2005-08-05 13:58:19Davide Pozza

src/main/java/org/smartcrawler/persistence/FileSystemPersister.java v 1.14

This object is now thread safe and is buildt dynamically by reading the xml config. The parameter "preserveUrl" has been added.
2005-08-05 13:56:14Davide Pozza

src/main/java/org/smartcrawler/common/Config.java v 1.7

src/main/java/org/smartcrawler/common/Context.java v 1.1

Config.java has been replaced by Context.java
2005-08-05 13:54:59Davide Pozza

src/main/java/org/smartcrawler/common/ConfigReader.java v 1.11

The configuration has been made more dynamic: the persister and the retriever are now fully configurable
2005-08-05 13:53:57Davide Pozza

src/main/java/org/smartcrawler/common/AbstractParametrizableComponent.java v 1.1

Abstract class extended by the objects which are buildt from the xml config and which need to receive custom parameters on creation time
2005-08-05 13:51:57Davide Pozza

src/main/java/org/smartcrawler/filter/AbstractFilter.java v 1.6

Removed and created the more generic AbstractParametrizableComponent
2005-08-05 13:51:00Davide Pozza

src/main/java/org/smartcrawler/filter/ContainedWordFilter.java v 1.2

src/main/java/org/smartcrawler/filter/ContentTypeLinkFilter.java v 1.4

src/main/java/org/smartcrawler/filter/DefaultLinkFilter.java v 1.6

src/main/java/org/smartcrawler/filter/FilterManager.java v 1.7

src/main/java/org/smartcrawler/filter/LinkFilter.java v 1.3

src/main/java/org/smartcrawler/filter/PostFilterLink.java v 1.4

src/main/java/org/smartcrawler/filter/PrecFilterLink.java v 1.4

Changed extended object: AbstractParametrizableComponent
2005-08-05 13:48:26Davide Pozza

maven.xml v 1.6

Added examples dir to the dist build
2005-08-05 13:46:55Davide Pozza

src/bin/conf/smartcrawler-config.xml v 1.3

Changed the structure of the xml configuration file
2005-08-05 13:45:47Davide Pozza

src/bin/smartcrawler.bat v 1.3

Added property definition of custom extractionPatterns
2005-08-05 13:39:43Davide Pozza

src/test/java/org/smartcrawler/filter/AbstractFilterTest.java v 1.1

src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java v 1.6

src/test/java/org/smartcrawler/retriever/HttpCallRetrieverTest.java v 1.4

Changes due to the refactoring of the persister and of the retriever
2005-08-05 13:38:04Davide Pozza

src/test/resources/extractPatterns.xml v 1.1

src/test/resources/log4j.properties v 1.1

Added custom junit resources
2005-08-05 13:36:50Davide Pozza

project.xml v 1.19

configured junit resources dir
2005-08-05 13:35:03Davide Pozza

examples/photosig/run.bat v 1.1

examples/photosig/run.sh v 1.1

examples/photosig/conf/extractPatterns.xml v 1.1

examples/photosig/conf/photosig-config.xml v 1.1

New crawling example
2005-08-05 13:35:54Davide Pozza

examples/googleImages/conf/google_images-config.xml v 1.2

examples/nytRss/conf/nyt_rss-config.xml v 1.2

examples/others/only-html-config.xml v 1.2

examples/others/yellowPages-config.xml v 1.2

Changed the structure of the xml configuration file
2005-08-03 07:49:14Davide Pozza

src/bin/conf/google_images-config.xml v 1.3

src/bin/conf/nyt_rss-config.xml v 1.3

src/bin/conf/only-html-config.xml v 1.3

src/bin/conf/yellowPages-config.xml v 1.2

Moved and reorganized on "examples" dir
2005-08-03 07:48:39Davide Pozza

examples/run-example.bat v 1.1

examples/run-example.sh v 1.1

examples/googleImages/run.bat v 1.1

examples/googleImages/run.sh v 1.1

examples/googleImages/conf/extractPatterns.xml v 1.1

examples/googleImages/conf/google_images-config.xml v 1.1

examples/nytRss/run.bat v 1.1

examples/nytRss/run.sh v 1.1

examples/nytRss/conf/nyt_rss-config.xml v 1.1

examples/others/only-html-config.xml v 1.1

examples/others/yellowPages-config.xml v 1.1

New location and easier usage for examples
2005-07-29 10:37:21Davide Pozza

src/main/java/org/smartcrawler/filter/LinkFilter.java v 1.2

Added parameters list handler with placeholders
2005-07-29 10:34:41Davide Pozza

src/main/java/org/smartcrawler/filter/FilterManager.java v 1.6

Added check on null links
2005-07-29 10:33:41Davide Pozza

src/main/java/org/smartcrawler/filter/AbstractFilter.java v 1.5

added getParameters method
2005-07-29 10:31:33Davide Pozza

src/main/java/org/smartcrawler/extractor/RegExpLinksExtractor.java v 1.7

removed commented out section
2005-07-29 10:30:48Davide Pozza

src/main/java/org/smartcrawler/extractor/PatternProvider.java v 1.5

Extraction patterns made customizable/overwritable
2005-07-29 10:20:27Davide Pozza

src/main/java/org/smartcrawler/extractor/HtmlURLImpl.java v 1.9

Added urldecoder into "clean" method
2005-07-29 10:18:55Davide Pozza

src/main/java/org/smartcrawler/examples/QuickTest.java v 1.7

updated
2005-07-29 10:18:29Davide Pozza

src/main/java/org/smartcrawler/common/SCLogger.java v 1.7

src/main/java/org/smartcrawler/common/SCLoggerFactory.java v 1.6

Code beautify
2005-07-29 10:17:25Davide Pozza

src/main/java/org/smartcrawler/common/Config.java v 1.6

src/main/java/org/smartcrawler/common/ConfigReader.java v 1.10

Added single thread support
2005-07-29 10:15:59Davide Pozza

src/main/java/org/smartcrawler/Crawler.java v 1.15

Added single thread support
2005-07-25 08:17:27Davide Pozza

xdocs/index.xml v 1.2

Changed title
2005-07-25 08:12:26Davide Pozza

src/main/java/org/smartcrawler/filter/ContainedWordFilter.java v 1.1

New experimental filter
2005-07-25 08:11:34Davide Pozza

src/main/java/org/smartcrawler/filter/AbstractFilter.java v 1.4

Added license header
2005-07-24 20:06:23Davide Pozza

src/main/java/org/smartcrawler/examples/QuickTest.java v 1.6

Updated
2005-07-24 20:06:01Davide Pozza

src/bin/conf/yellowPages-config.xml v 1.1

New demo xml
2005-07-24 20:05:28Davide Pozza

src/main/java/org/smartcrawler/extractor/HtmlURLImpl.java v 1.8

Added cleaning of encoded &
2005-07-24 20:04:36Davide Pozza

src/main/java/org/smartcrawler/retriever/SmartGetMethod.java v 1.1

Experimental support of gzipped contents
2005-07-24 20:03:11Davide Pozza

src/main/java/org/smartcrawler/filter/LinkFilter.java v 1.1

Added generic PREC filter with parameter "link"
2005-07-24 20:01:42Davide Pozza

src/main/java/org/smartcrawler/Crawler.java v 1.14

src/main/java/org/smartcrawler/retriever/HttpCall.java v 1.4

src/main/java/org/smartcrawler/retriever/HttpCallRetriever.java v 1.4

Working on new config option "useMultiThread"
2005-07-24 19:58:33Davide Pozza

project.xml v 1.18

Updated to httpClient 3.0-rc3
2005-07-24 19:57:28Davide Pozza

src/main/java/org/smartcrawler/common/ConfigReader.java v 1.9

Fixed reader bug
2005-07-15 13:12:22Davide Pozza

xdocs/index.xml v 1.1

xdocs/navigation.xml v 1.3

xdocs/start/configuring.xml v 1.3

xdocs/start/quick-start.xml v 1.4

xdocs/start/samples.xml v 1.1

texts updates
2005-07-14 14:38:04Davide Pozza

xdocs/navigation.xml v 1.2

Changed menu items order and label
2005-07-14 14:37:39Davide Pozza

xdocs/start/configuring.xml v 1.2

xdocs/start/quick-start.xml v 1.3

Updated manual
2005-07-14 14:36:46Davide Pozza

project.xml v 1.17

Added new version
2005-07-14 14:10:48Davide Pozza

src/bin/smartcrawler.sh v 1.5

Added again in order to preserve execution flag
2005-07-14 14:09:39Davide Pozza

src/bin/smartcrawler.sh v 1.4

Removed in order to change execution flag
2005-07-14 14:07:54Davide Pozza

src/bin/smartcrawler.sh v 1.3

*** empty log message ***
2005-07-14 13:50:25Davide Pozza

src/bin/smartcrawler.bat v 1.2

src/bin/smartcrawler.sh v 1.2

fixed script name
2005-07-14 13:42:29Davide Pozza

src/main/java/org/smartcrawler/Crawler.java v 1.13

src/main/java/org/smartcrawler/common/ConfigReader.java v 1.8

src/main/java/org/smartcrawler/common/Link.java v 1.7

src/main/java/org/smartcrawler/common/SiteConfiguration.java v 1.4

src/main/java/org/smartcrawler/examples/QuickTest.java v 1.5

src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java v 1.5

src/main/java/org/smartcrawler/extractor/PatternProvider.java v 1.4

src/main/java/org/smartcrawler/extractor/pattern/AbstractPattern.java v 1.4

src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java v 1.4

src/main/java/org/smartcrawler/filter/AbstractFilter.java v 1.3

src/main/java/org/smartcrawler/persistence/FileSystemPersister.java v 1.13

src/main/java/org/smartcrawler/retriever/Call.java v 1.2

src/test/java/org/smartcrawler/common/LinkTest.java v 1.4

src/test/java/org/smartcrawler/extractor/MimeTypeTranslatorTest.java v 1.2

src/test/java/org/smartcrawler/extractor/PatternProviderTest.java v 1.3

src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java v 1.5

src/test/java/org/smartcrawler/retriever/HttpCallRetrieverTest.java v 1.3

Code beautify
2005-07-14 13:38:44Davide Pozza

src/bin/run.bat v 1.3

src/bin/run.sh v 1.4

src/bin/smartcrawler.bat v 1.1

src/bin/smartcrawler.sh v 1.1

file renamed
2005-07-14 13:20:10Davide Pozza

checkstyle.xml v 1.4

maven.xml v 1.5

project.xml v 1.16

src/bin/conf/google_images-config.xml v 1.2

src/bin/conf/nyt_rss-config.xml v 1.2

src/bin/conf/only-html-config.xml v 1.2

src/bin/conf/smartcrawler-config.xml v 1.2

fixed license
2005-07-14 09:19:53Davide Pozza

src/bin/run.bat v 1.2

src/bin/run.sh v 1.3

src/main/java/org/smartcrawler/Crawler.java v 1.12

Path fix
2005-07-13 15:57:19Davide Pozza

src/bin/run.sh v 1.2

Fixed for $SMARTCRAWLER_HOME support
2005-07-13 15:18:29Davide Pozza

src/main/java/org/smartcrawler/persistence/FileSystemPersister.java v 1.12

Activated renaming system by checking the content type
2005-07-13 15:17:28Davide Pozza

src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java v 1.4

Added all the mimetypes
2005-07-13 15:16:50Davide Pozza

src/main/java/org/smartcrawler/Crawler.java v 1.11

Handle property smartcrawler.home
2005-07-13 15:15:22Davide Pozza

src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java v 1.4

added testLinkToFilePathWithExtension
2005-07-13 15:13:57Davide Pozza

src/test/java/org/smartcrawler/extractor/MimeTypeTranslatorTest.java v 1.1

Added testcase
2005-07-13 15:12:27Davide Pozza

maven.xml v 1.4

src/bin/cpappend.bat v 1.1

src/bin/run.bat v 1.1

src/bin/run.sh v 1.1

src/bin/conf/google_images-config.xml v 1.1

src/bin/conf/nyt_rss-config.xml v 1.1

src/bin/conf/only-html-config.xml v 1.1

src/bin/conf/smartcrawler-config.xml v 1.1

Moved startup scripts and configuration files to to src/bin
2005-07-09 16:30:38Davide Pozza

checkstyle.xml v 1.3

maven.xml v 1.3

project.xml v 1.15

src/main/java/org/smartcrawler/extractor/MimeTypeTranslator.java v 1.3

src/main/java/org/smartcrawler/extractor/PatternProvider.java v 1.3

src/main/java/org/smartcrawler/extractor/UnhandledMimeTypeException.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java v 1.3

src/main/resources/extractPatterns.xml v 1.2

src/test/java/org/smartcrawler/common/LinkTest.java v 1.3

src/test/java/org/smartcrawler/extractor/PatternProviderTest.java v 1.2

src/test/java/org/smartcrawler/persistence/FileSystemPersisterTest.java v 1.3

Fixed license text
2005-07-09 15:56:53Davide Pozza

src/main/java/org/smartcrawler/extractor/PatternProvider.java v 1.2

src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java v 1.2

Removed trailing spaces
2005-07-09 15:53:16Davide Pozza

src/test/java/org/smartcrawler/extractor/PatternProviderTest.java v 1.1

Implementation of the new patterns provider by using an xml file instead of a different class for every pattern: test case
2005-07-09 15:49:20Davide Pozza

project.xml v 1.14

modified version id
2005-07-09 15:48:31Davide Pozza

src/main/java/org/smartcrawler/examples/QuickTest.java v 1.4

src/main/java/org/smartcrawler/extractor/PatternProvider.java v 1.1

src/main/java/org/smartcrawler/extractor/RegExpLinksExtractor.java v 1.6

src/main/java/org/smartcrawler/extractor/pattern/AbstractPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/ConcretePattern.java v 1.1

src/main/resources/extractPatterns.xml v 1.1

Implementation of the new patterns provider by using an xml file instead of a different class for every pattern
2005-07-09 15:44:31Davide Pozza

src/main/java/org/smartcrawler/extractor/pattern/AnchorExtrPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/AreaExtrPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/ImgExtrPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/KnownExtensionsPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/LinkExtrPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/MetaExtrPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/ScriptExtrPattern.java v 1.3

src/main/java/org/smartcrawler/extractor/pattern/StyleExtrPattern.java v 1.3

Removal due to the new xml pattern configuration system