Hi,
Do you refer to the issue that the content might have changed since
the URL was crawled? I would say that for the sake of comparability,
please use the version included in the content. (Or do you have a
strong preference for recrawling?)
Thanks,
Peter
--- In billiontriples@yahoogroups.com, "huanxuezhou" <huanxuezhou@...>
wrote:
>
> Hi Peter, one thing confuses me. A warc record consists subject URI
> and content block. Since sometimes the content from content block is
> different from the one from URI, which content should we use?
>