The
Web ARChive (WARC)
archive format specifies a method for combining multiple digital resources into an aggregate
archive file together with related information. The WARC format is a revision of the
Internet Archive's ARC File Format
[4] that has traditionally been used to store
"web crawls" as sequences of content blocks harvested from the
World Wide Web. The WARC format generalizes the older format to better support the harvesting, access, and exchange needs of archiving organizations. Besides the primary content currently recorded, the revision accommodates related secondary content, such as assigned
metadata, abbreviated duplicate detection events, and later-date transformations.
[5]
WARC is now recognised by most
national library systems as the standard to follow for web archival.
[6]