Archiving the Web

Slight side topic from a simple review to have a little chat about something I have learned recently about archiving that I found rather interesting. And not just archiving books, for about 10 or so years now people have been trying to archive the web. For something that can be so brief in existing it certainly seems like a strange and vast task, but there is a reason. The ephemeral nature of the web means that sites that were there last week may not be there the next, domains are changed, pages deleted, and links are removed and information is then lost.

I mean, this blog post is a new webpage, one that did not exist yesterday, one that may not exist in the future if it gets deleted, and if it is not archived then it is gone forever. I assure you my quaint little recently neglected website is not going to be up for archiving at any possible time soon, but in the tumultuous degree I insist on partaking in, I am rather amused to find I am actually learning something rather interesting. Actually properly fascinating information about libraries, and books, and the internet, and the fact that technology is moving so fast everything is already practically obsolete before we have had a chance to appreciate it.

Something I learned about quite recently whilst researching for one of my assignments is that the 1086 Doomsday Book created by William the Conqueror still exists, nearly 1000 years after it was created as well all expected it should. It is a famous book that was crucial in documenting what is now Britain…or just England. I can’t actually remember and the Horrible Histories episode that discusses it escapes me just now. But the fact is it still exists, as does the few copies that were made of it in order to preserve the original.

What is more fascinating however is that on the 900th anniversary of the Doomsday Book, in 1986, the BBC in London did a Doomsday Project. They made a pair of interactive videodiscs to celebrate the anniversary to try and capture what England was like at the time. Yes, videodiscs, not even a beta tape, a floppy, a video tape, or a…some other kind of technology I can’t recall from the 80s. So essentially this time capsule was not that effective because while the videodiscs will remain readable for many more years, the computers that read them, and the software they used, were not lasting. They couldn’t even be viewed a lot in the 80s because the technology was expensive and therefore rare. There are a few working examples left to read them, but there is the fear that the 1986 Doomsday Project will soon be lost. I mean, they tried so hard and it just did not work. They are however rushing to try and retrieve this information but it is just sadly beautiful that it did not work.

That is a key issue of archiving anything, (and why books are triumphant if looked after correctly), there needs to be software and technology to be able to access the information. The same goes for everything, I recall an episode of South Park where Cartman freezes himself because he is too impatient for the new Wii. Instead he ends up too far in the future where he can get the Xbox, but there is no method of playing on it any more as the technology and power sources do not exist.

Books are very good at being archived, but space issues, and the fact that people tend to destroy them, intentionally and unintentionally, is a problem. We had a lecture a few weeks ago about all the risks and occupational hazards books are prone to from being in a library. The basics were those delightful little sleeves that were glued in the front or back of the book for the borrowing card, as well as library stamps, the Dewey number scribbled in there, and that is all before it is unleashed to the public. So all of this plus people damaging the books as they read make books tainted and can make it hard to preserve them.

The lecture also went on a little bit about theft, and how can you conserve and archive when people keep stealing books. My lecture was making books sound like high end art deals the way he was discussing the lengths people had gone through the get rare maps or rare books from collections. You only have to see the care people take with white gloves and air tight seals to realise books are precious, but at the same time your mind just goes to that one hardcover book your library has that has been “repaired” by the staff with masking tape to keep the spine on, and the one with the scribbles and underlines paragraphs in thick black pen that’s dug into the page it pops out on the other side, and random stains, so many random stains. Then you cannot imagine anybody secretly using a pen knife to extract anything from books and selling it on the black market as a high priced item. But it happens.

But as for the web, because it has become such a big deal and everyone upon everyone is using it, not just for trivial things like social interactions, but there are historical and culturally significant things being published on the web that hold importance. Things like information about the events of September 11, or about the Olympics, or research discoveries any other cultural or historical thing that happens would probably begin online, and this must be preserved or future researches will have nothing to look back on because the web pages have all been lost, unpreserved, taking all the information with it. No longer are books being written about events, (well they are, but there is a lot more on the web) and anybody and everybody is contributing.

You never really think about the web as having a lot of importance, there are fifteen different websites for the same song lyrics, there are strange things on Yahoo answers, and a website dedicated just to popping virtual bubble wrap. But there are also other websites out there that discuss evolution, dinosaurs, scientific discoveries old and new, religion, the history of the world, and how colonies were formed, kingdoms were created and lost, government creations and collapses, and all of these are important and will remain important for a long time. They show us where we have come from and it does so in an easily accessible manner. These are important as they have information that capture the culture at the time, what beliefs were, where technology was at, and what we were thinking of as a society. And the way to keep this information long after the fact is to archive and preserve these sites in ways that can keep both the intellectual information, but also the context and the structure of a site if necessary, in a form that can be accessed easily in the future. We have been preserving books for as long as they have been around almost and now we are recognising that the web has the same potential and are trying to capture it before it disappears. And I just thought all of that was really interesting and wanted to share that with you, even if it isn’t as eloquent and academic as it could be. Who knew the web was such a complex little thing?

1 Comment (+add yours?)

  1. allvce
    Nov 26, 2013 @ 16:09:38

    I remember reading something about data mortality a while ago as well that made the point about books also having this happen. It takes a lot longer but the examples given were lost languages like latin, and difficulties reading handwriting. I found this really interesting as I sometimes struggle to comprehend handwriting from two/three generatiomns back.
    Lovely article, by the way. Very thought provoking.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: