Pages

Wednesday, May 22, 2024

A virtual black hole

The Independent reports on a new study that suggests that the internet is slowly disapearing down a virtual black hole as web pages and online content is lost.

The web is often thought of as a place where content lasts forever. But vast swathes of its are being lost as pages are deleted or moved, according to new research.

Of the webpages that existed in 2013, for instance, 38 per cent are now lost. Even newer pages are disappearing: 8 per cent of pages that existed in 2023 are no longer available.

Those pages tend to disappear when they are deleted or moved. That happens on otherwise functional websites, the study from the Pew Research Center indicated, rather than happening when whole websites disappear.

The effect means that vast amounts of news and important reference content are disappearing. Some 23 per cent of news pages include at least one broken link, and 21 per cent of government websites, it said – and 54 per cent of Wikipedia pages include a link in their references that no longer exists.

Much the same effect is happening on social media. A fifth of tweets disappear from the site within months of being posted.

The study was completed by gathering a random samples of almost a million webpages, taken from Common Crawl, a service that archives parts of the internet. Researchers then looked to see whether those pages continued to exist between 2013 and 2023.

It found that 25 per cent of all pages collected between 2013 and 2023 were no longer available. Of those, 16 per cent of pages came from a website that continues to exist, while 9 per cent were located on websites that no longer exist at all.

This is one of the reasons why I tend to quote at length on this blog rather than rely on hotlinks. There are over 20 years of posts here and the further you go back, the more likely it is that the link to a particular story is broken.

No comments:

Post a Comment

I am happy to address most contributions, even the drunken ones if they are coherent, but I am not going to engage with negative sniping from those who do not have the guts to add their names or a consistent on-line identity to their comments. Such postings will not be published.

Anonymous comments with a constructive contribution to make to the discussion, even if it is critical will continue to be posted. Libellous comments or remarks I think may be libellous will not be published.

I will also not tolerate personation so please do not add comments in the name of real people unless you are that person. If you do not like these rules then start your own blog.

Oh, and if you persist in repeating yourself despite the fact I have addressed your point I may get bored and reject your comment.

The views expressed in comments are those of the poster, not me.