Google is developing technology that could position it to compete with a new breed of digital media companies that are generating story ideas for the internet by mining online search data for under-covered topics.
Source: Google shadow over new media groups
One data-mining guy can easily identify the other similar contents (scrapped/crawled + remixed by other guy). It’s not a rocket science for CS Engineering graduates. If Google can release that “new system” it will be good for us bloggers that create unique content. But that system can be really difficult to implement. Why?
Let’s say that I create a new site and content. Came the kid that scrape my content and post them into their old autoblog or spam site, what do you think it will happen? The old site that is indexed in Google will be seen as the “first poster” because Google found it first. But then there are two options for Google to add to its system and I think it will for sure. The process can be like this (a little difficult to explain, but will try my best):
–> Google get the article copied from the autoblog
–> The owner of that article add a backlink pointing to another part of the site
–> Google now, has to check if the site is an autoblog (which autobloggers can really make a good job and hide it)
–> Following the link of the article, Google may find that the new site has the same article and “think” that the owner of that article is for the new blog.
But there is really a problem with this scenario, if the new site has let’s say 1000 pages of content. Google should scan all content from that new site and will take a lot of time. What about those webmasters that don’t add links into their content? (there are millions that don’t do that) What about the autoblogger owner edit the content and remove the link from the article? That will be really difficult to Google to develop such an effective system. What do you think about this?



