How does Google parses the webpages?
When we Google something, it returns documents. Now documents as I
understand are html pages laden with tags. From my parsing experience,
html pages' structured-ness can vary, and vary hugely, some pages are
designed well with every div identified in a structured way and others are
just a mess. And with millions of documents out there that Google indexes,
how does it extract the relevant body of text, and presents to us the
starting part of text documents?
No comments:
Post a Comment