Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.
Saw this and a reply from perplexity in their blog essentially said “cos the user asked us to find the information, we do it on behalf of the user and therefore robots.txt doesn’t apply”
It is different to how Google crawls and makes a database of info, but… Not sure how I feel. It’s a greenfield out there.
No, the point of it is only live interactive browsing.
The closest thing would be lynx, anything less than that should respect robots.txt
Of course as a single user, you don’t really hace an impact and no one cares if you decide to ignore it, but once you are talking about automated systems…
Saw this and a reply from perplexity in their blog essentially said “cos the user asked us to find the information, we do it on behalf of the user and therefore robots.txt doesn’t apply”
It is different to how Google crawls and makes a database of info, but… Not sure how I feel. It’s a greenfield out there.
There’s no question about “how to feel.”
If the user wants information, they can seek it out themselves. No bots means no bots.
“Themselves” define that. Can I use Python requests?
No, the point of it is only live interactive browsing.
The closest thing would be lynx, anything less than that should respect robots.txt
Of course as a single user, you don’t really hace an impact and no one cares if you decide to ignore it, but once you are talking about automated systems…