One approach, that is commonly mentioned in this thread is to simulate a behavio...

One approach, that is commonly mentioned in this thread is to simulate a behavior of a normal user as much as possible. For instance rendering the full page (including JS, CSS, ...) which is far more resource intensive than just downloading the HTML page.

However if you're crawling big platforms, there are often ways in that can scale and be undetected for very long periods of time. Those include forgotten API endpoints that were build for some new application that was dismissed after a time, mobile interface that taps into different endpoints, obscure platform specific applications (e.g. playstation or some old version of android). Older and larger the platform is, the more probable is that they have many entry points they don't police at all or at least very lightly.

One of the most important rules of scrapping is to be patient. Everyone is anxious to get going as soon as they can, however once you start pounding on a website, consequently draining their resources, they will take measures against you and the whole task will get way more complicated. Would you have the patience and make sure you're staying within some limits (hard to guess from the outside), you will be eventually able to amass large datasets.