Scraping is usually referred to as screen scraping or, more precisely, “web scraping”. Website content is extracted, copied and saved manually or with the help of software and, if necessary, reused on your own website in a different design.
Used positively, web scraping offers the opportunity to add value to a website with content from other websites. In the opposite case, scraping violates copyrights and is rated as spam.
Scraping can be done using various techniques. The current ones are briefly presented here:
Using http manipulation, content from static or dynamic websites can be copied via http request. With the process called ” data mining “, various contents are identified on the basis of the templates and scripts in which they are embedded. The content is converted using a wrapper and made available for another website. The wrapper acts as a kind of interface between the two systems.
This google scraping tools perform a variety of scraping tasks, both automated and manually controlled. The spectrum ranges from copied content to copied structures or functionalities.
HTML parsers , such as those used for browsers , pull data from other websites and convert it for other purposes.
Copying content manually is often referred to as scraping. The spectrum ranges from simply copying texts to copying entire source code snippets. Manual scraping is often used when scraping programs are blocked, for example by robots.txt.
Reading microformats is also part of the scraping area. In the further development of the semantic web , microformats are popular components of a website.