In this tutorial we will learn to do only with PHP web scraping and web scraping do with cURL and PHP, also we know for what purposes we can use.
web scraping
For those who are unfamiliar with the web scraping term, you should know that it is a technique used to extract information from websites, with a few lines of code can travel the source of a web page that code as seen in the browser, storing in a database, display identically in their own URL, extract only important information within the code, among others.
Web scraping with PHP
To follow the steps in this tutorial XAMPP we must first have installed on our Windows operating system, you can use other environments, including other operating systems, you must ensure only count with PHP.
For those who have not yet installed XAMPP on Windows, you can review the following tutorial: https://devcode.la/tutoriales/instalar-xampp-en-windows-7/ .
Then in the XAMPP Control Panel will activate the Apache module.
Then in our htdocs folder, located inside the XAMPP folder on your disk C, we create our web scraping folder within which we will create the index.php file, then write the following code:
<? php
$ html = file_get_contents ( 'https : //devcode.la/'); // Converts URL information chain
Echo $ html ;
?>
Once created the index.php file with its own code, we open in our browser the following URL http: // localhost / web scraping / , which is the URL where the result of our previous code sample.
And we see that identically copy the https://devcode.la website, let's review the PHP code to understand what happened.
In the variable $ html file_gets_content the result of the function is stored, which makes file_gets_content is to convert information from a file into a string, with the file in this case web page https://devcode.la .
After this chain the show in our document using "echo".
Web Scraping with cURL and PHP
Now let's see how we can make web scraping using the cURL library to create our curl.php this file in the folder / web scraping created above and type the following code:
<? Php
// Define the function cURL
function curl ( $ url ) {
$ ch = curl_init ( $ url ) ; // Sign cURL
curl_setopt ( $ ch , CURLOPT_RETURNTRANSFER , TRUE ) ; // Set cURL to return the result as a string
curl_setopt ( $ ch , CURLOPT_SSL_VERIFYPEER , false ) ; // Set cURL to not check the peer certificate because our URL uses the HTTPS protocol
$ info = curl_exec ( $ ch ) ; // Sets a cURL session and assigns information to the variable $ info
curl_close ( $ ch ) ; // Close session cURL
return $ info ; // Returns the function information
}
$ sitioWeb = curl ( "https : //devcode.la"); // Run the curl function escrapeando website https://devcode.la and returns the value to the variable $ sitioWeb
Echo $ sitioWeb ;
?>
Within the code in the comments you will find the explanation of each line, summing up, we first create a function called curl, could also be done directly without creating the function, then began meeting with curl_init, then we made a couple of configurations, CURLOPT_RETURNTRANSFER is related to power use the result as a string and CURLOPT_SSL_VERIFYPEER served us so cURL URL can work in our despite having the HTTPS protocol.
Then at the end we use the curl function created to show our escrapeada website.
In conclusion, we can see that there is more than one way to make web scraping with PHP. This tutorial tries to make an approach to this issue, but there are more possibilities of how to treat information escrapeada.
Remember also that in our platform you can find a course Fundamentals of PHPitself not familiar with this programming language and want to learn from scratch.