PHP – Retrieving app ratings from Google Play and App Store by means of parsing

For my website dedicated to promotion of less known mobile games I needed to periodically check the ratings and number of reviews of the submitted apps. When there are 10 games total it can be done manually, but what if the site gets more popular and  10 games turns to 100?

I didn’t want to spend the whole evening checking whether the games still meet the site requirements.

Please continue reading to find out how it can be done automatically by means of parsing. Unfortunately Windows Phone store has protections that prevent this method from working without additional workarounds.

First of all we need to download the store web page content into a variable.

Downloading web page HTML content into a variable

I am using the following code for that:

function getHtml($url) {
	$session = curl_init();
	curl_setopt($session, CURLOPT_URL, $url);
	curl_setopt($session, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($session, CURLOPT_CONNECTTIMEOUT, 5);	
	$html = curl_exec($session);
	curl_close($session);
	return $html;
}

Google play

In this case the aim is to retrieve the data from the following part of the game’s page:

Google Play - Reviews

The corresponding HTML code is as follows:

Rating (depending on the localisation settings):

<div class="score">4.7</div>
<div class="score">4,7</div>

Review:

<span class="reviews-num">60</span>

… and the PHP code to retrieve the data:

function getAndroidRating($html) {
	if (preg_match('/<div class="score">(\d\.\d)<\/div>/', $html, $matches)) {
		return $matches[1];
	} if (preg_match('/<div class="score">(\d)\,(\d)<\/div>/', $html, $matches)) {
 		return $matches[1] . "." . $matches[2];
	} return "Not found!";
}

function getAndroidNoOfReviews($html) {
	if (preg_match('/<span class="reviews\-num">(\d+)<\/span>/', $html, $matches)) {
 		return $matches[1];
 	} return "Not found!";
}

APPLE STORE

In this case we will be targeting the a selected part of the web-site as well – unfortunately in this case there can be a couple of combinations.

In case the game has no ratings the part of the screen is as follows:

App Store - No ratings

In case of only current version:

App Store - Current version only

No ratings for current version, but ratings exist for all versions combined:

App Store - All versions only

 

Corresponding HTML codes:

<div>All Versions:</div>

or:

<div>Current Version:</div>

…and then:

<div class="rating" role="img" tabindex="-1" aria-label="4 and a half stars, 9 Ratings">
	<div>
		<span class="rating-star">&nbsp;</span>
		<span class="rating-star">&nbsp;</span>
		<span class="rating-star">&nbsp;</span>
		<span class="rating-star">&nbsp;</span>
		<span class="rating-star half">&nbsp;</span>
	</div>

… plus the code for the rating which follows immediately:

	<span class="rating-count">44 Ratings</span>

In this case the rating cannot be as easily retrieved as in Google Play version (I didn’t like handling different possibilities in “aria-label”) therefore I have decided to do some counting stars, i.e. count the number of occurrences of “rating-star” and “rating-star half”.

The PHP code is as follows:

function getiOSRating($html) {
	if (preg_match('/<div>Current Version:<\/div>(.+?)<\/div>/s', $html, $matches) > 0) {
		$full_stars = substr_count($matches[1], '"rating-star"');
		$half_stars = substr_count($matches[1], '"rating-star half"');
		
		if ($full_stars == 0 && $half_stars == 0) 
			return "Not found!";

		return $full_stars . "." . ($half_stars*5);
	} else if (preg_match('/<div>All Versions:<\/div>(.+?)<\/div>/s', $html, $matches) > 0) {
		$full_stars = substr_count($matches[1], '"rating-star"');
		$half_stars = substr_count($matches[1], '"rating-star half"');

		if ($full_stars == 0 && $half_stars == 0) 
			return "Not found!";

		return $full_stars . "." . ($half_stars*5);
	} else
		return "N/A";
	}

function getiOSNoOfReviews($html) {
	if (preg_match('/<div>Current Version:<\/div>.+?<span class="rating\-count">(\d+) Rating/s', $html, $matches) > 0) {
		return $matches[1];
	} else if (preg_match('/<div>All Versions:<\/div>.+?<span class="rating\-count">(\d+) Rating/s', $html, $matches) > 0) {
		return $matches[1];
	} else
		return "N/A";
}

Why did you need to retrieve the scores or ratings?


←Table of contents
←My apps, games and other projects

Advertisements
PHP – Retrieving app ratings from Google Play and App Store by means of parsing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s