SPARQL Query In Code: REST, PHP And JSON [TUTORIAL]

Use SPARQL And Query On
Photo: sclopit

SPARQL allows you to query a semantic web (i.e. RDF) data source. This post will cover some basics of SPARQL but it will mainly focus on how to run a SPARQL query in code, against a SPARQL endpoint live on the web. The code I’m using is PHP and JSON but the overall steps are the same using any language.

(Here is the tutorial demo and the full source code for this tutorial.)

The overall steps are…

1. Constructing our SPARQL query
2. Preparing our REST URL
3. Make the HTTP request to the URL
4. Parsing the response
5. Use/Display the results

1. Constructing our SPARQL query

Our example SPARQL query will get the “abstract” section (i.e. brief description) of the RDF page for Honda Legend (http://dbpedia.org/page/Honda_Legend) in DBPedia

PREFIX dbp: <http://dbpedia.org/resource/>
PREFIX dbp2: <http://dbpedia.org/ontology/>

SELECT ?abstract
WHERE {
     dbp:Honda_Legend dbp2:abstract ?abstract .
     FILTER langMatches(lang(?abstract), 'en')
}

(See this query run in DBPedia)

In the SPARQL query, ?abstract is a variable. The name is arbitrary (it could be called ?x) and all variables in SPARQL start with the ? (question mark).

The first line in the WHERE clause is a RDF triple. It’s a “subject predicate object” followed by a dot to end the triple (like the period of a sentence).

In this triple, the subject is dbp:Honda_Legend in it’s abbreviated form. We could also use it’s fully expanded form…
<http://dbpedia.org/resource/Honda_Legend>
and we wouldn’t need to have the PREFIX dbp: <http://dbpedia.org/resource/> line in the query.

The predicate is dbp2:abstract in it’s abbreviated form. The predicate in a triple describes the relationship between the subject and the object.

The object in this triple is the variable ?abstract. Or in other words, the object is what we are trying to find with this query.

To explain this query in English one could say “find the value of the object of the triple that matches the pattern “dbp:Honda_Legend dbp2:abstract object

The second line in the WHERE clause is applying a filter to the results to only show the english version of the abstract.

Try these links for more info on the SPARQL query language.

SPARQL by example – Cambridge Semantics

SPARQL Tutorial

2. Preparing our REST URL

The URL we’ll send our request to will start with the SPARQL endpoint. For DBPedia it’s http://dbpedia.org/sparql and we’ll pass it some query string parameters, format and query (which is our SPARQL query).

function getUrlDbpediaAbstract($term)
{
   $format = 'json';

   $query =
   "PREFIX dbp: <http://dbpedia.org/resource/>
   PREFIX dbp2: <http://dbpedia.org/ontology/>
 
   SELECT ?abstract
   WHERE {
      dbp:"
.$term." dbp2:abstract ?abstract .
      FILTER langMatches(lang(?abstract), 'en')
   }"
;
   
   $searchUrl = 'http://dbpedia.org/sparql?'
      .'query='.urlencode($query)
      .'&format='.$format;

   return $searchUrl;
}

I’ve taken it a step further by creating this function getUrlDbpediaAbstract which takes a parameter $term that lets us plug in any DBPedia (also Wikipedia) page name to get the abstract for. (Note: $term must be an actual page name from DBPedia/Wikipedia so that the URL http://en.wikipedia.org/wiki/[page name] returns a non redirected Wikipedia article. The [page name] must be exact and it is case sensitive.)

The following approach to getting and using the data from this REST web service can be used for any REST web service that returns JSON as a response format. I’ve found this approach to be very simple and powerful. (I may write a separate post focusing on this approach).

3. Make the HTTP request to the URL

Making HTTP requests in PHP is pretty standard and easy task using cURL if you have it installed on Apache (which most web hosts do it seems). There is also a PEAR package HTTP_Request2 which is another option but it requires PEAR to be installed.

function request($url){
 
   // is curl installed?
   if (!function_exists('curl_init')){
      die('CURL is not installed!');
   }
   
   // get curl handle
   $ch= curl_init();

   // set request url
   curl_setopt($ch,
      CURLOPT_URL,
      $url);

   // return response, don't print/echo
   curl_setopt($ch,
      CURLOPT_RETURNTRANSFER,
      true);
 
   /*
   Here you find more options for curl:
   http://www.php.net/curl_setopt
   */
   

   $response = curl_exec($ch);
   
   curl_close($ch);
   
   return $response;
}

Above is a simple function request which will request our the $url and return the response body as a string.

4. Parsing the response

The request to our URL above will return a string of JSON data because that’s the format we specified. I chose JSON as the response format because there is a very easy way to parse a JSON string in PHP 5. We can simply use json_decode . json_decode has a bool parameter assoc which if set to true returns a PHP array. With this PHP array we can easily access all the data returned in our response. We’ll see how easy this is in the next section.

5. Use/Display the results

With our results in a PHP array, we can use a little function that I created to neatly print the contents of a PHP array called printArray.

function printArray($array, $spaces = "")
{
   $retValue = "";
   
   if(is_array($array))
   { 
      $spaces = $spaces
         ."&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;";

      $retValue = $retValue."<br/>";

      foreach(array_keys($array) as $key)
      {
         $retValue = $retValue.$spaces
            ."<strong>".$key."</strong>"
            .printArray($array[$key],
               $spaces);
      }    
      $spaces = substr($spaces, 0, -30);
   }
   else $retValue =
      $retValue." - ".$array."<br/>";
   
   return $retValue;
}

This function can be used in development to see where the data you want is located in the array.

Putting our functions together now we can display and use the data returned.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?php
$term = "Honda_Legend";

$requestURL = getUrlDbpediaAbstract($term);

$responseArray = json_decode(
   request($requestURL),
   true);
?>

<h1>DBPedia Abstract for
<?php echo $term ?></h1>

<h3>Request URL:</h3>
<?php echo $requestURL ?>
<br/>

<h3>Parsed Response: </h3>
<?php echo printArray($responseArray); ?>
<br/>

<h3>Abstract: </h3>
<?php echo $responseArray["results"]
    ["bindings"][0]
    ["abstract"]["value"] ?>

(See this run live or check out the full example source code on GitHub)

At the time of writing the results of
echo printArray($responseArray);
look like this…

     head
          link
          vars
               0 – abstract
     results
          distinct
          ordered – 1
          bindings
               0
                    abstract
                         type – literal
                         xml:lang – en
                         value – The Honda Legend is a Mid-size luxury car made by the Japanese automaker Honda. It was originally developed as part of Project XX, a joint venture with the Austin Rover Group of Great Britain and was a twin of the Rover 800 series. The Legend was initially a four-door sedan, with a two-door coupé added later. It was the model which launched Honda’s upscale Acura brand in the United States. Honda was inspired by the word “legend” to create the first Honda vehicle with a V6. The first and second-generation Honda Legend was known as the Acura Legend in North American markets from 1986-1995, and in 1996 the third-generation was renamed as the Acura RL, while the Legend name is still used in Japan and other markets. Honda introduced the Legend as a flagship sedan to compete with the JDM Nissan Cedric and Nissan Gloria twins, the Toyota Crown, and later the Mazda Luce, and Mitsubishi Debonair. Unlike the Nissan twins and the Crown, the Legend is not used for taxi service. In the USA, the Legend competed with larger rear wheel drive V8 sedans Lexus LS and the Infiniti Q45, however, the Legend was marketed towards the slightly smaller Executive car vehicles that include the BMW 5 series, Audi A6, Mercedes-Benz E-Class, and the Jaguar S-type. The Legend hardtop coupe was introduced to compete with the Nissan Leopard coupe, the Toyota Soarer, and Mazda Cosmo.

From this we can discover the path to get the abstract data we want, which is…

<h3>Abstract: </h3>
<?php echo $responseArray["results"]
    ["bindings"][0]
    ["abstract"]["value"] ?>

from the code block above (line 22).

Conclusion

In this tutorial we’ve seen how to construct a SPARQL query that gets the value of the abstract property of DBPedia page. We sent the query to a SPARQL endpoint (DBPedia) which is a REST web service URI. We have parsed the results by converting the JSON string response to a PHP array using json_decode. This approach to consuming REST based web services is an easy way to access the data in the results. I may write a future post focusing on it.

A working demo of this tutorial is here and the full source code for the demo is on GitHub.

If you have any questions or issues with this tutorial and/or demo source code, please leave them in the comments below.

  • Chantu

    Dear Mr John !!! (SIR)

    I salute you for this lovely code.

    I had spend hours looking for a easy way to extract data of dbpedia without any success.

    And WOW !!!

    You are genius.

    Thank you so much for sharing.

    Kind Regrads
    Chantu of ChantuDotCom

  • Boeckle Martin

    thx for sharing this !!! king style…

  • http://johnwright.me/blog John Wright

    Thanks guys, I’m glad you’ve found it useful!

  • Jako

    Anyway to get this to work when the search term has an accent in it?

  • http://johnwright.me/blog John Wright

    Hmm, I see the problem. After a little investigating I wasn’t able to get it to work with the word “Encyclopédie”. I believe it could work since that word is in a Wikipedia URL http://fr.wikipedia.org/wiki/Encyclopédie . But since it’s a character encoding issue, it could be very difficult.

  • http://johnwright.me/blog John Wright

    In fact it appears the word “Encyclopédie” has caused problems with the Wikipedia link in the comment above. Could be very tricky to get to work.

  • Pamawill

     Nice Work!
    However, if the output is variable and not static (e.g. the path to abstract depends on the “thing” or category you searching - 
    is there any way to point into the array with indicies? 
    I cannot get this run:
    Any Idea?Cheers!

  • http://johnwright.me/blog John Wright

    Thanks Pamawill!

    I see what you’re saying. So far I haven’t found a case where the output isn’t static, but I’d be interested to look at one if you have an example. 

    I’m not sure how to use indices in this case but I don’t think it would prove to be a reliable method if it’s possible. By best suggestion at the moment would be to find as many path cases as possible and check for each one in code. 

    Example…

    if(isset($responseArray["results"]["bindings"][0]["abstract"]["value"])) 

         //use it 
    }  else if(isset($responseArray["results"]["bindings"][0]["some-other"]["path"]))
    {
        //use it

    etc..

  • Pamawill

    Hi John! Thanks for the quick answer!

    In my case I am querying user inputs (i.e. the variable $term is given by the user). As I am now looking for properties with a different amount of results, the array size is not fix prior.

    Among others, I am querying Yago classes. The amount of classes defined differer between 0 and sometimes up to 5.
    Same story with dbpedia categories.

    I want to put my results in a table and show the user declared categories and so on.
    So i was thinking if I could just go in a loop through the array I would just print all information (as they are usefull), but I dont know the order when which path name appears.

    I hope that makes it a bit clearer :)

  • http://johnwright.me/blog John Wright

    Ah I see. In this article, the example is a very specific query, which is getting the “abstract” from an DBPedia entry. In fact the “$term” must be a Wikipedia article entry (case sensitive) for it to work (see Section 2). 

    In your case, you would need to create different queries and examine the different kinds of output you are getting for each query and then write function(s) to display the results in a table form. 

  • http://twitter.com/ikhfan Ikhsan Fanani

    thanks for sharing this! ;) is there any way to run a query from multiple data sources (not just dbpedia)? thanks

  • http://johnwright.me/blog John Wright

    I remember having the same question when first learning about SPARQL. I still don’t have the answer worked out. At this point, I only know how to query one SPARQL endpoint at a time.

  • Chris Wren

    thanks for this, the Zend Http_Client extraction gets the wikipedia data in ‘wiki’ format which is difficult to parse, this is much better.

  • http://johnwright.me/blog John Wright

    @google-6c563788a0e03a6b5a1ea171fc6bf8fd:disqus  I’m glad it works for you. One issue with this way opposed to getting data directly from Wikipedia is that now there is another service in the middle, DBPedia (but this may or may not be an issue) .