Microformats & the semanantic web Web 3.0

28Jan/120

Defining Meaning on the Semantic Web

Mike Bergman recently asked the deceptively simple question, what do things mean on the semantic web? He explains, “The crowning achievement of the semantic Web is the simple use of URIs to identify data. Further, if the URI identifier can resolve to a representation of that data, it now becomes an integral part of the HTTP access protocol of the Web while providing a unique identifier for the data. These innovations provide the basis for distributed data at global scale, all accessible via Web devices such as browsers and smartphones that are now a ubiquitous part of our daily lives.” continued…

New Career Opportunities Daily: The best jobs in media.

27Jan/120

A Positive Take on Google’s New Privacy Policy

Christopher Dawson has commented on Google’s recent changes to their privacy policy. Dawson writes, “I live, eat, breathe, work, and play Google and there aren’t many people more aware of Google’s business model and the amount of data it collects than I. So is it just sheer stupidity and naiveté that has me utterly embracing the Google ecosystem and relatively unconcerned about newly announced privacy policies that have caused so much consternation this week? Before you jump down to the talkbacks to tell me how stupid I really am, read on for another couple paragraphs.” continued…

New Career Opportunities Daily: The best jobs in media.

26Jan/120

‘Semantic Web Software Must be Easy to Use’

Lee Feigenbaum recently argued that “semantic web software must be easy to use.” He explains, “On the surface, this sounds a bit trite. Surely we should demand that all software be easy to use, right? While ease of use is clearly an important goal in software design in general, I’d argue that it’s absolutely crucial to successfully realizing the value from Semantic Web software.” continued…

New Career Opportunities Daily: The best jobs in media.

25Jan/120

Nice reading on Semantic Search

I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

One of the “associations” I had, maybe somewhat surprisingly, is with another paper I read lately, namely a report on basic profiles for Linked Data[2]. In that paper Nally et al. look at what “subsets” of current Semantic Web specifications could be defined, as “profiles”, for the purpose of publishing and using Linked Data. This was also a general topic at a W3C Workshop on Linked Data Patterns at the end of last year (see also the final report of the event) and it is not a secret that W3C is considering setting up a relevant Working Group in the near future. Well, the experiences of an engine like SWSE might come very handy here. For example, SWSE uses a subset of the OWL 2 RL Profile for inferencing; that may be a good input for a possible Linked Data profile (although the differences are really minor, if one looks at the appendix of the paper that lists the rule sets the engine uses). The idea of “Authoritative Reasoning” is also interesting and possibly relevant; that approach makes a lot of pragmatic sense, I wonder whether this is not something that should be, somehow, documented for a general use. And I am sure there are more: In general, analyzing the experiences of major Semantic Web search engines on handling Linked Data might provide a great set of input for such pragmatic work.

I was also wondering about a very different issue. A great deal of work had to be done in SWSE on the proper handling of owl:sameAs. On the other hand, one of the recurring discussions on various mailing list and elsewhere is on whether the usage of this property is semantically o.k. or not (see, e.g., [3]). A possible alternative would be to define (beyond owl:sameAs) a set of properties borrowed from the SKOS Recommendation, like closeMatch, exactMatch, broadMatch, etc. It is almost trivial to generalize these SKOS properties for the general case but, reading this paper, I was wondering: what effect would such predicates have on search? Would it make it more complicated or, in fact, would such predicates make the life of search engines easier by providing “hints” that could be used for the user interface? Or both? Or is it already too late, because the ubiquitous usage of owl:sameAs is already so prevalent that it is not worth touching that stuff? I do not have a clear answer at this moment…

Thanks to the authors!

  1. A. Hogan, et al., “€œSearching and Browsing Linked Data with SWSE: the Semantic Web Search Engine”€, Journal of Web Semantics, vol. 4, no. December, pp. 365-401, 2011.
  2. M. Nally and S. Speicher, “Toward a Basic Profile for Linked Data”, IBM developersWork, 2011.
  3. H. Halpin, et al. “When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data”, Proceedings of the International Semantic Web Conference, pp. 305-320, 2010

Filed under: Semantic Web, Work Related Tagged: Linked Data, OWL, semantic search, Semantic Web, Web search engine

25Jan/120

New Open Gov Project: MA’s Open Checkbook

Andy Oram recently commented on Massachusetts’ newest open government venture, Open Checkbook. Oram writes, “On December 5, Massachusetts Governor Deval Patrick joined with state treasurer Steven Grossman to create an open government initiative with the promising moniker Open Checkbook. The site launched to some acclaim and has received over 220,000 hits. I decided to take a look at what’s offered and what’s missing from this site, and to ask someone in the government here in Massachusetts to describe their thinking in creating the site. The results can give us some insight into the effort it takes at each stage to release government data–and even more significantly, what it takes to increase the data’s value.” continued…

New Career Opportunities Daily: The best jobs in media.

21Jan/120

Semantic Web Jobs: Wipro Limited

Wipro Limited is looking for Semantic Web Gurus in Bangalore, India. According to the post, “The Strategic Technology Office at Wipro Limited is looking for Data Scientists & Stream Data Mining, Semantic Web Gurus to join at our Bangalore HQ. We are a multi-disciplinary team of technical architects, infrastructure engineers, software developers & customer evangelists – all striving for a single goal: make Big Data more meaningful to customers.” continued…

New Career Opportunities Daily: The best jobs in media.

20Jan/120

Mapping the English Deprivation Stats: Part 3 – Using JavaScript to build and execute the query

It’s been a while coming, but (finally), I’d like to present this 3rd article in the series. Together, we’re building a Linked Data application to map the English Indices of Multiple Deprivation Stats. The final source code for the app can be found on github. If you missed the last blog post in the series, you can find it here.

Note: These articles use Github gists for showing code-snippets, and if you’re viewing this in a feed reader they might not show up. So I recommend you read it on the web instead.

What the app will do:

The basic principle of the app is that the user pans or zooms or searches by postcode to change which bit of the map they are looking at. The app detects that, retrieves the required data using SPARQL, then draws it on the map. You can play with the finished app on the OpenDataCommunities site here.

Last time…

In the last article we explored how the deprivation data was stored, wrote a template SPARQL query, and drew a standard Google map centred on a starting location.

By the way, I’ve created a new Github repo especially for this blog-series, so that you can watch the application evolve through the git commits. Everything that we did in parts 1 and 2 are in commit bce302.

In this article, we’ll write some JavaScript to build and execute the right SPARQL query (based on the template from last time), to retrieve the deprivation data about the LSOAs currently displayed on the map. The code from this article is in commit 17e211.

A note about CORS

On the web server on which OpenDataCommunities is hosted, we’ve enabled Cross-Origin Resource Sharing (CORS), so that Ajax requests for the data can be made from sites not on the same domain. However, for this to be honoured, you’ll need to host the app on a web server (such as Apache), while you develop it. Just opening the html from your disk in a browser won’t work.

Map Manager

Much of the code we’ll write in this article will be in a file called map-manager.js, in the javascripts/swirrl directory. As you might expect, the MapManager will be responsible for dealing with the interaction with the map. The following code snippet explains the structure of the file:

What’s going to happen?…

We’re going to add some code to the main JavaScript closure (in the HTML file) which will listen for Google maps idle events (i.e. when the map stops being zoomed or dragged).

When we notice that the map has become idle, we’ll ask the MapManager to refresh the map.

The MapManager will report back to the main closure, using jQuery events, so that it can start listening again for idle events.

The MapManager Constructor

The constructor for the MapManager takes the Google Map object and the initial score domain (from the drop down) as it’s parameters, and sets some stuff up.

Interesting things to note:

  • We assign this into a variable called self so that when this gets set to other things (in jQuery callbacks etc), we can always get a reference to the current MapManager object.
  • The lsoaDataRetrieved event will be triggered by another function when the deprivation data has been retrieved and is ready for use. For now, we’re just going to log out the results (there’s a gist later on with an example log-output), and tell the calling code that we’ve finished.
  • Finally, there’s a little bit of code to clean up if there are errors.

The refresh function

The refresh function (along with the constructor) will form the public API for the MapManager. Let’s add it to the prototype:

  • When execution of the function begins, we’ll trigger the started event to tell others that we’re starting the process of refreshing the map.
  • If the Google map’s zoom level is acceptable (if we’re too zoomed-out there’ll be too much data to handle), we proceed. Otherwise, we do nothing other than trigger a couple of events.
  • the getTiles function (for brevity, not included in this article – see the source on github for details*), interrogates the google map and determines which 0.1×0.1 lat/long tiles are visible in the viewport.
  • Next we remove any tiles that are no longer visible in the viewport, and add any new ones by comparing the results from getTiles with the set of tiles from the previous time refresh was called. The reason we do this is for efficiency: we don’t need to request data for tiles for which we already have data.
  • Finally, we call getLsoaData, passing in the set of tiles currently visible.

Getting the LSOA Data

The getLsoaData function is responsible for building the right SPARQL query to call (based on the template query in the previous article), and executing it against the OpenDataCommunities SPARQL endpoint.

That code might look a bit complicated, but it’s not really. Let me break it down a bit.

  • buildSparql is a nested function (as it wont be needed outside the scope of getLsoaData). It’s responsible for interpolating the bottom-left and top-right lat/long values of tiles into the template SPARQL query.
  • callAjaxSparqlPaging is another nested function, which calls itself recursively until all the pages of data have been retrieved (the SPARQL endpoint will return the data in 1000-result ‘pages’).
  • For each result of each page (in the $.ajax success callback), we call setLsoaData which just sets the data for an LSOA (such as label, centroid lat/long, score, and URI) into a nested object (with the top level properties being the tile’s corners, and the inner objects’ properties being the notation of the LSOA (e.g. ‘E01005061’). See the example log-output later on for an example of this structure.
  • The code at the end of the getLsoaData function, calls the SPARQL query for each tile in our list of tiles.

Calling the refresh function

As I mentioned earlier, the main JavaScript closure in the HTML file is responsible for instantiating a MapManager and calling refresh. Let’s see what that looks like:

  • The first bit of code is what we set up in the previous article to just create the Google map centred on Manchester.
  • Next, we instantiate a MapManager, with the map object.
  • At the end of this code-snippet, we define a bindMapIdle function, and then call it. As I described at the top of the article, this calls the refresh function when we see that the map is idle.
  • The started event listener makes a note of the time, and then removes the idleListener if it exists (so that we only call refresh once at a time). It also shows the busy ‘spinner’.
  • Once refresh has finished, we log how long it took, hide the spinner, and re-bind the idle listener.
  • The zoomToWide and zoomOK handlers, just show and hide the zoom warning message (shown if the user zooms out too much and we can’t show all the data).

Let’s run it!

We’re now ready to see what all our code does.

If you’ve been following along, just open the map.html file (from your web server) in your browser. (If you’ve not been following along, just check out the code from this commit in Github).

Open your browser’s debug/console window (i.e. Web Inspector in Chrome/Safari, or Firebug in Firefox), and hit refresh. You should see a bunch of log-output lines including information on what requests were made against SPARQL endpoint and how long it took (“busy duration”).

Just before the busy duration message, there should be an entry that looks like this: [>Object] (in Webkit-based browsers, at least). Click the triangle to expand the object, and you can see what data we have for the LSOAs with centroids in the current viewport. (This is what is logged out from the lsoaDataRetrieved handler in the MapManager constructor.)

As I mentioned earlier, the top-level properties are the tiles. Each tile is an object whose properties which correspond to the LSOAs. Each LSOA contains data such as it’s label, the lat/long, the score for the currently selected domain, and the URI of the LSOA.

For example:

Try dragging and zooming the map, and watch what happens in the console window. The further zoomed out you are, the more tiles of data you will see.

Next time

In the next instalment (I promise not to leave it so long this time), we’ll get the boundary information for all the LSOAs and plot that on the map as polygons. If there’s time, we’ll also make these interactive.


*To be honest, I’m not particularly proud of this bit of code – it’s pretty hacky, but it does the job!

20Jan/120

LexisNexis Releases New Version of Lexis Advance

Lexis Nexis has announced a new release of Lexis Advance which includes “content enriched using SRA’s industry-leading NetOwl® text and entity analytics technology, delivering a more sophisticated semantic search capability to enable legal professionals to conduct better, faster and more relevant research. As one key part of the Lexis Advance application, NetOwl’s entity and relationship extraction capabilities semantically enrich the vast amounts of text-based content offered to legal professional customers.” continued…

New Career Opportunities Daily: The best jobs in media.

19Jan/120

RDF, Linked Data, and the Library

Karen Coyle recently commented on the growing number of RDF and linked data projects in the field of library data. Coyle writes, “With the newly developed enthusiasm for RDF as the basis for library bibliographic data we are seeing a number of efforts to transform library data into this modern, web-friendly format. This is a positive development in many ways, but we need to be careful to make this transition cleanly without bringing along baggage from our past. Recent efforts have focused on translating library record formats into RDF with the result that we now have: ISBD in RDF, FRBR in RDF, [and] RDA in RDF, and will soon have MODS in RDF.” continued…

New Career Opportunities Daily: The best jobs in media.

18Jan/120

Google’s New Search plus Your World

Google has introduced a new feature to their search engine: Search, plus Your World. The official announcement states, “We’re transforming Google into a search engine that understands not only content, but also people and relationships. We began this transformation with Social Search, and today we’re taking another big step in this direction by introducing three new features: (1) Personal Results, which enable you to find information just for you, such as Google+ photos and posts—both your own and those shared specifically with you, that only you will be able to see on your results page; (2) Profiles in Search, both in autocomplete and results, which enable you to immediately find people you’re close to or might be interested in following; and, (3) People and Pages, which help you find people profiles and Google+ pages related to a specific topic or area of interest, and enable you to follow them with just a few clicks. Because behind most every query is a community.” continued…

New Career Opportunities Daily: The best jobs in media.