Blog

2009 posts (16)

Pages: < Previous 1–10 11–20 Next >
Ordering: Ascending Descending

1. Virtual guide mashup

2009-01-07 18:41:14 by Martynas Jusevičius

Some time ago I was asked to put online a simple business card-like website with information about guide services. I came up with the idea to illustrate the real guide service with a virtual guide service, that is, to present notable places of interest on an interactive map. And here is the result: www.guideservice.lt. While it still is in early stages, the technical solution could be interesting to some.

Basically, the website is a mashup based on Google Maps and DBpedia and implemented in JavaScript and SPARQL. Using Lee Feigenbaum's SPARQL client and about 50 lines of JavaScript code, it queries DBpedia's (which is semantically converted Wikipedia) SPARQL endpoint for resources that have geographical coordinates within the bounds of the map, and sets up to a 100 of them as markers on the map. When clicked, the marker opens an info window with title, description, and/or picture of the object. In that way one can virtually explore the area.

This release has several known bugs. Sometimes the request to DBpedia time-outs, and no markers are shown. Another issue is too many markers on the screen which after a while becomes pretty slow.
Does anyone know a way to remove markers that appear out of the map bounds? Currently it is done by simply calling clearOverlays() after the map is moved, but it also closes open info windows.

I think this illustrates well how simple yet powerful DBpedia and SPARQL are. On the other hand, it also proves the need for higher-level semantics. For example, Galle is shown as a place in Lithuania, when in fact it is a crater... on the Moon :)

Add a comment Comments (3)

2. Refresh

2009-01-13 02:16:48 by Martynas Jusevičius

You know you have been programming too long when you paste a file to a folder, and before pressing F5 to refresh Windows Explorer so that all files are sorted nicely again, you think twice if that was a safe request or you might overwrite something as a side effect.

Add a comment Comments (6)

3. XML to JSON

2009-01-21 23:14:07 by Martynas Jusevičius

Lately I got involved with some AJAX, namely dynamic maps and autocompletion. It is much more easier to use JSON as the serialization format than XML because no complex parsing is needed, JSON structures automagically become JavaScript objects. So I knew I need my webservice endpoints to return JSON, but the DIY Framework is based on XML serialization of objects.
Luckily, that was no problem at all, since JSON and XML are basically different syntaxes for the same data model, and I knew XML to JSON conversion could be done using an XSLT stylesheet. And of course I found several existing ones:

I wanted support for XSLT 1.0 and XML attributes and arrays, but none of them really did the job. So I decided to fix the issues of xml2json-xslt by adding attribute and array support, and here is the result:

xml2json.xsl

I'm not really sure if it follows any syntax convention, but it does the job for me. Attributes are serialized in the same fashion as elements, this can be switched off using include-attrs parameter (on by default). Children elements with the same name are grouped using Muenchian method and put into arrays.

I've only tested it on relatively simple and flat XML, so bug reports are welcome :)

Add a comment Comments (31)

4. Auto-login

2009-01-27 03:25:40 by Martynas Jusevičius

I wanted to implement an auto-login feature on one of the websites, also known as “remember me”. If a user was logged-in the last time he/she was using the website, next time he/she accesses it (in a new browser window) the login should be carried out automatically, without the need to authenticate again. This is common nowadays, found on many login-based Web applications.

Sounds pretty simple, but I looked over the PHP's session and cookie documentation and some examples, and had some second thoughts. Can the implementation be as simple as making the (cookie-based) session persistent, so that it never expires unless the user logs-out? Or maybe expires after some longer time, such as a month, that would probably be safer. This seems to be easily achieved by setting the PHP session cookie lifetime using session_set_cookie_params().
Maybe there are some caveats here? I'm aware of the session fixation exploit, but it seems that a cookie-based solution is one of the safer (not involving HTTPS), and widely-used as well.

Add a comment Comments (702)

5. OpenID

2009-02-07 13:25:46 by Martynas Jusevičius

Yesterday OpenID Denmark organized an event, which focused on practical and commercial advantages of OpenID support. Main speaker Nat Sakimura presented status, opportunities and roadmap for OpenID, as well as market development and use cases of OpenID in Japan, where it is extremely successful.

Friday also brought a couple exciting news for OpenID. Facebook decided to join OpenID Foundation and is expected to help improving user experience.
The other news come from Lithuania, where all new electronic personal identity cards issued after 1th of January 2009 now have a digital certificate (x.509) and full OpenID 2.0 and PAPE extension support. National Sertificate Center under the Ministry of Inferior will be the national OpenID provider (at openid.vrm.lt). It is currently in testing mode.

Add a comment Comments (2020)

6. REST APIs not that RESTful

2009-02-21 16:41:10 by Martynas Jusevičius

I looked at the design of some well-known public APIs for inspiration, especially for RESTful designs.
Last.fm API and Flickr API are pretty neat and quite similar. They offer a set of methods (such as artist.getSimilar to get artists similar to an artist, or photos.getInfo to get information about a photo) and different request methods to call them, such as REST, XML-RPC and SOAP.

However, when I looked into the RESTful method, neither Last.fm REST Requests nor Flickr REST Request Format impressed me because in my eyes the design is not really RESTful. With the aforementioned methods, they go like this:

http://ws.audioscrobbler.com/2.0/?method=artist.getSimilar&api_key=...
http://api.flickr.com/services/rest/?method=flickr.photos.getInfo&api_key=...

I would have expected something along the lines of:

http://ws.audioscrobbler.com/2.0/music/Fridge/similar
http://api.flickr.com/services/rest/photos/2733/&api_key=...

Add a comment Comments (6)

7. War on Internet Explorer 6

2009-03-05 10:58:51 by Martynas Jusevičius

The Internet Explorer 6 browser was first released by Microsoft in 2001 and is well deprecated by today's standards. It has been long hated by Web developers for its bugs and lack of support for Web standards. Recently some of them got fed up with that, and an action has been started to encourage IE 6 users to upgrade to a modern browser such as Firefox, Opera, or Safari. It seems to have started in Norway, but has now spread across Scandinavia and as far as Australia. There are ready-made widgets that Web developers can include in their sites to warn IE 6 users about the problem.

I guess there have been similar attempts before, but this time it is becoming widespread, probably because it has been initiated by some well-known big sites collectively. I have been wanting to do a similar thing for quite a while, but did not dare to annoy the users. Now I might actually have a reason to implement it in support of the action. Hopefully we all get rid of IE 6 soon.

Add a comment Comments (57)

8. E-book formats

2009-03-11 20:06:27 by Martynas Jusevičius

I was recently doing an analysis of e-book formats for a client, and can share some of my findings here.

Criteria

Here are the main criteria of a good e-book format that I came up with (in the order of importance from the most important one):

# Criteria Description
1 Open standard The format is documented in detail and available publicly free of charge
2 XML-based The e-book file format itself (not the source formats) is based on XML and can be produced using standard XML tools only
3 End-user-oriented The format is meant to be used for the final workflow product — the e-book file downloaded to user's (software or hardware) reader
4 Reader-independent There is more than one reader for this format, released by different vendors
5 Workflow-oriented The format can be easily transformed to and from a variety of different e-book formats and used in the whole workflow (e. g. exchange and storage) of e-book production
6 Vendor-independent No vendor-dependent tools are necessary to produce the book file
7 Reflowable Support for rearrangement of text in respect to the size of the reader window instead of zooming and panning it to see the full document

If we stick to the criteria and look for a long-term solution, formats like Mobipocket and iSilo fall through at stage one for being binary, proprietary, and/or vendor-dependent. They require commercial software to transform source documents into end-user e-book files. That leaves us with basically 2 file formats.

PDF

Adobe's Portable Document Format (PDF) is the first one of them. Actually, it does not qualify for the criteria above either, but it is so popular and has a strong support that it cannot be ignored in such a consideration.

The main cause of PDF's limitations is the historical relation to printed media. Its binary structure does not preserve many of the semantics of the original layout and simply places characters and graphical elements at specified coordinates on the page's surface. For example, a table cannot be extracted from a PDF document, since it appears as text placed at specific places and a bunch of graphic lines. Alternatively, a table may exist as an image in the file. Both variants result in increased file size.

Since each document is intended for a specific page size, it is problematic to display it on screens of limited size or resolution, such as those of mobile devices. Reflow is only possible if the document was specifically marked for tagging at creation time, which excludes a majority of existing documents. Event then, the support for reflow is not guaranteed on the reader.

For these reasons PDF is meant for the final products in the digital publishing workflow, i. e. documents to be printed or displayed to the end-user. It cannot be easily used as an intermediary format in the workflow (format of source documents) and transformed to other formats. There are many software readers available, usually distributed free of charge, the best know of course being Adobe Acrobat. Most hardware readers support PDF as well.

There are also many PDF creation tools. Most of Adobe's products support the format, as well as office packages such as OpenOffice and Microsoft Office, TeX and DocBook tools. There exists also a number of PDF printers which create a PDF image instead of an actual printer output.

Choices of editing software are limited because of the format's complexities.

PDF supports encryption and DRM.

ePub

ePub is a relatively new XML-based format for reflowable digital books and publications, which seeks to increase interoperability between software as well as hardware tools in the digital publishing industry.

ePub is an open standard, defined not by some vendor but by a standards body containing members from the industry, namely the International Digital Publishing Forum (IDPF). It is meant to be used both in workflow as well as end-user reader format.

ePub is actually a set of 3 related standards:

OPS defines a standard presentation of electronic books which would be accessible on different readers and displayed equivalently. OPS 2.0 is based on XML and uses XHTML 1.1 vocabulary (DTD) with some extensions (such as SVG) to describe content structure and CSS 2 to describe its style. That means OPS and ePub can be easily produced and consumed using standard XML and CSS tools such as XSLT, no special software is necessary. Readers can also be easily implemented on existing Web browser libraries.

OPF defines the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication, for example, describes its components (markup files, images etc.), metadata, and table of contents. OCF is a general-purpose container technology based on the widely-used ZIP compression format. It defines the standard mechanism by which all components of an electronic publication can be packaged together into a single file for transmission, delivery and archival.

DRM is not integrated into these standards, but may be layered on top (for example, implemented in the reader application). However, many experts discourage the use of DRM (with ePub and in general) and even blame it for the unpopularity of e-books and for failures already experienced by the music industry, and many publishers abandon it as well, or use a so-called “social” DRM.

The format is being adopted by publishers such as Penguin Books, O'Reilly, as well as public libraries, and backed by associations in the digital publishing industry such as IDPF and AAP (Association of American Publishers).

A variety of readers is available and increasing, both software and hardware, including iPhone with the use of Stanza application and smartphones using FBReader.

Conclusions

Currently ePub seems to be the most reasonable choice as an e-book file format.

First of all, it is an open standard and not related to any specific vendor. No fees have to be paid for implementing it, and no specific software or hardware readers need to be used, nor any specific creation software needs to be purchased.

Secondly, the format is based on XHTML/XML+CSS and ZIP, technologies that are widely supported and implemented and have a strong tool base. Thirdly, the format is rather unique in the sense that it be used both as workflow/source format in the publishing pipeline, and as end-user format, which minimizes the need to support several standards. It does not lose layout semantics when packaged, therefore it can be used to store and/or exchange e-book files.

And finally, the format has received some strong backing by the digital publishing industry as well as public institutions such as libraries. Several well-known publishing houses are offering ePub among its e-book formats and report its increasing popularity. Software and hardware reader support is constantly increasing, as is support by commercial publishing software.

Add a comment Comments (1421)

9. Tim Berners-Lee on Linked Data at TED

2009-03-14 03:14:29 by Martynas Jusevičius

A great and accessible presentation on Linked Data (the core technology behind Semantic Web, together with RDF) and DBpedia at the TED conference by the founder of the World Wide Web:

Add a comment Comments (532)

10. Regexp string replace with PHP XSL

2009-03-19 13:26:52 by Martynas Jusevičius

While XSLT 2.0 has regular expression support, it is missing in XSLT 1.0. Tasks like string pattern matching and replace cannot be done in native XSLT 1.0 code, extension functions are needed for that. Luckily, it can be achieved with PHP XSL in the same fashion as URL-encoding, by registering PHP function support on XSLT processor and calling native PHP functions as XPath functions in PHP namespace.

One of the common cases for this functionality when using XSLT as View templates would be replacement of http:// links in text (for example, message content from a database) with actual <a> hyperlink elements. Here is a quick-and-dirty PHP function that does that:

abstract class FrontEndView extends XSLTView
{
	// ...

	public static function replaceLinks($text)
	{
		$text = htmlspecialchars($text);
		$text = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "<a href=\"\\0\">\\0</a>", $text);
		$text = "<div xmlns=\"http://www.w3.org/1999/xhtml\">".$text."</div>";
		$doc = new DOMDocument();
		$doc->loadXML($text);
		return $doc->documentElement;
	}
}

It replaces link text with hyperlinks at string level, then loads it to a DOM document and returns it. Before it can be accessed from a stylesheet, registerPhpFunctions() needs to be called on the XSLT processor instance.

The XSLT code then looks like this:

<xsl:copy-of select="php:function('FrontEndView::replaceLinks', string($text))/node()"/>

Notice that while you have to send string content to the function, what you get back is a node list, because the replaced text becomes XML elements, which are mixed with the unreplaced text nodes.
Don't forget to register the PHP namespace (http://php.net/xsl) in your stylesheet and use exclude-result-prefixes="php" for it not to appear in the result document.

I guess similar functions could be used to implement BBCode or wiki syntax support.

Add a comment Comments (672)

Pages: < Previous 1–10 11–20 Next >
Ordering: Ascending Descending