How to make Delicious Library 2 import items from virtually any website

Delicious Library (DL) is a great piece of software. I especially love the barcode scanning feature that is unfortunately limited to items present in the Amazon catalog... which is pretty huge but not enough for everyone.

I needed to scan more than 500 italian DVDs unavailable in the Amazon store, so I tricked DL to think that amazon.ca was in reality my own computer (localhost). Every time I scan a DVD the call is redirected to a script on my Mac that fetches DVD information from an arbitrary website instead of the Amazon store.

But let's start from the beninning.

Some theory

Please note that this is just a proof of concept. I'm going to tell you how to import data to DL from virtually any website, but the PHP script I present is really rough and incomplete. I am not responsible for any damage this tutorial blah blah blah...

What we are going to do is:

  1. Redirect all DNS calls from webservices.amazon.ca (or .jp or .fr or .de, you choose) to localhost;
  2. Configure a virtual domain on Apache so that our Mac responds to webservices.amazon.ca;
  3. Write a PHP script that reads the query string sent by DL and redirects the call to the website we want to grab the information from. Return an XML file that DL can read;
  4. Start barcode scanning and enjoy :)

It's easier than it seems.

1. DNS redirecting

I like to do it the linux way, all you need to do is edit the file /etc/hosts. The Mac way probably is to tweak some NetInfo Manager records.

Open terminal and give the following command:

$ sudo nano /etc/hosts

You'll enter the nano text editor. Add the following in a new line at the end of the file (replace 192.168.123.11 with your actual IP):

192.168.123.11	webservices.amazon.ca

192.168.123.11 is my Mac IP, remember to use yours.

Press CTRL+O to save the changes and CTRL+X to exit.

2. Add a virtual host to Apache

I use MAMP as webserver for my development environment, but you can use the Apache that comes with each and every Mac as well.

Add the following text at the end of your apache config file (/Applications/MAMP/conf/apache/httpd.conf if you use MAMP).

<VirtualHost *:80>
	<Directory "/Users/*YOURUSER*/Sites/dl">
		AllowOverride None
		Order allow,deny
		Allow from all

		RewriteEngine On
		RewriteRule ^onca/xml$ /index.php [QSA,L]
	</Directory>

	DocumentRoot /Users/*YOURUSER*/Sites/dl
	ServerName webservices.amazon.ca
	ErrorLog logs/dl-error_log
	CustomLog logs/dl-access_log common
</VirtualHost>

Replace *YOURUSER* with your user's name. If it is commented remember to uncomment (remove the leading #) the line

NameVirtualHost *

You may configure the virtual host the way you prefer, just remember to add the rewrite rule

RewriteRule ^onca/xml$ /index.php [QSA,L]

DL sends its requests to webservices.amazon.ca/onca/xml?..., with that rewrite rule we are redirecting all the DL calls to our script located in ~/Sites/dl/index.php.

3. The PHP script

Here I'll show you a nasty PHP script that searches the italian website www.internetbookshop.it for DVDs and Blu-Rays from a scanned barcode. It does not search items by title/author/etc. The script can be (and should be) rewritten to better implement all the search features offered by DL.

Create a folder named dl under ~/Sites/. Fire up your editor of choise and paste the following code (or better download the ready to use file):

<?php
$itemId = $_GET['ItemId'];			// EAN
$searchIndex = $_GET['SearchIndex'];		// Books, Video, Music, ...
$associateTag = $_GET['AssociateTag'];		// deliciousmoXX-XX
$version = $_GET['Version'];			// YYYY-MM-DD

$responseGroup = $_GET['ResponseGroup'];
$responseArray = explode(',',$responseGroup);

$result = '';

// I search only for "video"
if( $itemId && $searchIndex=="Video" )
{
	$url = "http://www.internetbookshop.it/dvd/ser/serdsp.asp?e=" . $itemId;
	$result = file_get_contents($url);
	
	// "DVD non presente nel catalogo" is the string I use to understand if the DVD is present in the site's database
	if( $result && stristr($result, "DVD non presente nel catalogo")===false )
	{
		$result = html_entity_decode($result);		// convert html entities back to real characters
		$result = utf8_encode($result);			// convert to UTF8
		
		$chars = 'a-zA-Z0-9 \|\\!"£$%&/\(\)=\?\'^,;\.:\-_\[\]\*\+àèéìòù<>';
		
		// Title
		$regex = '#<b>Titolo</b></td><td width="80%" valign="top" class="lbarrasup">([' . $chars . ']*)</td></tr>#';
		$title = preg_match($regex, $result, $matches) ? $matches[1] : '';
		
		
		// Actors
		$regex = '#<tr><td align="left" valign="top"><b>Principali interpreti</b></td><td valign="top">([' . $chars . ']*)</td></tr>#';
		$actors = preg_match($regex, $result, $matches) ? explode('; ', strip_tags($matches[1])) : '';
		
		
		// Aspect ratio
		$regex = '#Formato schermo ([12],[0-9]+:1)#i';
		$aspectRatio = preg_match($regex, $result, $matches) ? $matches[1] : '';
		
		
		// Languages
		$regex = '#<b>Lingua audio</b></td><td valign="top">([' . $chars . ']*)</td></tr><tr><td width="20%" align="left" valign="top"><b>Lingua sottotitoli</b>#';
		$languages = preg_match($regex, $result, $matches) ? strip_tags($matches[1]) : '';
		
		
		// Runtime
		$regex = '#<td valign="top">([0-9]*)..min\.</td>#';
		$runtime = preg_match($regex, $result, $matches) ? $matches[1] : '';
		
		// Label & Release date
		$regex = '#<b>Produzione</b></td><td valign="top">([' . $chars . ']*), ([0-9]{4})</td></tr><tr><td width="20%" align="left" valign="top"><b>Dati tecnici</b>#';
		$label = preg_match($regex, $result, $matches) ? $matches[1] : '';
		$releaseDate = preg_match($regex, $result, $matches) ? $matches[2] : '';
		
		
		// Director
		$regex = '#<b>Regia</b></td><td valign="top">([' . $chars . ']*)</td></tr><tr><td align="left" valign="top"><b>Principali interpreti</b>#';
		$director = preg_match($regex, $result, $matches) ? strip_tags($matches[1]) : '';
		
		
		// Items num
		$regex = '#<b>Numero dischi</b></td><td valign="top">([0-9]*) </td></tr>#';
		$itemsNum = preg_match($regex, $result, $matches) ? $matches[1] : '';
		
		
		// is bluray?
		$media = stristr($result, '/dvd/images/im/blueray.jpg'!==false) ? "Blu-ray" : "DVD";
	}
}

// No cache please
header("Cache-Control: no-cache, must-revalidate");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Content-Type: text/xml; charset=utf-8");

// Let's compile the XML. You can use a SOAP library if you want (probably wiser)
echo '<' . '?xml version="1.0" encoding="UTF-8"?' . '>';
?>

<ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2007-10-29">
	<OperationRequest>
		<HTTPHeaders>
			<Header Name="UserAgent" Value="<?= htmlspecialchars($_SERVER['HTTP_USER_AGENT'], ENT_QUOTES, "UTF-8") ?>"></Header>
		</HTTPHeaders>
		<RequestId>0C5HDFD8FKZHERVQAQNT</RequestId>
		<Arguments>
			<Argument Name="SearchIndex" Value="Video"></Argument>
			<Argument Name="AssociateTag" Value="<?= $associateTag ?>"></Argument>
			<Argument Name="ItemId" Value="<?= $itemId ?>"></Argument>
			<Argument Name="ReviewSort" Value="-HelpfulVotes"/>
			<Argument Name="ItemPage" Value="1"></Argument>
			<Argument Name="Service" Value="AWSECommerceService"></Argument>
			<Argument Name="ResponseGroup" Value="<?= $responseGroup ?>"></Argument>
			<Argument Name="Operation" Value="ItemLookup"></Argument>
			<Argument Name="IdType" Value="EAN"></Argument>
			<Argument Name="AWSAccessKeyId" Value="0DQQYTA1EG8SZC8F5N02"></Argument>
			<Argument Name="Version" Value="<?= $version ?>"></Argument>
		</Arguments>
		<RequestProcessingTime>0.0277700424194336</RequestProcessingTime>
	</OperationRequest>

	<Items>
		<Request>
			<IsValid>True</IsValid>
			<ItemLookupRequest>
				<IdType>EAN</IdType>
				<ItemId><?= $itemId ?></ItemId>
				<?php foreach( $responseArray as $value ): ?>
				<ResponseGroup><?= $value ?></ResponseGroup>
				<?php endforeach; ?>
				<ReviewSort>-HelpfulVotes</ReviewSort>
				<SearchIndex>Video</SearchIndex>
			</ItemLookupRequest>

<?php if( !$result ): ?>
			<Errors>
				<Error>
					<Code>AWS.InvalidParameterValue</Code>
					<Message><?= $itemId ?> is not a valid value for ItemId. Please change this value and retry your request.</Message>
				</Error>
			</Errors>
<?php endif; ?>

		</Request>

<?php if( $result ): ?>
		<Item>
			<ASIN>B0015L4WC0</ASIN>
			<DetailPageURL><?= htmlspecialchars($url, ENT_QUOTES, "UTF-8") ?></DetailPageURL>
			<ImageSets>
				<ImageSet Category="primary">
					<LargeImage>
						<URL><?= htmlspecialchars("http://giotto.internetbookshop.it/cop/copdjc.asp?e=" . $itemId, ENT_QUOTES, "UTF-8") ?></URL>
						<Height Units="pixels">200</Height>
						<Width Units="pixels">287</Width>
					</LargeImage>
				</ImageSet>
			</ImageSets>

			<ItemAttributes>
				<?php foreach( $actors as $value ): ?>
				<Actor><?= htmlspecialchars($value, ENT_QUOTES, "UTF-8") ?></Actor>
				<?php endforeach; ?>
				<AspectRatio><?= $aspectRatio ?></AspectRatio>
				<AudienceRating></AudienceRating>
				<Binding><?= $media ?></Binding>
				<RegionCode>2</RegionCode>
				<Director><?= htmlspecialchars($director, ENT_QUOTES, "UTF-8") ?></Director>
				<EAN><?= $itemId ?></EAN>
				<Label><?= htmlspecialchars($label, ENT_QUOTES, "UTF-8") ?></Label>
				<Languages>
					<Language>
						<Name>Italiano</Name>
						<Type>Original Language</Type>
						<AudioFormat>Dolby Digital 5.1</AudioFormat>
					</Language>
				</Languages>
				<Manufacturer><?= htmlspecialchars($label, ENT_QUOTES, "UTF-8") ?></Manufacturer>
				<ProductGroup>DVD</ProductGroup>
				<ProductTypeName>ABIS_DVD</ProductTypeName>
				<Publisher><?= htmlspecialchars($label, ENT_QUOTES, "UTF-8") ?></Publisher>
				<NumberOfItems><?= $itemsNum ?></NumberOfItems>
				<ReleaseDate><?= $releaseDate ?></ReleaseDate>
				<RunningTime Units="minutes"><?= $runtime ?></RunningTime>
				<Title><?= htmlspecialchars($title, ENT_QUOTES, "UTF-8") ?></Title>
				<Studio><?= htmlspecialchars($label, ENT_QUOTES, "UTF-8") ?></Studio>
			</ItemAttributes>
		</Item>
<?php endif; ?>

	</Items>
</ItemLookupResponse>

Save the file as index.php under the just created dl folder.

All this script does is getting the barcode

$itemId = $_GET['ItemId'];

and sending it to www.internetbookshop.it

$url = "http://www.internetbookshop.it/dvd/ser/serdsp.asp?e=" . $itemId;
$result = file_get_contents($url);

Everything else is regex voodoo to grab information from the HTML file returned by internetbookshop.

Start (or restart) MAMP, open DL2 and enjoy!

To test if it works I compiled for you a EAN code. Print it and let our script do the job (print it in high quality). If everything went well you'll know the italian title of the movie Working Girl (I know it's not so geeky as a movie... but it's the first EAN I found).

EAN

Or you can point your browser to the script on this site to see how the correct italian data is returned for that given barcode (but it's less fun).

Please note that DL will return an error when scanning codes with this method, but the item is added to the collection nonetheless. This is perfectly normal as we are not providing the "If you liked this movie, you'll also like..." information.

Appendix: How I did it (so you can do it by yourself)

Someone asked how I reverse engeneered the DL calls to Amazon.ca. Actually It was not so difficult, and I am in no way a network/php/apache/whatever guru.

The first thing was to find where DL was sending its requests. That was easy. Macs come with a great terminal application called tcpdump that -huh- dumps everything that passes through a network interface. The following command prints on screen your internet traffic as soon as it occurs (en1 is my network interface)

$ sudo tcpdump -v -i en1

When you scan a barcode you'll see that DL sends to the web something like this:

18:13:23.743049 IP (tos 0x0, ttl 64, id 34346, offset 0, flags [DF], proto TCP (6), length 40) neb.56133 > webservices.amazon.com.http: ., cksum 0xaec6 (correct), ack 24108 win 65535
18:13:23.743617 IP (tos 0x20, ttl 54, id 45078, offset 0, flags [DF], proto TCP (6), length 932) webservices.amazon.com.http > neb.56133: P 24108:25000(892) ack 3235 win 6876

BINGO! webservices.amazon.com (.ca, .de, .fr, .jp, etc) is the website DL asks items data for. So I redirected webservices.amazon.com to localhost, and analyzing my local apache log I found the exact URL called by DL (you can find it with tcpdump as well). That is:

/onca/xml?Service=AWSECommerceService&AWSAccessKeyId=0DQQYTA1EG8SZC8F5N02&AssociateTag=deliciousmons-22&Version=2007-10-29&Operation=ItemLookup&ResponseGroup=Small,ItemAttributes,Tracks,Images,BrowseNodes,OfferSummary,EditorialReview,Reviews&ReviewSort=-HelpfulVotes&IdType=EAN&ItemId=8010312033360&SearchIndex=Video&ItemPage=1

From there it was easy to grab the barcode (ItemId) and send it to another site. With the same technique you can find what query string is used to search items by title.

Lastly I needed to know how Amazon returns the item information. All I had to do was to send a valid request to Amazon with my browser. Have a look at what I am a Legend looks like (use Firefox as it formats XML files better than Safari).

Nothing simpler than a (SOAP) XML file. The rest is history.


In this article I showed how to serve Delicious Library with information grabbed from an italian site instead of Amazon store. The same procedure can be adapted to fetch data from virtually any website.

Please send comments, corrections, ideas to matteo.spinelli > gmail. If there's enough request maybe I could add features to the script and make it a little fancier.

Deliciously yours,
Matteo Spinelli
cubiq.org

Document last updated: 2008-05-31 @ 10:50