Before you begin, you should have a Django project created and configured.
Install our library from PyPI, like so:
$ pip install django-memento-framework
settings.py and add this app to your
INSTALLED_APPS = ( # ... # other apps would be above this of course # ... 'memento', )
For the purposes of this introduction, imagine a Django project that archives a list of URLs each hour. A simple set of models could look something like this.
from django.db import models class Page(models.Model): """ A web page that is archived each hour. """ url = models.URLField() class ArchivedHTML(model.Model): """ A HTML snapshot of a single web page. """ page = models.ForeignKey(Page) timestamp = models.DateTimeField() html = models.TextField() def get_absolute_url(self): return '/%s/' % self.pk
We won’t go into all the code you’d need for this archive to actually work, but I bet you can imagine the sort of thing you’d need. Imagine it’s there and it’s been archiving URLs every hour for a few years.
A good first step is create a TimeMap that will index all of the archives in your database. This is much like the sitemaps created for traditional search engines, but expanded to include datetime metadata.
This library includes a generic view that will publish a queryset in Memento’s link format, that can be optionally paginated. It is designed to emulate Django’s built-in feed framework and operates in much the same way.
You could start by create a pattern in
urls.py that will accept a web page’s URL with the goal of returning all the snapshots for that resource in our example archive.
url( r'^timemap/(?P<url>.*)$', views.ExampleTimemapLinkList(), name="timemap-archivedhtml" ),
That URL now needs to connect to a view named
ExampleTimemapLinkList so let’s go make it in your
views.py file. It will inherit from our custom class and include configuration to pull the
Page object from the database that matches the submitted URL and convert a list of all its archived HTML into a Timemap.
from memento.timemap import TimemapLinkList class ExampleTimemapLinkList(TimemapLinkList): paginate_by = 1000 def get_object(self, request, url): """ Take the URL from the request and check if we have it in our Page model. """ return get_object_or_404(Page, url=url) def get_original_url(self, obj): """ Return the original URL being archived from the Page model after its been pulled. """ return obj.url def memento_list(self, obj): """ Pull a queryset set of all the ArchivedHTML records of the Page object. """ return ArchivedHTML.objects.filter(page=obj) def memento_datetime(self, item): """ Return the datetime when the archived was saved from each ArchivedURL object. """ return item.timestamp
Now if you fired up your test server and visited http://localhost:8000/timemap/link/http://example.com/ You’d get a paginated index of all the archives saved from the page
The natural next step is to create the other pillar of the Memento system, a TimeGate. It is a view that when given a request that includes a URL and a timestamp will redirect to the detail page for the nearest archive.
A good first step is to create an URL in
url( r'^timegate/(?P<url>.*)$', views.ExampleTimeGateView.as_view(), name="timegate" ),
Next you need to create the view included there.
class ExampleTimeGateView(TimeGateView): model = ArchivedHTML url_field = 'page__url' # You can walk across ForeignKeys like normal datetime_field = 'timestamp' timemap_pattern_name = "timemap-archived" # The name of the timemap URL we created above
This will now return a 302 redirect when given a URL in the request with a header of Accept-Datetime. A good way to check out if this is working is to use cURL on the command line.
$ url -X HEAD -i http://localhost:8000/timegate/http://archivedsite.com/ --header "Accept-Datetime: Fri, 1 May 2015 00:01:00 GMT" HTTP/1.1 302 Moved Temporarily Server: Apache/2.2.22 (Ubuntu) Link: <http://archivedsite.com/>; rel="original", <http://localhost:800/timemap/link/http://archivedsite.com/>; rel="timemap"; type="application/link-format" Location: http://www.example.com/screenshot/100/ Vary: accept-datetime
Memento detail view¶
The detail page for each snapshot in the archive should also be enriched to include Memento metadata in its response. This is done by using our extended version of Django’s generic DetailView.
from memento.timegate import MementoDetailView class ExampleMementoDetailView(MementoDetailView): model = ArchivedHTML datetime_field = 'timestamp' timemap_pattern_name = "timemap-archivedhtml" timegate_pattern_name = "timegate" def get_original_url(self, obj): return obj.page.url
That can be linked to a pattern in the
urls.py file like any other Django detail view.
url( r'^archived-html/(?P<pk>\d+)/$', views.ExampleMementoDetailView.as_view(), name='archivedhtml-detail' )
And that’s it! Get all that going and your archive is ready to be indexed by http://mementoweb.org/ and integrated into Memento-based services.