Markdown, Python

03 Mar 2016

Simple and robust, Markdown is great for web content. But as a user, I don’t like to remember stuff like the URLs to my product pictures. To make things easier for everyone, let’s write a Markdown extension that looks up those pictures in a database.

The problem

Adding images to Markdown text is often inconvenient. Chances are, your users are not going to enjoy manually finding image URLs and inserting images like the following:

![alt text](/some/path/to/electric-toothbrush-model-2016.jpg)

To make this easier for everyone, we want to enable our Markdown extension to look up product pictures in a database – using regular Markdown image links, but with a special URI scheme that we are going to call products:

![alt text](products:electric-toothbrush)

Matching the URI schemes

First off, let’s write the Regular Expression that will split up our URLs into the (optional) scheme and the actual URI.

URI_SCHEME_SPLIT_RE = re.compile(
    r'^((?P<scheme>[^:\/#]+):(?!\/\/))?(?P<src>.*)$')

This regular expression has named groups, conveniently splitting URLs into the parts we need:

http://domain.com/path
# => scheme: None, src: "http://domain.com/path"

mailto:you@domain.com
# => scheme: "mailto", src: "you@domain.com"

products:product-name
# => scheme: "products", src: "product-name"

Writing a scheme handler

To look up our pictures, we need a callback function that accepts the parts of the URL that we’ve matched above, then retrieves the appropriate picture from the database, and returns the modified img node. Here is a handler using a Django model:

from myapp.models import Product

def product_picture_handler(el, scheme, product_name):
    try:
        product = Product.objects.get(name=product_name)
        el.set('src', product.picture.url)
    except Product.DoesNotExist:
        # We don't want to crash on bad user input. In this case, the
        # element will be returned unchanged.
        pass
    return el

Writing the Markdown extension

We want our extension to be flexible and configurable, since it might be useful not just for product pictures. Ideally, we’ll be able to configure it in our application’s settings file. Here is an example configuration using tuples, like you would do it in a Django environment:

MARKDOWN_MEDIA_SCHEME_HANDLERS = (
    ('products', 'myapp.mymodule.product_picture_handler'),
)

To actually call the handler function, we are going to have to resolve the dotted path, myapp.mymodule.product_picture_handler, and retrieve the actual function from the Python module. The following utility function does just that:

from importlib import import_module

""" Turns a tuple containing elements such as ("key", "some.module.attribute")
    into a dict where the module attribute references have been resolved and
    loaded.
"""
def get_handler_refs(handlers):
    handler_refs = {}
    for key, handler in dict(handlers).items():
        if isinstance(handler, basestring):
            module_name, handler_name = handler.rsplit('.', 1)
            module = import_module(module_name)
            try:
                handler = getattr(module, handler_name)
            except AttributeError:
                raise AttributeError("Module '%s' has no attribute '%s'" % (
                    module_name, handler_name))
        handler_refs[key] = handler
    return handler_refs

Now we are ready to write the piece of code that actually processes the img nodes using our configured handler. Let’s wrap it in the following utility class:

class SchemeHandler():

    def __init__(self, schemes):
        self.schemes = self.get_handler_refs(schemes)

    def get_handler_refs(handlers):
        # code omitted. see above.

    def handle_schemes(self, el):        
        full_src = el.get('src')
        match = URI_SCHEME_SPLIT_RE.match(full_src)
        if match:
            scheme, src = match.group('scheme', 'src')
            # handle a scheme such as "products:product-name"
            if scheme in self.schemes:
                el = self.schemes[scheme](el, scheme, src)
        return el

Finally, we need to let the Markdown library know that we want to process all image links using this scheme handler. We’ll start by subclassing ImagePattern:

from markdown.inlinepatterns import ImagePattern


class ImageSchemesLinkPattern(ImagePattern):

    def handleMatch(self, m):
        el = super(ImageSchemesLinkPattern, self).handleMatch(m)
        if self.scheme_handler:
            el = self.scheme_handler.handle_schemes(el)
        return el

Creating the actual extension is straightforward:

from markdown.extensions import Extension
from markdown.inlinepatterns import IMAGE_LINK_RE


class ImageSchemesExtension(Extension):

    def __init__(self, **kwargs):
        self.config = {'schemes' : [(), 'Two-tuple of scheme handlers']}
        super(ImageSchemesExtension, self).__init__(**kwargs)

    def extendMarkdown(self, md, md_globals):
        scheme_handler = SchemeHandler(self.config['schemes'][0])

        link_pattern = ImageSchemesLinkPattern(IMAGE_LINK_RE, md)
        link_pattern.scheme_handler = scheme_handler
        md.inlinePatterns['image_link'] = link_pattern

Using the extension

Configure your extension to use your scheme:

import markdown

md = markdown.Markdown(extensions=[
    ImageSchemesExtension(schemes=(
        ('products': 'myapp.mymodule.product_picture_handler'),
    )
])

Ta da! Now you are able to use product names in your Markdown to retrieve their pictures, instead of direct URLs. Image URLs that do not have the prefix products: will be treated like before.

md.convert('![Electric Toothbrush](products:electric-toothbrush)')