Writing a domain specific language DSL with python
A domain-specific language is a piece of software designed to be useful for a specific task in a fixed problem domain, they’re gaining popularity because they enhance productivity and reusability of artifacts. DSLs also enable expression and validation of concepts at the level of abstraction of the problem domain, this approach is very useful when you need to describe a user interface, a business process, a database, or the flow of information.
google_ad_client = “pub-9202186284112152”; /* 300×250 blog content */ google_ad_slot = “4279590471”; google_ad_width = 300; google_ad_height = 250;
The DSL concept isn’t new after all, special-purpose programming languages and all kinds of modeling, specification languages have always existed, but this term rise due the popularity of domain-specific model.
You can easily implement dsls using the ruby language, Java or even C# if you prefer, but this isn’t the main propose of this article. The sine qua non become visible when I was implementing a simple test case with python. Indeed, there are a lot of python BDD-like frameworks, mostly who are self claimed the silver bullet, that are mismatching a lot of basic principles, but like I said, we are talking about dsls =)
With python we can easily create a piece of software that expresses some basic desired behavior, like rspec does, but much more pythonic.
# coding: pyspec class Bow: def shot(self): print "got shot" def score(self): return 5 describe Bowling: it "should score 0 for gutter game": bowling = Bow() bowling.shot() assert that bowling.score.should_be(5)
we can easily make this test dsl a runnable piece of python code, whiteout writing incompressible regular expressions, just using the python codecs and tokenizer.
Fist of all we need to define a new encoding for pyspec – our pre defined spec file syntax – this neat hacking enables a new path to tokenize this file
import tokenize import codecs, cStringIO, encodings from encodings import utf_8 class StreamReader(utf_8.StreamReader): def __init__(self, *args, **kwargs): codecs.StreamReader.__init__(self, *args, **kwargs) data = tokenize.untokenize(translate(self.stream.readline)) self.stream = cStringIO.StringIO(data) def search_function(s): if s!='pyspec': return None utf8=encodings.search_function('utf8') # Assume utf8 encoding return codecs.CodecInfo( name='pyspec', encode = utf8.encode, decode = utf8.decode, incrementalencoder=utf8.incrementalencoder, incrementaldecoder=utf8.incrementaldecoder, streamreader=StreamReader, streamwriter=utf8.streamwriter) codecs.register(search_function)
Our tiny translate function defines a easy way to translate both describe and it into a traditional python class and method definition.
def method_for_it(token): return token.strip().replace(" ", "_").replace("\"","" ) + "(self)" def translate(readline): previous_name = "" for type, name,_,_,_ in tokenize.generate_tokens(readline): if type ==tokenize.NAME and name =='describe': yield tokenize.NAME, 'class' elif type ==tokenize.NAME and name =='it': yield tokenize.NAME, 'def' elif type == 3 and previous_name == 'it': yield 3, method_for_it(name) else: yield type,name previous_name = name
Clever isn’t ? Now, fell yourself free to fork this project on GitHub and finish the job =) maybe someday we can have a real BDD python framework. http://github.com/fmeyer/pydsl/tree/master
- [codecs — Codec registry and base
- [tokenize — Tokenizer
for Python source]