<<DSL development: 7 recommendations
for Domain Specific Language design based on Domain-Driven Design
The term Domain-Specific Language (DSL) is heard a lot nowadays. A DSL
is a language developed to address the need of a given domain. This
domain can be a problem domain (e.g. insurance, healthcare,
transportation) or a system aspect (e.g. data, presentation, business
logic, workflow). The idea is to have a language with limited concepts
which are all focused on a specific domain. This leads to higher level
languages improving developer productivity and communication with
domain experts. In a lot of cases it is even possible to let domain
experts use the DSL and develop applications.
The question for this article is: how
to develop a Domain-Specific Language?
I'll first explain the DSL lifecycle,
consisting of the phases:
decision, analysis, design, implementation, deployment, and
maintenance. Afterwards I'll give 7 recommendations for DSL development
based on my experiences with developing non-trivial DSLs.
The DSL lifecycle
According
to Mernik et al. [1] the DSL life cycle consists of five development
phases: decision, analysis, design, implementation and deployment.
Eelco Visser [2] adds maintenance as the sixth phase in the lifecycle
of DSLs. Note that in practice DSL Development isn't a sequential
process, the phases should be applied iteratively.
Let's look at each phase in more detail.
1. Decision
The development of a DSL starts with the decision to develop a DSL, to
reuse an existing one, or to use a GPL. If a domain is very fresh and
little knowledge is available, it doesn't make sense to start
developing a DSL. In order to determine the basic concepts of the
field, first the regular software engineering process should be applied
and a code base supported with libraries should be developed [2].
In other words: if you never have developed an application for a
certain domain by hand and you have no existing code base, it isn't
smart to start implementing a DSL and its associated code generators or
execution engine.
The situation differs of course for
non-executable DSLs. However, as
you need experience with existing code for executable DSL, along the
same lines you'll need a deep understanding of the domain you are
modeling for non-executable DSLs.
2. Analysis
In the analysis phase the problem domain is
identified and domain
knowledge is gathered. The output of formal domain analysis is a domain
model consisting of [1]:
- a domain definition, defining the scope
of the domain,
- domain terminology (vocabulary, ontology),
- descriptions of domain concepts, and
- feature models describing the
commonalities and variabilities of domain concepts and their
interdependencies.
The information gathered in this phase can
be used to develop the
actual DSL. Variabilities indicate what elements should be specified in
the DSL, while commonalities are used to define the execution engine or
domain framework.
If you, for example, analyze a couple of
existing code bases in a
certain domain, you can split the elements of this code in two parts:
the parts that differ and the parts that are the same for each code
base. The static parts (the commonalities) can, depending on your
implementation approach, be part of the execution engine interpreting
the DSL or can be put in a domain framework which is used by the
generated code. The parts that differ (the variabilities) should be
specified in the DSL, these are the parts which a user of the DSL needs
to ‘configure'.
Eelco Visser [2] recommends an inductive
approach which, in opposite
to designing the complete DSL before implementation, incrementally
introduces abstractions that allow to capture a set of common
programming patterns in software development for a particular domain.
He also states that developing the DSL in iterations can mitigate the
risk of failure. Instead of a big project that produces a functional
DSL in the end, an iterative process produces useful DSLs for
sub-domains early on.
In the second part of this article I will give 7 additional
recommendations for the analysis and design phase of a DLS, based on my
own experiences.
3. Design
Approaches to DSL design can be characterized along two orthogonal
dimensions: the relationship between the DSL and existing languages,
and the formal nature of the design description [1]. A DSL can be
designed from scratch or it can be easier to base it on an existing
language.
Mernik et al. [1] identify three different patterns of design based on
existing languages:
- piggyback: existing language is
partially used,
- specialization: existing
language is restricted, and
- extension: existing language is
extended.
Besides the relation with existing languages the formal nature can
range between:
- informal: a DSL specified in
natural language and/or with examples, and
- formal: a DSL specified using
one of the available semantic definition methods, e.g. regular
expressions, grammars, etc.
It is important to decide what approach to take, however, it is maybe
even more important to keep this lesson in mind [3]:
Lesson T2: You
are almost never designing a programming language.
Most DSL designers come from language design backgrounds. There the
admirable principles of orthogonality and economy of form are not
necessarily well-applied to DSL design. Especially in catering to the
pre-existing jargon and notations of the domain, one must be careful
not to embellish or over-generalize the language.
Lesson T2 Corollary: Design
only what is necessary. Learn to recognize your tendency to
over-design.
4. Implementation
For executable DSLs the most suitable
implementation approach should
be chosen. Mernik et al. [1] identify seven different implementation
patterns, all with different characteristics:
- interpreter, DSL constructs are
recognized and
interpreted using a standard fetch-decode-execute cycle. With this
pattern no transformation takes place, the model is directly executable.
- compiler/application generator,
DSL constructs are translated to base language constructs and library
calls. People are mostly talking about code generation when pointing at this
implementation pattern.
- preprocessor,
DSL constructs are translated to constructs in an existing language
(the base language). Static analysis is limited to that done by the
base language processor.
- embedding, DSL constructs
are embedded in an existing GPL (the host language) by defining new
abstract data types and operators. A basic example are application
libraries. This type of DSL is mostly called an internal DSL.
- extensible compiler/interpreter,
a GPL compiler/interpreter is extended with domain-specific
optimization rules and/or domain-specific code generation. While
interpreters are usually relatively easy to extend, extending compilers
is hard unless they were designed with extension in mind.
- commercial off-the-shelf,
existing tools and/or notations are applied to a specific domain. You
don't have to define your DSL, editor and DSL implementation approach
yourself, you just make use of a Model Driven Software Factory.
You can, for example, use the Mendix
Model-Driven Enterprise Application Platform targeted at the domain
of Service-Oriented
Business Applications.
- hybrid, a combination of the
above approaches.
While the different approaches can make a big difference in the total
effort to be invested in DSL development, the choice for a particular
approach is very important.
5. Deployment
In the deployment phase the DSLs and the
applications constructed
with them are used. Developers and/or domain experts use the DSLs to
specify models. These models are implemented with one of the
implementation patterns presented in the previous section (e.g. the
models are interpreted by an engine). Such an implementation results in
working software which is used by end-users.
6. Maintenance
While domain experts themselves can
understand, validate, and modify
the software by adapting the models expressed in DSLs, modifications
are easier to make and their impact is easier to understand. However,
more substantial changes in the software may involve altering the DSL
implementation. So, like any other element of software a DSL will
evolve over time. Therefore having a DSL migration strategy is very
important.
Besides migration strategies, I have two recommendations which
alleviate the maintenance risks of DSLs:
Seven Domain-Driven Design based
recommendations for DSL Development
Now
the lifecycle of a DSL is clear I want to share some of my experiences
with the analysis and design phases of the DSL lifecycle. The other
phases are left for a future article.
Before going into the details of DSL design let's try to understand the
context of these experiences. First of all, they are focused on
creating multiple connected DSLs,
i.e. you can create models expressed with different DSLs referring to
each other. For example, in a Form model you can refer to elements from
your Data model. More specifically, we are talking about a set of DSLs covering all system aspects of a Service-Oriented
Business Application.
Another important point in the DSLs we're talking about is that they
are all aimed at non-programmer domain experts.
For most cases this means domain experts can create models expressed in
these DSLs, in a few cases this means they can at least read them. This
of course always leads to finding a balance between flexibility and
complexity.
Based on my experiences, influenced by the
concepts of Domain-Driven
Design [4], I have the following 7 recommendations for DSL development:
1. Capture domain knowledge in a metamodel
If you talk about models for DSLs you will
stumble upon the term
metamodel. For a lot of people this sounds scary enough to stop
reading. However, it's just a model of the abstract structure of the
language. In other words: a metamodel models the concepts of a language
and their relationships. Just as you model the concepts ‘Order',
‘Product', and ‘Customer' if you are building software like an order
entry portal.
A metamodel is essential for constructing a
DSL. It captures the
knowledge of the domain the DSL is aimed at. The model reflects how the
team developing the DSL structures the domain knowledge and what they
see as the most important elements. The binding of model and
implementation ensures that the experiences with earlier versions of
the DSL can be used as feedback in the modeling process.
2. Communicate using an ubiquitous language
The metamodel is also important for
communication purposes. When
designing a DSL a lot of communication is needed between the users of
the language (domain experts) and the developers. The metamodel is the
backbone of a language used by all team members.
Because the model is bound to the implementation, developers can talk
about the DSL in this language. They can communicate with domain
experts without translation.
You should play with the model when talking about the DSL. If you can't
talk in terms of the model about a scenario, the model should be
adapted until you can. If the domain experts don't understand the
model, there is something wrong with the model. Domain experts should
object to terms or structures that are awkward or inadequate to convey
domain understanding. Developers should watch for ambiguity or
inconsistency that will trip up design.
3. Let the metamodel drive the
implementation
Don't forget that a language definition is more than just a metamodel (abstract syntax). A language definition
also contains a concrete syntax and semantics. When designing and implementing a
DSL the concrete syntax is captured in the solution workbench,
i.e. an environment in which you can specify models using the DSL with
either a textual or a graphical concrete syntax. The semantics of the
language are captured in the transformation rules or model interpreter
(based on the used implementation pattern, see above).
It is important that the metamodel drives
the implementation of the
solution workbench and interpreter, i.e. the metamodel should driven
the implementation of the DSL. If the implementation doesn't map to the
metamodel, the metamodel is of little value. At the same time, complex
mappings between metamodel and implementation are difficult to
understand and in practice difficult to maintain as the design changes.
A deadly gap between metamodel and implementation opens, so that
insight gained in each of those activities does not feed into the
other.
Therefore, design the metamodel in such a
way that it reflects the
implementation in a very literal way. However, demand at the same time
that a single metamodel also serves the purpose of supporting the
ubiquitous language. The implementation must become an expression of
the metamodel, so a change to the code may be a change tot the
metamodel and the other way around. To tie the DSL implementation and
metamodel in such a way, usually requires DSL tools
that let you generate big parts of the DSL implementation from the
metamodel. Figure 1 exhibits such a scenario in which part of the
solution workbench and interpreter are generated from the metamodel.

Figure 1 - Metamodel-driven DSL implementation
4. Isolate the domain
As said before, DSLs will evolve over time. In the previous
recommendation we've seen that it is important to tie model and
implementation, the model should drive the implementation. However, to
do so you need to isolate the domain. If the domain code, representing
the metamodel is diffused through the code it is very difficult to make
changes to it. Changes in the GUI of your modeling environment or the
infrastructure of your interpreter can actually change your domain
code.
In principle the recommendations for
‘normal' software also hold for
DSL implementations. Divide your code into layers and concentrate all
the code related to the domain model in one layer which is isolated
from GUI and infrastructure code. The domain objects should be free of
the responsibility of displaying themselves, storing themselves,
managing application tasks, etc. They should focus on expressing the
domain model.
In the previous recommendation I stated that the model should drive the
implementation, and I meant to do that as literally as possible. This
is possible if you isolate the domain! Using the Generation Gap Pattern
you can generate all the domain code while isolating it from your other
code.
So, isolate the domain to ensure that the model can evolve to be rich
enough to express the domain and to keep track of the changes in that
domain.
5. Refactor continuously
Along the same lines you should refactor all
the time. You should
refactor while you're knowledge crunching. You should refactor while
you're communicating using the metamodel. You should refactor while
you're busy with implementing the DSL. You should refactor while you're
generating code from you metamodel. To say it with Eric Evans [4], you
should especially refactor if:
- The design does not express the team's
current understanding of the domain.
- Important concepts are implicit in the
design (and you see a way to make them explicit).
- You see an opportunity to make some
important part of the design suppler.
I think it doesn't need any explanation that such an approach needs
close involvement of all team members including the domain
experts.
6. Maintain metamodel integrity
To effectively abstract a complex domain with domain-specific models,
you need more than one DSL. In complex projects multiple DSLs
are usually necessary in order to cope with different concerns. In
other words: multiple domain-specific models (DSMs), specified in
different DSLs are needed to accurately abstract complex systems.
Total unification of the metamodel (remember: the metamodel describes
the concepts of the DSL we are designing) for a large domain will not
be feasible or cost-effective. The most important reason for this is
that attempting to satisfy everyone with a single metamodel (and thus a
single language) will lead to complex options that make the language
difficult to use. This is the reason we are designing a DSL at all!
Different domain experts will have a need for their own domain specific
language to define their aspect of the system.
So, we need multiple domain specific languages, hence we also need
multiple metamodels. However, the boundaries and relationships between
different metamodels need to be marked consciously. Some
recommendations on multi-DSL development:
- Explicitly define the context for each
metamodel, i.e. define the domain (e.g. system aspect) the DSL is
designed for.
- Continuously
integrate the implementation of a metamodel and make the interfaces to
other metamodels part of the automated tests.
- Model the
points of contact between the metamodels and use that model in your
ubiquitous language. These points of contact define how models
expressed in different DSLs can refer to each other. For example, a GUI
element can refer to an element in the data model.
- Think
about your reference resolve strategy. If you use interpreters /
engines to execute the models expressed in a DSL you can use
late-binding, i.e. use soft references and resolve them at runtime. The
advantage of this strategy is flexibility and adaptability. The
approach usually used with code generation is early-binding, the
references are explicitly reflected in the generated code. Performance
can be a reason to follow this strategy.
7. Use a people-oriented approach
Executing a DSL implementation process, especially in a way as
recommended in the previous points, is not easy. It requires an
effective team of developers and domain experts. My last, and most
important, recommendation is to use a people-first approach in DSL
development. DSL development is highly creative an professional work.
Developers need to make the technical decisions, they are the best
people to decide how to conduct their technical work. Domain experts
live the domain, hence they are best suited to decide on the
applicability of the concepts of the language.
Although I strongly recommend the way of
working reflected in the
previous six points, the team has to decide on the process. Accepting a
process requires commitment, and as such needs the active involvement
of all the team.
Key take aways for DSL development
- Capture domain knowledge in a metamodel
- Communicate using an ubiquitous language
- Let the metamodel drive the implementation
- Isolate the domain
- Refactor continuously
- Maintain metamodel integrity
- Use a people-oriented approach>>
You can read rhis at:
http://www.theenterprisearchitect.eu/archive/2009/05/06/dsl-development-7-recommendations-for-domain-specific-language-design-based-on-domain-driven-design
Gervas
|