Why the Wesnoth Markup Language is bad for you

In the past, I've seen many people from Wesnoth's user community and Development Team advocating the use and implementation of WML in Wesnoth and, potentially, other games.

Even I have participated in some debates over whether WML is good or bad for content developers. I have also expressed my opinions on topics such as the old, scrapped Python AI and the currently thriving embedded Lua support, all of this on IRC, usually in #wesnoth or #wesnoth-dev, and even #wesnoth-umc-dev.

I have got to admit that my personal opinions on programming languages have changed over time, mainly because I've learned from real experiences when maintaining or starting small to medium-sized projects built upon some of them. WML sort of counts as one of those since I'm the author and former maintainer of Invasion from the Unknown and, more recently, After the Storm. I have also worked on the WML events engine a bit implementing new features or fixing known and unknown bugs, including some I've found while developing my own WML content.

At this point, I think I am qualified for thoroughly examining the pros and cons of WML with a less partial point of view, so here's a rather lengthy review of both, despite the title of the article, which deliberately sounds like flamewar bait for the audience. 😉

WML advocates tend to refer to this language as if it were a programming language on its own. I'm afraid I differ on this classification because WML, as implemented by the Battle for Wesnoth engine nowadays, consists in multiple dialects sharing an unique syntax that resembles a mark-up language such as the widely used SGML-types including HTML and XML, with some crucial differences that may go unnoticed to the casual code reader.

Wesnoth's engine uses WML to define aspects of the game's basic configuration and features, such as the path to the titlescreen logo file, the list of official alternate Multiplayer servers, practical parameters such as the base income for players, recall cost, and more. This is clearly not programming since these particular dialects do not describe instructions to be interpreted by the game engine — instead, the game engine runs its own procedures taking configuration info from the WML code in question.

Scenarios are modeled by WML nodes that describe the number and characteristics associated to each game player, regardless of their class (artificial intelligence-driven players, human players, networked players, uncontrolled or “null” players), the Time-of-Day sequences, the background music playlist, and so forth. This is, again, not a program to be interpreted by the engine or the computer, but a description for the game of how to arrange the various building blocks to create the gameplay containers we know as “scenarios” or “levels.” In particular, a single scenario description could be interpreted in a completely different manner by a Wesnoth-compatible game engine since neither the syntax or contents define exactly what the underlying client program should make of it.

Terrain definitions and the infamous terrain graphics layout dialect have a similar nature to the above, with the latter being a more convoluted case akin to the Scalable Vector Graphics format in regards to its semantics.

Things get complicated when we examine scenarios — and more recently, a particular aspect of unit types' definitions. The WML [event] tag, which is to be expected to describe scripts for the game's behavior and player interactions, is analyzed by a somewhat different component of the game engine that diverges from the main representation of WML.

Most WML dialects handled by Wesnoth are handled in the C++ side using associative container objects corresponding to instances of the config class, pretty much in the same vein as the language's STL containers such as std::map<...>. These objects store static information that roughly corresponds to a WML code snippet in a 1:1 relation, save for convenience features such as run-time localization of translatable strings and — only in 1.9.0-svn and later — data type formalization at the C++ engine level.

(Note that for most purposes, WML only handles string literals at the syntax level. The interpretation of attribute values with particular types and semantics is dictated by the engine for every particular unit. Literal values are associated to a key (attribute id), and can be translatable for internationalization purposes. Additionally, WML can represent nodes containing attributes (key-value pairs) in arrays when they share a single tag name — however, this is not part of the language's syntax, and the associated semantics are more of an engine-side facility.)

However, [event] nodes in scenario and unit type descriptions are a very specific and different case because they describe programs that are effectively interpreted by the game events module. This dialect brings rich features such as variables, containers and arrays to the mix, with a more formal syntax implemented on top of the WML config structure. The code's contents and meaning can vary at run-time according to its own descriptions of what the game engine should make of it — hence I tend to refer to this dialect and its instances as managed WML, which is internally handled as instances of the somewhat opaque vconfig class.

Managed WML at a lower level is still a structure built upon associative string containers, in which the attributes' values may change at runtime in function of the interpreter's state, which contains run-time information including a tree of WML variables, containers and arrays, that can be easily converted back into Unmanaged WML config trees as it's done in saved games to preserve the game state in [variables] nodes. There's no inherent “programming” nature at this point.

Taking a look at its implementation cases from a high-level perspective, managed WML is effectively used to describe programs. How does this work? The game events module reads config objects and uses them to create vconfig objects to adjust the WML according to the run-time state of the engine. Then, following certain grammar rules specified at the host (C++) level, the engine takes decisions that alter the game state and execution paths in various ways according to what the managed WML specifies. In current stable and development branches, managed WML can even create more code at run-time based on internal and external factors, thus making scripts made with it capable of evolving and transforming themselves over the course of their execution. This is why I sometimes refer to this unit as dynamic WML.

(Notice that nowadays it's possible to embed Lua scripts in WML events, or even define new WML grammar using Lua scripts, which is how various WML tags are now defined in mainline instead of using C++ code. This is an interesting subject, albeit one that's out of the scope of this discussion.)

Much to the dismay of practicing WML coders, managed WML is not used everywhere. There are some implementation-induced reasons for this since managed WML depends on the game instancing the state objects required to make vconfig objects as flexible as they are. These entities do not exist at some points of the game's execution or get reset to a default, empty state in some cases. This is particularly needed to isolate game content so the state of one finished campaign does not interact in unexpected ways with another new campaign started by the user in the same session. This is considered a feature, similar to how programs in protected-mode operating systems such as Windows, Mac OS X and GNU/Linux can't interact with each other's memory segments and execution state unless proper authentication mechanisms are set in place, such as Interprocess Communication (IPC) concepts.

(Of course, this feature has its drawbacks since it's sometimes desirable to have one campaign's outcome affect the gameplay of another campaign, particularly in the case of sequels. There's work underway in 1.9.x to fill this old gap.)

Wesnoth still considers scenario and unit type definitions static entities of their own that cannot change at initialization time, despite the existence of grammar in managed WML that can change parameters such as the next scenario id, player properties, AI behavior, etc. In my opinion, this is a legacy issue that stems from the original WML implementation which didn't have any such thing as managed/dynamic code. While some of the consequences of this might be fixed in the future to increase the power granted to content authors, it would be a huge undertaking of its own due to the way the game's C++ code has evolved over time.

At this point you wonder why this post's title says that WML is bad for you, while I have done nothing but spurt technobabble and concepts that I have learned during my work as a mainline developer, while hacking on the gruesome innards of the WML-using units such as the game events engine.

The problem with all this is that WML is misleading on its own, due to the widespread misconception that it's an actual embedded programming language when it obviously isn't, at least not in all cases. Its relative ease of use also drives unexperienced programmers to make false assumptions about how it works. This leads to unjustified complaints, redundant or far-fetched feature requests and much pain for WML coders, mainline developers, and community volunteers who attend to support requests.

*(Notice that I haven't referred to the oft overestimated or misunderstood WML preprocessor proper and its syntax because that's — this is deliberate because I consider this to be a separate subject deserving more delicate treatment in a more broad context. Additionally, I did not make any comparisons with Frogatto's implementation of the basic syntax and the interesting embedded extensions that applied on top of it.)

I hope that this article clarifies some of the mysteries behind WML's structure and nature for the more technically-oriented segment of my audience and serves as a lesson for anyone planning to create a game engine (or any other kind of software application, for that matter) that implementing a WML-derived language. If this article happens to be accessible to the Regular Joes of the community, I'd appreciate hearing of their opinions on the matter.