MacRabbit

Syntaxes

Syntaxes define the regular expressions that Espresso uses to parse documents into hierarchical syntax zones. Syntaxes provide the foundation for most other functionality within Espresso.

Quick links:

Overview

Syntaxes are defined in XML files within a Syntaxes folder in the root Sugar folder (specific name of the file doesn’t matter; by convention, most Sugars name the file after the syntax, e.g. Textile.xml).

A basic syntax XML file looks like this:

<?xml version="1.0"?>
<syntax name="language-root.textile">

    <
zones>
        <!-- 
Root-level <zone> and <include> elements go here -->
    </
zones>
    
    <!-- 
The (optional) <librarytag contains collections of reusable zones -->
    <
library>
        <!-- <
collectionelements go here -->
    </
library>
    
</
syntax

The <zones> and <library> tags are described below. What is important in the basic syntax definition is the name. This is what you will be using as your <root-zone> in your Languages.xml file. The special keyword “language-root” is necessary if you want to be able to filter actions and so forth based on the language context.

If you are creating a syntax that is embedded within another syntax (like PHP is inside of HTML, for instance), the convention is to extend the base language’s zone name. For PHP, for instance, the root zone name is “language-root.html.with-php”.

Syntax zones

Syntax zones in Espresso have the following characteristics:

  • They are defined using regular expressions (regex)
  • They can be hierarchically nested (zones can contain other zones)
  • All zones have an identifier, which is used by the rest of the system in Selectors
  • Zone regex can use capture groups to add additional identifiers nested within a zone

Most of the features of regex are available, but for performance reasons all regex are constrained to searching within a single line. This has the following implications:

  • ^ and $ refer to the beginning and end of the line, respectively
  • Lookaheads beyond the line terminating characters (usually \n) will always fail
  • The dot (.) character can never match newlines

To create the syntax zone tree, Espresso runs each line in the file sequentially through the zones defined in the syntax XML to check for matches. If a match is found, Espresso checks for any child zones. If the syntax zone has children, the context changes and Espresso starts only parsing those child zones until the parent zone has ended, at which point it returns to parsing the root-level context.

There are four types of zones you can use:

  1. Simple match
  2. Start/End
  3. Includes
  4. Cut-offs

Any <zone> tag can also use captures or subzones.

Simple match rules

For simple structures like keywords, your zone can match a single piece of text.

<zone name="keyword.storage.type">
    <
expression>\bfunction\b</expression>
</
zone

Unlike start/end rules, the name attribute is optional on simple match rules.

Start/End rules

Start/end rules contain two regex: the first starts the zone, and the second ends it. Anything in between the start and end matches is parsed by the contents of the <subzones> tag. This is the most common way to hierarchically nest zones.

<zone name="string.quoted.single">
    <
starts-with>
        <
expression>'</expression>
    </starts-with>
    <ends-with>
        <expression>'
</expression>
    </
ends-with>
</
zone

Include rules

Includes pull in syntax zones that are stored in a collection or syntax. There are three forms of include rules.

Include current syntax. This will include all root zones from your current syntax if for some reason you need to include your root syntax in a nested zone:

<include syntax="self"/> 

Include external syntax. This will include an entire external syntax (either one defined in your Sugar or a third-party Sugar):

<include syntax="language-root.css"/> 

Include a collection. The first example includes a collection from the current syntax, and the second pulls in a collection from an external syntax:

<include collection="collection.identifier"/>
<include 
syntax="language-root.css" collection="properties"/> 

The syntax identifiers for Espresso’s built-in languages are:

  • Apache: language-root.apache
  • CSS: language-root.css
  • HTML: language-root.html
  • [removed] language-root.js
  • Markdown: language-root.markdown
  • PHP: language-root.html.with-php (or just php for the syntax without HTML)
  • Python: language-root.python
  • Ruby: language-root.ruby
  • XML: language-root.xml
  • XSL: language-root.xml.xsl

If you need to reference a third-party Sugar syntax or a specific collection, you will need to view the Sugar’s source to get the specific name.

Cut-off rules

Cut-off rules are actually processing instructions rather than true syntax rules because they never generate a syntax zone. Normally, the regex in various rules are searched within the rest of the current line. In some situations, however, it may be difficult to accurately cut-off the search expression.

A cut-off rule basically says “no search expressions for rules in the current context can go beyond this point”. Note that cut-offs only work in the current context; if Espresso has advanced into a further nested group of subzones or returned to a higher context, the cut-off will have no effect.

<cut-off>
    <
expression></style></expression>
</
cut-off

Captures

<expression> tags within a zone can be combined with <capture> tags to easily define subzones based on capture groups in the regex. For instance, the definition of a PHP source code block looks like this:

<zone name="embedded.language-root.php">
    <
starts-with>
        <
expression>(<)(\?)(php|=)?</expression>
        <
capture number="0" name="delimiter.source.start"/>
        <
capture number="1" name="punctuation.bracket.angle.open" />
        <
capture number="2" name="punctuation.delimiter.question-mark" />
        <
capture number="3" name="keyword.definition" />
    </
starts-with>
    <
ends-with>
        <
expression>(\?)(>)</expression>
        <
capture number="0" name="delimiter.source.end"/>
        <
capture number="1" name="punctuation.delimiter.question-mark"/>
        <
capture number="2" name="punctuation.bracket.angle.close" />
    </
ends-with>
</
zone

The zero number capture group is the entire captured regex (optional), and any numbers from one onward represent the numbered capture groups.

Subzones

You can nest syntax zones using the <subzones> tag in both <zone> tags and <capture> tags. For instance, a more complete string zone than the example above:

<zone name="string.quoted.single">
    <
starts-with>
        <
expression>'</expression>
    </starts-with>
    <ends-with>
        <expression>'
</expression>
    </
ends-with>
    <
subzones>
        <include 
collection="character-escapes" />
    </
subzones>
</
zone

The <subzones> tag can contain <zone>, <include>, or <cut-off> tags.

Using subzones within a capture is very similar. For instance, here is an example from the HTML.sugar that embeds CSS styling within a style attribute:

<zone>
    <
expression>\s+(style)(=)("([^"]*)")</expression>
    <capture number="
1" name="attribute-name.style"/>
    <capture number="
2" name="punctuation.separator.attribute"/>
    <capture number="
3" name="string.quoted.double"/>
    <capture number="
4" name="embedded.property-list.css">
        <subzones>
            <include syntax="
language-root.css" collection="properties"/>
        </subzones>
    </capture>
</zone> 

Library and collections

The optional <library> tag allows you to create reusable groups of syntax zone that can then be referenced using include rules. The syntax is as follows:

<library>
    <
collection name="collection.identifier">
        <!-- <
zonetags go here -->
    </
collection>
    
    <
collection name="another.collection">
        <!-- 
more <zonetags -->
    </
collection>
</
library

Syntax Injections

If you need to extend a syntax from another Sugar for whatever reason, you can use syntax injections. The basic syntax for syntax injection XML files is very similar to the base syntax syntax:

<?xml version="1.0" encoding="UTF-8"?>
<injections>

    <
injection name="injection.identifier" selector="selector" action="insert-before-children">
        <!-- 
Zones or include tags go here -->
    </
injection>
    
    <!-- 
Like syntaxes, <libraryis an optional place for reusable collections -->
    <
library>
        <!-- 
Collections here -->
    </
library>
    
</
injections

Each <injection> element contains a list of rules to be injected. The allowed elements are identical to syntaxes and collections. Each injection requires a unique name.

The selector attribute specifies which syntax rules need to be targeted for this particular injection. Note that the selector is interpreted against the syntax rules from the syntax definition, not the resulting syntax zones in a document. You will need to reference the syntax source to construct your injection selectors rather than using the Syntax Inspector. Additionally, collections cannot be targeted, because as far as the syntax engine is concerned they are little more than syntactic sugar to improve the organization of the XML.

The action attribute specifies how the injection zones should be handled with regard to the target in the selector. The available values are:

  • replace-target: replace target with injection rules
  • replace-children: replace target’s subzone rules with injection rules
  • attach-before-target: insert injection rules before target rule (within target rule’s parent)
  • attach-after-target: same as “before-target”, but after the target rule
  • insert-before-children: insert injection rules before the target rule’s first subzone rule
  • insert-after-children: same as “before-children”, but after the last subzone rule