Language definitions

Every language is defined as a set of tokens, which are expressed as regular expressions. For example, this is the language definition for CSS:

A regular expression literal is the simplest way to express a token. An alternative way, with more options, is by using an object literal. With that notation, the regular expression describing the token would be the pattern attribute:

...
'tokenname': {
	pattern: /regex/
}
...

So far the functionality is exactly the same between the short and extended notations. However, the extended notation allows for additional options:

inside

This property accepts another object literal, with tokens that are allowed to be nested in this token. This makes it easier to define certain languages. However, keep in mind that they’re slower and if coded poorly, can even result in infinite recursion. For an example of nested tokens, check out the Markup language definition:

lookbehind

This option mitigates JavaScript’s lack of lookbehind. When set to true, the first capturing group in the regex pattern is discarded when matching this token, so it effectively behaves as if it was lookbehind. For an example of this, check out the C-like language definition, in particular the comment and class-name tokens:

rest

Accepts an object literal with tokens and appends them to the end of the current object literal. Useful for referring to tokens defined elsewhere. For an example where rest is useful, check the Markup definitions above.

alias

This option can be used to define one or more aliases for the matched token. The result will be, that the styles of the token and its aliases are combined. This can be useful, to combine the styling of a well known token, which is already supported by most of the themes, with a semantically correct token name. The option can be set to a string literal or an array of string literals. In the following example the token name latex-equation is not supported by any theme, but it will be highlighted the same as a string.

{
	'latex-equation': {
		pattern: /\$(\\?.)*?\$/g,
		alias: 'string'
	}
}

greedy

This is a boolean attribute. It is intended to solve a common problem with patterns that match long strings like comments, regex or string literals. For example, comments are parsed first, but if the string /* foo */ appears inside a string, you would not want it to be highlighted as a comment. The greedy-property allows a pattern to ignore previous matches of other patterns, and overwrite them when necessary. Use this flag with restraint, as it incurs a small performance overhead. The following example demonstrates its usage:

'string': {
	pattern: /(["'])(\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1/,
	greedy: true
}

Unless explicitly allowed through the inside property, each token cannot contain other tokens, so their order is significant. Although per the ECMAScript specification, objects are not required to have a specific ordering of their properties, in practice they do in every modern browser.

In most languages there are multiple different ways of declaring the same constructs (e.g. comments, strings, ...) and sometimes it is difficult or unpractical to match all of them with one single regular expression. To add multiple regular expressions for one token name an array can be used:

...
'tokenname': [ /regex0/, /regex1/, { pattern: /regex2/ } ]
...

`Prism.languages.insertBefore(inside, before, insert, root)`

This is a helper method to ease modifying existing languages. For example, the CSS language definition not only defines CSS highlighting for CSS documents, but also needs to define highlighting for CSS embedded in HTML through <style> elements. To do this, it needs to modify Prism.languages.markup and add the appropriate tokens. However, Prism.languages.markup is a regular JavaScript object literal, so if you do this:

Prism.languages.markup.style = {
	/* tokens */
};

then the style token will be added (and processed) at the end. Prism.languages.insertBefore allows you to insert tokens before existing tokens. For the CSS example above, you would use it like this:

Prism.languages.insertBefore('markup', 'cdata', {
	'style': {
		/* tokens */
	}
});

Parameters

inside: The property of root that contains the object to be modified.
before: Key to insert before (String)
insert: An object containing the key-value pairs to be inserted
root: The root object, i.e. the object that contains the object that will be modified. Optional, default value is Prism.languages.

API documentation

`Prism.highlightAll(async, callback)`

This is the most high-level function in Prism’s API. It fetches all the elements that have a .language-xxxx class and then calls Prism.highlightElement() on each one of them.

Parameters

async: Whether to use Web Workers to improve performance and avoid blocking the UI when highlighting very large chunks of code. False by default (why?).
Note: All language definitions required to highlight the code must be included in the main prism.js file for the async highlighting to work. You can build your own bundle on the Download page.
callback: An optional callback to be invoked after the highlighting is done. Mostly useful when async is true, since in that case, the highlighting is done asynchronously.

`Prism.highlightAllUnder(element, async, callback)`

Fetches all the descendants of element that have a .language-xxxx class and then calls Prism.highlightElement() on each one of them.

Parameters

element: The root element, whose descendants that have a .language-xxxx class will be highlighted.
async: Same as in Prism.highlightAll()
callback: Same as in Prism.highlightAll()

`Prism.highlightElement(element, async, callback)`

Highlights the code inside a single element.

Parameters

element: The element containing the code. It must have a class of language-xxxx to be processed, where xxxx is a valid language identifier.
async: Same as in Prism.highlightAll()
callback: Same as in Prism.highlightAll()

`Prism.highlight(text, grammar)`

Low-level function, only use if you know what you’re doing. It accepts a string of text as input and the language definitions to use, and returns a string with the HTML produced.

Parameters

text: A string with the code to be highlighted.
grammar: An object containing the tokens to use. Usually a language definition like Prism.languages.markup

Returns

The highlighted HTML

`Prism.tokenize(text, grammar)`

This is the heart of Prism, and the most low-level function you can use. It accepts a string of text as input and the language definitions to use, and returns an array with the tokenized code. When the language definition includes nested tokens, the function is called recursively on each of these tokens. This method could be useful in other contexts as well, as a very crude parser.

Parameters

text: A string with the code to be highlighted.
grammar: An object containing the tokens to use. Usually a language definition like Prism.languages.markup

Returns

An array of strings, tokens (class Prism.Token) and other arrays.

Extending Prism

Language definitions

`Prism.languages.insertBefore(inside, before, insert, root)`

Parameters

Writing plugins

API documentation

`Prism.highlightAll(async, callback)`

Parameters

`Prism.highlightAllUnder(element, async, callback)`

Parameters

`Prism.highlightElement(element, async, callback)`

Parameters

`Prism.highlight(text, grammar)`

Parameters

Returns

`Prism.tokenize(text, grammar)`

Parameters

Returns