Next: Developing major modes with tree-sitter, Previous: Pattern Matching Tree-sitter Nodes, Up: Parsing Program Source [Contents][Index]
Sometimes, the source of a programming language could contain snippets of other languages; HTML + CSS + JavaScript is one example. In that case, text segments written in different languages need to be assigned different parsers. Traditionally, this is achieved by using narrowing. While tree-sitter works with narrowing (see narrowing), the recommended way is instead to set regions of buffer text (i.e., ranges) in which a parser will operate. This section describes functions for setting and getting ranges for a parser.
Lisp programs should call treesit-update-ranges
to make sure
the ranges for each parser are correct before using parsers in a
buffer, and call treesit-language-at
to figure out the language
responsible for the text at some position. These two functions don’t
work by themselves, they need major modes to set
treesit-range-settings
and
treesit-language-at-point-function
, which do the actual work.
These functions and variables are explained in more detail towards the
end of the section.
This function sets up parser to operate on ranges. The
parser will only read the text of the specified ranges. Each
range in ranges is a list of the form (beg . end)
.
The ranges in ranges must come in order and must not overlap. That is, in pseudo code:
(cl-loop for idx from 1 to (1- (length ranges)) for prev = (nth (1- idx) ranges) for next = (nth idx ranges) should (<= (car prev) (cdr prev) (car next) (cdr next)))
If ranges violates this constraint, or something else went
wrong, this function signals the treesit-range-invalid
error.
The signal data contains a specific error message and the ranges we
are trying to set.
This function can also be used for disabling ranges. If ranges
is nil
, the parser is set to parse the whole buffer.
Example:
(treesit-parser-set-included-ranges parser '((1 . 9) (16 . 24) (24 . 25)))
This function returns the ranges set for parser. The return
value is the same as the ranges argument of
treesit-parser-included-ranges
: a list of cons cells of the form
(beg . end)
. If parser doesn’t have any
ranges, the return value is nil
.
(treesit-parser-included-ranges parser) ⇒ ((1 . 9) (16 . 24) (24 . 25))
This function matches source with query and returns the
ranges of captured nodes. The return value is a list of cons cells of
the form (beg . end)
, where beg and
end specify the beginning and the end of a region of text.
For convenience, source can be a language symbol, a parser, or a node. If it’s a language symbol, this function matches in the root node of the first parser using that language; if a parser, this function matches in the root node of that parser; if a node, this function matches in that node.
The argument query is the query used to capture nodes
(see Pattern Matching Tree-sitter Nodes). The capture names don’t matter. The
arguments beg and end, if both non-nil
, limit the
range in which this function queries.
Like other query functions, this function raises the
treesit-query-error
error if query is malformed.
It should suffice for general Lisp programs to call the following two functions in order to support program sources that mixes multiple languages.
This function updates ranges for parsers in the buffer. It makes sure
the parsers’ ranges are set correctly between beg and end,
according to treesit-range-settings
. If omitted, beg
defaults to the beginning of the buffer, and end defaults to the
end of the buffer.
For example, fontification functions use this function before querying for nodes in a region.
This function returns the language of the text at buffer position
pos. Under the hood it calls
treesit-language-at-point-function
and returns its return
value. If treesit-language-at-point-function
is nil
,
this function returns the language of the first parser in the returned
value of treesit-parser-list
. If there is no parser in the
buffer, it returns nil
.
Normally, in a set of languages that can be mixed together, there is a host language and one or more embedded languages. A Lisp program usually first parses the whole document with the host language’s parser, retrieves some information, sets ranges for the embedded languages with that information, and then parses the embedded languages.
Take a buffer containing HTML, CSS and JavaScript
as an example. A Lisp program will first parse the whole buffer with
an HTML parser, then query the parser for
style_element
and script_element
nodes, which
correspond to CSS and JavaScript text, respectively. Then
it sets the range of the CSS and JavaScript parser to the
ranges in which their corresponding nodes span.
Given a simple HTML document:
<html> <script>1 + 2</script> <style>body { color: "blue"; }</style> </html>
a Lisp program will first parse with a HTML parser, then set ranges for CSS and JavaScript parsers:
;; Create parsers. (setq html (treesit-get-parser-create 'html)) (setq css (treesit-get-parser-create 'css)) (setq js (treesit-get-parser-create 'javascript))
;; Set CSS ranges. (setq css-range (treesit-query-range 'html "(style_element (raw_text) @capture)")) (treesit-parser-set-included-ranges css css-range)
;; Set JavaScript ranges. (setq js-range (treesit-query-range 'html "(script_element (raw_text) @capture)")) (treesit-parser-set-included-ranges js js-range)
Emacs automates this process in treesit-update-ranges
. A
multi-language major mode should set treesit-range-settings
so
that treesit-update-ranges
knows how to perform this process
automatically. Major modes should use the helper function
treesit-range-rules
to generate a value that can be assigned to
treesit-range-settings
. The settings in the following example
directly translate into operations shown above.
(setq-local treesit-range-settings (treesit-range-rules :embed 'javascript :host 'html '((script_element (raw_text) @capture))
:embed 'css :host 'html '((style_element (raw_text) @capture))))
This function is used to set treesit-range-settings. It takes care of compiling queries and other post-processing, and outputs a value that treesit-range-settings can have.
It takes a series of query-specs, where each query-spec is a query preceded by zero or more keyword/value pairs. Each query is a tree-sitter query in either the string, s-expression or compiled form, or a function.
If query is a tree-sitter query, it should be preceded by two
:keyword/value pairs, where the :embed
keyword
specifies the embedded language, and the :host
keyword
specified the host language.
treesit-update-ranges
uses query to figure out how to set
the ranges for parsers for the embedded language. It queries
query in a host language parser, computes the ranges in which
the captured nodes span, and applies these ranges to embedded
language parsers.
If query is a function, it doesn’t need any :keyword and value pair. It should be a function that takes 2 arguments, start and end, and sets the ranges for parsers in the current buffer in the region between start and end. It is fine for this function to set ranges in a larger region that encompasses the region between start and end.
This variable helps treesit-update-ranges
in updating the
ranges for parsers in the buffer. It is a list of settings
where the exact format of a setting is considered internal. You
should use treesit-range-rules
to generate a value that this
variable can have.
This variable’s value should be a function that takes a single
argument, pos, which is a buffer position, and returns the
language of the buffer text at pos. This variable is used by
treesit-language-at
.
Next: Developing major modes with tree-sitter, Previous: Pattern Matching Tree-sitter Nodes, Up: Parsing Program Source [Contents][Index]