Tree-sitter lets Lisp programs match patterns using a small declarative language. This pattern matching consists of two steps: first tree-sitter matches a pattern against nodes in the syntax tree, then it captures specific nodes that matched the pattern and returns the captured nodes.
We describe first how to write the most basic query pattern and how to capture nodes in a pattern, then the pattern-matching function, and finally the more advanced pattern syntax.
A query consists of multiple patterns. Each pattern is an
s-expression that matches a certain node in the syntax node. A
pattern has the form (type (child…)).
For example, a pattern that matches a binary_expression node that
contains number_literal child nodes would look like
(binary_expression (number_literal))
To capture a node using the query pattern above, append
@capture-name after the node pattern you want to
capture. For example,
(binary_expression (number_literal) @number-in-exp)
captures number_literal nodes that are inside a
binary_expression node with the capture name
number-in-exp.
We can capture the binary_expression node as well, with, for
example, the capture name biexp:
(binary_expression (number_literal) @number-in-exp) @biexp
Now we can introduce the query functions.
This function matches patterns in query within node. The argument query can be either an s-expression, a string, or a compiled query object. For now, we focus on the s-expression syntax; string syntax and compiled queries are described at the end of the section.
The argument node can also be a parser or a language symbol. A parser means use its root node, a language symbol means find or create a parser for that language in the current buffer, and use the root node.
The function returns all the captured nodes in an alist with elements
of the form (capture_name . node). If
node-only is non-nil, it returns the list of nodes
instead. By default the entire text of node is searched, but if
beg and end are both non-nil, they specify the
region of buffer text where this function should match nodes. Any
matching node whose span overlaps with the region between beg
and end is captured; it doesn’t have to be completely contained
in the region.
If grouped is non-nil, this function returns a grouped list
of lists of captured nodes. The grouping is determined by query.
Captures in the same match of a pattern in query are grouped
together.
This function raises the treesit-query-error error if
query is malformed. The signal data contains a description of
the specific error. You can use treesit-query-validate to
validate and debug the query.
For example, suppose node’s text is 1 + 2, and
query is
(setq query
'((binary_expression
(number_literal) @number-in-exp) @biexp)
Matching that query would return
(treesit-query-capture node query)
⇒ ((biexp . <node for "1 + 2">)
(number-in-exp . <node for "1">)
(number-in-exp . <node for "2">))
As mentioned earlier, query could contain multiple patterns. For example, it could have two top-level patterns:
(setq query
'((binary_expression) @biexp
(number_literal) @number)
This function parses string as language, matches its root node with query, and returns the result.
Besides node type and capture name, tree-sitter’s pattern syntax can express anonymous node, field name, wildcard, quantification, grouping, alternation, anchor, and predicate.
An anonymous node is written verbatim, surrounded by quotes. A
pattern matching (and capturing) keyword return would be
"return" @keyword
In a pattern, ‘(_)’ matches any named node, and ‘_’ matches
any named or anonymous node. For example, to capture any named child
of a binary_expression node, the pattern would be
(binary_expression (_) @in-biexp)
It is possible to capture child nodes that have specific field names.
In the pattern below, declarator and body are field
names, indicated by the colon following them.
(function_definition declarator: (_) @func-declarator body: (_) @func-body)
It is also possible to capture a node that doesn’t have a certain
field, say, a function_definition without a body field:
(function_definition !body) @func-no-body
Tree-sitter recognizes quantification operators ‘:*’, ‘:+’, and ‘:?’. Their meanings are the same as in regular expressions: ‘:*’ matches the preceding pattern zero or more times, ‘:+’ matches one or more times, and ‘:?’ matches zero or one times.
For example, the following pattern matches type_declaration
nodes that have zero or more long keywords.
(type_declaration "long" :*) @long-type
The following pattern matches a type declaration that may or may not
have a long keyword:
(type_declaration "long" :?) @long-type
Similar to groups in regular expressions, we can bundle patterns into groups and apply quantification operators to them. For example, to express a comma-separated list of identifiers, one could write
(identifier) ("," (identifier)) :*
Again, similar to regular expressions, we can express “match any one of these patterns” in a pattern. The syntax is a vector of patterns. For example, to capture some keywords in C, the pattern would be
[ "return" "break" "if" "else" ] @keyword
The anchor operator :anchor can be used to enforce juxtaposition,
i.e., to enforce two things to be directly next to each other. The
two “things” can be two nodes, or a child and the end of its parent.
For example, to capture the first child, the last child, or two
adjacent children:
;; Anchor the child with the end of its parent. (compound_expression (_) @last-child :anchor)
;; Anchor the child with the beginning of its parent. (compound_expression :anchor (_) @first-child)
;; Anchor two adjacent children. (compound_expression (_) @prev-child :anchor (_) @next-child)
Note that the enforcement of juxtaposition ignores any anonymous nodes.
It is possible to add predicate constraints to a pattern. For example, with the following pattern:
( (array :anchor (_) @first (_) @last :anchor) (:eq? @first @last) )
tree-sitter only matches arrays where the first element is equal to
the last element. To attach a predicate to a pattern, we need to
group them together. Currently there are three predicates:
:eq?, :match?, and :pred?.
Matches if arg1 is equal to arg2. Arguments can be either
strings or capture names. Capture names represent the text that the
captured node spans in the buffer. Note that this is more like
equal in Elisp, but eq? is the convention used by
tree-sitter. Previously we supported the :equal predicate but
it’s now considered deprecated.
Matches if the text that capture-name’s node spans in the buffer
matches regular expression regexp, given as a string literal.
Matching is case-sensitive. The ordering of the arguments doesn’t
matter. Previously we supported the :match predicate but it’s
now considered deprecated.
Matches if function fn returns non-nil when passed each
node in nodes as arguments. The function runs with the current
buffer set to the buffer of node being queried. Be very careful when
using this predicate, since it can be expensive when used in a tight
loop. Previously we supported the :pred predicate but it’s now
considered deprecated.
Note that a predicate can only refer to capture names that appear in the same pattern. Indeed, it makes little sense to refer to capture names in other patterns.
Besides s-expressions, Emacs allows the tree-sitter’s native query syntax to be used by writing them as strings. It largely resembles the s-expression syntax. For example, the following query
(treesit-query-capture
node '((addition_expression
left: (_) @left
"+" @plus-sign
right: (_) @right) @addition
["return" "break"] @keyword))
is equivalent to
(treesit-query-capture
node "(addition_expression
left: (_) @left
\"+\" @plus-sign
right: (_) @right) @addition
[\"return\" \"break\"] @keyword")
Most patterns can be written directly as s-expressions inside a string. Only a few of them need modification:
:anchor is written as ‘.’.
:eq?, :match? and :pred? are written as
#eq?, #match? and #pred?, respectively. In
general, predicates change the ‘:’ to ‘#’.
For example,
'(( (compound_expression :anchor (_) @first (_) :* @rest) (:match? "love" @first) ))
is written in string form as
"( (compound_expression . (_) @first (_)* @rest) (#match? \"love\" @first) )"
If a query is intended to be used repeatedly, especially in tight loops, it is important to compile that query, because a compiled query is much faster than an uncompiled one. A compiled query can be used anywhere a query is accepted.
This function compiles query for language into a compiled query object and returns it.
This function raises the treesit-query-error error if
query is malformed. The signal data contains a description of
the specific error. You can use treesit-query-validate to
validate and debug the query.
By default, Emacs lazily compiles query, meaning query isn’t
actually compiled until it’s used. To compile query immediately,
pass non-nil for eager.
To tell an actually compiled query apart from one that hasn’t been
compiled, use treesit-query-eagerly-compiled-p.
If query is malformed or language can’t be loaded, this function
signals treesit-query-error. Obviously this will only happen
when eager is non-nil, since otherwise Emacs doesn’t
actually compile query.
There are some additional functions for queries:
treesit-query-language returns the language of a query;
treesit-query-source returns the original string or sexp source
query of a compiled query; treesit-query-valid-p checks whether a
query is valid; treesit-query-expand converts a s-expression
query into the string format; and treesit-pattern-expand converts
a pattern.
Tree-sitter grammars change overtime. To support multiple possible
versions of a grammar, a Lisp program can use
treesit-query-first-valid to pick the right query to use. For
example, if a grammar has a (defun) node in one version, and
later renamed it to (function_definition), a Lisp program can use
(treesit-query-first-valid 'lang '((defun) @defun) '((function_definition) @defun))
to support both versions of the grammar.
For more details, consider reading the tree-sitter project’s documentation about pattern-matching. The documentation can be found at https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries.