Next: Parsing Text in Multiple Languages, Previous: Accessing Node Information, Up: Parsing Program Source [Contents][Index]
Tree-sitter lets Lisp programs match patterns using a small declarative language. This pattern matching consists of two steps: first tree-sitter matches a pattern against nodes in the syntax tree, then it captures specific nodes that matched the pattern and returns the captured nodes.
We describe first how to write the most basic query pattern and how to capture nodes in a pattern, then the pattern-matching function, and finally the more advanced pattern syntax.
A query consists of multiple patterns. Each pattern is an
s-expression that matches a certain node in the syntax node. A
pattern has the form (type (child…))
For example, a pattern that matches a binary_expression
node that
contains number_literal
child nodes would look like
(binary_expression (number_literal))
To capture a node using the query pattern above, append
@capture-name
after the node pattern you want to
capture. For example,
(binary_expression (number_literal) @number-in-exp)
captures number_literal
nodes that are inside a
binary_expression
node with the capture name
number-in-exp
.
We can capture the binary_expression
node as well, with, for
example, the capture name biexp
:
(binary_expression (number_literal) @number-in-exp) @biexp
Now we can introduce the query functions.
This function matches patterns in query within node. The argument query can be either a string, a s-expression, or a compiled query object. For now, we focus on the string syntax; s-expression syntax and compiled query are described at the end of the section.
The argument node can also be a parser or a language symbol. A parser means using its root node, a language symbol means find or create a parser for that language in the current buffer, and use the root node.
The function returns all the captured nodes in a list of the form
(capture_name . node)
. If node-only is
non-nil
, it returns the list of nodes instead. By default the
entire text of node is searched, but if beg and end
are both non-nil
, they specify the region of buffer text where
this function should match nodes. Any matching node whose span
overlaps with the region between beg and end are captured,
it doesn’t have to be completely in the region.
This function raises the treesit-query-error
error if
query is malformed. The signal data contains a description of
the specific error. You can use treesit-query-validate
to
validate and debug the query.
For example, suppose node’s text is 1 + 2
, and
query is
(setq query "(binary_expression (number_literal) @number-in-exp) @biexp")
Matching that query would return
(treesit-query-capture node query) ⇒ ((biexp . <node for "1 + 2">) (number-in-exp . <node for "1">) (number-in-exp . <node for "2">))
As mentioned earlier, query could contain multiple patterns. For example, it could have two top-level patterns:
(setq query "(binary_expression) @biexp (number_literal) @number @biexp")
This function parses string with language, matches its root node with query, and returns the result.
Besides node type and capture, tree-sitter’s pattern syntax can express anonymous node, field name, wildcard, quantification, grouping, alternation, anchor, and predicate.
An anonymous node is written verbatim, surrounded by quotes. A
pattern matching (and capturing) keyword return
would be
"return" @keyword
In a pattern, ‘(_)’ matches any named node, and ‘_’ matches
any named and anonymous node. For example, to capture any named child
of a binary_expression
node, the pattern would be
(binary_expression (_) @in_biexp)
It is possible to capture child nodes that have specific field names.
In the pattern below, declarator
and body
are field
names, indicated by the colon following them.
(function_definition declarator: (_) @func-declarator body: (_) @func-body)
It is also possible to capture a node that doesn’t have a certain
field, say, a function_definition
without a body
field.
(function_definition !body) @func-no-body
Tree-sitter recognizes quantification operators ‘*’, ‘+’ and ‘?’. Their meanings are the same as in regular expressions: ‘*’ matches the preceding pattern zero or more times, ‘+’ matches one or more times, and ‘?’ matches zero or one time.
For example, the following pattern matches type_declaration
nodes that has zero or more long
keyword.
(type_declaration "long"*) @long-type
The following pattern matches a type declaration that has zero or one
long
keyword:
(type_declaration "long"?) @long-type
Similar to groups in regular expression, we can bundle patterns into groups and apply quantification operators to them. For example, to express a comma separated list of identifiers, one could write
(identifier) ("," (identifier))*
Again, similar to regular expressions, we can express “match anyone from this group of patterns” in a pattern. The syntax is a list of patterns enclosed in square brackets. For example, to capture some keywords in C, the pattern would be
[ "return" "break" "if" "else" ] @keyword
The anchor operator ‘.’ can be used to enforce juxtaposition, i.e., to enforce two things to be directly next to each other. The two “things” can be two nodes, or a child and the end of its parent. For example, to capture the first child, the last child, or two adjacent children:
;; Anchor the child with the end of its parent. (compound_expression (_) @last-child .)
;; Anchor the child with the beginning of its parent. (compound_expression . (_) @first-child)
;; Anchor two adjacent children. (compound_expression (_) @prev-child . (_) @next-child)
Note that the enforcement of juxtaposition ignores any anonymous nodes.
It is possible to add predicate constraints to a pattern. For example, with the following pattern:
( (array . (_) @first (_) @last .) (#equal @first @last) )
tree-sitter only matches arrays where the first element equals to the
last element. To attach a predicate to a pattern, we need to group
them together. A predicate always starts with a ‘#’. Currently
there are three predicates, #equal
, #match
, and
#pred
.
Matches if arg1 equals to arg2. Arguments can be either strings or capture names. Capture names represent the text that the captured node spans in the buffer.
Matches if the text that capture-name’s node spans in the buffer matches regular expression regexp. Matching is case-sensitive.
Matches if function fn returns non-nil
when passed each
node in nodes as arguments.
Note that a predicate can only refer to capture names that appear in the same pattern. Indeed, it makes little sense to refer to capture names in other patterns.
Besides strings, Emacs provides a s-expression based syntax for tree-sitter patterns. It largely resembles the string-based syntax. For example, the following query
(treesit-query-capture node "(addition_expression left: (_) @left \"+\" @plus-sign right: (_) @right) @addition [\"return\" \"break\"] @keyword")
is equivalent to
(treesit-query-capture node '((addition_expression left: (_) @left "+" @plus-sign right: (_) @right) @addition ["return" "break"] @keyword))
Most patterns can be written directly as strange but nevertheless valid s-expressions. Only a few of them needs modification:
:anchor
.
#equal
is written as :equal
. In general, predicates
change their ‘#’ to ‘:’.
For example,
"( (compound_expression . (_) @first (_)* @rest) (#match \"love\" @first) )"
is written in s-expression as
'(( (compound_expression :anchor (_) @first (_) :* @rest) (:match "love" @first) ))
If a query is intended to be used repeatedly, especially in tight loops, it is important to compile that query, because a compiled query is much faster than an uncompiled one. A compiled query can be used anywhere a query is accepted.
This function compiles query for language into a compiled query object and returns it.
This function raises the treesit-query-error
error if
query is malformed. The signal data contains a description of
the specific error. You can use treesit-query-validate
to
validate and debug the query.
This function return the language of query.
This function converts the s-expression query into the string format.
This function converts the s-expression pattern into the string format.
For more details, read the tree-sitter project’s documentation about pattern-matching, which can be found at https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries.
Next: Parsing Text in Multiple Languages, Previous: Accessing Node Information, Up: Parsing Program Source [Contents][Index]