It’s often useful to be able to identify and find certain things in a buffer, like function and class definitions, statements, code blocks, strings, comments, etc., in terms of node types defined by the tree-sitter grammar used in the buffer. Emacs allows Lisp programs to define what kinds of tree-sitter nodes corresponds to each “thing”. This enables handy features like jumping to the next function, marking the code block at point, transposing two function arguments, etc.
The “things” feature in Emacs is independent of the pattern matching feature of tree-sitter (see Pattern Matching Tree-sitter Nodes), and comparatively less powerful, but more suitable for navigation and traversing the buffer text in terms of the tree-sitter parse tree.
You can define things with treesit-thing-settings, retrieve the
predicate of a defined thing with treesit-thing-definition, and
test if a thing is defined with treesit-thing-defined-p.
This is an alist of thing definitions for each language supported by the
grammar used in a buffer; it should be defined by the buffer’s major
mode (the default value is nil). The key of each entry is a
language symbol (e.g., c for C, cpp for C++, etc.),
and the value is a list of thing definitions of the form
(thing pred), where thing is a symbol
representing the thing, and pred specifies what kinds of
tree-sitter nodes are considered as this thing.
The symbol used to define the thing can be anything meaningful for
the major mode: defun, defclass, sentence,
comment, string, etc. To support tree-sitter based
navigation commands (see Moving over Balanced Expressions), the mode should define two
things: list and sexp.
pred can be a regexp string that matches the type of the node; it
can be a function that takes a node as the argument and returns a
boolean that indicates whether the node qualifies as the thing; or it can
be a cons (regexp . fn), which is a combination
of a regular expression regexp and a function fn—the node
has to match both the regexp and to satisfy fn to qualify as
the thing.
pred can also be recursively defined. It can be (or pred…), meaning that satisfying any one of the preds
qualifies the node as the thing. It can be (and pred…), meaning that satisfying all of the preds
qualifies the node as the thing. It can be (not pred),
meaning that not satisfying pred qualifies the node.
Finally, pred can refer to other things defined in this
list. For example, (or sexp sentence) defines something
that’s either a sexp thing or a sentence thing, as defined
by some other rules in the alist.
There are two pre-defined predicates: named and anonymous,
which qualify, respectively, named and anonymous nodes of the
tree-sitter grammar. They can be combined with and to narrow
down the match.
Here’s an example treesit-thing-settings for C and C++:
((c
(defun "function_definition")
(sexp (not "[](),[{}]"))
(comment "comment")
(string "raw_string_literal")
(text (or comment string)))
(cpp
(defun ("function_definition" . cpp-ts-mode-defun-valid-p))
(defclass "class_specifier")
(comment "comment")))
Note that this example is modified for didactic purposes, and isn’t
exactly how tree-sitter based C and C++ modes define things.
Emacs builtin functions already make use of some thing definitions.
Command treesit-forward-sexp uses the sexp definition if
major mode defines it (see Moving over Balanced Expressions); treesit-forward-list,
treesit-down-list, treesit-up-list,
treesit-show-paren-data use the list definition (its
symbol list has the symbol property treesit-thing-symbol
to avoid ambiguity with the function that has the same name);
treesit-forward-sentence uses the sentence definition.
Defun movement functions like treesit-end-of-defun uses the
defun definition (defun definition is overridden by
treesit-defun-type-regexp for backward compatibility). Major
modes can also define comment, string, and text
things (to match comments and strings).
The rest of this section lists a few functions that take advantage of
the thing definitions. Besides the functions below, some other
functions listed elsewhere also utilize the thing feature, e.g.,
tree-traversing functions like treesit-search-forward,
treesit-induce-sparse-tree, etc. See Retrieving Nodes.
This function checks whether node represents a thing.
If node represents thing, return non-nil, otherwise
return nil. For convenience, if node is nil, this
function just returns nil.
The thing can be either a thing symbol like defun, or
simply a predicate that defines a thing, like
"function_definition", or (or comment string).
By default, if thing is undefined or malformed, this function
signals treesit-invalid-predicate error. If ignore-missing
is t, this function doesn’t signal the error when thing is
undefined and just returns nil; but it still signals the error if
thing is a malformed predicate.
Functions below are responsible for finding things and moving across them, and they have to deal with the fact that a buffer sometimes contains multiple adjacent or nested parsers. By default, these functions try to be helpful and search in every relevant parser at point, from most specific (deepest embedded) to the least. Lisp programs should be cautious and assess whether this behavior is desired when using these functions as building blocks of other functions; if not, explicitly pass a parser or language.
This function returns the first node before position in the
current buffer that is the specified thing. If no such node
exists, it returns nil.
It’s guaranteed that, if a node is returned, the node’s end position is less or equal to position. In other words, this function never returns a node that encloses position.
Again, thing can be either a symbol or a predicate.
If parser is non-nil, only use that parser’s parse tree.
Otherwise try each parser covering point, from the most specific
(deepest-embedded) to the least specific. If there are multiple parsers with
the same embed level at position, which parser is tried first is
undefined. If parser is a language symbol, the function limits
the parsers it tries to the ones for that language.
This function is similar to treesit-thing-prev, only it returns
the first node after position that’s the thing. It
also guarantees that if a node is returned, the node’s start position is
greater or equal to position. The parser parameter is the
same as in treesit-thing-prev.
This function builds upon treesit-thing-prev and
treesit-thing-next and provides functionality that a navigation
command would find useful. It returns the position after moving across
arg instances of thing from position. If
there aren’t enough things to navigate across, it returns nil. The
function doesn’t move point.
A positive arg means moving forward that many instances of
thing; negative arg means moving backward. If side is
beg, this function returns the position of the beginning of
thing; if it’s end, it returns the position at the end of
thing.
Like in treesit-thing-prev, thing can be a thing symbol
defined in treesit-thing-settings, or a predicate, and
parser can be either nil, a parser, or a language symbol.
Like in that function, parser decides which parsers or languages
are searched. When there are multiple parsers available, this function
tries each until it succeeds.
tactic determines how this function moves between things. It can
be nested, top-level, restricted,
parent-first, or nil. nested or nil means
normal nested navigation: first try to move across siblings; if there
aren’t any siblings left in the current level, move to the parent, then
its siblings, and so on. top-level means only navigate across
top-level things and ignore nested things. restricted means
movement is restricted within the thing that encloses position, if
there is such a thing. This tactic is useful for commands that want to
stop at the current nesting level and not move up. parent-first
means move to the parent if there is one; and move to siblings if
there’s no parent.
This function returns the smallest node that’s the thing and
encloses position; if there’s no such node, it returns nil.
The returned node must enclose position, i.e., its start position is less or equal to position, and it’s end position is greater or equal to position.
If strict is non-nil, this function uses strict comparison,
i.e., start position must be strictly smaller than position, and end
position must be strictly greater than position.
thing can be either a thing symbol defined in
treesit-thing-settings, or a predicate.
If parser is non-nil, only use that parser’s parse tree. Otherwise try each parser covering point, from the most specific (deepest-embedded) to the least specific. If there are multiple parsers with the same embed level at position, which parser is tried first is undefined. parser can also be a language symbol.
There are also some convenient wrapper functions.
treesit-beginning-of-thing moves point to the beginning of a
thing, treesit-end-of-thing moves to the end of a thing, and
treesit-thing-at-point returns the thing at point.
There are also defun commands that specifically use the defun
definition (as a fallback of treesit-defun-type-regexp), like
treesit-beginning-of-defun, treesit-end-of-defun, and
treesit-defun-at-point. In addition, these functions use
treesit-defun-tactic as the navigation tactic. They are
described in more detail in other sections (see Developing major modes with tree-sitter).