User-defined Things (GNU Emacs Lisp Reference Manual)

Next: Parsing Text in Multiple Languages, Previous: Pattern Matching Tree-sitter Nodes, Up: Parsing Program Source [Contents][Index]

It’s often useful to be able to identify and find certain things in a buffer, like function and class definitions, statements, code blocks, strings, comments, etc., in terms of node types defined by the tree-sitter grammar used in the buffer. Emacs allows Lisp programs to define what kinds of tree-sitter nodes corresponds to each “thing”. This enables handy features like jumping to the next function, marking the code block at point, transposing two function arguments, etc.

The “things” feature in Emacs is independent of the pattern matching feature of tree-sitter (see Pattern Matching Tree-sitter Nodes), and comparatively less powerful, but more suitable for navigation and traversing the buffer text in terms of the tree-sitter parse tree.

You can define things with treesit-thing-settings, retrieve the predicate of a defined thing with treesit-thing-definition, and test if a thing is defined with treesit-thing-defined-p.

Variable: treesit-thing-settings ¶

This is an alist of thing definitions for each language supported by the grammar used in a buffer; it should be defined by the buffer’s major mode (the default value is nil). The key of each entry is a language symbol (e.g., c for C, cpp for C++, etc.), and the value is a list of thing definitions of the form (thing pred), where thing is a symbol representing the thing, and pred specifies what kinds of tree-sitter nodes are considered as this thing.

The symbol used to define the thing can be anything meaningful for the major mode: defun, defclass, sentence, comment, string, etc. To support tree-sitter based navigation commands (see Moving over Balanced Expressions), the mode should define two things: list and sexp.

pred can be a regexp string that matches the type of the node; it can be a function that takes a node as the argument and returns a boolean that indicates whether the node qualifies as the thing; or it can be a cons (regexp . fn), which is a combination of a regular expression regexp and a function fn—the node has to match both the regexp and to satisfy fn to qualify as the thing.

pred can also be recursively defined. It can be (or pred…), meaning that satisfying any one of the preds qualifies the node as the thing. It can be (and pred…), meaning that satisfying all of the preds qualifies the node as the thing. It can be (not pred), meaning that not satisfying pred qualifies the node.

Finally, pred can refer to other things defined in this list. For example, (or sexp sentence) defines something that’s either a sexp thing or a sentence thing, as defined by some other rules in the alist.

There are two pre-defined predicates: named and anonymous, which qualify, respectively, named and anonymous nodes of the tree-sitter grammar. They can be combined with and to narrow down the match.

Here’s an example treesit-thing-settings for C and C++:

((c
  (defun "function_definition")
  (sexp (not "[](),[{}]"))
  (comment "comment")
  (string "raw_string_literal")
  (text (or comment string)))

 (cpp
  (defun ("function_definition" . cpp-ts-mode-defun-valid-p))
  (defclass "class_specifier")
  (comment "comment")))

Note that this example is modified for didactic purposes, and isn’t exactly how tree-sitter based C and C++ modes define things.

Emacs builtin functions already make use of some thing definitions. Command treesit-forward-sexp uses the sexp definition if major mode defines it (see Moving over Balanced Expressions); treesit-forward-list, treesit-down-list, treesit-up-list, treesit-show-paren-data use the list definition (its symbol list has the symbol property treesit-thing-symbol to avoid ambiguity with the function that has the same name); treesit-forward-sentence uses the sentence definition. Defun movement functions like treesit-end-of-defun uses the defun definition (defun definition is overridden by treesit-defun-type-regexp for backward compatibility). Major modes can also define comment, string, and text things (to match comments and strings).

The rest of this section lists a few functions that take advantage of the thing definitions. Besides the functions below, some other functions listed elsewhere also utilize the thing feature, e.g., tree-traversing functions like treesit-search-forward, treesit-induce-sparse-tree, etc. See Retrieving Nodes.

Function: treesit-node-match-p node thing &optional ignore-missing ¶

This function checks whether node represents a thing.

If node represents thing, return non-nil, otherwise return nil. For convenience, if node is nil, this function just returns nil.

The thing can be either a thing symbol like defun, or simply a predicate that defines a thing, like "function_definition", or (or comment string).

By default, if thing is undefined or malformed, this function signals treesit-invalid-predicate error. If ignore-missing is t, this function doesn’t signal the error when thing is undefined and just returns nil; but it still signals the error if thing is a malformed predicate.

Functions below are responsible for finding things and moving across them, and they have to deal with the fact that a buffer sometimes contains multiple adjacent or nested parsers. By default, these functions try to be helpful and search in every relevant parser at point, from most specific (deepest embedded) to the least. Lisp programs should be cautious and assess whether this behavior is desired when using these functions as building blocks of other functions; if not, explicitly pass a parser or language.

Function: treesit-thing-prev position thing &optional parser ¶

This function returns the first node before position in the current buffer that is the specified thing. If no such node exists, it returns nil.

It’s guaranteed that, if a node is returned, the node’s end position is less or equal to position. In other words, this function never returns a node that encloses position.

Again, thing can be either a symbol or a predicate.

If parser is non-nil, only use that parser’s parse tree. Otherwise try each parser covering point, from the most specific (deepest-embedded) to the least specific. If there are multiple parsers with the same embed level at position, which parser is tried first is undefined. If parser is a language symbol, the function limits the parsers it tries to the ones for that language.

Function: treesit-thing-next position thing &optional parser ¶: This function is similar to treesit-thing-prev, only it returns the first node after position that’s the thing. It also guarantees that if a node is returned, the node’s start position is greater or equal to position. The parser parameter is the same as in treesit-thing-prev.

Function: treesit-navigate-thing position arg side thing &optional tactic ¶

This function builds upon treesit-thing-prev and treesit-thing-next and provides functionality that a navigation command would find useful. It returns the position after moving across arg instances of thing from position. If there aren’t enough things to navigate across, it returns nil. The function doesn’t move point.

A positive arg means moving forward that many instances of thing; negative arg means moving backward. If side is beg, this function returns the position of the beginning of thing; if it’s end, it returns the position at the end of thing.

Like in treesit-thing-prev, thing can be a thing symbol defined in treesit-thing-settings, or a predicate, and parser can be either nil, a parser, or a language symbol. Like in that function, parser decides which parsers or languages are searched. When there are multiple parsers available, this function tries each until it succeeds.

tactic determines how this function moves between things. It can be nested, top-level, restricted, parent-first, or nil. nested or nil means normal nested navigation: first try to move across siblings; if there aren’t any siblings left in the current level, move to the parent, then its siblings, and so on. top-level means only navigate across top-level things and ignore nested things. restricted means movement is restricted within the thing that encloses position, if there is such a thing. This tactic is useful for commands that want to stop at the current nesting level and not move up. parent-first means move to the parent if there is one; and move to siblings if there’s no parent.

Function: treesit-thing-at position thing &optional strict parser ¶

This function returns the smallest node that’s the thing and encloses position; if there’s no such node, it returns nil.

The returned node must enclose position, i.e., its start position is less or equal to position, and it’s end position is greater or equal to position.

If strict is non-nil, this function uses strict comparison, i.e., start position must be strictly smaller than position, and end position must be strictly greater than position.

thing can be either a thing symbol defined in treesit-thing-settings, or a predicate.

If parser is non-nil, only use that parser’s parse tree. Otherwise try each parser covering point, from the most specific (deepest-embedded) to the least specific. If there are multiple parsers with the same embed level at position, which parser is tried first is undefined. parser can also be a language symbol.

There are also some convenient wrapper functions. treesit-beginning-of-thing moves point to the beginning of a thing, treesit-end-of-thing moves to the end of a thing, and treesit-thing-at-point returns the thing at point.

There are also defun commands that specifically use the defun definition (as a fallback of treesit-defun-type-regexp), like treesit-beginning-of-defun, treesit-end-of-defun, and treesit-defun-at-point. In addition, these functions use treesit-defun-tactic as the navigation tactic. They are described in more detail in other sections (see Developing major modes with tree-sitter).

38.6 User-defined “Things” and Navigation ¶