Using Parser (GNU Emacs Lisp Reference Manual)

Next: Retrieving Nodes, Previous: Tree-sitter Language Grammar, Up: Parsing Program Source [Contents][Index]

38.2 Using Tree-sitter Parser ¶

This section describes how to create and configure a tree-sitter parser. In Emacs, each tree-sitter parser is associated with a buffer. As the user edits the buffer, the associated parser and syntax tree are automatically kept up-to-date.

Variable: treesit-max-buffer-size ¶: This variable contains the maximum size of buffers in which tree-sitter can be activated. Major modes should check this value when deciding whether to enable tree-sitter features.

Variable: treesit-languages-require-line-column-tracking ¶

Emacs by default doesn’t keep track of line and column numbers for positions in a buffer. However, some language grammars utilize the line and column information for parsing. If parsers of these languages are created in a buffer, Emacs will turn on line and column tracking and report these information to these parsers. Once the buffer starts tracking line and column, it never stops doing so. And once a parser is created as tracking/not-tracking line and column, it stays that way regardless of changes to this variable.

This variable is a list of languages that require line and column tracking. The vast majority of languages don’t need line and column information. So far, only Haskell is known to need it.

User can use treesit-tracking-line-column-p and treesit-parser-tracking-line-column-p to check if a buffer or parser is tracking line and column, respectively.

Function: treesit-parser-create language &optional buffer no-reuse tag ¶

Create a parser for the specified buffer and language (see Tree-sitter Language Grammar), with tag. If buffer is omitted or nil, it stands for the current buffer.

By default, this function reuses a parser if one already exists for language with tag in buffer, but if no-reuse is non-nil, this function always creates a new parser.

tag can be any symbol except t, and defaults to nil. Different parsers can have the same tag.

Given a parser, we can query information about it.

Function: treesit-parser-buffer parser ¶: This function returns the buffer associated with parser.

Function: treesit-parser-language parser ¶: This function returns the language used by parser.

Function: treesit-parser-p object ¶: This function checks if object is a tree-sitter parser, and returns non-nil if it is, and nil otherwise.

There is no need to explicitly parse a buffer, because parsing is done automatically and lazily. A parser only parses when a Lisp program queries for a node in its syntax tree. Therefore, when a parser is first created, it doesn’t parse the buffer; it waits until the Lisp program queries for a node for the first time. Similarly, when some change is made in the buffer, a parser doesn’t re-parse immediately.

When a parser does parse, it checks for the size of the buffer. Tree-sitter can only handle buffers no larger than about 4GB. If the size exceeds that, Emacs signals the treesit-buffer-too-large error with signal data being the buffer size.

Once a parser is created, Emacs automatically adds it to the internal parser list. Every time a change is made to the buffer, Emacs updates parsers in this list so they can update their syntax tree incrementally.

Function: treesit-parser-list &optional buffer language tag ¶

This function returns the parser list of buffer, filtered by language and tag. If buffer is nil or omitted, it defaults to the current buffer.

If language is non-nil, only include parsers for that language, and only include parsers with tag. tag defaults to nil. If tag is t, include parsers in the returned list regardless of their tag.

Function: treesit-parser-delete parser ¶: This function deletes parser.

Normally, a parser “sees” the whole buffer, but when the buffer is narrowed (see Narrowing), the parser will only see the accessible portion of the buffer. As far as the parser can tell, the hidden region was deleted. When the buffer is later widened, the parser thinks text is inserted at the beginning and at the end. Although parsers respect narrowing, modes should not use narrowing as a means to handle a multi-language buffer; instead, set the ranges in which the parser should operate. See Parsing Text in Multiple Languages.

Because a parser parses lazily, when the user or a Lisp program narrows the buffer, the parser is not affected immediately; as long as the mode doesn’t query for a node while the buffer is narrowed, the parser is oblivious of the narrowing.

Besides creating a parser for a buffer, a Lisp program can also parse a string. Unlike a buffer, parsing a string is a one-off operation, and there is no way to update the result.

Function: treesit-parse-string string language ¶: This function parses string using language, and returns the root node of the generated syntax tree. Do not use this function in a loop: this is a convenience function intended for one-off use, and it isn’t optimized; for heavy workload, use a temporary buffer instead.

Be notified by changes to the parse tree ¶

A Lisp program might want to be notified of text affected by incremental parsing. For example, inserting a comment-closing token converts text before that token into a comment. Even though the text is not directly edited, it is deemed to be “changed” nevertheless.

Emacs lets a Lisp program register callback functions (a.k.a. notifiers) for these kinds of changes. A notifier function takes two arguments: ranges and parser. ranges is a list of cons cells of the form (start . end), where start and end mark the start and the end positions of a range. parser is the parser issuing the notification.

Every time a parser reparses a buffer, it compares the old and new parse-tree, computes the ranges in which nodes have changed, and passes the ranges to notifier functions. Note that the initial parse is also considered a “change”, so notifier functions are called on the initial parse, with range being the whole buffer.

Function: treesit-parser-add-notifier parser function ¶: This function adds function to parser’s list of after-change notifier functions. function must be a function symbol, not a lambda function (see Anonymous Functions).

Function: treesit-parser-remove-notifier parser function ¶: This function removes function from the list of parser’s after-change notifier functions. function must be a function symbol, rather than a lambda function.

Function: treesit-parser-notifiers parser ¶: This function returns the list of parser’s notifier functions.

A lisp program can also choose to force a parser to reparse and get the changed regions immediately with treesit-parser-changed-regions.

Function: treesit-parser-changed-regions parser ¶: This function forces parser to reparse, and returns the affected regions: a list of (start . end). If the parser has nothing new to reparse, or the affected regions are empty, this function returns nil.

Substitute parser for another language ¶

Sometimes, a grammar for language B is a strict superset of the grammar of another language A. Then it makes sense to reuse configurations (font-lock rules, indentation rules, etc.) of language A for language B. For that purpose, treesit-language-remap-alist allows users to remap language A into language B.

Variable: treesit-language-remap-alist ¶

The value of this variable should be an alist of (language-a . language-b). When such pair exists in the alist, creating a parser for language-a actually creates a parser for language-b. By extension, anything that creates a node or makes a query of language-a will be redirected to use language-b instead. This mapping is completely transparent, the created parser will reported to use language-b, and the same goes for nodes created by this parser.

Specifically, the parser created by treesit-parser-create will report to use whatever language was given to it. For example, if language cpp is mapped to cuda:

(setq treesit-language-remap-alist '((cpp . cuda)))

(treesit-parser-language (treesit-parser-create 'cpp))
  ⇒ 'cpp

(treesit-parser-language (treesit-parser-create 'cuda))
  ⇒ 'cuda

Even though both parser are actually cuda parser.