]> code.delx.au - gnu-emacs/blob - doc/lispref/syntax.texi
Update Syntax chapter of Lisp manual.
[gnu-emacs] / doc / lispref / syntax.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1990-1995, 1998-1999, 2001-2012
4 @c Free Software Foundation, Inc.
5 @c See the file elisp.texi for copying conditions.
6 @setfilename ../../info/syntax
7 @node Syntax Tables, Abbrevs, Searching and Matching, Top
8 @chapter Syntax Tables
9 @cindex parsing buffer text
10 @cindex syntax table
11 @cindex text parsing
12
13 A @dfn{syntax table} specifies the syntactic role of each character
14 in a buffer. It can be used to determine where words, symbols, and
15 other syntactic constructs begin and end. This information is used by
16 many Emacs facilities, including Font Lock mode (@pxref{Font Lock
17 Mode}) and the various complex movement commands (@pxref{Motion}).
18
19 @menu
20 * Basics: Syntax Basics. Basic concepts of syntax tables.
21 * Syntax Descriptors:: How characters are classified.
22 * Syntax Table Functions:: How to create, examine and alter syntax tables.
23 * Syntax Properties:: Overriding syntax with text properties.
24 * Motion and Syntax:: Moving over characters with certain syntaxes.
25 * Parsing Expressions:: Parsing balanced expressions
26 using the syntax table.
27 * Standard Syntax Tables:: Syntax tables used by various major modes.
28 * Syntax Table Internals:: How syntax table information is stored.
29 * Categories:: Another way of classifying character syntax.
30 @end menu
31
32 @node Syntax Basics
33 @section Syntax Table Concepts
34
35 A syntax table is a char-table (@pxref{Char-Tables}). The element at
36 index @var{c} describes the character with code @var{c}. The element's
37 value should be a list that encodes the syntax of the character in
38 question.
39
40 Syntax tables are used only for moving across text, not for the Emacs
41 Lisp reader. Emacs Lisp uses built-in syntactic rules when reading Lisp
42 expressions, and these rules cannot be changed. (Some Lisp systems
43 provide ways to redefine the read syntax, but we decided to leave this
44 feature out of Emacs Lisp for simplicity.)
45
46 Each buffer has its own major mode, and each major mode has its own
47 idea of the syntactic class of various characters. For example, in
48 Lisp mode, the character @samp{;} begins a comment, but in C mode, it
49 terminates a statement. To support these variations, Emacs makes the
50 syntax table local to each buffer. Typically, each major mode has its
51 own syntax table and installs that table in each buffer that uses that
52 mode. Changing this table alters the syntax in all those buffers as
53 well as in any buffers subsequently put in that mode. Occasionally
54 several similar modes share one syntax table. @xref{Example Major
55 Modes}, for an example of how to set up a syntax table.
56
57 A syntax table can inherit the data for some characters from the
58 standard syntax table, while specifying other characters itself. The
59 ``inherit'' syntax class means ``inherit this character's syntax from
60 the standard syntax table.'' Just changing the standard syntax for a
61 character affects all syntax tables that inherit from it.
62
63 @defun syntax-table-p object
64 This function returns @code{t} if @var{object} is a syntax table.
65 @end defun
66
67 @node Syntax Descriptors
68 @section Syntax Descriptors
69 @cindex syntax class
70
71 The syntactic role of a character is called its @dfn{syntax class}.
72 Each syntax table specifies the syntax class of each character. There
73 is no necessary relationship between the class of a character in one
74 syntax table and its class in any other table.
75
76 Each syntax class is designated by a mnemonic character, which
77 serves as the name of the class when you need to specify a class.
78 Usually, this designator character is one that is often assigned that
79 class; however, its meaning as a designator is unvarying and
80 independent of what syntax that character currently has. Thus,
81 @samp{\} as a designator character always means ``escape character''
82 syntax, regardless of whether the @samp{\} character actually has that
83 syntax in the current syntax table.
84 @ifnottex
85 @xref{Syntax Class Table}, for a list of syntax classes.
86 @end ifnottex
87
88 @cindex syntax descriptor
89 A @dfn{syntax descriptor} is a Lisp string that describes the syntax
90 classes and other syntactic properties of a character. When you want
91 to modify the syntax of a character, that is done by calling the
92 function @code{modify-syntax-entry} and passing a syntax descriptor as
93 one of its arguments (@pxref{Syntax Table Functions}).
94
95 The first character in a syntax descriptor designates the syntax
96 class. The second character specifies a matching character (e.g.@: in
97 Lisp, the matching character for @samp{(} is @samp{)}); if there is no
98 matching character, put a space there. Then come the characters for
99 any desired flags.
100
101 If no matching character or flags are needed, only one character
102 (specifying the syntax class) is sufficient.
103
104 For example, the syntax descriptor for the character @samp{*} in C
105 mode is @code{". 23"} (i.e., punctuation, matching character slot
106 unused, second character of a comment-starter, first character of a
107 comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e.,
108 punctuation, matching character slot unused, first character of a
109 comment-starter, second character of a comment-ender).
110
111 @menu
112 * Syntax Class Table:: Table of syntax classes.
113 * Syntax Flags:: Additional flags each character can have.
114 @end menu
115
116 @node Syntax Class Table
117 @subsection Table of Syntax Classes
118
119 Here is a table of syntax classes, the characters that designate
120 them, their meanings, and examples of their use.
121
122 @table @asis
123 @item Whitespace characters: @samp{@ } or @samp{-}
124 Characters that separate symbols and words from each other.
125 Typically, whitespace characters have no other syntactic significance,
126 and multiple whitespace characters are syntactically equivalent to a
127 single one. Space, tab, and formfeed are classified as whitespace in
128 almost all major modes.
129
130 This syntax class can be designated by either @w{@samp{@ }} or
131 @samp{-}. Both designators are equivalent.
132
133 @item Word constituents: @samp{w}
134 Parts of words in human languages. These are typically used in
135 variable and command names in programs. All upper- and lower-case
136 letters, and the digits, are typically word constituents.
137
138 @item Symbol constituents: @samp{_}
139 Extra characters used in variable and command names along with word
140 constituents. Examples include the characters @samp{$&*+-_<>} in Lisp
141 mode, which may be part of a symbol name even though they are not part
142 of English words. In standard C, the only non-word-constituent
143 character that is valid in symbols is underscore (@samp{_}).
144
145 @item Punctuation characters: @samp{.}
146 Characters used as punctuation in a human language, or used in a
147 programming language to separate symbols from one another. Some
148 programming language modes, such as Emacs Lisp mode, have no
149 characters in this class since the few characters that are not symbol
150 or word constituents all have other uses. Other programming language
151 modes, such as C mode, use punctuation syntax for operators.
152
153 @item Open parenthesis characters: @samp{(}
154 @itemx Close parenthesis characters: @samp{)}
155 Characters used in dissimilar pairs to surround sentences or
156 expressions. Such a grouping is begun with an open parenthesis
157 character and terminated with a close. Each open parenthesis
158 character matches a particular close parenthesis character, and vice
159 versa. Normally, Emacs indicates momentarily the matching open
160 parenthesis when you insert a close parenthesis. @xref{Blinking}.
161
162 In human languages, and in C code, the parenthesis pairs are
163 @samp{()}, @samp{[]}, and @samp{@{@}}. In Emacs Lisp, the delimiters
164 for lists and vectors (@samp{()} and @samp{[]}) are classified as
165 parenthesis characters.
166
167 @item String quotes: @samp{"}
168 Characters used to delimit string constants. The same string quote
169 character appears at the beginning and the end of a string. Such
170 quoted strings do not nest.
171
172 The parsing facilities of Emacs consider a string as a single token.
173 The usual syntactic meanings of the characters in the string are
174 suppressed.
175
176 The Lisp modes have two string quote characters: double-quote (@samp{"})
177 and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it
178 is used in Common Lisp. C also has two string quote characters:
179 double-quote for strings, and single-quote (@samp{'}) for character
180 constants.
181
182 Human text has no string quote characters. We do not want quotation
183 marks to turn off the usual syntactic properties of other characters
184 in the quotation.
185
186 @item Escape-syntax characters: @samp{\}
187 Characters that start an escape sequence, such as is used in string
188 and character constants. The character @samp{\} belongs to this class
189 in both C and Lisp. (In C, it is used thus only inside strings, but
190 it turns out to cause no trouble to treat it this way throughout C
191 code.)
192
193 Characters in this class count as part of words if
194 @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
195
196 @item Character quotes: @samp{/}
197 Characters used to quote the following character so that it loses its
198 normal syntactic meaning. This differs from an escape character in
199 that only the character immediately following is ever affected.
200
201 Characters in this class count as part of words if
202 @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
203
204 This class is used for backslash in @TeX{} mode.
205
206 @item Paired delimiters: @samp{$}
207 Similar to string quote characters, except that the syntactic
208 properties of the characters between the delimiters are not
209 suppressed. Only @TeX{} mode uses a paired delimiter presently---the
210 @samp{$} that both enters and leaves math mode.
211
212 @item Expression prefixes: @samp{'}
213 Characters used for syntactic operators that are considered as part of
214 an expression if they appear next to one. In Lisp modes, these
215 characters include the apostrophe, @samp{'} (used for quoting), the
216 comma, @samp{,} (used in macros), and @samp{#} (used in the read
217 syntax for certain data types).
218
219 @item Comment starters: @samp{<}
220 @itemx Comment enders: @samp{>}
221 @cindex comment syntax
222 Characters used in various languages to delimit comments. Human text
223 has no comment characters. In Lisp, the semicolon (@samp{;}) starts a
224 comment and a newline or formfeed ends one.
225
226 @item Inherit standard syntax: @samp{@@}
227 This syntax class does not specify a particular syntax. It says to
228 look in the standard syntax table to find the syntax of this
229 character.
230
231 @item Generic comment delimiters: @samp{!}
232 Characters that start or end a special kind of comment. @emph{Any}
233 generic comment delimiter matches @emph{any} generic comment
234 delimiter, but they cannot match a comment starter or comment ender;
235 generic comment delimiters can only match each other.
236
237 This syntax class is primarily meant for use with the
238 @code{syntax-table} text property (@pxref{Syntax Properties}). You
239 can mark any range of characters as forming a comment, by giving the
240 first and last characters of the range @code{syntax-table} properties
241 identifying them as generic comment delimiters.
242
243 @item Generic string delimiters: @samp{|}
244 Characters that start or end a string. This class differs from the
245 string quote class in that @emph{any} generic string delimiter can
246 match any other generic string delimiter; but they do not match
247 ordinary string quote characters.
248
249 This syntax class is primarily meant for use with the
250 @code{syntax-table} text property (@pxref{Syntax Properties}). You
251 can mark any range of characters as forming a string constant, by
252 giving the first and last characters of the range @code{syntax-table}
253 properties identifying them as generic string delimiters.
254 @end table
255
256 @node Syntax Flags
257 @subsection Syntax Flags
258 @cindex syntax flags
259
260 In addition to the classes, entries for characters in a syntax table
261 can specify flags. There are eight possible flags, represented by the
262 characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
263 @samp{n}, and @samp{p}.
264
265 All the flags except @samp{p} are used to describe comment
266 delimiters. The digit flags are used for comment delimiters made up
267 of 2 characters. They indicate that a character can @emph{also} be
268 part of a comment sequence, in addition to the syntactic properties
269 associated with its character class. The flags are independent of the
270 class and each other for the sake of characters such as @samp{*} in
271 C mode, which is a punctuation character, @emph{and} the second
272 character of a start-of-comment sequence (@samp{/*}), @emph{and} the
273 first character of an end-of-comment sequence (@samp{*/}). The flags
274 @samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
275 comment delimiter.
276
277 Here is a table of the possible flags for a character @var{c},
278 and what they mean:
279
280 @itemize @bullet
281 @item
282 @samp{1} means @var{c} is the start of a two-character comment-start
283 sequence.
284
285 @item
286 @samp{2} means @var{c} is the second character of such a sequence.
287
288 @item
289 @samp{3} means @var{c} is the start of a two-character comment-end
290 sequence.
291
292 @item
293 @samp{4} means @var{c} is the second character of such a sequence.
294
295 @item
296 @samp{b} means that @var{c} as a comment delimiter belongs to the
297 alternative ``b'' comment style. For a two-character comment starter,
298 this flag is only significant on the second char, and for a 2-character
299 comment ender it is only significant on the first char.
300
301 @item
302 @samp{c} means that @var{c} as a comment delimiter belongs to the
303 alternative ``c'' comment style. For a two-character comment
304 delimiter, @samp{c} on either character makes it of style ``c''.
305
306 @item
307 @samp{n} on a comment delimiter character specifies
308 that this kind of comment can be nested. For a two-character
309 comment delimiter, @samp{n} on either character makes it
310 nestable.
311
312 Emacs supports several comment styles simultaneously in any one syntax
313 table. A comment style is a set of flags @samp{b}, @samp{c}, and
314 @samp{n}, so there can be up to 8 different comment styles.
315 Each comment delimiter has a style and only matches comment delimiters
316 of the same style. Thus if a comment starts with the comment-start
317 sequence of style ``bn'', it will extend until the next matching
318 comment-end sequence of style ``bn''.
319
320 The appropriate comment syntax settings for C++ can be as follows:
321
322 @table @asis
323 @item @samp{/}
324 @samp{124}
325 @item @samp{*}
326 @samp{23b}
327 @item newline
328 @samp{>}
329 @end table
330
331 This defines four comment-delimiting sequences:
332
333 @table @asis
334 @item @samp{/*}
335 This is a comment-start sequence for ``b'' style because the
336 second character, @samp{*}, has the @samp{b} flag.
337
338 @item @samp{//}
339 This is a comment-start sequence for ``a'' style because the second
340 character, @samp{/}, does not have the @samp{b} flag.
341
342 @item @samp{*/}
343 This is a comment-end sequence for ``b'' style because the first
344 character, @samp{*}, has the @samp{b} flag.
345
346 @item newline
347 This is a comment-end sequence for ``a'' style, because the newline
348 character does not have the @samp{b} flag.
349 @end table
350
351 @item
352 @c Emacs 19 feature
353 @samp{p} identifies an additional ``prefix character'' for Lisp syntax.
354 These characters are treated as whitespace when they appear between
355 expressions. When they appear within an expression, they are handled
356 according to their usual syntax classes.
357
358 The function @code{backward-prefix-chars} moves back over these
359 characters, as well as over characters whose primary syntax class is
360 prefix (@samp{'}). @xref{Motion and Syntax}.
361 @end itemize
362
363 @node Syntax Table Functions
364 @section Syntax Table Functions
365
366 In this section we describe functions for creating, accessing and
367 altering syntax tables.
368
369 @defun make-syntax-table &optional table
370 This function creates a new syntax table, with all values initialized
371 to @code{nil}. If @var{table} is non-@code{nil}, it becomes the
372 parent of the new syntax table, otherwise the standard syntax table is
373 the parent. Like all char-tables, a syntax table inherits from its
374 parent. Thus the original syntax of all characters in the returned
375 syntax table is determined by the parent. @xref{Char-Tables}.
376
377 Most major mode syntax tables are created in this way.
378 @end defun
379
380 @defun copy-syntax-table &optional table
381 This function constructs a copy of @var{table} and returns it. If
382 @var{table} is not supplied (or is @code{nil}), it returns a copy of the
383 standard syntax table. Otherwise, an error is signaled if @var{table} is
384 not a syntax table.
385 @end defun
386
387 @deffn Command modify-syntax-entry char syntax-descriptor &optional table
388 This function sets the syntax entry for @var{char} according to
389 @var{syntax-descriptor}. @var{char} must be a character, or a cons
390 cell of the form @code{(@var{min} . @var{max})}; in the latter case,
391 the function sets the syntax entries for all characters in the range
392 between @var{min} and @var{max}, inclusive.
393
394 The syntax is changed only for @var{table}, which defaults to the
395 current buffer's syntax table, and not in any other syntax table.
396
397 The argument @var{syntax-descriptor} is a syntax descriptor for the
398 desired syntax (i.e.@: a string beginning with a class designator
399 character, and optionally containing a matching character and syntax
400 flags). An error is signaled if the first character is not one of the
401 seventeen syntax class designators. @xref{Syntax Descriptors}.
402
403 This function always returns @code{nil}. The old syntax information in
404 the table for this character is discarded.
405
406 @example
407 @group
408 @exdent @r{Examples:}
409
410 ;; @r{Put the space character in class whitespace.}
411 (modify-syntax-entry ?\s " ")
412 @result{} nil
413 @end group
414
415 @group
416 ;; @r{Make @samp{$} an open parenthesis character,}
417 ;; @r{with @samp{^} as its matching close.}
418 (modify-syntax-entry ?$ "(^")
419 @result{} nil
420 @end group
421
422 @group
423 ;; @r{Make @samp{^} a close parenthesis character,}
424 ;; @r{with @samp{$} as its matching open.}
425 (modify-syntax-entry ?^ ")$")
426 @result{} nil
427 @end group
428
429 @group
430 ;; @r{Make @samp{/} a punctuation character,}
431 ;; @r{the first character of a start-comment sequence,}
432 ;; @r{and the second character of an end-comment sequence.}
433 ;; @r{This is used in C mode.}
434 (modify-syntax-entry ?/ ". 14")
435 @result{} nil
436 @end group
437 @end example
438 @end deffn
439
440 @defun char-syntax character
441 This function returns the syntax class of @var{character}, represented
442 by its mnemonic designator character. This returns @emph{only} the
443 class, not any matching parenthesis or flags.
444
445 An error is signaled if @var{char} is not a character.
446
447 The following examples apply to C mode. The first example shows that
448 the syntax class of space is whitespace (represented by a space). The
449 second example shows that the syntax of @samp{/} is punctuation. This
450 does not show the fact that it is also part of comment-start and -end
451 sequences. The third example shows that open parenthesis is in the class
452 of open parentheses. This does not show the fact that it has a matching
453 character, @samp{)}.
454
455 @example
456 @group
457 (string (char-syntax ?\s))
458 @result{} " "
459 @end group
460
461 @group
462 (string (char-syntax ?/))
463 @result{} "."
464 @end group
465
466 @group
467 (string (char-syntax ?\())
468 @result{} "("
469 @end group
470 @end example
471
472 We use @code{string} to make it easier to see the character returned by
473 @code{char-syntax}.
474 @end defun
475
476 @defun set-syntax-table table
477 This function makes @var{table} the syntax table for the current buffer.
478 It returns @var{table}.
479 @end defun
480
481 @defun syntax-table
482 This function returns the current syntax table, which is the table for
483 the current buffer.
484 @end defun
485
486 @defmac with-syntax-table @var{table} @var{body}@dots{}
487 This macro executes @var{body} using @var{table} as the current syntax
488 table. It returns the value of the last form in @var{body}, after
489 restoring the old current syntax table.
490
491 Since each buffer has its own current syntax table, we should make that
492 more precise: @code{with-syntax-table} temporarily alters the current
493 syntax table of whichever buffer is current at the time the macro
494 execution starts. Other buffers are not affected.
495 @end defmac
496
497 @node Syntax Properties
498 @section Syntax Properties
499 @kindex syntax-table @r{(text property)}
500
501 When the syntax table is not flexible enough to specify the syntax of
502 a language, you can override the syntax table for specific character
503 occurrences in the buffer, by applying a @code{syntax-table} text
504 property. @xref{Text Properties}, for how to apply text properties.
505
506 The valid values of @code{syntax-table} text property are:
507
508 @table @asis
509 @item @var{syntax-table}
510 If the property value is a syntax table, that table is used instead of
511 the current buffer's syntax table to determine the syntax for the
512 underlying text character.
513
514 @item @code{(@var{syntax-code} . @var{matching-char})}
515 A cons cell of this format specifies the syntax for the underlying
516 text character. (@pxref{Syntax Table Internals})
517
518 @item @code{nil}
519 If the property is @code{nil}, the character's syntax is determined from
520 the current syntax table in the usual way.
521 @end table
522
523 @defvar parse-sexp-lookup-properties
524 If this is non-@code{nil}, the syntax scanning functions, like
525 @code{forward-sexp}, pay attention to syntax text properties.
526 Otherwise they use only the current syntax table.
527 @end defvar
528
529 @defvar syntax-propertize-function
530 This variable, if non-@code{nil}, should store a function for applying
531 @code{syntax-table} properties to a specified stretch of text. It is
532 intended to be used by major modes to install a function which applies
533 @code{syntax-table} properties in some mode-appropriate way.
534
535 The function is called by @code{syntax-ppss} (@pxref{Position Parse}),
536 and by Font Lock mode during syntactic fontification (@pxref{Syntactic
537 Font Lock}). It is called with two arguments, @var{start} and
538 @var{end}, which are the starting and ending positions of the text on
539 which it should act. It is allowed to call @code{syntax-ppss} on any
540 position before @var{end}. However, it should not call
541 @code{syntax-ppss-flush-cache}; so, it is not allowed to call
542 @code{syntax-ppss} on some position and later modify the buffer at an
543 earlier position.
544 @end defvar
545
546 @defvar syntax-propertize-extend-region-functions
547 This abnormal hook is run by the syntax parsing code prior to calling
548 @code{syntax-propertize-function}. Its role is to help locate safe
549 starting and ending buffer positions for passing to
550 @code{syntax-propertize-function}. For example, a major mode can add
551 a function to this hook to identify multi-line syntactic constructs,
552 and ensure that the boundaries do not fall in the middle of one.
553
554 Each function in this hook should accept two arguments, @var{start}
555 and @var{end}. It should return either a cons cell of two adjusted
556 buffer positions, @code{(@var{new-start} . @var{new-end})}, or
557 @code{nil} if no adjustment is necessary. The hook functions are run
558 in turn, repeatedly, until they all return @code{nil}.
559 @end defvar
560
561 @node Motion and Syntax
562 @section Motion and Syntax
563
564 This section describes functions for moving across characters that
565 have certain syntax classes.
566
567 @defun skip-syntax-forward syntaxes &optional limit
568 This function moves point forward across characters having syntax
569 classes mentioned in @var{syntaxes} (a string of syntax class
570 characters). It stops when it encounters the end of the buffer, or
571 position @var{limit} (if specified), or a character it is not supposed
572 to skip.
573
574 If @var{syntaxes} starts with @samp{^}, then the function skips
575 characters whose syntax is @emph{not} in @var{syntaxes}.
576
577 The return value is the distance traveled, which is a nonnegative
578 integer.
579 @end defun
580
581 @defun skip-syntax-backward syntaxes &optional limit
582 This function moves point backward across characters whose syntax
583 classes are mentioned in @var{syntaxes}. It stops when it encounters
584 the beginning of the buffer, or position @var{limit} (if specified), or
585 a character it is not supposed to skip.
586
587 If @var{syntaxes} starts with @samp{^}, then the function skips
588 characters whose syntax is @emph{not} in @var{syntaxes}.
589
590 The return value indicates the distance traveled. It is an integer that
591 is zero or less.
592 @end defun
593
594 @defun backward-prefix-chars
595 This function moves point backward over any number of characters with
596 expression prefix syntax. This includes both characters in the
597 expression prefix syntax class, and characters with the @samp{p} flag.
598 @end defun
599
600 @node Parsing Expressions
601 @section Parsing Expressions
602
603 This section describes functions for parsing and scanning balanced
604 expressions. We will refer to such expressions as @dfn{sexps},
605 following the terminology of Lisp, even though these functions can act
606 on languages other than Lisp. Basically, a sexp is either a balanced
607 parenthetical grouping, a string, or a ``symbol'' (i.e.@: a sequence
608 of characters whose syntax is either word constituent or symbol
609 constituent). However, characters in the expression prefix syntax
610 class (@pxref{Syntax Class Table}) are treated as part of the sexp if
611 they appear next to it.
612
613 The syntax table controls the interpretation of characters, so these
614 functions can be used for Lisp expressions when in Lisp mode and for C
615 expressions when in C mode. @xref{List Motion}, for convenient
616 higher-level functions for moving over balanced expressions.
617
618 A character's syntax controls how it changes the state of the
619 parser, rather than describing the state itself. For example, a
620 string delimiter character toggles the parser state between
621 ``in-string'' and ``in-code,'' but the syntax of characters does not
622 directly say whether they are inside a string. For example (note that
623 15 is the syntax code for generic string delimiters),
624
625 @example
626 (put-text-property 1 9 'syntax-table '(15 . nil))
627 @end example
628
629 @noindent
630 does not tell Emacs that the first eight chars of the current buffer
631 are a string, but rather that they are all string delimiters. As a
632 result, Emacs treats them as four consecutive empty string constants.
633
634 @menu
635 * Motion via Parsing:: Motion functions that work by parsing.
636 * Position Parse:: Determining the syntactic state of a position.
637 * Parser State:: How Emacs represents a syntactic state.
638 * Low-Level Parsing:: Parsing across a specified region.
639 * Control Parsing:: Parameters that affect parsing.
640 @end menu
641
642 @node Motion via Parsing
643 @subsection Motion Commands Based on Parsing
644
645 This section describes simple point-motion functions that operate
646 based on parsing expressions.
647
648 @defun scan-lists from count depth
649 This function scans forward @var{count} balanced parenthetical groupings
650 from position @var{from}. It returns the position where the scan stops.
651 If @var{count} is negative, the scan moves backwards.
652
653 If @var{depth} is nonzero, assume that the starting point is already
654 @var{depth} parentheses deep. This function counts out @var{count}
655 number of points where the parenthesis depth goes back to zero, then
656 stops. Thus, a positive value for @var{depth} has the effect of
657 moving out @var{depth} levels of parenthesis, whereas a negative
658 @var{depth} has the effect of moving deeper by @var{-depth} levels of
659 parenthesis.
660
661 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
662 non-@code{nil}.
663
664 If the scan reaches the beginning or end of the buffer (or its
665 accessible portion), and the depth is not zero, an error is signaled.
666 If the depth is zero but the count is not used up, @code{nil} is
667 returned.
668 @end defun
669
670 @defun scan-sexps from count
671 This function scans forward @var{count} sexps from position @var{from}.
672 It returns the position where the scan stops. If @var{count} is
673 negative, the scan moves backwards.
674
675 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
676 non-@code{nil}.
677
678 If the scan reaches the beginning or end of (the accessible part of) the
679 buffer while in the middle of a parenthetical grouping, an error is
680 signaled. If it reaches the beginning or end between groupings but
681 before count is used up, @code{nil} is returned.
682 @end defun
683
684 @defun forward-comment count
685 This function moves point forward across @var{count} complete comments
686 (that is, including the starting delimiter and the terminating
687 delimiter if any), plus any whitespace encountered on the way. It
688 moves backward if @var{count} is negative. If it encounters anything
689 other than a comment or whitespace, it stops, leaving point at the
690 place where it stopped. This includes (for instance) finding the end
691 of a comment when moving forward and expecting the beginning of one.
692 The function also stops immediately after moving over the specified
693 number of complete comments. If @var{count} comments are found as
694 expected, with nothing except whitespace between them, it returns
695 @code{t}; otherwise it returns @code{nil}.
696
697 This function cannot tell whether the ``comments'' it traverses are
698 embedded within a string. If they look like comments, it treats them
699 as comments.
700
701 To move forward over all comments and whitespace following point, use
702 @code{(forward-comment (buffer-size))}. @code{(buffer-size)} is a
703 good argument to use, because the number of comments in the buffer
704 cannot exceed that many.
705 @end defun
706
707 @node Position Parse
708 @subsection Finding the Parse State for a Position
709
710 For syntactic analysis, such as in indentation, often the useful
711 thing is to compute the syntactic state corresponding to a given buffer
712 position. This function does that conveniently.
713
714 @defun syntax-ppss &optional pos
715 This function returns the parser state that the parser would reach at
716 position @var{pos} starting from the beginning of the buffer.
717 @iftex
718 See the next section for
719 @end iftex
720 @ifnottex
721 @xref{Parser State},
722 @end ifnottex
723 for a description of the parser state.
724
725 The return value is the same as if you call the low-level parsing
726 function @code{parse-partial-sexp} to parse from the beginning of the
727 buffer to @var{pos} (@pxref{Low-Level Parsing}). However,
728 @code{syntax-ppss} uses a cache to speed up the computation. Due to
729 this optimization, the second value (previous complete subexpression)
730 and sixth value (minimum parenthesis depth) in the returned parser
731 state are not meaningful.
732
733 This function has a side effect: it adds a buffer-local entry to
734 @code{before-change-functions} (@pxref{Change Hooks}) for
735 @code{syntax-ppss-flush-cache} (see below). This entry keeps the
736 cache consistent as the buffer is modified. However, the cache might
737 not be updated if @code{syntax-ppss} is called while
738 @code{before-change-functions} is temporarily let-bound, or if the
739 buffer is modified without running the hook, such as when using
740 @code{inhibit-modification-hooks}. In those cases, it is necessary to
741 call @code{syntax-ppss-flush-cache} explicitly.
742 @end defun
743
744 @defun syntax-ppss-flush-cache beg &rest ignored-args
745 This function flushes the cache used by @code{syntax-ppss}, starting
746 at position @var{beg}. The remaining arguments, @var{ignored-args},
747 are ignored; this function accepts them so that it can be directly
748 used on hooks such as @code{before-change-functions} (@pxref{Change
749 Hooks}).
750 @end defun
751
752 Major modes can make @code{syntax-ppss} run faster by specifying
753 where it needs to start parsing.
754
755 @defvar syntax-begin-function
756 If this is non-@code{nil}, it should be a function that moves to an
757 earlier buffer position where the parser state is equivalent to
758 @code{nil}---in other words, a position outside of any comment,
759 string, or parenthesis. @code{syntax-ppss} uses it to further
760 optimize its computations, when the cache gives no help.
761 @end defvar
762
763 @node Parser State
764 @subsection Parser State
765 @cindex parser state
766
767 A @dfn{parser state} is a list of ten elements describing the state
768 of the syntactic parser, after it parses the text between a specified
769 starting point and a specified end point in the buffer. Parsing
770 functions such as @code{syntax-ppss}
771 @ifnottex
772 (@pxref{Position Parse})
773 @end ifnottex
774 return a parser state as the value. Some parsing functions accept a
775 parser state as an argument, for resuming parsing.
776
777 Here are the meanings of the elements of the parser state:
778
779 @enumerate 0
780 @item
781 The depth in parentheses, counting from 0. @strong{Warning:} this can
782 be negative if there are more close parens than open parens between
783 the parser's starting point and end point.
784
785 @item
786 @cindex innermost containing parentheses
787 The character position of the start of the innermost parenthetical
788 grouping containing the stopping point; @code{nil} if none.
789
790 @item
791 @cindex previous complete subexpression
792 The character position of the start of the last complete subexpression
793 terminated; @code{nil} if none.
794
795 @item
796 @cindex inside string
797 Non-@code{nil} if inside a string. More precisely, this is the
798 character that will terminate the string, or @code{t} if a generic
799 string delimiter character should terminate it.
800
801 @item
802 @cindex inside comment
803 @code{t} if inside a non-nestable comment (of any comment style;
804 @pxref{Syntax Flags}); or the comment nesting level if inside a
805 comment that can be nested.
806
807 @item
808 @cindex quote character
809 @code{t} if the end point is just after a quote character.
810
811 @item
812 The minimum parenthesis depth encountered during this scan.
813
814 @item
815 What kind of comment is active: @code{nil} if not in a comment or in a
816 comment of style @samp{a}; 1 for a comment of style @samp{b}; 2 for a
817 comment of style @samp{c}; and @code{syntax-table} for a comment that
818 should be ended by a generic comment delimiter character.
819
820 @item
821 The string or comment start position. While inside a comment, this is
822 the position where the comment began; while inside a string, this is the
823 position where the string began. When outside of strings and comments,
824 this element is @code{nil}.
825
826 @item
827 Internal data for continuing the parsing. The meaning of this
828 data is subject to change; it is used if you pass this list
829 as the @var{state} argument to another call.
830 @end enumerate
831
832 Elements 1, 2, and 6 are ignored in a state which you pass as an
833 argument to continue parsing, and elements 8 and 9 are used only in
834 trivial cases. Those elements are mainly used internally by the
835 parser code.
836
837 One additional piece of useful information is available from a
838 parser state using this function:
839
840 @defun syntax-ppss-toplevel-pos state
841 This function extracts, from parser state @var{state}, the last
842 position scanned in the parse which was at top level in grammatical
843 structure. ``At top level'' means outside of any parentheses,
844 comments, or strings.
845
846 The value is @code{nil} if @var{state} represents a parse which has
847 arrived at a top level position.
848 @end defun
849
850 @node Low-Level Parsing
851 @subsection Low-Level Parsing
852
853 The most basic way to use the expression parser is to tell it
854 to start at a given position with a certain state, and parse up to
855 a specified end position.
856
857 @defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment
858 This function parses a sexp in the current buffer starting at
859 @var{start}, not scanning past @var{limit}. It stops at position
860 @var{limit} or when certain criteria described below are met, and sets
861 point to the location where parsing stops. It returns a parser state
862 describing the status of the parse at the point where it stops.
863
864 @cindex parenthesis depth
865 If the third argument @var{target-depth} is non-@code{nil}, parsing
866 stops if the depth in parentheses becomes equal to @var{target-depth}.
867 The depth starts at 0, or at whatever is given in @var{state}.
868
869 If the fourth argument @var{stop-before} is non-@code{nil}, parsing
870 stops when it comes to any character that starts a sexp. If
871 @var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
872 start of a comment. If @var{stop-comment} is the symbol
873 @code{syntax-table}, parsing stops after the start of a comment or a
874 string, or the end of a comment or a string, whichever comes first.
875
876 If @var{state} is @code{nil}, @var{start} is assumed to be at the top
877 level of parenthesis structure, such as the beginning of a function
878 definition. Alternatively, you might wish to resume parsing in the
879 middle of the structure. To do this, you must provide a @var{state}
880 argument that describes the initial status of parsing. The value
881 returned by a previous call to @code{parse-partial-sexp} will do
882 nicely.
883 @end defun
884
885 @node Control Parsing
886 @subsection Parameters to Control Parsing
887
888 @defvar multibyte-syntax-as-symbol
889 If this variable is non-@code{nil}, @code{scan-sexps} treats all
890 non-@acronym{ASCII} characters as symbol constituents regardless
891 of what the syntax table says about them. (However, text properties
892 can still override the syntax.)
893 @end defvar
894
895 @defopt parse-sexp-ignore-comments
896 @cindex skipping comments
897 If the value is non-@code{nil}, then comments are treated as
898 whitespace by the functions in this section and by @code{forward-sexp},
899 @code{scan-lists} and @code{scan-sexps}.
900 @end defopt
901
902 @vindex parse-sexp-lookup-properties
903 The behavior of @code{parse-partial-sexp} is also affected by
904 @code{parse-sexp-lookup-properties} (@pxref{Syntax Properties}).
905
906 You can use @code{forward-comment} to move forward or backward over
907 one comment or several comments.
908
909 @node Standard Syntax Tables
910 @section Some Standard Syntax Tables
911
912 Most of the major modes in Emacs have their own syntax tables. Here
913 are several of them:
914
915 @defun standard-syntax-table
916 This function returns the standard syntax table, which is the syntax
917 table used in Fundamental mode.
918 @end defun
919
920 @defvar text-mode-syntax-table
921 The value of this variable is the syntax table used in Text mode.
922 @end defvar
923
924 @defvar c-mode-syntax-table
925 The value of this variable is the syntax table for C-mode buffers.
926 @end defvar
927
928 @defvar emacs-lisp-mode-syntax-table
929 The value of this variable is the syntax table used in Emacs Lisp mode
930 by editing commands. (It has no effect on the Lisp @code{read}
931 function.)
932 @end defvar
933
934 @node Syntax Table Internals
935 @section Syntax Table Internals
936 @cindex syntax table internals
937
938 Lisp programs don't usually work with the elements directly; the
939 Lisp-level syntax table functions usually work with syntax descriptors
940 (@pxref{Syntax Descriptors}). Nonetheless, here we document the
941 internal format. This format is used mostly when manipulating
942 syntax properties.
943
944 Each element of a syntax table is a cons cell of the form
945 @code{(@var{syntax-code} . @var{matching-char})}. The @sc{car},
946 @var{syntax-code}, is an integer that encodes the syntax class, and any
947 flags. The @sc{cdr}, @var{matching-char}, is non-@code{nil} if
948 a character to match was specified.
949
950 This table gives the value of @var{syntax-code} which corresponds
951 to each syntactic type.
952
953 @multitable @columnfractions .05 .3 .3 .31
954 @item
955 @tab
956 @i{Integer} @i{Class}
957 @tab
958 @i{Integer} @i{Class}
959 @tab
960 @i{Integer} @i{Class}
961 @item
962 @tab
963 0 @ @ whitespace
964 @tab
965 5 @ @ close parenthesis
966 @tab
967 10 @ @ character quote
968 @item
969 @tab
970 1 @ @ punctuation
971 @tab
972 6 @ @ expression prefix
973 @tab
974 11 @ @ comment-start
975 @item
976 @tab
977 2 @ @ word
978 @tab
979 7 @ @ string quote
980 @tab
981 12 @ @ comment-end
982 @item
983 @tab
984 3 @ @ symbol
985 @tab
986 8 @ @ paired delimiter
987 @tab
988 13 @ @ inherit
989 @item
990 @tab
991 4 @ @ open parenthesis
992 @tab
993 9 @ @ escape
994 @tab
995 14 @ @ generic comment
996 @item
997 @tab
998 15 @ generic string
999 @end multitable
1000
1001 For example, the usual syntax value for @samp{(} is @code{(4 . 41)}.
1002 (41 is the character code for @samp{)}.)
1003
1004 The flags are encoded in higher order bits, starting 16 bits from the
1005 least significant bit. This table gives the power of two which
1006 corresponds to each syntax flag.
1007
1008 @multitable @columnfractions .05 .3 .3 .3
1009 @item
1010 @tab
1011 @i{Prefix} @i{Flag}
1012 @tab
1013 @i{Prefix} @i{Flag}
1014 @tab
1015 @i{Prefix} @i{Flag}
1016 @item
1017 @tab
1018 @samp{1} @ @ @code{(lsh 1 16)}
1019 @tab
1020 @samp{4} @ @ @code{(lsh 1 19)}
1021 @tab
1022 @samp{b} @ @ @code{(lsh 1 21)}
1023 @item
1024 @tab
1025 @samp{2} @ @ @code{(lsh 1 17)}
1026 @tab
1027 @samp{p} @ @ @code{(lsh 1 20)}
1028 @tab
1029 @samp{n} @ @ @code{(lsh 1 22)}
1030 @item
1031 @tab
1032 @samp{3} @ @ @code{(lsh 1 18)}
1033 @end multitable
1034
1035 @defun string-to-syntax @var{desc}
1036 This function returns the internal form corresponding to the syntax
1037 descriptor @var{desc}, a cons cell @code{(@var{syntax-code}
1038 . @var{matching-char})}.
1039 @end defun
1040
1041 @defun syntax-after pos
1042 This function returns the syntax code of the character in the buffer
1043 after position @var{pos}, taking account of syntax properties as well
1044 as the syntax table. If @var{pos} is outside the buffer's accessible
1045 portion (@pxref{Narrowing, accessible portion}), this function returns
1046 @code{nil}.
1047 @end defun
1048
1049 @defun syntax-class syntax
1050 This function returns the syntax class of the syntax code
1051 @var{syntax}. (It masks off the high 16 bits that hold the flags
1052 encoded in the syntax descriptor.) If @var{syntax} is @code{nil}, it
1053 returns @code{nil}; this is so evaluating the expression
1054
1055 @example
1056 (syntax-class (syntax-after pos))
1057 @end example
1058
1059 @noindent
1060 where @code{pos} is outside the buffer's accessible portion, will
1061 yield @code{nil} without throwing errors or producing wrong syntax
1062 class codes.
1063 @end defun
1064
1065 @node Categories
1066 @section Categories
1067 @cindex categories of characters
1068 @cindex character categories
1069
1070 @dfn{Categories} provide an alternate way of classifying characters
1071 syntactically. You can define several categories as needed, then
1072 independently assign each character to one or more categories. Unlike
1073 syntax classes, categories are not mutually exclusive; it is normal for
1074 one character to belong to several categories.
1075
1076 @cindex category table
1077 Each buffer has a @dfn{category table} which records which categories
1078 are defined and also which characters belong to each category. Each
1079 category table defines its own categories, but normally these are
1080 initialized by copying from the standard categories table, so that the
1081 standard categories are available in all modes.
1082
1083 Each category has a name, which is an @acronym{ASCII} printing character in
1084 the range @w{@samp{ }} to @samp{~}. You specify the name of a category
1085 when you define it with @code{define-category}.
1086
1087 The category table is actually a char-table (@pxref{Char-Tables}).
1088 The element of the category table at index @var{c} is a @dfn{category
1089 set}---a bool-vector---that indicates which categories character @var{c}
1090 belongs to. In this category set, if the element at index @var{cat} is
1091 @code{t}, that means category @var{cat} is a member of the set, and that
1092 character @var{c} belongs to category @var{cat}.
1093
1094 For the next three functions, the optional argument @var{table}
1095 defaults to the current buffer's category table.
1096
1097 @defun define-category char docstring &optional table
1098 This function defines a new category, with name @var{char} and
1099 documentation @var{docstring}, for the category table @var{table}.
1100
1101 Here's an example of defining a new category for characters that have
1102 strong right-to-left directionality (@pxref{Bidirectional Display})
1103 and using it in a special category table:
1104
1105 @example
1106 (defvar special-category-table-for-bidi
1107 (let ((category-table (make-category-table))
1108 (uniprop-table (unicode-property-table-internal 'bidi-class)))
1109 (define-category ?R "Characters of bidi-class R, AL, or RLO"
1110 category-table)
1111 (map-char-table
1112 #'(lambda (key val)
1113 (if (memq val '(R AL RLO))
1114 (modify-category-entry key ?R category-table)))
1115 uniprop-table)
1116 category-table))
1117 @end example
1118 @end defun
1119
1120 @defun category-docstring category &optional table
1121 This function returns the documentation string of category @var{category}
1122 in category table @var{table}.
1123
1124 @example
1125 (category-docstring ?a)
1126 @result{} "ASCII"
1127 (category-docstring ?l)
1128 @result{} "Latin"
1129 @end example
1130 @end defun
1131
1132 @defun get-unused-category &optional table
1133 This function returns a category name (a character) which is not
1134 currently defined in @var{table}. If all possible categories are in use
1135 in @var{table}, it returns @code{nil}.
1136 @end defun
1137
1138 @defun category-table
1139 This function returns the current buffer's category table.
1140 @end defun
1141
1142 @defun category-table-p object
1143 This function returns @code{t} if @var{object} is a category table,
1144 otherwise @code{nil}.
1145 @end defun
1146
1147 @defun standard-category-table
1148 This function returns the standard category table.
1149 @end defun
1150
1151 @defun copy-category-table &optional table
1152 This function constructs a copy of @var{table} and returns it. If
1153 @var{table} is not supplied (or is @code{nil}), it returns a copy of the
1154 standard category table. Otherwise, an error is signaled if @var{table}
1155 is not a category table.
1156 @end defun
1157
1158 @defun set-category-table table
1159 This function makes @var{table} the category table for the current
1160 buffer. It returns @var{table}.
1161 @end defun
1162
1163 @defun make-category-table
1164 This creates and returns an empty category table. In an empty category
1165 table, no categories have been allocated, and no characters belong to
1166 any categories.
1167 @end defun
1168
1169 @defun make-category-set categories
1170 This function returns a new category set---a bool-vector---whose initial
1171 contents are the categories listed in the string @var{categories}. The
1172 elements of @var{categories} should be category names; the new category
1173 set has @code{t} for each of those categories, and @code{nil} for all
1174 other categories.
1175
1176 @example
1177 (make-category-set "al")
1178 @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1179 @end example
1180 @end defun
1181
1182 @defun char-category-set char
1183 This function returns the category set for character @var{char} in the
1184 current buffer's category table. This is the bool-vector which
1185 records which categories the character @var{char} belongs to. The
1186 function @code{char-category-set} does not allocate storage, because
1187 it returns the same bool-vector that exists in the category table.
1188
1189 @example
1190 (char-category-set ?a)
1191 @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1192 @end example
1193 @end defun
1194
1195 @defun category-set-mnemonics category-set
1196 This function converts the category set @var{category-set} into a string
1197 containing the characters that designate the categories that are members
1198 of the set.
1199
1200 @example
1201 (category-set-mnemonics (char-category-set ?a))
1202 @result{} "al"
1203 @end example
1204 @end defun
1205
1206 @defun modify-category-entry char category &optional table reset
1207 This function modifies the category set of @var{char} in category
1208 table @var{table} (which defaults to the current buffer's category
1209 table). @var{char} can be a character, or a cons cell of the form
1210 @code{(@var{min} . @var{max})}; in the latter case, the function
1211 modifies the category sets of all characters in the range between
1212 @var{min} and @var{max}, inclusive.
1213
1214 Normally, it modifies a category set by adding @var{category} to it.
1215 But if @var{reset} is non-@code{nil}, then it deletes @var{category}
1216 instead.
1217 @end defun
1218
1219 @deffn Command describe-categories &optional buffer-or-name
1220 This function describes the category specifications in the current
1221 category table. It inserts the descriptions in a buffer, and then
1222 displays that buffer. If @var{buffer-or-name} is non-@code{nil}, it
1223 describes the category table of that buffer instead.
1224 @end deffn