]> code.delx.au - gnu-emacs/blob - doc/lispref/syntax.texi
Merge from emacs-24; up to 2012-04-26T03:04:36Z!cyd@gnu.org
[gnu-emacs] / doc / lispref / syntax.texi
1 @c -*-texinfo-*-
2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1990-1995, 1998-1999, 2001-2012
4 @c Free Software Foundation, Inc.
5 @c See the file elisp.texi for copying conditions.
6 @node Syntax Tables
7 @chapter Syntax Tables
8 @cindex parsing buffer text
9 @cindex syntax table
10 @cindex text parsing
11
12 A @dfn{syntax table} specifies the syntactic role of each character
13 in a buffer. It can be used to determine where words, symbols, and
14 other syntactic constructs begin and end. This information is used by
15 many Emacs facilities, including Font Lock mode (@pxref{Font Lock
16 Mode}) and the various complex movement commands (@pxref{Motion}).
17
18 @menu
19 * Basics: Syntax Basics. Basic concepts of syntax tables.
20 * Syntax Descriptors:: How characters are classified.
21 * Syntax Table Functions:: How to create, examine and alter syntax tables.
22 * Syntax Properties:: Overriding syntax with text properties.
23 * Motion and Syntax:: Moving over characters with certain syntaxes.
24 * Parsing Expressions:: Parsing balanced expressions
25 using the syntax table.
26 * Standard Syntax Tables:: Syntax tables used by various major modes.
27 * Syntax Table Internals:: How syntax table information is stored.
28 * Categories:: Another way of classifying character syntax.
29 @end menu
30
31 @node Syntax Basics
32 @section Syntax Table Concepts
33
34 A syntax table is a char-table (@pxref{Char-Tables}). The element at
35 index @var{c} describes the character with code @var{c}. The element's
36 value should be a list that encodes the syntax of the character in
37 question.
38
39 Syntax tables are used only for moving across text, not for the Emacs
40 Lisp reader. Emacs Lisp uses built-in syntactic rules when reading Lisp
41 expressions, and these rules cannot be changed. (Some Lisp systems
42 provide ways to redefine the read syntax, but we decided to leave this
43 feature out of Emacs Lisp for simplicity.)
44
45 Each buffer has its own major mode, and each major mode has its own
46 idea of the syntactic class of various characters. For example, in
47 Lisp mode, the character @samp{;} begins a comment, but in C mode, it
48 terminates a statement. To support these variations, Emacs makes the
49 syntax table local to each buffer. Typically, each major mode has its
50 own syntax table and installs that table in each buffer that uses that
51 mode. Changing this table alters the syntax in all those buffers as
52 well as in any buffers subsequently put in that mode. Occasionally
53 several similar modes share one syntax table. @xref{Example Major
54 Modes}, for an example of how to set up a syntax table.
55
56 A syntax table can inherit the data for some characters from the
57 standard syntax table, while specifying other characters itself. The
58 ``inherit'' syntax class means ``inherit this character's syntax from
59 the standard syntax table''. Just changing the standard syntax for a
60 character affects all syntax tables that inherit from it.
61
62 @defun syntax-table-p object
63 This function returns @code{t} if @var{object} is a syntax table.
64 @end defun
65
66 @node Syntax Descriptors
67 @section Syntax Descriptors
68 @cindex syntax class
69
70 The syntactic role of a character is called its @dfn{syntax class}.
71 Each syntax table specifies the syntax class of each character. There
72 is no necessary relationship between the class of a character in one
73 syntax table and its class in any other table.
74
75 Each syntax class is designated by a mnemonic character, which
76 serves as the name of the class when you need to specify a class.
77 Usually, this designator character is one that is often assigned that
78 class; however, its meaning as a designator is unvarying and
79 independent of what syntax that character currently has. Thus,
80 @samp{\} as a designator character always means ``escape character''
81 syntax, regardless of whether the @samp{\} character actually has that
82 syntax in the current syntax table.
83 @ifnottex
84 @xref{Syntax Class Table}, for a list of syntax classes.
85 @end ifnottex
86
87 @cindex syntax descriptor
88 A @dfn{syntax descriptor} is a Lisp string that describes the syntax
89 classes and other syntactic properties of a character. When you want
90 to modify the syntax of a character, that is done by calling the
91 function @code{modify-syntax-entry} and passing a syntax descriptor as
92 one of its arguments (@pxref{Syntax Table Functions}).
93
94 The first character in a syntax descriptor designates the syntax
95 class. The second character specifies a matching character (e.g.@: in
96 Lisp, the matching character for @samp{(} is @samp{)}); if there is no
97 matching character, put a space there. Then come the characters for
98 any desired flags.
99
100 If no matching character or flags are needed, only one character
101 (specifying the syntax class) is sufficient.
102
103 For example, the syntax descriptor for the character @samp{*} in C
104 mode is @code{". 23"} (i.e., punctuation, matching character slot
105 unused, second character of a comment-starter, first character of a
106 comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e.,
107 punctuation, matching character slot unused, first character of a
108 comment-starter, second character of a comment-ender).
109
110 @menu
111 * Syntax Class Table:: Table of syntax classes.
112 * Syntax Flags:: Additional flags each character can have.
113 @end menu
114
115 @node Syntax Class Table
116 @subsection Table of Syntax Classes
117
118 Here is a table of syntax classes, the characters that designate
119 them, their meanings, and examples of their use.
120
121 @table @asis
122 @item Whitespace characters: @samp{@ } or @samp{-}
123 Characters that separate symbols and words from each other.
124 Typically, whitespace characters have no other syntactic significance,
125 and multiple whitespace characters are syntactically equivalent to a
126 single one. Space, tab, and formfeed are classified as whitespace in
127 almost all major modes.
128
129 This syntax class can be designated by either @w{@samp{@ }} or
130 @samp{-}. Both designators are equivalent.
131
132 @item Word constituents: @samp{w}
133 Parts of words in human languages. These are typically used in
134 variable and command names in programs. All upper- and lower-case
135 letters, and the digits, are typically word constituents.
136
137 @item Symbol constituents: @samp{_}
138 Extra characters used in variable and command names along with word
139 constituents. Examples include the characters @samp{$&*+-_<>} in Lisp
140 mode, which may be part of a symbol name even though they are not part
141 of English words. In standard C, the only non-word-constituent
142 character that is valid in symbols is underscore (@samp{_}).
143
144 @item Punctuation characters: @samp{.}
145 Characters used as punctuation in a human language, or used in a
146 programming language to separate symbols from one another. Some
147 programming language modes, such as Emacs Lisp mode, have no
148 characters in this class since the few characters that are not symbol
149 or word constituents all have other uses. Other programming language
150 modes, such as C mode, use punctuation syntax for operators.
151
152 @item Open parenthesis characters: @samp{(}
153 @itemx Close parenthesis characters: @samp{)}
154 Characters used in dissimilar pairs to surround sentences or
155 expressions. Such a grouping is begun with an open parenthesis
156 character and terminated with a close. Each open parenthesis
157 character matches a particular close parenthesis character, and vice
158 versa. Normally, Emacs indicates momentarily the matching open
159 parenthesis when you insert a close parenthesis. @xref{Blinking}.
160
161 In human languages, and in C code, the parenthesis pairs are
162 @samp{()}, @samp{[]}, and @samp{@{@}}. In Emacs Lisp, the delimiters
163 for lists and vectors (@samp{()} and @samp{[]}) are classified as
164 parenthesis characters.
165
166 @item String quotes: @samp{"}
167 Characters used to delimit string constants. The same string quote
168 character appears at the beginning and the end of a string. Such
169 quoted strings do not nest.
170
171 The parsing facilities of Emacs consider a string as a single token.
172 The usual syntactic meanings of the characters in the string are
173 suppressed.
174
175 The Lisp modes have two string quote characters: double-quote (@samp{"})
176 and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it
177 is used in Common Lisp. C also has two string quote characters:
178 double-quote for strings, and single-quote (@samp{'}) for character
179 constants.
180
181 Human text has no string quote characters. We do not want quotation
182 marks to turn off the usual syntactic properties of other characters
183 in the quotation.
184
185 @item Escape-syntax characters: @samp{\}
186 Characters that start an escape sequence, such as is used in string
187 and character constants. The character @samp{\} belongs to this class
188 in both C and Lisp. (In C, it is used thus only inside strings, but
189 it turns out to cause no trouble to treat it this way throughout C
190 code.)
191
192 Characters in this class count as part of words if
193 @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
194
195 @item Character quotes: @samp{/}
196 Characters used to quote the following character so that it loses its
197 normal syntactic meaning. This differs from an escape character in
198 that only the character immediately following is ever affected.
199
200 Characters in this class count as part of words if
201 @code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}.
202
203 This class is used for backslash in @TeX{} mode.
204
205 @item Paired delimiters: @samp{$}
206 Similar to string quote characters, except that the syntactic
207 properties of the characters between the delimiters are not
208 suppressed. Only @TeX{} mode uses a paired delimiter presently---the
209 @samp{$} that both enters and leaves math mode.
210
211 @item Expression prefixes: @samp{'}
212 Characters used for syntactic operators that are considered as part of
213 an expression if they appear next to one. In Lisp modes, these
214 characters include the apostrophe, @samp{'} (used for quoting), the
215 comma, @samp{,} (used in macros), and @samp{#} (used in the read
216 syntax for certain data types).
217
218 @item Comment starters: @samp{<}
219 @itemx Comment enders: @samp{>}
220 @cindex comment syntax
221 Characters used in various languages to delimit comments. Human text
222 has no comment characters. In Lisp, the semicolon (@samp{;}) starts a
223 comment and a newline or formfeed ends one.
224
225 @item Inherit standard syntax: @samp{@@}
226 This syntax class does not specify a particular syntax. It says to
227 look in the standard syntax table to find the syntax of this
228 character.
229
230 @item Generic comment delimiters: @samp{!}
231 Characters that start or end a special kind of comment. @emph{Any}
232 generic comment delimiter matches @emph{any} generic comment
233 delimiter, but they cannot match a comment starter or comment ender;
234 generic comment delimiters can only match each other.
235
236 This syntax class is primarily meant for use with the
237 @code{syntax-table} text property (@pxref{Syntax Properties}). You
238 can mark any range of characters as forming a comment, by giving the
239 first and last characters of the range @code{syntax-table} properties
240 identifying them as generic comment delimiters.
241
242 @item Generic string delimiters: @samp{|}
243 Characters that start or end a string. This class differs from the
244 string quote class in that @emph{any} generic string delimiter can
245 match any other generic string delimiter; but they do not match
246 ordinary string quote characters.
247
248 This syntax class is primarily meant for use with the
249 @code{syntax-table} text property (@pxref{Syntax Properties}). You
250 can mark any range of characters as forming a string constant, by
251 giving the first and last characters of the range @code{syntax-table}
252 properties identifying them as generic string delimiters.
253 @end table
254
255 @node Syntax Flags
256 @subsection Syntax Flags
257 @cindex syntax flags
258
259 In addition to the classes, entries for characters in a syntax table
260 can specify flags. There are eight possible flags, represented by the
261 characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
262 @samp{n}, and @samp{p}.
263
264 All the flags except @samp{p} are used to describe comment
265 delimiters. The digit flags are used for comment delimiters made up
266 of 2 characters. They indicate that a character can @emph{also} be
267 part of a comment sequence, in addition to the syntactic properties
268 associated with its character class. The flags are independent of the
269 class and each other for the sake of characters such as @samp{*} in
270 C mode, which is a punctuation character, @emph{and} the second
271 character of a start-of-comment sequence (@samp{/*}), @emph{and} the
272 first character of an end-of-comment sequence (@samp{*/}). The flags
273 @samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
274 comment delimiter.
275
276 Here is a table of the possible flags for a character @var{c},
277 and what they mean:
278
279 @itemize @bullet
280 @item
281 @samp{1} means @var{c} is the start of a two-character comment-start
282 sequence.
283
284 @item
285 @samp{2} means @var{c} is the second character of such a sequence.
286
287 @item
288 @samp{3} means @var{c} is the start of a two-character comment-end
289 sequence.
290
291 @item
292 @samp{4} means @var{c} is the second character of such a sequence.
293
294 @item
295 @samp{b} means that @var{c} as a comment delimiter belongs to the
296 alternative ``b'' comment style. For a two-character comment starter,
297 this flag is only significant on the second char, and for a 2-character
298 comment ender it is only significant on the first char.
299
300 @item
301 @samp{c} means that @var{c} as a comment delimiter belongs to the
302 alternative ``c'' comment style. For a two-character comment
303 delimiter, @samp{c} on either character makes it of style ``c''.
304
305 @item
306 @samp{n} on a comment delimiter character specifies
307 that this kind of comment can be nested. For a two-character
308 comment delimiter, @samp{n} on either character makes it
309 nestable.
310
311 Emacs supports several comment styles simultaneously in any one syntax
312 table. A comment style is a set of flags @samp{b}, @samp{c}, and
313 @samp{n}, so there can be up to 8 different comment styles.
314 Each comment delimiter has a style and only matches comment delimiters
315 of the same style. Thus if a comment starts with the comment-start
316 sequence of style ``bn'', it will extend until the next matching
317 comment-end sequence of style ``bn''.
318
319 The appropriate comment syntax settings for C++ can be as follows:
320
321 @table @asis
322 @item @samp{/}
323 @samp{124}
324 @item @samp{*}
325 @samp{23b}
326 @item newline
327 @samp{>}
328 @end table
329
330 This defines four comment-delimiting sequences:
331
332 @table @asis
333 @item @samp{/*}
334 This is a comment-start sequence for ``b'' style because the
335 second character, @samp{*}, has the @samp{b} flag.
336
337 @item @samp{//}
338 This is a comment-start sequence for ``a'' style because the second
339 character, @samp{/}, does not have the @samp{b} flag.
340
341 @item @samp{*/}
342 This is a comment-end sequence for ``b'' style because the first
343 character, @samp{*}, has the @samp{b} flag.
344
345 @item newline
346 This is a comment-end sequence for ``a'' style, because the newline
347 character does not have the @samp{b} flag.
348 @end table
349
350 @item
351 @c Emacs 19 feature
352 @samp{p} identifies an additional ``prefix character'' for Lisp syntax.
353 These characters are treated as whitespace when they appear between
354 expressions. When they appear within an expression, they are handled
355 according to their usual syntax classes.
356
357 The function @code{backward-prefix-chars} moves back over these
358 characters, as well as over characters whose primary syntax class is
359 prefix (@samp{'}). @xref{Motion and Syntax}.
360 @end itemize
361
362 @node Syntax Table Functions
363 @section Syntax Table Functions
364
365 In this section we describe functions for creating, accessing and
366 altering syntax tables.
367
368 @defun make-syntax-table &optional table
369 This function creates a new syntax table, with all values initialized
370 to @code{nil}. If @var{table} is non-@code{nil}, it becomes the
371 parent of the new syntax table, otherwise the standard syntax table is
372 the parent. Like all char-tables, a syntax table inherits from its
373 parent. Thus the original syntax of all characters in the returned
374 syntax table is determined by the parent. @xref{Char-Tables}.
375
376 Most major mode syntax tables are created in this way.
377 @end defun
378
379 @defun copy-syntax-table &optional table
380 This function constructs a copy of @var{table} and returns it. If
381 @var{table} is not supplied (or is @code{nil}), it returns a copy of the
382 standard syntax table. Otherwise, an error is signaled if @var{table} is
383 not a syntax table.
384 @end defun
385
386 @deffn Command modify-syntax-entry char syntax-descriptor &optional table
387 This function sets the syntax entry for @var{char} according to
388 @var{syntax-descriptor}. @var{char} must be a character, or a cons
389 cell of the form @code{(@var{min} . @var{max})}; in the latter case,
390 the function sets the syntax entries for all characters in the range
391 between @var{min} and @var{max}, inclusive.
392
393 The syntax is changed only for @var{table}, which defaults to the
394 current buffer's syntax table, and not in any other syntax table.
395
396 The argument @var{syntax-descriptor} is a syntax descriptor for the
397 desired syntax (i.e.@: a string beginning with a class designator
398 character, and optionally containing a matching character and syntax
399 flags). An error is signaled if the first character is not one of the
400 seventeen syntax class designators. @xref{Syntax Descriptors}.
401
402 This function always returns @code{nil}. The old syntax information in
403 the table for this character is discarded.
404
405 @example
406 @group
407 @exdent @r{Examples:}
408
409 ;; @r{Put the space character in class whitespace.}
410 (modify-syntax-entry ?\s " ")
411 @result{} nil
412 @end group
413
414 @group
415 ;; @r{Make @samp{$} an open parenthesis character,}
416 ;; @r{with @samp{^} as its matching close.}
417 (modify-syntax-entry ?$ "(^")
418 @result{} nil
419 @end group
420
421 @group
422 ;; @r{Make @samp{^} a close parenthesis character,}
423 ;; @r{with @samp{$} as its matching open.}
424 (modify-syntax-entry ?^ ")$")
425 @result{} nil
426 @end group
427
428 @group
429 ;; @r{Make @samp{/} a punctuation character,}
430 ;; @r{the first character of a start-comment sequence,}
431 ;; @r{and the second character of an end-comment sequence.}
432 ;; @r{This is used in C mode.}
433 (modify-syntax-entry ?/ ". 14")
434 @result{} nil
435 @end group
436 @end example
437 @end deffn
438
439 @defun char-syntax character
440 This function returns the syntax class of @var{character}, represented
441 by its mnemonic designator character. This returns @emph{only} the
442 class, not any matching parenthesis or flags.
443
444 An error is signaled if @var{char} is not a character.
445
446 The following examples apply to C mode. The first example shows that
447 the syntax class of space is whitespace (represented by a space). The
448 second example shows that the syntax of @samp{/} is punctuation. This
449 does not show the fact that it is also part of comment-start and -end
450 sequences. The third example shows that open parenthesis is in the class
451 of open parentheses. This does not show the fact that it has a matching
452 character, @samp{)}.
453
454 @example
455 @group
456 (string (char-syntax ?\s))
457 @result{} " "
458 @end group
459
460 @group
461 (string (char-syntax ?/))
462 @result{} "."
463 @end group
464
465 @group
466 (string (char-syntax ?\())
467 @result{} "("
468 @end group
469 @end example
470
471 We use @code{string} to make it easier to see the character returned by
472 @code{char-syntax}.
473 @end defun
474
475 @defun set-syntax-table table
476 This function makes @var{table} the syntax table for the current buffer.
477 It returns @var{table}.
478 @end defun
479
480 @defun syntax-table
481 This function returns the current syntax table, which is the table for
482 the current buffer.
483 @end defun
484
485 @defmac with-syntax-table @var{table} @var{body}@dots{}
486 This macro executes @var{body} using @var{table} as the current syntax
487 table. It returns the value of the last form in @var{body}, after
488 restoring the old current syntax table.
489
490 Since each buffer has its own current syntax table, we should make that
491 more precise: @code{with-syntax-table} temporarily alters the current
492 syntax table of whichever buffer is current at the time the macro
493 execution starts. Other buffers are not affected.
494 @end defmac
495
496 @node Syntax Properties
497 @section Syntax Properties
498 @kindex syntax-table @r{(text property)}
499
500 When the syntax table is not flexible enough to specify the syntax of
501 a language, you can override the syntax table for specific character
502 occurrences in the buffer, by applying a @code{syntax-table} text
503 property. @xref{Text Properties}, for how to apply text properties.
504
505 The valid values of @code{syntax-table} text property are:
506
507 @table @asis
508 @item @var{syntax-table}
509 If the property value is a syntax table, that table is used instead of
510 the current buffer's syntax table to determine the syntax for the
511 underlying text character.
512
513 @item @code{(@var{syntax-code} . @var{matching-char})}
514 A cons cell of this format specifies the syntax for the underlying
515 text character. (@pxref{Syntax Table Internals})
516
517 @item @code{nil}
518 If the property is @code{nil}, the character's syntax is determined from
519 the current syntax table in the usual way.
520 @end table
521
522 @defvar parse-sexp-lookup-properties
523 If this is non-@code{nil}, the syntax scanning functions, like
524 @code{forward-sexp}, pay attention to syntax text properties.
525 Otherwise they use only the current syntax table.
526 @end defvar
527
528 @defvar syntax-propertize-function
529 This variable, if non-@code{nil}, should store a function for applying
530 @code{syntax-table} properties to a specified stretch of text. It is
531 intended to be used by major modes to install a function which applies
532 @code{syntax-table} properties in some mode-appropriate way.
533
534 The function is called by @code{syntax-ppss} (@pxref{Position Parse}),
535 and by Font Lock mode during syntactic fontification (@pxref{Syntactic
536 Font Lock}). It is called with two arguments, @var{start} and
537 @var{end}, which are the starting and ending positions of the text on
538 which it should act. It is allowed to call @code{syntax-ppss} on any
539 position before @var{end}. However, it should not call
540 @code{syntax-ppss-flush-cache}; so, it is not allowed to call
541 @code{syntax-ppss} on some position and later modify the buffer at an
542 earlier position.
543 @end defvar
544
545 @defvar syntax-propertize-extend-region-functions
546 This abnormal hook is run by the syntax parsing code prior to calling
547 @code{syntax-propertize-function}. Its role is to help locate safe
548 starting and ending buffer positions for passing to
549 @code{syntax-propertize-function}. For example, a major mode can add
550 a function to this hook to identify multi-line syntactic constructs,
551 and ensure that the boundaries do not fall in the middle of one.
552
553 Each function in this hook should accept two arguments, @var{start}
554 and @var{end}. It should return either a cons cell of two adjusted
555 buffer positions, @code{(@var{new-start} . @var{new-end})}, or
556 @code{nil} if no adjustment is necessary. The hook functions are run
557 in turn, repeatedly, until they all return @code{nil}.
558 @end defvar
559
560 @node Motion and Syntax
561 @section Motion and Syntax
562
563 This section describes functions for moving across characters that
564 have certain syntax classes.
565
566 @defun skip-syntax-forward syntaxes &optional limit
567 This function moves point forward across characters having syntax
568 classes mentioned in @var{syntaxes} (a string of syntax class
569 characters). It stops when it encounters the end of the buffer, or
570 position @var{limit} (if specified), or a character it is not supposed
571 to skip.
572
573 If @var{syntaxes} starts with @samp{^}, then the function skips
574 characters whose syntax is @emph{not} in @var{syntaxes}.
575
576 The return value is the distance traveled, which is a nonnegative
577 integer.
578 @end defun
579
580 @defun skip-syntax-backward syntaxes &optional limit
581 This function moves point backward across characters whose syntax
582 classes are mentioned in @var{syntaxes}. It stops when it encounters
583 the beginning of the buffer, or position @var{limit} (if specified), or
584 a character it is not supposed to skip.
585
586 If @var{syntaxes} starts with @samp{^}, then the function skips
587 characters whose syntax is @emph{not} in @var{syntaxes}.
588
589 The return value indicates the distance traveled. It is an integer that
590 is zero or less.
591 @end defun
592
593 @defun backward-prefix-chars
594 This function moves point backward over any number of characters with
595 expression prefix syntax. This includes both characters in the
596 expression prefix syntax class, and characters with the @samp{p} flag.
597 @end defun
598
599 @node Parsing Expressions
600 @section Parsing Expressions
601
602 This section describes functions for parsing and scanning balanced
603 expressions. We will refer to such expressions as @dfn{sexps},
604 following the terminology of Lisp, even though these functions can act
605 on languages other than Lisp. Basically, a sexp is either a balanced
606 parenthetical grouping, a string, or a ``symbol'' (i.e.@: a sequence
607 of characters whose syntax is either word constituent or symbol
608 constituent). However, characters in the expression prefix syntax
609 class (@pxref{Syntax Class Table}) are treated as part of the sexp if
610 they appear next to it.
611
612 The syntax table controls the interpretation of characters, so these
613 functions can be used for Lisp expressions when in Lisp mode and for C
614 expressions when in C mode. @xref{List Motion}, for convenient
615 higher-level functions for moving over balanced expressions.
616
617 A character's syntax controls how it changes the state of the
618 parser, rather than describing the state itself. For example, a
619 string delimiter character toggles the parser state between
620 ``in-string'' and ``in-code'', but the syntax of characters does not
621 directly say whether they are inside a string. For example (note that
622 15 is the syntax code for generic string delimiters),
623
624 @example
625 (put-text-property 1 9 'syntax-table '(15 . nil))
626 @end example
627
628 @noindent
629 does not tell Emacs that the first eight chars of the current buffer
630 are a string, but rather that they are all string delimiters. As a
631 result, Emacs treats them as four consecutive empty string constants.
632
633 @menu
634 * Motion via Parsing:: Motion functions that work by parsing.
635 * Position Parse:: Determining the syntactic state of a position.
636 * Parser State:: How Emacs represents a syntactic state.
637 * Low-Level Parsing:: Parsing across a specified region.
638 * Control Parsing:: Parameters that affect parsing.
639 @end menu
640
641 @node Motion via Parsing
642 @subsection Motion Commands Based on Parsing
643
644 This section describes simple point-motion functions that operate
645 based on parsing expressions.
646
647 @defun scan-lists from count depth
648 This function scans forward @var{count} balanced parenthetical
649 groupings from position @var{from}. It returns the position where the
650 scan stops. If @var{count} is negative, the scan moves backwards.
651
652 If @var{depth} is nonzero, treat the starting position as being
653 @var{depth} parentheses deep. The scanner moves forward or backward
654 through the buffer until the depth changes to zero @var{count} times.
655 Hence, a positive value for @var{depth} has the effect of moving out
656 @var{depth} levels of parenthesis from the starting position, while a
657 negative @var{depth} has the effect of moving deeper by @var{-depth}
658 levels of parenthesis.
659
660 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
661 non-@code{nil}.
662
663 If the scan reaches the beginning or end of the accessible part of the
664 buffer before it has scanned over @var{count} parenthetical groupings,
665 the return value is @code{nil} if the depth at that point is zero; if
666 the depth is non-zero, a @code{scan-error} error is signaled.
667 @end defun
668
669 @defun scan-sexps from count
670 This function scans forward @var{count} sexps from position @var{from}.
671 It returns the position where the scan stops. If @var{count} is
672 negative, the scan moves backwards.
673
674 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
675 non-@code{nil}.
676
677 If the scan reaches the beginning or end of (the accessible part of) the
678 buffer while in the middle of a parenthetical grouping, an error is
679 signaled. If it reaches the beginning or end between groupings but
680 before count is used up, @code{nil} is returned.
681 @end defun
682
683 @defun forward-comment count
684 This function moves point forward across @var{count} complete comments
685 (that is, including the starting delimiter and the terminating
686 delimiter if any), plus any whitespace encountered on the way. It
687 moves backward if @var{count} is negative. If it encounters anything
688 other than a comment or whitespace, it stops, leaving point at the
689 place where it stopped. This includes (for instance) finding the end
690 of a comment when moving forward and expecting the beginning of one.
691 The function also stops immediately after moving over the specified
692 number of complete comments. If @var{count} comments are found as
693 expected, with nothing except whitespace between them, it returns
694 @code{t}; otherwise it returns @code{nil}.
695
696 This function cannot tell whether the ``comments'' it traverses are
697 embedded within a string. If they look like comments, it treats them
698 as comments.
699
700 To move forward over all comments and whitespace following point, use
701 @code{(forward-comment (buffer-size))}. @code{(buffer-size)} is a
702 good argument to use, because the number of comments in the buffer
703 cannot exceed that many.
704 @end defun
705
706 @node Position Parse
707 @subsection Finding the Parse State for a Position
708
709 For syntactic analysis, such as in indentation, often the useful
710 thing is to compute the syntactic state corresponding to a given buffer
711 position. This function does that conveniently.
712
713 @defun syntax-ppss &optional pos
714 This function returns the parser state that the parser would reach at
715 position @var{pos} starting from the beginning of the buffer.
716 @iftex
717 See the next section for
718 @end iftex
719 @ifnottex
720 @xref{Parser State},
721 @end ifnottex
722 for a description of the parser state.
723
724 The return value is the same as if you call the low-level parsing
725 function @code{parse-partial-sexp} to parse from the beginning of the
726 buffer to @var{pos} (@pxref{Low-Level Parsing}). However,
727 @code{syntax-ppss} uses a cache to speed up the computation. Due to
728 this optimization, the second value (previous complete subexpression)
729 and sixth value (minimum parenthesis depth) in the returned parser
730 state are not meaningful.
731
732 This function has a side effect: it adds a buffer-local entry to
733 @code{before-change-functions} (@pxref{Change Hooks}) for
734 @code{syntax-ppss-flush-cache} (see below). This entry keeps the
735 cache consistent as the buffer is modified. However, the cache might
736 not be updated if @code{syntax-ppss} is called while
737 @code{before-change-functions} is temporarily let-bound, or if the
738 buffer is modified without running the hook, such as when using
739 @code{inhibit-modification-hooks}. In those cases, it is necessary to
740 call @code{syntax-ppss-flush-cache} explicitly.
741 @end defun
742
743 @defun syntax-ppss-flush-cache beg &rest ignored-args
744 This function flushes the cache used by @code{syntax-ppss}, starting
745 at position @var{beg}. The remaining arguments, @var{ignored-args},
746 are ignored; this function accepts them so that it can be directly
747 used on hooks such as @code{before-change-functions} (@pxref{Change
748 Hooks}).
749 @end defun
750
751 Major modes can make @code{syntax-ppss} run faster by specifying
752 where it needs to start parsing.
753
754 @defvar syntax-begin-function
755 If this is non-@code{nil}, it should be a function that moves to an
756 earlier buffer position where the parser state is equivalent to
757 @code{nil}---in other words, a position outside of any comment,
758 string, or parenthesis. @code{syntax-ppss} uses it to further
759 optimize its computations, when the cache gives no help.
760 @end defvar
761
762 @node Parser State
763 @subsection Parser State
764 @cindex parser state
765
766 A @dfn{parser state} is a list of ten elements describing the state
767 of the syntactic parser, after it parses the text between a specified
768 starting point and a specified end point in the buffer. Parsing
769 functions such as @code{syntax-ppss}
770 @ifnottex
771 (@pxref{Position Parse})
772 @end ifnottex
773 return a parser state as the value. Some parsing functions accept a
774 parser state as an argument, for resuming parsing.
775
776 Here are the meanings of the elements of the parser state:
777
778 @enumerate 0
779 @item
780 The depth in parentheses, counting from 0. @strong{Warning:} this can
781 be negative if there are more close parens than open parens between
782 the parser's starting point and end point.
783
784 @item
785 @cindex innermost containing parentheses
786 The character position of the start of the innermost parenthetical
787 grouping containing the stopping point; @code{nil} if none.
788
789 @item
790 @cindex previous complete subexpression
791 The character position of the start of the last complete subexpression
792 terminated; @code{nil} if none.
793
794 @item
795 @cindex inside string
796 Non-@code{nil} if inside a string. More precisely, this is the
797 character that will terminate the string, or @code{t} if a generic
798 string delimiter character should terminate it.
799
800 @item
801 @cindex inside comment
802 @code{t} if inside a non-nestable comment (of any comment style;
803 @pxref{Syntax Flags}); or the comment nesting level if inside a
804 comment that can be nested.
805
806 @item
807 @cindex quote character
808 @code{t} if the end point is just after a quote character.
809
810 @item
811 The minimum parenthesis depth encountered during this scan.
812
813 @item
814 What kind of comment is active: @code{nil} if not in a comment or in a
815 comment of style @samp{a}; 1 for a comment of style @samp{b}; 2 for a
816 comment of style @samp{c}; and @code{syntax-table} for a comment that
817 should be ended by a generic comment delimiter character.
818
819 @item
820 The string or comment start position. While inside a comment, this is
821 the position where the comment began; while inside a string, this is the
822 position where the string began. When outside of strings and comments,
823 this element is @code{nil}.
824
825 @item
826 Internal data for continuing the parsing. The meaning of this
827 data is subject to change; it is used if you pass this list
828 as the @var{state} argument to another call.
829 @end enumerate
830
831 Elements 1, 2, and 6 are ignored in a state which you pass as an
832 argument to continue parsing, and elements 8 and 9 are used only in
833 trivial cases. Those elements are mainly used internally by the
834 parser code.
835
836 One additional piece of useful information is available from a
837 parser state using this function:
838
839 @defun syntax-ppss-toplevel-pos state
840 This function extracts, from parser state @var{state}, the last
841 position scanned in the parse which was at top level in grammatical
842 structure. ``At top level'' means outside of any parentheses,
843 comments, or strings.
844
845 The value is @code{nil} if @var{state} represents a parse which has
846 arrived at a top level position.
847 @end defun
848
849 @node Low-Level Parsing
850 @subsection Low-Level Parsing
851
852 The most basic way to use the expression parser is to tell it
853 to start at a given position with a certain state, and parse up to
854 a specified end position.
855
856 @defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment
857 This function parses a sexp in the current buffer starting at
858 @var{start}, not scanning past @var{limit}. It stops at position
859 @var{limit} or when certain criteria described below are met, and sets
860 point to the location where parsing stops. It returns a parser state
861 describing the status of the parse at the point where it stops.
862
863 @cindex parenthesis depth
864 If the third argument @var{target-depth} is non-@code{nil}, parsing
865 stops if the depth in parentheses becomes equal to @var{target-depth}.
866 The depth starts at 0, or at whatever is given in @var{state}.
867
868 If the fourth argument @var{stop-before} is non-@code{nil}, parsing
869 stops when it comes to any character that starts a sexp. If
870 @var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
871 start of a comment. If @var{stop-comment} is the symbol
872 @code{syntax-table}, parsing stops after the start of a comment or a
873 string, or the end of a comment or a string, whichever comes first.
874
875 If @var{state} is @code{nil}, @var{start} is assumed to be at the top
876 level of parenthesis structure, such as the beginning of a function
877 definition. Alternatively, you might wish to resume parsing in the
878 middle of the structure. To do this, you must provide a @var{state}
879 argument that describes the initial status of parsing. The value
880 returned by a previous call to @code{parse-partial-sexp} will do
881 nicely.
882 @end defun
883
884 @node Control Parsing
885 @subsection Parameters to Control Parsing
886
887 @defvar multibyte-syntax-as-symbol
888 If this variable is non-@code{nil}, @code{scan-sexps} treats all
889 non-@acronym{ASCII} characters as symbol constituents regardless
890 of what the syntax table says about them. (However, text properties
891 can still override the syntax.)
892 @end defvar
893
894 @defopt parse-sexp-ignore-comments
895 @cindex skipping comments
896 If the value is non-@code{nil}, then comments are treated as
897 whitespace by the functions in this section and by @code{forward-sexp},
898 @code{scan-lists} and @code{scan-sexps}.
899 @end defopt
900
901 @vindex parse-sexp-lookup-properties
902 The behavior of @code{parse-partial-sexp} is also affected by
903 @code{parse-sexp-lookup-properties} (@pxref{Syntax Properties}).
904
905 You can use @code{forward-comment} to move forward or backward over
906 one comment or several comments.
907
908 @node Standard Syntax Tables
909 @section Some Standard Syntax Tables
910
911 Most of the major modes in Emacs have their own syntax tables. Here
912 are several of them:
913
914 @defun standard-syntax-table
915 This function returns the standard syntax table, which is the syntax
916 table used in Fundamental mode.
917 @end defun
918
919 @defvar text-mode-syntax-table
920 The value of this variable is the syntax table used in Text mode.
921 @end defvar
922
923 @defvar c-mode-syntax-table
924 The value of this variable is the syntax table for C-mode buffers.
925 @end defvar
926
927 @defvar emacs-lisp-mode-syntax-table
928 The value of this variable is the syntax table used in Emacs Lisp mode
929 by editing commands. (It has no effect on the Lisp @code{read}
930 function.)
931 @end defvar
932
933 @node Syntax Table Internals
934 @section Syntax Table Internals
935 @cindex syntax table internals
936
937 Lisp programs don't usually work with the elements directly; the
938 Lisp-level syntax table functions usually work with syntax descriptors
939 (@pxref{Syntax Descriptors}). Nonetheless, here we document the
940 internal format. This format is used mostly when manipulating
941 syntax properties.
942
943 Each element of a syntax table is a cons cell of the form
944 @code{(@var{syntax-code} . @var{matching-char})}. The @sc{car},
945 @var{syntax-code}, is an integer that encodes the syntax class, and any
946 flags. The @sc{cdr}, @var{matching-char}, is non-@code{nil} if
947 a character to match was specified.
948
949 This table gives the value of @var{syntax-code} which corresponds
950 to each syntactic type.
951
952 @multitable @columnfractions .05 .3 .3 .31
953 @item
954 @tab
955 @i{Integer} @i{Class}
956 @tab
957 @i{Integer} @i{Class}
958 @tab
959 @i{Integer} @i{Class}
960 @item
961 @tab
962 0 @ @ whitespace
963 @tab
964 5 @ @ close parenthesis
965 @tab
966 10 @ @ character quote
967 @item
968 @tab
969 1 @ @ punctuation
970 @tab
971 6 @ @ expression prefix
972 @tab
973 11 @ @ comment-start
974 @item
975 @tab
976 2 @ @ word
977 @tab
978 7 @ @ string quote
979 @tab
980 12 @ @ comment-end
981 @item
982 @tab
983 3 @ @ symbol
984 @tab
985 8 @ @ paired delimiter
986 @tab
987 13 @ @ inherit
988 @item
989 @tab
990 4 @ @ open parenthesis
991 @tab
992 9 @ @ escape
993 @tab
994 14 @ @ generic comment
995 @item
996 @tab
997 15 @ generic string
998 @end multitable
999
1000 For example, the usual syntax value for @samp{(} is @code{(4 . 41)}.
1001 (41 is the character code for @samp{)}.)
1002
1003 The flags are encoded in higher order bits, starting 16 bits from the
1004 least significant bit. This table gives the power of two which
1005 corresponds to each syntax flag.
1006
1007 @multitable @columnfractions .05 .3 .3 .3
1008 @item
1009 @tab
1010 @i{Prefix} @i{Flag}
1011 @tab
1012 @i{Prefix} @i{Flag}
1013 @tab
1014 @i{Prefix} @i{Flag}
1015 @item
1016 @tab
1017 @samp{1} @ @ @code{(lsh 1 16)}
1018 @tab
1019 @samp{4} @ @ @code{(lsh 1 19)}
1020 @tab
1021 @samp{b} @ @ @code{(lsh 1 21)}
1022 @item
1023 @tab
1024 @samp{2} @ @ @code{(lsh 1 17)}
1025 @tab
1026 @samp{p} @ @ @code{(lsh 1 20)}
1027 @tab
1028 @samp{n} @ @ @code{(lsh 1 22)}
1029 @item
1030 @tab
1031 @samp{3} @ @ @code{(lsh 1 18)}
1032 @end multitable
1033
1034 @defun string-to-syntax @var{desc}
1035 This function returns the internal form corresponding to the syntax
1036 descriptor @var{desc}, a cons cell @code{(@var{syntax-code}
1037 . @var{matching-char})}.
1038 @end defun
1039
1040 @defun syntax-after pos
1041 This function returns the syntax code of the character in the buffer
1042 after position @var{pos}, taking account of syntax properties as well
1043 as the syntax table. If @var{pos} is outside the buffer's accessible
1044 portion (@pxref{Narrowing, accessible portion}), this function returns
1045 @code{nil}.
1046 @end defun
1047
1048 @defun syntax-class syntax
1049 This function returns the syntax class of the syntax code
1050 @var{syntax}. (It masks off the high 16 bits that hold the flags
1051 encoded in the syntax descriptor.) If @var{syntax} is @code{nil}, it
1052 returns @code{nil}; this is so evaluating the expression
1053
1054 @example
1055 (syntax-class (syntax-after pos))
1056 @end example
1057
1058 @noindent
1059 where @code{pos} is outside the buffer's accessible portion, will
1060 yield @code{nil} without throwing errors or producing wrong syntax
1061 class codes.
1062 @end defun
1063
1064 @node Categories
1065 @section Categories
1066 @cindex categories of characters
1067 @cindex character categories
1068
1069 @dfn{Categories} provide an alternate way of classifying characters
1070 syntactically. You can define several categories as needed, then
1071 independently assign each character to one or more categories. Unlike
1072 syntax classes, categories are not mutually exclusive; it is normal for
1073 one character to belong to several categories.
1074
1075 @cindex category table
1076 Each buffer has a @dfn{category table} which records which categories
1077 are defined and also which characters belong to each category. Each
1078 category table defines its own categories, but normally these are
1079 initialized by copying from the standard categories table, so that the
1080 standard categories are available in all modes.
1081
1082 Each category has a name, which is an @acronym{ASCII} printing character in
1083 the range @w{@samp{ }} to @samp{~}. You specify the name of a category
1084 when you define it with @code{define-category}.
1085
1086 The category table is actually a char-table (@pxref{Char-Tables}).
1087 The element of the category table at index @var{c} is a @dfn{category
1088 set}---a bool-vector---that indicates which categories character @var{c}
1089 belongs to. In this category set, if the element at index @var{cat} is
1090 @code{t}, that means category @var{cat} is a member of the set, and that
1091 character @var{c} belongs to category @var{cat}.
1092
1093 For the next three functions, the optional argument @var{table}
1094 defaults to the current buffer's category table.
1095
1096 @defun define-category char docstring &optional table
1097 This function defines a new category, with name @var{char} and
1098 documentation @var{docstring}, for the category table @var{table}.
1099
1100 Here's an example of defining a new category for characters that have
1101 strong right-to-left directionality (@pxref{Bidirectional Display})
1102 and using it in a special category table:
1103
1104 @example
1105 (defvar special-category-table-for-bidi
1106 (let ((category-table (make-category-table))
1107 (uniprop-table (unicode-property-table-internal 'bidi-class)))
1108 (define-category ?R "Characters of bidi-class R, AL, or RLO"
1109 category-table)
1110 (map-char-table
1111 #'(lambda (key val)
1112 (if (memq val '(R AL RLO))
1113 (modify-category-entry key ?R category-table)))
1114 uniprop-table)
1115 category-table))
1116 @end example
1117 @end defun
1118
1119 @defun category-docstring category &optional table
1120 This function returns the documentation string of category @var{category}
1121 in category table @var{table}.
1122
1123 @example
1124 (category-docstring ?a)
1125 @result{} "ASCII"
1126 (category-docstring ?l)
1127 @result{} "Latin"
1128 @end example
1129 @end defun
1130
1131 @defun get-unused-category &optional table
1132 This function returns a category name (a character) which is not
1133 currently defined in @var{table}. If all possible categories are in use
1134 in @var{table}, it returns @code{nil}.
1135 @end defun
1136
1137 @defun category-table
1138 This function returns the current buffer's category table.
1139 @end defun
1140
1141 @defun category-table-p object
1142 This function returns @code{t} if @var{object} is a category table,
1143 otherwise @code{nil}.
1144 @end defun
1145
1146 @defun standard-category-table
1147 This function returns the standard category table.
1148 @end defun
1149
1150 @defun copy-category-table &optional table
1151 This function constructs a copy of @var{table} and returns it. If
1152 @var{table} is not supplied (or is @code{nil}), it returns a copy of the
1153 standard category table. Otherwise, an error is signaled if @var{table}
1154 is not a category table.
1155 @end defun
1156
1157 @defun set-category-table table
1158 This function makes @var{table} the category table for the current
1159 buffer. It returns @var{table}.
1160 @end defun
1161
1162 @defun make-category-table
1163 This creates and returns an empty category table. In an empty category
1164 table, no categories have been allocated, and no characters belong to
1165 any categories.
1166 @end defun
1167
1168 @defun make-category-set categories
1169 This function returns a new category set---a bool-vector---whose initial
1170 contents are the categories listed in the string @var{categories}. The
1171 elements of @var{categories} should be category names; the new category
1172 set has @code{t} for each of those categories, and @code{nil} for all
1173 other categories.
1174
1175 @example
1176 (make-category-set "al")
1177 @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1178 @end example
1179 @end defun
1180
1181 @defun char-category-set char
1182 This function returns the category set for character @var{char} in the
1183 current buffer's category table. This is the bool-vector which
1184 records which categories the character @var{char} belongs to. The
1185 function @code{char-category-set} does not allocate storage, because
1186 it returns the same bool-vector that exists in the category table.
1187
1188 @example
1189 (char-category-set ?a)
1190 @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1191 @end example
1192 @end defun
1193
1194 @defun category-set-mnemonics category-set
1195 This function converts the category set @var{category-set} into a string
1196 containing the characters that designate the categories that are members
1197 of the set.
1198
1199 @example
1200 (category-set-mnemonics (char-category-set ?a))
1201 @result{} "al"
1202 @end example
1203 @end defun
1204
1205 @defun modify-category-entry char category &optional table reset
1206 This function modifies the category set of @var{char} in category
1207 table @var{table} (which defaults to the current buffer's category
1208 table). @var{char} can be a character, or a cons cell of the form
1209 @code{(@var{min} . @var{max})}; in the latter case, the function
1210 modifies the category sets of all characters in the range between
1211 @var{min} and @var{max}, inclusive.
1212
1213 Normally, it modifies a category set by adding @var{category} to it.
1214 But if @var{reset} is non-@code{nil}, then it deletes @var{category}
1215 instead.
1216 @end defun
1217
1218 @deffn Command describe-categories &optional buffer-or-name
1219 This function describes the category specifications in the current
1220 category table. It inserts the descriptions in a buffer, and then
1221 displays that buffer. If @var{buffer-or-name} is non-@code{nil}, it
1222 describes the category table of that buffer instead.
1223 @end deffn