code.delx.au - gnu-emacs/blob - doc/lispref/syntax.texi

   1 @c -*-texinfo-*-
   2 @c This is part of the GNU Emacs Lisp Reference Manual.
   3 @c Copyright (C) 1990-1995, 1998-1999, 2001-2012
   4 @c   Free Software Foundation, Inc.
   5 @c See the file elisp.texi for copying conditions.
   6 @setfilename ../../info/syntax
   7 @node Syntax Tables, Abbrevs, Searching and Matching, Top
   8 @chapter Syntax Tables
   9 @cindex parsing buffer text
  10 @cindex syntax table
  11 @cindex text parsing
  12
  13   A @dfn{syntax table} specifies the syntactic role of each character
  14 in a buffer.  It can be used to determine where words, symbols, and
  15 other syntactic constructs begin and end.  This information is used by
  16 many Emacs facilities, including Font Lock mode (@pxref{Font Lock
  17 Mode}) and the various complex movement commands (@pxref{Motion}).
  18
  19 @menu
  20 * Basics: Syntax Basics.     Basic concepts of syntax tables.
  21 * Syntax Descriptors::       How characters are classified.
  22 * Syntax Table Functions::   How to create, examine and alter syntax tables.
  23 * Syntax Properties::        Overriding syntax with text properties.
  24 * Motion and Syntax::        Moving over characters with certain syntaxes.
  25 * Parsing Expressions::      Parsing balanced expressions
  26                                 using the syntax table.
  27 * Standard Syntax Tables::   Syntax tables used by various major modes.
  28 * Syntax Table Internals::   How syntax table information is stored.
  29 * Categories::               Another way of classifying character syntax.
  30 @end menu
  31
  32 @node Syntax Basics
  33 @section Syntax Table Concepts
  34
  35   A syntax table is a char-table (@pxref{Char-Tables}).  The element at
  36 index @var{c} describes the character with code @var{c}.  The element's
  37 value should be a list that encodes the syntax of the character in
  38 question.
  39
  40   Syntax tables are used only for moving across text, not for the Emacs
  41 Lisp reader.  Emacs Lisp uses built-in syntactic rules when reading Lisp
  42 expressions, and these rules cannot be changed.  (Some Lisp systems
  43 provide ways to redefine the read syntax, but we decided to leave this
  44 feature out of Emacs Lisp for simplicity.)
  45
  46   Each buffer has its own major mode, and each major mode has its own
  47 idea of the syntactic class of various characters.  For example, in
  48 Lisp mode, the character @samp{;} begins a comment, but in C mode, it
  49 terminates a statement.  To support these variations, Emacs makes the
  50 syntax table local to each buffer.  Typically, each major mode has its
  51 own syntax table and installs that table in each buffer that uses that
  52 mode.  Changing this table alters the syntax in all those buffers as
  53 well as in any buffers subsequently put in that mode.  Occasionally
  54 several similar modes share one syntax table.  @xref{Example Major
  55 Modes}, for an example of how to set up a syntax table.
  56
  57 A syntax table can inherit the data for some characters from the
  58 standard syntax table, while specifying other characters itself.  The
  59 ``inherit'' syntax class means ``inherit this character's syntax from
  60 the standard syntax table.''  Just changing the standard syntax for a
  61 character affects all syntax tables that inherit from it.
  62
  63 @defun syntax-table-p object
  64 This function returns @code{t} if @var{object} is a syntax table.
  65 @end defun
  66
  67 @node Syntax Descriptors
  68 @section Syntax Descriptors
  69 @cindex syntax class
  70
  71   The syntactic role of a character is called its @dfn{syntax class}.
  72 Each syntax table specifies the syntax class of each character.  There
  73 is no necessary relationship between the class of a character in one
  74 syntax table and its class in any other table.
  75
  76   Each syntax class is designated by a mnemonic character, which
  77 serves as the name of the class when you need to specify a class.
  78 Usually, this designator character is one that is often assigned that
  79 class; however, its meaning as a designator is unvarying and
  80 independent of what syntax that character currently has.  Thus,
  81 @samp{\} as a designator character always means ``escape character''
  82 syntax, regardless of whether the @samp{\} character actually has that
  83 syntax in the current syntax table.
  84 @ifnottex
  85 @xref{Syntax Class Table}, for a list of syntax classes.
  86 @end ifnottex
  87
  88 @cindex syntax descriptor
  89   A @dfn{syntax descriptor} is a Lisp string that describes the syntax
  90 classes and other syntactic properties of a character.  When you want
  91 to modify the syntax of a character, that is done by calling the
  92 function @code{modify-syntax-entry} and passing a syntax descriptor as
  93 one of its arguments (@pxref{Syntax Table Functions}).
  94
  95   The first character in a syntax descriptor designates the syntax
  96 class.  The second character specifies a matching character (e.g.@: in
  97 Lisp, the matching character for @samp{(} is @samp{)}); if there is no
  98 matching character, put a space there.  Then come the characters for
  99 any desired flags.
 100
 101   If no matching character or flags are needed, only one character
 102 (specifying the syntax class) is sufficient.
 103
 104   For example, the syntax descriptor for the character @samp{*} in C
 105 mode is @code{". 23"} (i.e., punctuation, matching character slot
 106 unused, second character of a comment-starter, first character of a
 107 comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e.,
 108 punctuation, matching character slot unused, first character of a
 109 comment-starter, second character of a comment-ender).
 110
 111 @menu
 112 * Syntax Class Table::      Table of syntax classes.
 113 * Syntax Flags::            Additional flags each character can have.
 114 @end menu
 115
 116 @node Syntax Class Table
 117 @subsection Table of Syntax Classes
 118
 119   Here is a table of syntax classes, the characters that designate
 120 them, their meanings, and examples of their use.
 121
 122 @table @asis
 123 @item Whitespace characters: @samp{@ } or @samp{-}
 124 Characters that separate symbols and words from each other.
 125 Typically, whitespace characters have no other syntactic significance,
 126 and multiple whitespace characters are syntactically equivalent to a
 127 single one.  Space, tab, and formfeed are classified as whitespace in
 128 almost all major modes.
 129
 130 This syntax class can be designated by either @w{@samp{@ }} or
 131 @samp{-}.  Both designators are equivalent.
 132
 133 @item Word constituents: @samp{w}
 134 Parts of words in human languages.  These are typically used in
 135 variable and command names in programs.  All upper- and lower-case
 136 letters, and the digits, are typically word constituents.
 137
 138 @item Symbol constituents: @samp{_}
 139 Extra characters used in variable and command names along with word
 140 constituents.  Examples include the characters @samp{$&*+-_<>} in Lisp
 141 mode, which may be part of a symbol name even though they are not part
 142 of English words.  In standard C, the only non-word-constituent
 143 character that is valid in symbols is underscore (@samp{_}).
 144
 145 @item Punctuation characters: @samp{.}
 146 Characters used as punctuation in a human language, or used in a
 147 programming language to separate symbols from one another.  Some
 148 programming language modes, such as Emacs Lisp mode, have no
 149 characters in this class since the few characters that are not symbol
 150 or word constituents all have other uses.  Other programming language
 151 modes, such as C mode, use punctuation syntax for operators.
 152
 153 @item Open parenthesis characters: @samp{(}
 154 @itemx Close parenthesis characters: @samp{)}
 155 Characters used in dissimilar pairs to surround sentences or
 156 expressions.  Such a grouping is begun with an open parenthesis
 157 character and terminated with a close.  Each open parenthesis
 158 character matches a particular close parenthesis character, and vice
 159 versa.  Normally, Emacs indicates momentarily the matching open
 160 parenthesis when you insert a close parenthesis.  @xref{Blinking}.
 161
 162 In human languages, and in C code, the parenthesis pairs are
 163 @samp{()}, @samp{[]}, and @samp{@{@}}.  In Emacs Lisp, the delimiters
 164 for lists and vectors (@samp{()} and @samp{[]}) are classified as
 165 parenthesis characters.
 166
 167 @item String quotes: @samp{"}
 168 Characters used to delimit string constants.  The same string quote
 169 character appears at the beginning and the end of a string.  Such
 170 quoted strings do not nest.
 171
 172 The parsing facilities of Emacs consider a string as a single token.
 173 The usual syntactic meanings of the characters in the string are
 174 suppressed.
 175
 176 The Lisp modes have two string quote characters: double-quote (@samp{"})
 177 and vertical bar (@samp{|}).  @samp{|} is not used in Emacs Lisp, but it
 178 is used in Common Lisp.  C also has two string quote characters:
 179 double-quote for strings, and single-quote (@samp{'}) for character
 180 constants.
 181
 182 Human text has no string quote characters.  We do not want quotation
 183 marks to turn off the usual syntactic properties of other characters
 184 in the quotation.
 185
 186 @item Escape-syntax characters: @samp{\}
 187 Characters that start an escape sequence, such as is used in string
 188 and character constants.  The character @samp{\} belongs to this class
 189 in both C and Lisp.  (In C, it is used thus only inside strings, but
 190 it turns out to cause no trouble to treat it this way throughout C
 191 code.)
 192
 193 Characters in this class count as part of words if
 194 @code{words-include-escapes} is non-@code{nil}.  @xref{Word Motion}.
 195
 196 @item Character quotes: @samp{/}
 197 Characters used to quote the following character so that it loses its
 198 normal syntactic meaning.  This differs from an escape character in
 199 that only the character immediately following is ever affected.
 200
 201 Characters in this class count as part of words if
 202 @code{words-include-escapes} is non-@code{nil}.  @xref{Word Motion}.
 203
 204 This class is used for backslash in @TeX{} mode.
 205
 206 @item Paired delimiters: @samp{$}
 207 Similar to string quote characters, except that the syntactic
 208 properties of the characters between the delimiters are not
 209 suppressed.  Only @TeX{} mode uses a paired delimiter presently---the
 210 @samp{$} that both enters and leaves math mode.
 211
 212 @item Expression prefixes: @samp{'}
 213 Characters used for syntactic operators that are considered as part of
 214 an expression if they appear next to one.  In Lisp modes, these
 215 characters include the apostrophe, @samp{'} (used for quoting), the
 216 comma, @samp{,} (used in macros), and @samp{#} (used in the read
 217 syntax for certain data types).
 218
 219 @item Comment starters: @samp{<}
 220 @itemx Comment enders: @samp{>}
 221 @cindex comment syntax
 222 Characters used in various languages to delimit comments.  Human text
 223 has no comment characters.  In Lisp, the semicolon (@samp{;}) starts a
 224 comment and a newline or formfeed ends one.
 225
 226 @item Inherit standard syntax: @samp{@@}
 227 This syntax class does not specify a particular syntax.  It says to
 228 look in the standard syntax table to find the syntax of this
 229 character.
 230
 231 @item Generic comment delimiters: @samp{!}
 232 Characters that start or end a special kind of comment.  @emph{Any}
 233 generic comment delimiter matches @emph{any} generic comment
 234 delimiter, but they cannot match a comment starter or comment ender;
 235 generic comment delimiters can only match each other.
 236
 237 This syntax class is primarily meant for use with the
 238 @code{syntax-table} text property (@pxref{Syntax Properties}).  You
 239 can mark any range of characters as forming a comment, by giving the
 240 first and last characters of the range @code{syntax-table} properties
 241 identifying them as generic comment delimiters.
 242
 243 @item Generic string delimiters: @samp{|}
 244 Characters that start or end a string.  This class differs from the
 245 string quote class in that @emph{any} generic string delimiter can
 246 match any other generic string delimiter; but they do not match
 247 ordinary string quote characters.
 248
 249 This syntax class is primarily meant for use with the
 250 @code{syntax-table} text property (@pxref{Syntax Properties}).  You
 251 can mark any range of characters as forming a string constant, by
 252 giving the first and last characters of the range @code{syntax-table}
 253 properties identifying them as generic string delimiters.
 254 @end table
 255
 256 @node Syntax Flags
 257 @subsection Syntax Flags
 258 @cindex syntax flags
 259
 260   In addition to the classes, entries for characters in a syntax table
 261 can specify flags.  There are eight possible flags, represented by the
 262 characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
 263 @samp{n}, and @samp{p}.
 264
 265   All the flags except @samp{p} are used to describe comment
 266 delimiters.  The digit flags are used for comment delimiters made up
 267 of 2 characters.  They indicate that a character can @emph{also} be
 268 part of a comment sequence, in addition to the syntactic properties
 269 associated with its character class.  The flags are independent of the
 270 class and each other for the sake of characters such as @samp{*} in
 271 C mode, which is a punctuation character, @emph{and} the second
 272 character of a start-of-comment sequence (@samp{/*}), @emph{and} the
 273 first character of an end-of-comment sequence (@samp{*/}).  The flags
 274 @samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
 275 comment delimiter.
 276
 277   Here is a table of the possible flags for a character @var{c},
 278 and what they mean:
 279
 280 @itemize @bullet
 281 @item
 282 @samp{1} means @var{c} is the start of a two-character comment-start
 283 sequence.
 284
 285 @item
 286 @samp{2} means @var{c} is the second character of such a sequence.
 287
 288 @item
 289 @samp{3} means @var{c} is the start of a two-character comment-end
 290 sequence.
 291
 292 @item
 293 @samp{4} means @var{c} is the second character of such a sequence.
 294
 295 @item
 296 @samp{b} means that @var{c} as a comment delimiter belongs to the
 297 alternative ``b'' comment style.  For a two-character comment starter,
 298 this flag is only significant on the second char, and for a 2-character
 299 comment ender it is only significant on the first char.
 300
 301 @item
 302 @samp{c} means that @var{c} as a comment delimiter belongs to the
 303 alternative ``c'' comment style.  For a two-character comment
 304 delimiter, @samp{c} on either character makes it of style ``c''.
 305
 306 @item
 307 @samp{n} on a comment delimiter character specifies
 308 that this kind of comment can be nested.  For a two-character
 309 comment delimiter, @samp{n} on either character makes it
 310 nestable.
 311
 312 Emacs supports several comment styles simultaneously in any one syntax
 313 table.  A comment style is a set of flags @samp{b}, @samp{c}, and
 314 @samp{n}, so there can be up to 8 different comment styles.
 315 Each comment delimiter has a style and only matches comment delimiters
 316 of the same style.  Thus if a comment starts with the comment-start
 317 sequence of style ``bn'', it will extend until the next matching
 318 comment-end sequence of style ``bn''.
 319
 320 The appropriate comment syntax settings for C++ can be as follows:
 321
 322 @table @asis
 323 @item @samp{/}
 324 @samp{124}
 325 @item @samp{*}
 326 @samp{23b}
 327 @item newline
 328 @samp{>}
 329 @end table
 330
 331 This defines four comment-delimiting sequences:
 332
 333 @table @asis
 334 @item @samp{/*}
 335 This is a comment-start sequence for ``b'' style because the
 336 second character, @samp{*}, has the @samp{b} flag.
 337
 338 @item @samp{//}
 339 This is a comment-start sequence for ``a'' style because the second
 340 character, @samp{/}, does not have the @samp{b} flag.
 341
 342 @item @samp{*/}
 343 This is a comment-end sequence for ``b'' style because the first
 344 character, @samp{*}, has the @samp{b} flag.
 345
 346 @item newline
 347 This is a comment-end sequence for ``a'' style, because the newline
 348 character does not have the @samp{b} flag.
 349 @end table
 350
 351 @item
 352 @c Emacs 19 feature
 353 @samp{p} identifies an additional ``prefix character'' for Lisp syntax.
 354 These characters are treated as whitespace when they appear between
 355 expressions.  When they appear within an expression, they are handled
 356 according to their usual syntax classes.
 357
 358 The function @code{backward-prefix-chars} moves back over these
 359 characters, as well as over characters whose primary syntax class is
 360 prefix (@samp{'}).  @xref{Motion and Syntax}.
 361 @end itemize
 362
 363 @node Syntax Table Functions
 364 @section Syntax Table Functions
 365
 366   In this section we describe functions for creating, accessing and
 367 altering syntax tables.
 368
 369 @defun make-syntax-table &optional table
 370 This function creates a new syntax table, with all values initialized
 371 to @code{nil}.  If @var{table} is non-@code{nil}, it becomes the
 372 parent of the new syntax table, otherwise the standard syntax table is
 373 the parent.  Like all char-tables, a syntax table inherits from its
 374 parent.  Thus the original syntax of all characters in the returned
 375 syntax table is determined by the parent.  @xref{Char-Tables}.
 376
 377 Most major mode syntax tables are created in this way.
 378 @end defun
 379
 380 @defun copy-syntax-table &optional table
 381 This function constructs a copy of @var{table} and returns it.  If
 382 @var{table} is not supplied (or is @code{nil}), it returns a copy of the
 383 standard syntax table.  Otherwise, an error is signaled if @var{table} is
 384 not a syntax table.
 385 @end defun
 386
 387 @deffn Command modify-syntax-entry char syntax-descriptor  &optional table
 388 This function sets the syntax entry for @var{char} according to
 389 @var{syntax-descriptor}.  @var{char} must be a character, or a cons
 390 cell of the form @code{(@var{min} . @var{max})}; in the latter case,
 391 the function sets the syntax entries for all characters in the range
 392 between @var{min} and @var{max}, inclusive.
 393
 394 The syntax is changed only for @var{table}, which defaults to the
 395 current buffer's syntax table, and not in any other syntax table.
 396
 397 The argument @var{syntax-descriptor} is a syntax descriptor for the
 398 desired syntax (i.e.@: a string beginning with a class designator
 399 character, and optionally containing a matching character and syntax
 400 flags).  An error is signaled if the first character is not one of the
 401 seventeen syntax class designators.  @xref{Syntax Descriptors}.
 402
 403 This function always returns @code{nil}.  The old syntax information in
 404 the table for this character is discarded.
 405
 406 @example
 407 @group
 408 @exdent @r{Examples:}
 409
 410 ;; @r{Put the space character in class whitespace.}
 411 (modify-syntax-entry ?\s " ")
 412      @result{} nil
 413 @end group
 414
 415 @group
 416 ;; @r{Make @samp{$} an open parenthesis character,}
 417 ;;   @r{with @samp{^} as its matching close.}
 418 (modify-syntax-entry ?$ "(^")
 419      @result{} nil
 420 @end group
 421
 422 @group
 423 ;; @r{Make @samp{^} a close parenthesis character,}
 424 ;;   @r{with @samp{$} as its matching open.}
 425 (modify-syntax-entry ?^ ")$")
 426      @result{} nil
 427 @end group
 428
 429 @group
 430 ;; @r{Make @samp{/} a punctuation character,}
 431 ;;   @r{the first character of a start-comment sequence,}
 432 ;;   @r{and the second character of an end-comment sequence.}
 433 ;;   @r{This is used in C mode.}
 434 (modify-syntax-entry ?/ ". 14")
 435      @result{} nil
 436 @end group
 437 @end example
 438 @end deffn
 439
 440 @defun char-syntax character
 441 This function returns the syntax class of @var{character}, represented
 442 by its mnemonic designator character.  This returns @emph{only} the
 443 class, not any matching parenthesis or flags.
 444
 445 An error is signaled if @var{char} is not a character.
 446
 447 The following examples apply to C mode.  The first example shows that
 448 the syntax class of space is whitespace (represented by a space).  The
 449 second example shows that the syntax of @samp{/} is punctuation.  This
 450 does not show the fact that it is also part of comment-start and -end
 451 sequences.  The third example shows that open parenthesis is in the class
 452 of open parentheses.  This does not show the fact that it has a matching
 453 character, @samp{)}.
 454
 455 @example
 456 @group
 457 (string (char-syntax ?\s))
 458      @result{} " "
 459 @end group
 460
 461 @group
 462 (string (char-syntax ?/))
 463      @result{} "."
 464 @end group
 465
 466 @group
 467 (string (char-syntax ?\())
 468      @result{} "("
 469 @end group
 470 @end example
 471
 472 We use @code{string} to make it easier to see the character returned by
 473 @code{char-syntax}.
 474 @end defun
 475
 476 @defun set-syntax-table table
 477 This function makes @var{table} the syntax table for the current buffer.
 478 It returns @var{table}.
 479 @end defun
 480
 481 @defun syntax-table
 482 This function returns the current syntax table, which is the table for
 483 the current buffer.
 484 @end defun
 485
 486 @defmac with-syntax-table @var{table} @var{body}@dots{}
 487 This macro executes @var{body} using @var{table} as the current syntax
 488 table.  It returns the value of the last form in @var{body}, after
 489 restoring the old current syntax table.
 490
 491 Since each buffer has its own current syntax table, we should make that
 492 more precise: @code{with-syntax-table} temporarily alters the current
 493 syntax table of whichever buffer is current at the time the macro
 494 execution starts.  Other buffers are not affected.
 495 @end defmac
 496
 497 @node Syntax Properties
 498 @section Syntax Properties
 499 @kindex syntax-table @r{(text property)}
 500
 501 When the syntax table is not flexible enough to specify the syntax of
 502 a language, you can override the syntax table for specific character
 503 occurrences in the buffer, by applying a @code{syntax-table} text
 504 property.  @xref{Text Properties}, for how to apply text properties.
 505
 506   The valid values of @code{syntax-table} text property are:
 507
 508 @table @asis
 509 @item @var{syntax-table}
 510 If the property value is a syntax table, that table is used instead of
 511 the current buffer's syntax table to determine the syntax for the
 512 underlying text character.
 513
 514 @item @code{(@var{syntax-code} . @var{matching-char})}
 515 A cons cell of this format specifies the syntax for the underlying
 516 text character.  (@pxref{Syntax Table Internals})
 517
 518 @item @code{nil}
 519 If the property is @code{nil}, the character's syntax is determined from
 520 the current syntax table in the usual way.
 521 @end table
 522
 523 @defvar parse-sexp-lookup-properties
 524 If this is non-@code{nil}, the syntax scanning functions, like
 525 @code{forward-sexp}, pay attention to syntax text properties.
 526 Otherwise they use only the current syntax table.
 527 @end defvar
 528
 529 @defvar syntax-propertize-function
 530 This variable, if non-@code{nil}, should store a function for applying
 531 @code{syntax-table} properties to a specified stretch of text.  It is
 532 intended to be used by major modes to install a function which applies
 533 @code{syntax-table} properties in some mode-appropriate way.
 534
 535 The function is called by @code{syntax-ppss} (@pxref{Position Parse}),
 536 and by Font Lock mode during syntactic fontification (@pxref{Syntactic
 537 Font Lock}).  It is called with two arguments, @var{start} and
 538 @var{end}, which are the starting and ending positions of the text on
 539 which it should act.  It is allowed to call @code{syntax-ppss} on any
 540 position before @var{end}.  However, it should not call
 541 @code{syntax-ppss-flush-cache}; so, it is not allowed to call
 542 @code{syntax-ppss} on some position and later modify the buffer at an
 543 earlier position.
 544 @end defvar
 545
 546 @defvar syntax-propertize-extend-region-functions
 547 This abnormal hook is run by the syntax parsing code prior to calling
 548 @code{syntax-propertize-function}.  Its role is to help locate safe
 549 starting and ending buffer positions for passing to
 550 @code{syntax-propertize-function}.  For example, a major mode can add
 551 a function to this hook to identify multi-line syntactic constructs,
 552 and ensure that the boundaries do not fall in the middle of one.
 553
 554 Each function in this hook should accept two arguments, @var{start}
 555 and @var{end}.  It should return either a cons cell of two adjusted
 556 buffer positions, @code{(@var{new-start} . @var{new-end})}, or
 557 @code{nil} if no adjustment is necessary.  The hook functions are run
 558 in turn, repeatedly, until they all return @code{nil}.
 559 @end defvar
 560
 561 @node Motion and Syntax
 562 @section Motion and Syntax
 563
 564   This section describes functions for moving across characters that
 565 have certain syntax classes.
 566
 567 @defun skip-syntax-forward syntaxes &optional limit
 568 This function moves point forward across characters having syntax
 569 classes mentioned in @var{syntaxes} (a string of syntax class
 570 characters).  It stops when it encounters the end of the buffer, or
 571 position @var{limit} (if specified), or a character it is not supposed
 572 to skip.
 573
 574 If @var{syntaxes} starts with @samp{^}, then the function skips
 575 characters whose syntax is @emph{not} in @var{syntaxes}.
 576
 577 The return value is the distance traveled, which is a nonnegative
 578 integer.
 579 @end defun
 580
 581 @defun skip-syntax-backward syntaxes &optional limit
 582 This function moves point backward across characters whose syntax
 583 classes are mentioned in @var{syntaxes}.  It stops when it encounters
 584 the beginning of the buffer, or position @var{limit} (if specified), or
 585 a character it is not supposed to skip.
 586
 587 If @var{syntaxes} starts with @samp{^}, then the function skips
 588 characters whose syntax is @emph{not} in @var{syntaxes}.
 589
 590 The return value indicates the distance traveled.  It is an integer that
 591 is zero or less.
 592 @end defun
 593
 594 @defun backward-prefix-chars
 595 This function moves point backward over any number of characters with
 596 expression prefix syntax.  This includes both characters in the
 597 expression prefix syntax class, and characters with the @samp{p} flag.
 598 @end defun
 599
 600 @node Parsing Expressions
 601 @section Parsing Expressions
 602
 603   This section describes functions for parsing and scanning balanced
 604 expressions.  We will refer to such expressions as @dfn{sexps},
 605 following the terminology of Lisp, even though these functions can act
 606 on languages other than Lisp.  Basically, a sexp is either a balanced
 607 parenthetical grouping, a string, or a ``symbol'' (i.e.@: a sequence
 608 of characters whose syntax is either word constituent or symbol
 609 constituent).  However, characters in the expression prefix syntax
 610 class (@pxref{Syntax Class Table}) are treated as part of the sexp if
 611 they appear next to it.
 612
 613   The syntax table controls the interpretation of characters, so these
 614 functions can be used for Lisp expressions when in Lisp mode and for C
 615 expressions when in C mode.  @xref{List Motion}, for convenient
 616 higher-level functions for moving over balanced expressions.
 617
 618   A character's syntax controls how it changes the state of the
 619 parser, rather than describing the state itself.  For example, a
 620 string delimiter character toggles the parser state between
 621 ``in-string'' and ``in-code,'' but the syntax of characters does not
 622 directly say whether they are inside a string.  For example (note that
 623 15 is the syntax code for generic string delimiters),
 624
 625 @example
 626 (put-text-property 1 9 'syntax-table '(15 . nil))
 627 @end example
 628
 629 @noindent
 630 does not tell Emacs that the first eight chars of the current buffer
 631 are a string, but rather that they are all string delimiters.  As a
 632 result, Emacs treats them as four consecutive empty string constants.
 633
 634 @menu
 635 * Motion via Parsing::       Motion functions that work by parsing.
 636 * Position Parse::           Determining the syntactic state of a position.
 637 * Parser State::             How Emacs represents a syntactic state.
 638 * Low-Level Parsing::        Parsing across a specified region.
 639 * Control Parsing::          Parameters that affect parsing.
 640 @end menu
 641
 642 @node Motion via Parsing
 643 @subsection Motion Commands Based on Parsing
 644
 645   This section describes simple point-motion functions that operate
 646 based on parsing expressions.
 647
 648 @defun scan-lists from count depth
 649 This function scans forward @var{count} balanced parenthetical groupings
 650 from position @var{from}.  It returns the position where the scan stops.
 651 If @var{count} is negative, the scan moves backwards.
 652
 653 If @var{depth} is nonzero, assume that the starting point is already
 654 @var{depth} parentheses deep.  This function counts out @var{count}
 655 number of points where the parenthesis depth goes back to zero, then
 656 stops.  Thus, a positive value for @var{depth} has the effect of
 657 moving out @var{depth} levels of parenthesis, whereas a negative
 658 @var{depth} has the effect of moving deeper by @var{-depth} levels of
 659 parenthesis.
 660
 661 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
 662 non-@code{nil}.
 663
 664 If the scan reaches the beginning or end of the buffer (or its
 665 accessible portion), and the depth is not zero, an error is signaled.
 666 If the depth is zero but the count is not used up, @code{nil} is
 667 returned.
 668 @end defun
 669
 670 @defun scan-sexps from count
 671 This function scans forward @var{count} sexps from position @var{from}.
 672 It returns the position where the scan stops.  If @var{count} is
 673 negative, the scan moves backwards.
 674
 675 Scanning ignores comments if @code{parse-sexp-ignore-comments} is
 676 non-@code{nil}.
 677
 678 If the scan reaches the beginning or end of (the accessible part of) the
 679 buffer while in the middle of a parenthetical grouping, an error is
 680 signaled.  If it reaches the beginning or end between groupings but
 681 before count is used up, @code{nil} is returned.
 682 @end defun
 683
 684 @defun forward-comment count
 685 This function moves point forward across @var{count} complete comments
 686      (that is, including the starting delimiter and the terminating
 687 delimiter if any), plus any whitespace encountered on the way.  It
 688 moves backward if @var{count} is negative.  If it encounters anything
 689 other than a comment or whitespace, it stops, leaving point at the
 690 place where it stopped.  This includes (for instance) finding the end
 691 of a comment when moving forward and expecting the beginning of one.
 692 The function also stops immediately after moving over the specified
 693 number of complete comments.  If @var{count} comments are found as
 694 expected, with nothing except whitespace between them, it returns
 695 @code{t}; otherwise it returns @code{nil}.
 696
 697 This function cannot tell whether the ``comments'' it traverses are
 698 embedded within a string.  If they look like comments, it treats them
 699 as comments.
 700
 701 To move forward over all comments and whitespace following point, use
 702 @code{(forward-comment (buffer-size))}.  @code{(buffer-size)} is a
 703 good argument to use, because the number of comments in the buffer
 704 cannot exceed that many.
 705 @end defun
 706
 707 @node Position Parse
 708 @subsection Finding the Parse State for a Position
 709
 710   For syntactic analysis, such as in indentation, often the useful
 711 thing is to compute the syntactic state corresponding to a given buffer
 712 position.  This function does that conveniently.
 713
 714 @defun syntax-ppss &optional pos
 715 This function returns the parser state that the parser would reach at
 716 position @var{pos} starting from the beginning of the buffer.
 717 @iftex
 718 See the next section for
 719 @end iftex
 720 @ifnottex
 721 @xref{Parser State},
 722 @end ifnottex
 723 for a description of the parser state.
 724
 725 The return value is the same as if you call the low-level parsing
 726 function @code{parse-partial-sexp} to parse from the beginning of the
 727 buffer to @var{pos} (@pxref{Low-Level Parsing}).  However,
 728 @code{syntax-ppss} uses a cache to speed up the computation.  Due to
 729 this optimization, the second value (previous complete subexpression)
 730 and sixth value (minimum parenthesis depth) in the returned parser
 731 state are not meaningful.
 732
 733 This function has a side effect: it adds a buffer-local entry to
 734 @code{before-change-functions} (@pxref{Change Hooks}) for
 735 @code{syntax-ppss-flush-cache} (see below).  This entry keeps the
 736 cache consistent as the buffer is modified.  However, the cache might
 737 not be updated if @code{syntax-ppss} is called while
 738 @code{before-change-functions} is temporarily let-bound, or if the
 739 buffer is modified without running the hook, such as when using
 740 @code{inhibit-modification-hooks}.  In those cases, it is necessary to
 741 call @code{syntax-ppss-flush-cache} explicitly.
 742 @end defun
 743
 744 @defun syntax-ppss-flush-cache beg &rest ignored-args
 745 This function flushes the cache used by @code{syntax-ppss}, starting
 746 at position @var{beg}.  The remaining arguments, @var{ignored-args},
 747 are ignored; this function accepts them so that it can be directly
 748 used on hooks such as @code{before-change-functions} (@pxref{Change
 749 Hooks}).
 750 @end defun
 751
 752   Major modes can make @code{syntax-ppss} run faster by specifying
 753 where it needs to start parsing.
 754
 755 @defvar syntax-begin-function
 756 If this is non-@code{nil}, it should be a function that moves to an
 757 earlier buffer position where the parser state is equivalent to
 758 @code{nil}---in other words, a position outside of any comment,
 759 string, or parenthesis.  @code{syntax-ppss} uses it to further
 760 optimize its computations, when the cache gives no help.
 761 @end defvar
 762
 763 @node Parser State
 764 @subsection Parser State
 765 @cindex parser state
 766
 767   A @dfn{parser state} is a list of ten elements describing the state
 768 of the syntactic parser, after it parses the text between a specified
 769 starting point and a specified end point in the buffer.  Parsing
 770 functions such as @code{syntax-ppss}
 771 @ifnottex
 772 (@pxref{Position Parse})
 773 @end ifnottex
 774 return a parser state as the value.  Some parsing functions accept a
 775 parser state as an argument, for resuming parsing.
 776
 777   Here are the meanings of the elements of the parser state:
 778
 779 @enumerate 0
 780 @item
 781 The depth in parentheses, counting from 0.  @strong{Warning:} this can
 782 be negative if there are more close parens than open parens between
 783 the parser's starting point and end point.
 784
 785 @item
 786 @cindex innermost containing parentheses
 787 The character position of the start of the innermost parenthetical
 788 grouping containing the stopping point; @code{nil} if none.
 789
 790 @item
 791 @cindex previous complete subexpression
 792 The character position of the start of the last complete subexpression
 793 terminated; @code{nil} if none.
 794
 795 @item
 796 @cindex inside string
 797 Non-@code{nil} if inside a string.  More precisely, this is the
 798 character that will terminate the string, or @code{t} if a generic
 799 string delimiter character should terminate it.
 800
 801 @item
 802 @cindex inside comment
 803 @code{t} if inside a non-nestable comment (of any comment style;
 804 @pxref{Syntax Flags}); or the comment nesting level if inside a
 805 comment that can be nested.
 806
 807 @item
 808 @cindex quote character
 809 @code{t} if the end point is just after a quote character.
 810
 811 @item
 812 The minimum parenthesis depth encountered during this scan.
 813
 814 @item
 815 What kind of comment is active: @code{nil} if not in a comment or in a
 816 comment of style @samp{a}; 1 for a comment of style @samp{b}; 2 for a
 817 comment of style @samp{c}; and @code{syntax-table} for a comment that
 818 should be ended by a generic comment delimiter character.
 819
 820 @item
 821 The string or comment start position.  While inside a comment, this is
 822 the position where the comment began; while inside a string, this is the
 823 position where the string began.  When outside of strings and comments,
 824 this element is @code{nil}.
 825
 826 @item
 827 Internal data for continuing the parsing.  The meaning of this
 828 data is subject to change; it is used if you pass this list
 829 as the @var{state} argument to another call.
 830 @end enumerate
 831
 832   Elements 1, 2, and 6 are ignored in a state which you pass as an
 833 argument to continue parsing, and elements 8 and 9 are used only in
 834 trivial cases.  Those elements are mainly used internally by the
 835 parser code.
 836
 837   One additional piece of useful information is available from a
 838 parser state using this function:
 839
 840 @defun syntax-ppss-toplevel-pos state
 841 This function extracts, from parser state @var{state}, the last
 842 position scanned in the parse which was at top level in grammatical
 843 structure.  ``At top level'' means outside of any parentheses,
 844 comments, or strings.
 845
 846 The value is @code{nil} if @var{state} represents a parse which has
 847 arrived at a top level position.
 848 @end defun
 849
 850 @node Low-Level Parsing
 851 @subsection Low-Level Parsing
 852
 853   The most basic way to use the expression parser is to tell it
 854 to start at a given position with a certain state, and parse up to
 855 a specified end position.
 856
 857 @defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment
 858 This function parses a sexp in the current buffer starting at
 859 @var{start}, not scanning past @var{limit}.  It stops at position
 860 @var{limit} or when certain criteria described below are met, and sets
 861 point to the location where parsing stops.  It returns a parser state
 862 describing the status of the parse at the point where it stops.
 863
 864 @cindex parenthesis depth
 865 If the third argument @var{target-depth} is non-@code{nil}, parsing
 866 stops if the depth in parentheses becomes equal to @var{target-depth}.
 867 The depth starts at 0, or at whatever is given in @var{state}.
 868
 869 If the fourth argument @var{stop-before} is non-@code{nil}, parsing
 870 stops when it comes to any character that starts a sexp.  If
 871 @var{stop-comment} is non-@code{nil}, parsing stops when it comes to the
 872 start of a comment.  If @var{stop-comment} is the symbol
 873 @code{syntax-table}, parsing stops after the start of a comment or a
 874 string, or the end of a comment or a string, whichever comes first.
 875
 876 If @var{state} is @code{nil}, @var{start} is assumed to be at the top
 877 level of parenthesis structure, such as the beginning of a function
 878 definition.  Alternatively, you might wish to resume parsing in the
 879 middle of the structure.  To do this, you must provide a @var{state}
 880 argument that describes the initial status of parsing.  The value
 881 returned by a previous call to @code{parse-partial-sexp} will do
 882 nicely.
 883 @end defun
 884
 885 @node Control Parsing
 886 @subsection Parameters to Control Parsing
 887
 888 @defvar multibyte-syntax-as-symbol
 889 If this variable is non-@code{nil}, @code{scan-sexps} treats all
 890 non-@acronym{ASCII} characters as symbol constituents regardless
 891 of what the syntax table says about them.  (However, text properties
 892 can still override the syntax.)
 893 @end defvar
 894
 895 @defopt parse-sexp-ignore-comments
 896 @cindex skipping comments
 897 If the value is non-@code{nil}, then comments are treated as
 898 whitespace by the functions in this section and by @code{forward-sexp},
 899 @code{scan-lists} and @code{scan-sexps}.
 900 @end defopt
 901
 902 @vindex parse-sexp-lookup-properties
 903 The behavior of @code{parse-partial-sexp} is also affected by
 904 @code{parse-sexp-lookup-properties} (@pxref{Syntax Properties}).
 905
 906 You can use @code{forward-comment} to move forward or backward over
 907 one comment or several comments.
 908
 909 @node Standard Syntax Tables
 910 @section Some Standard Syntax Tables
 911
 912   Most of the major modes in Emacs have their own syntax tables.  Here
 913 are several of them:
 914
 915 @defun standard-syntax-table
 916 This function returns the standard syntax table, which is the syntax
 917 table used in Fundamental mode.
 918 @end defun
 919
 920 @defvar text-mode-syntax-table
 921 The value of this variable is the syntax table used in Text mode.
 922 @end defvar
 923
 924 @defvar c-mode-syntax-table
 925 The value of this variable is the syntax table for C-mode buffers.
 926 @end defvar
 927
 928 @defvar emacs-lisp-mode-syntax-table
 929 The value of this variable is the syntax table used in Emacs Lisp mode
 930 by editing commands.  (It has no effect on the Lisp @code{read}
 931 function.)
 932 @end defvar
 933
 934 @node Syntax Table Internals
 935 @section Syntax Table Internals
 936 @cindex syntax table internals
 937
 938   Lisp programs don't usually work with the elements directly; the
 939 Lisp-level syntax table functions usually work with syntax descriptors
 940 (@pxref{Syntax Descriptors}).  Nonetheless, here we document the
 941 internal format.  This format is used mostly when manipulating
 942 syntax properties.
 943
 944   Each element of a syntax table is a cons cell of the form
 945 @code{(@var{syntax-code} . @var{matching-char})}.  The @sc{car},
 946 @var{syntax-code}, is an integer that encodes the syntax class, and any
 947 flags.  The @sc{cdr}, @var{matching-char}, is non-@code{nil} if
 948 a character to match was specified.
 949
 950   This table gives the value of @var{syntax-code} which corresponds
 951 to each syntactic type.
 952
 953 @multitable @columnfractions .05 .3 .3 .31
 954 @item
 955 @tab
 956 @i{Integer} @i{Class}
 957 @tab
 958 @i{Integer} @i{Class}
 959 @tab
 960 @i{Integer} @i{Class}
 961 @item
 962 @tab
 963 0 @ @  whitespace
 964 @tab
 965 5 @ @  close parenthesis
 966 @tab
 967 10 @ @  character quote
 968 @item
 969 @tab
 970 1 @ @  punctuation
 971 @tab
 972 6 @ @  expression prefix
 973 @tab
 974 11 @ @  comment-start
 975 @item
 976 @tab
 977 2 @ @  word
 978 @tab
 979 7 @ @  string quote
 980 @tab
 981 12 @ @  comment-end
 982 @item
 983 @tab
 984 3 @ @  symbol
 985 @tab
 986 8 @ @  paired delimiter
 987 @tab
 988 13 @ @  inherit
 989 @item
 990 @tab
 991 4 @ @  open parenthesis
 992 @tab
 993 9 @ @  escape
 994 @tab
 995 14 @ @  generic comment
 996 @item
 997 @tab
 998 15 @  generic string
 999 @end multitable
1000
1001   For example, the usual syntax value for @samp{(} is @code{(4 . 41)}.
1002 (41 is the character code for @samp{)}.)
1003
1004   The flags are encoded in higher order bits, starting 16 bits from the
1005 least significant bit.  This table gives the power of two which
1006 corresponds to each syntax flag.
1007
1008 @multitable @columnfractions .05 .3 .3 .3
1009 @item
1010 @tab
1011 @i{Prefix} @i{Flag}
1012 @tab
1013 @i{Prefix} @i{Flag}
1014 @tab
1015 @i{Prefix} @i{Flag}
1016 @item
1017 @tab
1018 @samp{1} @ @  @code{(lsh 1 16)}
1019 @tab
1020 @samp{4} @ @  @code{(lsh 1 19)}
1021 @tab
1022 @samp{b} @ @  @code{(lsh 1 21)}
1023 @item
1024 @tab
1025 @samp{2} @ @  @code{(lsh 1 17)}
1026 @tab
1027 @samp{p} @ @  @code{(lsh 1 20)}
1028 @tab
1029 @samp{n} @ @  @code{(lsh 1 22)}
1030 @item
1031 @tab
1032 @samp{3} @ @  @code{(lsh 1 18)}
1033 @end multitable
1034
1035 @defun string-to-syntax @var{desc}
1036 This function returns the internal form corresponding to the syntax
1037 descriptor @var{desc}, a cons cell @code{(@var{syntax-code}
1038 . @var{matching-char})}.
1039 @end defun
1040
1041 @defun syntax-after pos
1042 This function returns the syntax code of the character in the buffer
1043 after position @var{pos}, taking account of syntax properties as well
1044 as the syntax table.  If @var{pos} is outside the buffer's accessible
1045 portion (@pxref{Narrowing, accessible portion}), this function returns
1046 @code{nil}.
1047 @end defun
1048
1049 @defun syntax-class syntax
1050 This function returns the syntax class of the syntax code
1051 @var{syntax}.  (It masks off the high 16 bits that hold the flags
1052 encoded in the syntax descriptor.)  If @var{syntax} is @code{nil}, it
1053 returns @code{nil}; this is so evaluating the expression
1054
1055 @example
1056 (syntax-class (syntax-after pos))
1057 @end example
1058
1059 @noindent
1060 where @code{pos} is outside the buffer's accessible portion, will
1061 yield @code{nil} without throwing errors or producing wrong syntax
1062 class codes.
1063 @end defun
1064
1065 @node Categories
1066 @section Categories
1067 @cindex categories of characters
1068 @cindex character categories
1069
1070   @dfn{Categories} provide an alternate way of classifying characters
1071 syntactically.  You can define several categories as needed, then
1072 independently assign each character to one or more categories.  Unlike
1073 syntax classes, categories are not mutually exclusive; it is normal for
1074 one character to belong to several categories.
1075
1076 @cindex category table
1077   Each buffer has a @dfn{category table} which records which categories
1078 are defined and also which characters belong to each category.  Each
1079 category table defines its own categories, but normally these are
1080 initialized by copying from the standard categories table, so that the
1081 standard categories are available in all modes.
1082
1083   Each category has a name, which is an @acronym{ASCII} printing character in
1084 the range @w{@samp{ }} to @samp{~}.  You specify the name of a category
1085 when you define it with @code{define-category}.
1086
1087   The category table is actually a char-table (@pxref{Char-Tables}).
1088 The element of the category table at index @var{c} is a @dfn{category
1089 set}---a bool-vector---that indicates which categories character @var{c}
1090 belongs to.  In this category set, if the element at index @var{cat} is
1091 @code{t}, that means category @var{cat} is a member of the set, and that
1092 character @var{c} belongs to category @var{cat}.
1093
1094 For the next three functions, the optional argument @var{table}
1095 defaults to the current buffer's category table.
1096
1097 @defun define-category char docstring &optional table
1098 This function defines a new category, with name @var{char} and
1099 documentation @var{docstring}, for the category table @var{table}.
1100
1101 Here's an example of defining a new category for characters that have
1102 strong right-to-left directionality (@pxref{Bidirectional Display})
1103 and using it in a special category table:
1104
1105 @example
1106 (defvar special-category-table-for-bidi
1107   (let ((category-table (make-category-table))
1108         (uniprop-table (unicode-property-table-internal 'bidi-class)))
1109     (define-category ?R "Characters of bidi-class R, AL, or RLO"
1110                      category-table)
1111     (map-char-table
1112      #'(lambda (key val)
1113          (if (memq val '(R AL RLO))
1114              (modify-category-entry key ?R category-table)))
1115      uniprop-table)
1116     category-table))
1117 @end example
1118 @end defun
1119
1120 @defun category-docstring category &optional table
1121 This function returns the documentation string of category @var{category}
1122 in category table @var{table}.
1123
1124 @example
1125 (category-docstring ?a)
1126      @result{} "ASCII"
1127 (category-docstring ?l)
1128      @result{} "Latin"
1129 @end example
1130 @end defun
1131
1132 @defun get-unused-category &optional table
1133 This function returns a category name (a character) which is not
1134 currently defined in @var{table}.  If all possible categories are in use
1135 in @var{table}, it returns @code{nil}.
1136 @end defun
1137
1138 @defun category-table
1139 This function returns the current buffer's category table.
1140 @end defun
1141
1142 @defun category-table-p object
1143 This function returns @code{t} if @var{object} is a category table,
1144 otherwise @code{nil}.
1145 @end defun
1146
1147 @defun standard-category-table
1148 This function returns the standard category table.
1149 @end defun
1150
1151 @defun copy-category-table &optional table
1152 This function constructs a copy of @var{table} and returns it.  If
1153 @var{table} is not supplied (or is @code{nil}), it returns a copy of the
1154 standard category table.  Otherwise, an error is signaled if @var{table}
1155 is not a category table.
1156 @end defun
1157
1158 @defun set-category-table table
1159 This function makes @var{table} the category table for the current
1160 buffer.  It returns @var{table}.
1161 @end defun
1162
1163 @defun make-category-table
1164 This creates and returns an empty category table.  In an empty category
1165 table, no categories have been allocated, and no characters belong to
1166 any categories.
1167 @end defun
1168
1169 @defun make-category-set categories
1170 This function returns a new category set---a bool-vector---whose initial
1171 contents are the categories listed in the string @var{categories}.  The
1172 elements of @var{categories} should be category names; the new category
1173 set has @code{t} for each of those categories, and @code{nil} for all
1174 other categories.
1175
1176 @example
1177 (make-category-set "al")
1178      @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1179 @end example
1180 @end defun
1181
1182 @defun char-category-set char
1183 This function returns the category set for character @var{char} in the
1184 current buffer's category table.  This is the bool-vector which
1185 records which categories the character @var{char} belongs to.  The
1186 function @code{char-category-set} does not allocate storage, because
1187 it returns the same bool-vector that exists in the category table.
1188
1189 @example
1190 (char-category-set ?a)
1191      @result{} #&128"\0\0\0\0\0\0\0\0\0\0\0\0\2\20\0\0"
1192 @end example
1193 @end defun
1194
1195 @defun category-set-mnemonics category-set
1196 This function converts the category set @var{category-set} into a string
1197 containing the characters that designate the categories that are members
1198 of the set.
1199
1200 @example
1201 (category-set-mnemonics (char-category-set ?a))
1202      @result{} "al"
1203 @end example
1204 @end defun
1205
1206 @defun modify-category-entry char category &optional table reset
1207 This function modifies the category set of @var{char} in category
1208 table @var{table} (which defaults to the current buffer's category
1209 table).  @var{char} can be a character, or a cons cell of the form
1210 @code{(@var{min} . @var{max})}; in the latter case, the function
1211 modifies the category sets of all characters in the range between
1212 @var{min} and @var{max}, inclusive.
1213
1214 Normally, it modifies a category set by adding @var{category} to it.
1215 But if @var{reset} is non-@code{nil}, then it deletes @var{category}
1216 instead.
1217 @end defun
1218
1219 @deffn Command describe-categories &optional buffer-or-name
1220 This function describes the category specifications in the current
1221 category table.  It inserts the descriptions in a buffer, and then
1222 displays that buffer.  If @var{buffer-or-name} is non-@code{nil}, it
1223 describes the category table of that buffer instead.
1224 @end deffn