Angstrom
Parser combinators built for speed and memory-efficiency.
Angstrom is a parser-combinator library that provides monadic and applicative interfaces for constructing parsers with unbounded lookahead. Its parsers can consume input incrementally, whether in a blocking or non-blocking environment. To achieve efficient incremental parsing, Angstrom offers both a buffered and unbuffered interface to input streams, with the Unbuffered
interface enabling zero-copy IO. With these features and low-level iteration parser primitives like take_while
and skip_while
, Angstrom makes it easy to write efficient, expressive, and reusable parsers suitable for high-performance applications.
type bigstring =
(char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t
val peek_char : char option t
peek_char
accepts any char and returns it, or returns None
if the end of input has been reached.
This parser does not advance the input. Use it for lookahead.
val peek_char_fail : char t
peek_char_fail
accepts any char and returns it. If end of input has been reached, it will fail.
This parser does not advance the input. Use it for lookahead.
val peek_string : int -> string t
peek_string n
accepts exactly n
characters and returns them as a string. If there is not enough input, it will fail.
This parser does not advance the input. Use it for lookahead.
val char : char -> char t
char c
accepts c
and returns it.
val not_char : char -> char t
not_char
accepts any character that is not c
and returns the matched character.
val any_char : char t
any_char
accepts any character and returns it.
val satisfy : (char -> bool) -> char t
satisfy f
accepts any character for which f
returns true
and returns the accepted character. In the case that none of the parser succeeds, then the parser will fail indicating the offending character.
val string : string -> string t
string s
accepts s
exactly and returns it.
val string_ci : string -> string t
string_ci s
accepts s
, ignoring case, and returns the matched string, preserving the case of the original input.
val skip : (char -> bool) -> unit t
skip f
accepts any character for which f
returns true
and discards the accepted character. skip f
is equivalent to satisfy f
but discards the accepted character.
val skip_while : (char -> bool) -> unit t
skip_while f
accepts input as long as f
returns true
and discards the accepted characters.
val take : int -> string t
take n
accepts exactly n
characters of input and returns them as a string.
val take_while : (char -> bool) -> string t
take_while f
accepts input as long as f
returns true
and returns the accepted characters as a string.
This parser does not fail. If f
returns false
on the first character, it will return the empty string.
val take_while1 : (char -> bool) -> string t
take_while1 f
accepts input as long as f
returns true
and returns the accepted characters as a string.
This parser requires that f
return true
for at least one character of input, and will fail otherwise.
val take_till : (char -> bool) -> string t
take_till f
accepts input as long as f
returns false
and returns the accepted characters as a string.
This parser does not fail. If f
returns true
on the first character, it will return the empty string.
consumed p
runs p
and returns the contents that were consumed during the parsing as a string
take_bigstring n
accepts exactly n
characters of input and returns them as a newly allocated bigstring.
take_bigstring_while f
accepts input as long as f
returns true
and returns the accepted characters as a newly allocated bigstring.
This parser does not fail. If f
returns false
on the first character, it will return the empty bigstring.
take_bigstring_while1 f
accepts input as long as f
returns true
and returns the accepted characters as a newly allocated bigstring.
This parser requires that f
return true
for at least one character of input, and will fail otherwise.
take_bigstring_till f
accepts input as long as f
returns false
and returns the accepted characters as a newly allocated bigstring.
This parser does not fail. If f
returns true
on the first character, it will return the empty bigstring.
consumed p
runs p
and returns the contents that were consumed during the parsing as a bigstring
val advance : int -> unit t
advance n
advances the input n
characters, failing if the remaining input is less than n
.
val end_of_line : unit t
end_of_line
accepts either a line feed \n
, or a carriage return followed by a line feed \r\n
and returns unit.
val at_end_of_input : bool t
at_end_of_input
returns whether the end of the end of input has been reached. This parser always succeeds.
val end_of_input : unit t
end_of_input
succeeds if all the input has been consumed, and fails otherwise.
val scan : 'state -> ('state -> char -> 'state option) -> (string * 'state) t
scan init f
consumes until f
returns None
. Returns the final state before None
and the accumulated string
val scan_state : 'state -> ('state -> char -> 'state option) -> 'state t
val scan_string : 'state -> ('state -> char -> 'state option) -> string t
scan_string init f
is like scan
but discards the final state and returns the accumulated string.
val int8 : int -> int t
int8 i
accepts one byte that matches the lower-order byte of i
and returns unit.
val any_uint8 : int t
any_uint8
accepts any byte and returns it as an unsigned int8.
val any_int8 : int t
any_int8
accepts any byte and returns it as a signed int8.
module BE : sig ... end
Big endian parsers
module LE : sig ... end
Little endian parsers
option v p
runs p
, returning the result of p
if it succeeds and v
if it fails.
both p q
runs p
followed by q
and returns both results in a tuple
list ps
runs each p
in ps
in sequence, returning a list of results of each p
.
many p
runs p
zero or more times and returns a list of results from the runs of p
.
many1 p
runs p
one or more times and returns a list of results from the runs of p
.
many_till p e
runs parser p
zero or more times until action e
succeeds and returns the list of result from the runs of p
.
sep_by s p
runs p
zero or more times, interspersing runs of s
in between.
sep_by1 s p
runs p
one or more times, interspersing runs of s
in between.
fix f
computes the fixpoint of f
and runs the resultant parser. The argument that f
receives is the result of fix f
, which f
must use, paradoxically, to define fix f
.
fix
is useful when constructing parsers for inductively-defined types such as sequences, trees, etc. Consider for example the implementation of the many
combinator defined in this library:
let many p =
fix (fun m ->
(cons <$> p <*> m) <|> return [])
many p
is a parser that will run p
zero or more times, accumulating the result of every run into a list, returning the result. It's defined by passing fix
a function. This function assumes its argument m
is a parser that behaves exactly like many p
. You can see this in the expression comprising the left hand side of the alternative operator <|>
. This expression runs the parser p
followed by the parser m
, and after which the result of p
is cons'd onto the list that m
produces. The right-hand side of the alternative operator provides a base case for the combinator: if p
fails and the parse cannot proceed, return an empty list.
Another way to illustrate the uses of fix
is to construct a JSON parser. Assuming that parsers exist for the basic types such as false
, true
, null
, strings, and numbers, the question then becomes how to define a parser for objects and arrays? Both contain values that are themselves JSON values, so it seems as though it's impossible to write a parser that will accept JSON objects and arrays before writing a parser for JSON values as a whole.
This is the exact situation that fix
was made for. By defining the parsers for arrays and objects within the function that you pass to fix
, you will gain access to a parser that you can use to parse JSON values, the very parser you are defining!
let json =
fix (fun json ->
let arr = char '[' *> sep_by (char ',') json <* char ']' in
let obj = char '{' *> ... json ... <* char '}' in
choice [str; num; arr json, ...])
p <|> q
runs p
and returns the result if succeeds. If p
fails, then the input will be reset and q
will run instead.
choice ?failure_msg ts
runs each parser in ts
in order until one succeeds and returns that result. In the case that none of the parser succeeds, then the parser will fail with the message failure_msg
, if provided, or a much less informative message otherwise.
p <?> name
associates name
with the parser p
, which will be reported in the case of failure.
val commit : unit t
commit
prevents backtracking beyond the current position of the input, allowing the manager of the input buffer to reuse the preceding bytes for other purposes.
The Unbuffered
parsing interface will report directly to the caller the number of bytes committed to the when returning a Unbuffered.state.Partial
state, allowing the caller to reuse those bytes for any purpose. The Buffered
will keep track of the region of committed bytes in its internal buffer and reuse that region to store additional input when necessary.
val return : 'a -> 'a t
return v
creates a parser that will always succeed and return v
val fail : string -> _ t
fail msg
creates a parser that will always fail with the message msg
p >>= f
creates a parser that will run p
, pass its result to f
, run the parser that f
produces, and return its result.
p >>| f
creates a parser that will run p
, and if it succeeds with result v
, will return f v
p *> q
runs p
, discards its result and then runs q
, and returns its result.
p <* q
runs p
, then runs q
, discards its result, and returns the result of p
.
The liftn
family of functions promote functions to the parser monad. For any of these functions, the following equivalence holds:
liftn f p1 ... pn = f <$> p1 <*> ... <*> pn
These functions are more efficient than using the applicative interface directly, mostly in terms of memory allocation but also in terms of speed. Prefer them over the applicative interface, even when the arity of the function to be lifted exceeds the maximum n
for which there is an implementation for liftn
. In other words, if f
has an arity of 5
but only lift4
is provided, do the following:
lift4 f m1 m2 m3 m4 <*> m5
Even with the partial application, it will be more efficient than the applicative implementation.
The mapn
family of functions are just like liftn
, with a slightly different interface.
module Let_syntax : sig ... end
The Let_syntax
module is intended to be used with the ppx_let
pre-processor, and just contains copies of functions described elsewhere.
module Unsafe : sig ... end
Unsafe Operations on Angstrom's Internal Buffer
module Consume : sig ... end
parse_bigstring ~consume t bs
runs t
on bs
. The parser will receive an `Eof
after all of bs
has been consumed. Passing Prefix
in the consume
argument allows the parse to successfully complete without reaching eof. To require the parser to reach eof, pass All
in the consume
argument.
For use-cases requiring that the parser be fed input incrementally, see the Buffered
and Unbuffered
modules below.
parse_string ~consume t bs
runs t
on bs
. The parser will receive an `Eof
after all of bs
has been consumed. Passing Prefix
in the consume
argument allows the parse to successfully complete without reaching eof. To require the parser to reach eof, pass All
in the consume
argument.
For use-cases requiring that the parser be fed input incrementally, see the Buffered
and Unbuffered
modules below.
module Buffered : sig ... end
Buffered parsing interface.
module Unbuffered : sig ... end
Unbuffered parsing interface.
For people that know what they're doing. If you want to use them, read the code. No further documentation will be provided.
val pos : int t
val available : int t