String.Escaping
Operations for escaping and unescaping strings, with parameterized escape and escapeworthy characters. Escaping/unescaping using this module is more efficient than using Pcre. Benchmark code can be found in core/benchmarks/string_escaping.ml.
val escape_gen_exn :
escapeworthy_map:(char * char) list ->
escape_char:char ->
(string -> string) Staged.t
escape_gen_exn escapeworthy_map escape_char
returns a function that will escape a string s
as follows: if (c1,c2)
is in escapeworthy_map
, then all occurrences of c1
are replaced by escape_char
concatenated to c2
.
Raises an exception if escapeworthy_map
is not one-to-one. If escape_char
is not in escapeworthy_map
, then it will be escaped to itself.
val escape_gen :
escapeworthy_map:(char * char) list ->
escape_char:char ->
(string -> string) Or_error.t
val escape :
escapeworthy:char list ->
escape_char:char ->
(string -> string) Staged.t
escape ~escapeworthy ~escape_char s
is
escape_gen_exn ~escapeworthy_map:(List.zip_exn escapeworthy escapeworthy)
~escape_char
Duplicates and escape_char
will be removed from escapeworthy
. So, no exception will be raised
val unescape_gen_exn :
escapeworthy_map:(char * char) list ->
escape_char:char ->
(string -> string) Staged.t
unescape_gen_exn
is the inverse operation of escape_gen_exn
. That is,
let escape = Staged.unstage (escape_gen_exn ~escapeworthy_map ~escape_char) in
let unescape = Staged.unstage (unescape_gen_exn ~escapeworthy_map ~escape_char) in
assert (s = unescape (escape s))
always succeed when ~escapeworthy_map is not causing exceptions.
val unescape_gen :
escapeworthy_map:(char * char) list ->
escape_char:char ->
(string -> string) Or_error.t
val unescape : escape_char:char -> (string -> string) Staged.t
unescape ~escape_char
is defined as unescape_gen_exn ~map:[] ~escape_char
Any char in an escaped string is either escaping, escaped, or literal. For example, for escaped string "0_a0__0"
with escape_char
as '_'
, pos 1 and 4 are escaping, 2 and 5 are escaped, and the rest are literal.
is_char_escaping s ~escape_char pos
returns true if the char at pos
is escaping, false otherwise.
is_char_escaped s ~escape_char pos
returns true if the char at pos
is escaped, false otherwise.
is_char_literal s ~escape_char pos
returns true if the char at pos
is not escaped or escaping.
index s ~escape_char char
finds the first literal (not escaped) instance of char
in s starting from 0.
rindex s ~escape_char char
finds the first literal (not escaped) instance of char
in s
starting from the end of s
and proceeding towards 0.
index_from s ~escape_char pos char
finds the first literal (not escaped) instance of char
in s
starting from pos
and proceeding towards the end of s
.
rindex_from s ~escape_char pos char
finds the first literal (not escaped) instance of char
in s
starting from pos
and towards 0.
split s ~escape_char ~on
returns a list of substrings of s
that are separated by literal versions of on
. Consecutive on
characters will cause multiple empty strings in the result. Splitting the empty string returns a list of the empty string, not the empty list.
E.g., split ~escape_char:'_' ~on:',' "foo,bar_,baz" = ["foo"; "bar_,baz"]
.
split_on_chars s ~on
returns a list of all substrings of s
that are separated by one of the literal chars from on
. on
are not grouped. So a grouping of on
in the source string will produce multiple empty string splits in the result.
E.g., split_on_chars ~escape_char:'_' ~on:[',';'|'] "foo_|bar,baz|0" ->
["foo_|bar"; "baz"; "0"]
.
lsplit2 s ~on ~escape_char
splits s into a pair on the first literal instance of on
(meaning the first unescaped instance) starting from the left.
rsplit2 s ~on ~escape_char
splits s
into a pair on the first literal instance of on
(meaning the first unescaped instance) starting from the right.
These are the same as lstrip
, rstrip
, and strip
for generic strings, except that they only drop literal characters -- they do not drop characters that are escaping or escaped. This makes sense if you're trying to get rid of junk whitespace (for example), because escaped whitespace seems more likely to be deliberate and not junk.