gov.nasa.jpl.aerie.json (parsing-utilities API)

package gov.nasa.jpl.aerie.json

A toolkit for authoring bidirectional JSON parsers with parser combinators.

This library provides the basic building blocks for defining serialization formats for any Java type, together with "parsers" which can convert both to and from these formats. A format is defined grammatically, by composing simpler pre-existing formats into more complex assemblies with the help of general-purpose combiners. We prioritize ease-of-use and expressivity over efficiency: parsers defined with this library may be slower than those defined with other libraries, but our goal is for them to be more maintainable in certain dimensions than alternatives.

We take the position that serialization is a separate and orthogonal concern from domain modeling. Domain types ought not favor a particular mode of serialization, as business needs can easily change over time. Formats can be authored using this library without any need to modify the domain types they operate on. Moreover, where other libraries may infer details about the serialization format by inspecting the class definitions of domain types, we accept some potential overhead in repeating ourselves to cleanly separate the concern of what developers call a field from what clients call it -- and whether their data is structually organized the same way at all.

We eschew reflection and convention in favor of explicit control over how Java values are modeled in JSON. Other libraries may require little to no configuration by relying on reflection and convention, but needs outside the happy path begin to look very different (and ad-hoc) from where you started. This library tries to provide a uniform and consistent authoring experience: if you can author a simple format, odds are you know everything you need to author a much more complex format.

At the same time, we provide a flexible foundation for building custom parsing logic. It is entirely possible to define a reflection-based parser as a custom implementation of JsonParser, then use it as a building block just like any of the provided parsers. Custom needs of any kind need only implement that interface.

Lastly, we do not parse JSON documents out of strings, but rather work with values of type javax.json.JsonValue. Any library that produces and consumes these values can be used to bridge the last gap from this library to the filesystem or network.

Defining a format

As a running example, consider the type of expressions below, which models a DSL of operations on integers and strings. (Note in particular that expressions are classified into integer-valued expressions and string-valued expressions.) This DSL may admit multiple serialized representations, such as a traditional infix representation (#(1 + 1) .. "3") and a tree-formatted JSON representation, so we do not want to privilege one representation by implementing it as a method on the expression type itself.

public sealed interface Expr<T> {
  record Num(int value) implements Expr<Integer> {}
  record Negate(Expr<Integer> operand) implements Expr<Integer> {}
  record Add(Expr<Integer> left, Expr<Integer> right) implements Expr<Integer> {}

  record Str(String value) implements Expr<String> {}
  record ToString(Expr<Integer> operand) implements Expr<String> {}
  record Concat(Expr<String> left, Expr<String> right) implements Expr<String> {}
}

A JSON format for this type must capture the top-level alternatives amongst the kinds of expression, the mid-level group of fields within each alternative, and the base-level recursion back to the top. In fact, most types can be thought of as a sum (alternatives) of products (fields) of other types; when the "other type" is our original type, we have recursion. Often, there is only one option or only one field, so this hierarchy simplifies for many types.

Because almost any compound type can be modeled as a sum of products, this library provides general-purpose combiners for describing formats that follow this structure. You may build custom combiners for special needs, or even build parsers without using combiners at all -- especially when working with types that do not break down into independent pieces in this way -- but the provided combiners should be useful in most cases.

Now, let's see how to build up a parser for our Expr type.

final JsonParser<Expr.Num> numP
  = intP
  . map(Expr.Num::new, Expr.Num::value);

final JsonParser<Expr.Str> strP
  = stringP
  . map(Expr.Str::new, Expr.Str::value));

The BasicParsers.intP and BasicParsers.stringP parsers are provided by BasicParsers, and can be statically imported for brevity. They work with the Integer and String type, respectively. In order to adapt these to our custom Expr subclasses, we use the JsonParser.map(gov.nasa.jpl.aerie.json.Convert<T, S>) helper method, which takes two functions: a conversion to the new type from the current type, and a conversion from the new type back to the current type. Here, we are only constructing and deconstructing a wrapper around a single value.

static JsonObjectParser<Expr.Negate> negateP(final JsonParser<Expr<Integer>> integerExprP) {
  return productP
      . field("operand", integerExprP)
      . map(Expr.Negate::new, Expr.Negate::operand);
}

static JsonObjectParser<Expr.ToString> toStringP(final JsonParser<Expr<Integer>> integerExprP) {
  return productP
      . field("operand", integerExprP)
      . map(Expr.ToString::new, Expr.ToString::operand);
}

Our next two parsers depend on a parser we haven't defined yet -- the top-level integer expression and string expression parsers. Since the top-level parsers, in turn, depend on these individual parsers, we will have a cyclic dependency to handle no matter where we started. Instead of defining these parsers immediately, we defer their construction until later, passing it the top-level parser as an argument once we have it. (We'll see how to close the cycle momentarily.)

In addition, notice that we are using the BasicParsers.productP combiner here -- which specifies a JSON object whose fields are described by other parsers -- rather than using integerExprP directly. There are two reasons for this! First, we want parsers to be "productive", which means that they should consume some part of the input before descending into a subparser. This is not always a hard-and-fast rule, but since our grammar is recursive, we must make progress on the input before cycling back to the same point in the grammar. Otherwise, we will have an infinite loop on our hands!

The other reason not to use `integerExprP` directly is because the operators described by these parsers are simply two of many, and we need a way to distinguish these options from the others. When we collect these parsers together into one parser of alternatives, we will extend them with an additional "op" field taking on a unique value. Notice that these methods return a JsonObjectParser rather than the more generic JsonParser: the former allows the format to be extended with additional fields.

static JsonParser<Expr<Integer>> integerExprP(final JsonParser<Expr<Integer>> integerExprP) {
  @SuppressWarnings("unchecked")
  final var intExprClass = (Class<Expr<Integer>>) (Object) Expr.class;

  return chooseP(
      numP,
      sumP("op", intExprClass, List.of(
          new Variant<>("+", Expr.Add.class, addP(integerExprP)),
          new Variant<>("-", Expr.Negate.class, negateP(integerExprP))
      )));
}

final JsonParser<Expr<Integer>> integerExprP
    = recursiveP(selfP -> integerExprP(selfP));

Here, integerExprP is defined as a method. Just like the previous parsers, its construction depends on a parser that doesn't exist yet -- only, in this case, it depends on itself. The BasicParsers.recursiveP(java.util.function.Function) combiner ties the knot on such a dependency cycle, feeding the given factory function a handle to a mutable location that will, eventually, contain a valid parser -- but only once the factory returns one. For this reason, it is important that the factory not invoke the provided parser, only use it to construct a bigger (productive!) parser.

The integerExprP parser itself is built using two different combiners for handling alternatives. The first, BasicParsers.chooseP(gov.nasa.jpl.aerie.json.JsonParser[]), models an untagged sum: it can't tell immediately which alternative a particular JSON document is a representation of, so it attempts each subparser in turn until it finds one that works. It is very easy to accidentally define overlapping alternatives; be careful to ensure that values covered by one alternative are not covered by another!

The SumParsers.sumP(java.lang.String, java.lang.Class, java.util.List) combiner, on the other hand, is a tagged sum: it associates to every alternative an extra field, whose fixed value is distinct for each alternative. This makes it fast and reliable to determine which subparser reigns for a particular document, but makes it less general than chooseP. The sumP combiner is also specialized to situations where the domain types are described by a subclass hierarchy; it cannot be used for modeling, say, Optional, whose alternatives are not detected with instanceof.

Our expression parser uses sumP to describe operations using an "op" field, whose value (either "+" or "-") determines the remaining fields. We use chooseP to allow numbers to be written directly, rather than wrapping them in an extra object like the other cases. The alternatives under chooseP have no overlap in either the JSON format or the Java type hierarchy, so this is a safe use of chooseP.

The case for string expressions is directly analogous:

static JsonParser<Expr<String>> stringExprP(final JsonParser<Expr<String>> stringExprP) {
  @SuppressWarnings("unchecked")
  final var stringExprClass = (Class<Expr<String>>) (Object) Expr.class;

  return chooseP(
      strP,
      sumP("op", stringExprClass, List.of(
          new Variant<>("++", Expr.Concat.class, concatP(stringExprP)),
          new Variant<>("$", Expr.ToString.class, toStringP(integerExprP))
      )));
}

final JsonParser<Expr<String>> stringExprP
    = recursiveP(selfP -> stringExprP(selfP));

Now, if stringExprP models the root of our expression grammar, we can invoke JsonParser.parse(javax.json.JsonValue) on it to convert a JSON document into an Expr<String>, or invoke JsonParser.unparse(java.lang.Object) to convert an Expr<String> into a JSON document. As a bonus, the JsonParser.getSchema() method will produce a JSON Schema-compliant document describing the class of JSON documents modeled by this parser!

Class

Description

BasicParsers

A namespace for primitive parsers and essential combinators.

Breadcrumb

Breadcrumb.BreadcrumbVisitor<Result>

Convert<S,T>

An infallible two-way conversion between types S and T.

JsonObjectParser<T>

JsonParser<T>

An interface for two-way conversion between JSON documents and domain objects.

JsonParseResult<T>

JsonParseResult.Failure<T>

JsonParseResult.FailureReason

JsonParseResult.Success<T>

PathJsonParser

ProductParsers

ProductParsers.EmptyProductParser

ProductParsers.VariadicProductParser<T>

SchemaCache

SumParsers

SumParsers.Variant<T>

Uncurry

A set of utility functions for transforming the fields of an object parser.

Uncurry.Function3<Result,T1,T2,T3>

Uncurry.Function4<Result,T1,T2,T3,T4>

Uncurry.Function5<Result,T1,T2,T3,T4,T5>

Uncurry.Function6<Result,T1,T2,T3,T4,T5,T6>

Uncurry.Function7<Result,T1,T2,T3,T4,T5,T6,T7>

Uncurry.Function8<Result,T1,T2,T3,T4,T5,T6,T7,T8>

Uncurry.Function9<Result,T1,T2,T3,T4,T5,T6,T7,T8,T9>

Unit

A type with only one non-null value.

Package gov.nasa.jpl.aerie.json

Defining a format