The Unified Navigation Language (UNL) is designed as an embeddable query language. This specification describes its syntax, data model, type system, and the necessary components of an execution environment for integration into a host programming language. UNL provides a single, coherent syntax to query, navigate, and transform data across multiple structured formats, including XML, JSON, CSV, and RDF.
In an increasingly heterogeneous data ecosystem, developers must master multiple query languages (XPath for XML, JSONPath for JSON, SPARQL for RDF, etc.). UNL was designed to eliminate this complexity by offering a unified abstraction layer.
UNL's key innovations are:
?: A unique syntax to elegantly handle variable or incomplete data structures.@): Provides fundamental semantic clarity by distinguishing between navigable containers and final values.outer(), root()): Enables complex, type-based navigation jumps that are difficult to achieve with other languages.|format): The pipe operator allows for on-the-fly data conversion within a single navigation expression.A UNL query is always evaluated against an initial context. The initial context can be a single document or a sequence of documents.
UNL is designed to be embedded within a host programming language (e.g., Python, JavaScript, Java). A UNL engine implementation MUST be provided with an Execution Environment by the host. This environment supplies the context and configurations necessary to execute a query.
The components of the Execution Environment are:
foaf:name), the host environment MUST provide a mapping of prefixes to their full namespace URI strings. This is typically a hash map or dictionary. The UNL engine uses this map to resolve EQNames.
$username). The host environment can provide a map of variable names to their values. These values are mapped to UNL's conceptual data types, and can be primitive Leaves (String, Number, Boolean) or structured Nodes (Object, Array).
// Pseudo-code for variable binding
let variables = {
"user_id": 123,
"allowed_roles": ["admin", "editor"]
};
// Use a numeric variable in a predicate
unl.run(context, "users[@id = $user_id]", { variables });
// Use an array variable in a predicate
unl.run(context, "users[role = $allowed_roles]");
// Pseudo-code for registering and using a custom function
// my_validators.js
const is_internal_email = (email_string) => email_string.endsWith('@example.com');
// Main application
unl.registerFunctionLibrary("validate", { isInternal: is_internal_email });
unl.run(context, "//user[validate:isInternal(@email)]");
UNL includes a powerful two-stage pipeline model to process sequences of results. This allows for both efficient, low-memory streaming and complex, full-sequence aggregations.
|)
The default pipeline, using the single pipe |, operates in a streaming fashion. Each operator processes items one by one as they arrive, typically with minimal memory overhead (O(1) or `O(k)`). This model is highly efficient for large datasets. It includes format parsing, decoding, and a class of "streamable aggregates".
These operators can produce their result without needing to store the entire sequence in memory. They operate in the standard streaming pipeline.
| Operator | Description | Type Signature |
|---|---|---|
| head(n) | Takes the first n items from the stream and terminates the pipeline. | Sequence(T) → Sequence(T) |
| tail(n) | Maintains a fixed-size buffer to output the last n items once the stream ends. | Sequence(T) → Sequence(T) |
| count | Counts all items in the stream and outputs a single leaf with the total at the end. | Sequence(T) → Leaf(Number) |
| sum | Calculates the sum of all items in a numeric stream. | Sequence(Leaf(Number)) → Leaf(Number) |
| avg | Calculates the average of all items in a numeric stream. | Sequence(Leaf(Number)) → Leaf(Number) |
| min | Finds the minimum value in a stream of comparable items. | Sequence(T) → T |
| max | Finds the maximum value in a stream of comparable items. | Sequence(T) → T |
||)
The double pipe || acts as a blocking barrier. It instructs the engine to stop streaming, collect all results from the preceding pipeline into a full sequence in memory (O(n)), and then pass that complete sequence to the aggregation operators that follow. This is a conscious trade-off made by the user to enable powerful, whole-sequence operations.
These operators MUST be preceded by the || barrier, as they require the entire sequence to be available to perform their work.
| Operator | Description | Type Signature |
|---|---|---|
| order-by(key) | Sorts the entire sequence based on a key expression. | Sequence(T) → Sequence(T) |
| group-by(key) | Groups items in the sequence based on a key expression. | Sequence(T) → Sequence(Node(Group)) |
| distinct | Removes duplicate items from the sequence. | Sequence(T) → Sequence(T) |
The result of a UNL query is the data produced by the final operator in the pipeline. If the pipeline ends with an aggregation operator like order-by or `distinct`, the result is a sequence (an array of nodes or leaves). The task of serializing this final sequence into a specific document format (e.g., by wrapping it in a root element) is left to the calling application or environment.
UNL's data model is based on three core principles.
@ Prefix)| Operator)| allows the value of a Leaf to be re-interpreted as a new data source. This operation performs a type-casting of the leaf's value, creating a new, navigable Node structure. This mechanism is the key to nested parsing and is fundamental to UNL's power.
.../Leaf(String) |json → New Node(JSON)
UNL operates on a set of conceptual data types defined in the Data Model Mapping appendix. The behavior of comparison and equality operators depends on these types.
The equality operator compares two values. The inequality operator `!=` is defined as the negation of `=`. The rules are applied in order:
Ordered comparisons are primarily defined for primitive, orderable types (Numbers, Strings).
The `| order-by(key)` operator uses these comparison rules to sort a sequence. It evaluates the `key` expression for each item in the sequence, and then sorts the items based on the resulting key values. The host implementation SHOULD provide options to specify data type (e.g., numeric vs. text) for sorting to avoid ambiguity.
Many characters have a special syntactic meaning in UNL (e.g., / ? @ * [ ] ( ) | . :). If a node or leaf name in the source data contains one of these characters, it must be enclosed in single (') or double (") quotes to be treated as a literal name.
This quoting mechanism applies to any path segment that is a name test.
\). A literal backslash MUST also be escaped (\\).
// Example 1: JSON key with a forward slash
// Data: { "a/b": { "c": 1 } }
// Query: "a/b"/c/@value
// Example 2: Filename with special characters
// Resource Locator: my-archive.zip
// Path in zip: reports/report-[v1].xml
my-archive.zip|decomp:zip/reports/"report-[v1].xml"|xml//...
// Example 3: XML element name with dots
// Data: <com.example.Node>Value</com.example.Node>
// Query: "com.example.Node"/text()
// Example 4: Quoting a name containing quotes
// Data: { "node with \"quotes\"": 42 }
// Query: "node with \\\"quotes\\\""/@value
UNL provides robust support for namespaced data formats like XML and RDF. To ensure unambiguous queries, UNL adopts the EQName (Extended QName) notation from [[XPATH-31]].
atom:title)prefix:local-name syntax. Its use is supported but discouraged in favor of EQNames, as it relies on an external context to map the prefix to a namespace URI.Q{http://www.w3.org/2005/Atom}title)Q{namespace-uri}local-name syntax. This is the recommended approach as it includes the full namespace URI directly within the expression, making queries self-contained and unambiguous.Because RDF predicates are full URIs, using the EQName syntax is the most precise method for navigating RDF graphs.
// EQName syntax is unambiguous and recommended
Q{http://www.w3.org/2005/Atom}feed/Q{http://www.w3.org/2005/Atom}entry/title
// EQName used to query an RDF property
Q{http://xmlns.com/foaf/0.1/}Person/Q{http://xmlns.com/foaf/0.1/}knows
// Full URI used in a predicate value
*[rdf:type = <http://xmlns.com/foaf/0.1/Person>]
The pipe operator (|) is used for transformations. As described in the Operating Models, the first pipe in a ResourceQuery serves to load and parse a resource. Subsequent pipes transform the current selection, often by type-casting a Leaf's value into a new Node structure, as defined in the Core Principles.
All transformations described below adhere to the Implicit Iteration principle. When the input is a Sequence(T), the output will be a Sequence(U), where the transformation T → U has been applied to each item.
These transformations parse a string-based or binary Leaf or resource into a navigable Node structure.
| Format | Description | Type Signature | Reference |
|---|---|---|---|
|xml | Extensible Markup Language. | (ResourceLocator | Leaf(String | Binary)) → Node(XML) | W3C XML 1.0 |
|exi | Efficient XML Interchange (binary XML). | (ResourceLocator | Leaf(Binary)) → Node(XML) | W3C EXI 1.0 |
|json | JavaScript Object Notation. | (ResourceLocator | Leaf(String | Binary)) → Node(Object | Array) | RFC 8259 |
|csv | Comma-Separated Values. | (ResourceLocator | Leaf(String | Binary)) → Node(Array) | RFC 4180 |
|rdf | Resource Description Framework. | (ResourceLocator | Leaf(String | Binary)) → Node(Graph) | W3C RDF 1.1 |
|html | HyperText Markup Language. | (ResourceLocator | Leaf(String | Binary)) → Node(HTML) | WHATWG HTML |
|text | Plain text. Forces a binary leaf to be interpreted as a string. | (ResourceLocator | Leaf(Binary)) → Leaf(String) | RFC 2046 |
|yaml | YAML Ain't Markup Language. | (ResourceLocator | Leaf(String | Binary)) → Node(Object | Array) | YAML 1.2.2 |
|toml | Tom's Obvious, Minimal Language. | (ResourceLocator | Leaf(String | Binary)) → Node(Object) | TOML 1.0.0 |
This operator treats a local directory path as a navigable archive-like structure.
| Format | Description | Type Signature | Reference |
|---|---|---|---|
|ls | Lists the contents of a local directory. | ResourceLocator(Directory) → Node(Archive) | N/A |
decomp: prefix)These transformations operate on a resource or a binary Leaf, decompressing it to expose a virtual filesystem of Nodes or a raw binary stream.
| Format | Description | Type Signature | Reference |
|---|---|---|---|
|decomp:zip | ZIP file format. | (ResourceLocator | Leaf(Binary)) → Node(Archive) | PKWARE ZIP |
|decomp:tar | Tape Archive. | (ResourceLocator | Leaf(Binary)) → Node(Archive) | POSIX.1-2017 |
|decomp:gz | Gzip compression. | (ResourceLocator | Leaf(Binary)) → Leaf(Binary) | RFC 1952 |
|decomp:7z | 7z archive format. | (ResourceLocator | Leaf(Binary)) → Node(Archive) | 7-Zip Format |
|decomp:rar | Roshal Archive. | (ResourceLocator | Leaf(Binary)) → Node(Archive) | N/A |
|decomp:xz | XZ compression. | (ResourceLocator | Leaf(Binary)) → Leaf(Binary) | XZ Format |
|decomp:bz2 | Bzip2 compression. | (ResourceLocator | Leaf(Binary)) → Leaf(Binary) | Bzip2 Format |
|decomp:zstd | Zstandard compression. | (ResourceLocator | Leaf(Binary)) → Leaf(Binary) | RFC 8878 |
|decomp:brotli | Brotli compression. | (ResourceLocator | Leaf(Binary)) → Leaf(Binary) | RFC 7932 |
decode: prefix)These are intermediate transformations that type-cast a Leaf's value.
| Format | Description | Type Signature | Reference |
|---|---|---|---|
|decode:base64 | Base64 decoding. | Leaf(String) → Leaf(Binary) | RFC 4648 |
|decode:hex | Hexadecimal decoding. | Leaf(String) → Leaf(Binary) | RFC 4648 |
|decode:url | Percent-decoding. | Leaf(String) → Leaf(String) | RFC 3986 |
|decode:html-entities | Decodes HTML/XML character entities. | Leaf(String) → Leaf(String) | WHATWG HTML |
|decode:json-string | Un-escapes a string that was itself encoded as a JSON string literal. | Leaf(String) → Leaf(String) | RFC 8259 |
|decode:punycode | Decodes Punycode strings (IDN). | Leaf(String) → Leaf(String) | RFC 3492 |
|decode:quoted-printable | Decodes Quoted-Printable (MIME) content. | Leaf(String) → Leaf(String) | RFC 2045 |
Predicates, placed between square brackets [...], are used to filter sets of nodes. Path expressions inside a predicate can be absolute (starting with /) or relative to the current node.
// Filter items where category leaf matches a global configuration value
// The path /config/default_category starts from the document root
//item[@category = /config/default_category]
These functions are used within a predicate to filter based on position in a sequence.
| Function | Description | Type Signature |
|---|---|---|
position() | Returns the 1-based position of the current item in its sequence. `[n]` is a shorthand for `[position()=n]`. | () → Leaf(Number) |
last() | Returns the total number of items in the current sequence. | () → Leaf(Number) |
Built-in functions are called without a namespace prefix (e.g., text()). Custom functions provided by a host language MUST use a namespace prefix (e.g., myfuncs:my_func()).
These functions manipulate or query the structure of the data model.
| Function | Description | Type Signature |
|---|---|---|
root() | Returns the root node of the document. Equivalent to starting a path with /. | () → Node |
outer(selector, n) | Navigates upwards n levels, jumping only over nodes that match the selector. | (Node, String, Number) → Sequence(Node) |
inner(selector) | Navigates to the terminal elements matching the selector within the current context. | (Node, String) → Sequence(Node) |
These functions operate on sequences of nodes to filter them based on their hierarchical relationships.
| Function | Description | Type Signature |
|---|---|---|
outermost(nodes) | From a set of nodes, keeps only those that are not contained within other nodes in the set. | (Sequence(Node)) → Sequence(Node) |
innermost(nodes) | From a set of nodes, keeps only those that do not contain any other nodes from the set. | (Sequence(Node)) → Sequence(Node) |
These functions provide general utility for inspection and logic within queries.
| Function | Description | Type Signature |
|---|---|---|
only(elements) | Tests if the context contains only the specified elements. | (Node, Sequence(Node)) → Leaf(Boolean) |
text() | Returns the text content of a node. | (Node) → Leaf(String) |
lang() | When used on a literal leaf, returns its language tag as a string. | (Leaf) → Leaf(String) |
count() | Returns the number of elements in a selection. | (Sequence(T)) → Leaf(Number) |
type() | Returns the node's type as a string (e.g., "element", "object"). | (Node) → Leaf(String) |
not(expr) | Negation of a predicate expression. | (Leaf(Boolean)) → Leaf(Boolean) |
The following examples illustrate the two operating models and advanced features.
// In-Memory: Get an attribute from an XML node
doc/book/@isbn
// In-Memory: Get the name from the first object in a JSON array
users[1]/@name
// Resource Loading: Get a column from a specific row in a CSV file
data.csv|csv/*[@id="ABC"]/@name
// Resource Loading: Navigate into a ZIP file to get an XML element
http://example.com/data.zip|decomp:zip/docs/report.xml|xml//title
// Recursive Parsing: A leaf's value is piped into a new parser
// Gets the 2nd tag from a comma-separated string within a CSV cell
users.csv|csv/*[1]/@tags|csv/*[1]/@*[2]
// API Chaining: A payload contains gzipped XML data
// The query decodes, decompresses, and parses the data in one pipeline
api/data|json/@payload|decode:base64|decomp:gz|xml//important
// Sequence Processing: Get all unique, sorted authors from a set of files
"data/*.xml"|xml//author/text()||distinct|order-by(.)
This appendix summarizes common data conversion pathways, showing the UNL operators used to transform an input representation into a desired output structure, as defined in the Data Model Mapping appendix.
"my_data.json"|json/......@string_leaf|xml/......@binary_leaf|json/... (Parser auto-detects encoding)...@gzipped_json_leaf|decomp:gz|json/... (Decompress then parse)"my_files.zip"|decomp:zip/......@binary_zip_leaf|decomp:zip/..."my_file.txt"|text...@binary_leaf|text.../my_node|text (Serializes the node to its default text representation, e.g., outer XML)...@string_leaf|decode:base64.../my_node|exi (Serializes an XML node to binary EXI format)This appendix provides a non-normative grammar for the Unified Navigation Language, with a syntax conforming to the EBNF standard [[ISO-IEC-14977]].
(* A full query can have an optional, final aggregation stage *)
UNLQuery ::= ( ResourceQuery | InMemoryQuery ) ( '||' AggregationPath )?
(* Form 1: Starts with a resource, requires a parsing transformation *)
ResourceQuery ::= ResourceLocator StreamingPipe ( StreamingPipe )*
(* Form 2: Starts with a path, operates on a pre-existing context *)
InMemoryQuery ::= Path ( StreamingPipe )*
StreamingPipe ::= '|' ( DataFormat | FilesystemOp | DecompTransform | DecodeTransform | StreamingAggregate )
AggregationPath ::= BlockingAggregate ( '|' BlockingAggregate )*
Path ::= ( '/' )? Step ( ( '/' | '?' ) Step )*
Step ::= ( PrimaryStep | Axis ) Predicate*
PrimaryStep ::= NameTest | Wildcard | LeafAccess | IdAccess | '.' | '(' Path ')' | FunctionCall
NameTest ::= EQName | Literal
LeafAccess ::= '@' ( EQName | '*' | Literal )
EQName ::= QName | URIQualifiedName
QName ::= ( NCName ':' )? NCName
URIQualifiedName ::= 'Q{' URILiteral '}' NCName
Wildcard ::= '*' | '**' | '?'
IdAccess ::= '#' NCName
Axis ::= '..' | '..' Integer | '...' | '+' | '-' | '~' | '~~'
Predicate ::= '[' FilterExpression ']'
FilterExpression ::= OrExpression
OrExpression ::= AndExpression ( '|' AndExpression )*
AndExpression ::= EqualityExpression ( '&' EqualityExpression )*
EqualityExpression ::= RelationalExpression ( ( '=' | '!=' ) RelationalExpression )?
RelationalExpression ::= PrimaryFilterExpr ( ( '<' | '>' | '<=' | '>=' | '~' ) PrimaryFilterExpr )*
PrimaryFilterExpr ::= Literal | Variable | FunctionCall | Path | LeafAccess | '!' FilterExpression | '(' FilterExpression ')' | Integer
Variable ::= '$' NCName
FunctionCall ::= EQName '(' ( FilterExpression ( ',' FilterExpression )* )? ')'
(* Operator Definitions *)
DataFormat ::= 'xml' | 'exi' | 'json' | 'csv' | 'rdf' | 'html' | 'text' | 'yaml' | 'toml'
FilesystemOp ::= 'ls'
DecompTransform ::= 'decomp:' ( 'zip' | 'tar' | 'gz' | '7z' | 'gzip' | 'rar' | 'xz' | 'bz2' | 'zstd' | 'brotli' )
DecodeTransform ::= 'decode:' ( 'base64' | 'hex' | 'url' | 'html-entities' | 'json-string' | 'punycode' | 'quoted-printable' )
StreamingAggregate ::= 'count' | 'sum' | 'avg' | 'min' | 'max' | 'head' '(' Integer ')' | 'tail' '(' Integer ')'
BlockingAggregate ::= 'distinct' | 'order-by' '(' Path ')' | 'group-by' '(' Path ')'
(* Lexical Definitions (Informal) *)
ResourceLocator ::= (* A literal string representing a URI or file path. It uses standard '/' separators and does not support UNL operators like '?' or '@'. UNL navigation begins after the first pipe. *)
URILiteral ::= (* A string representing a valid URI, conforming to RFC3986 *)
NCName ::= (* A Non-Colonized Name, as defined in [[XML-NAMES]]. It must not contain ':' and should be compliant with the full Unicode character set allowed by that standard. *)
Integer ::= [0-9]+
Literal ::= '"' ( [^"\\] | '\\' . )* '"' | "'" ( [^'\\] | '\\' . )* "'"
This section defines how UNL's abstract concepts of Node and Leaf are mapped onto the concrete structures of each major supported format. UNL is a 1-based language, following the convention of XPath for all positional indexing.
To describe transformations accurately, UNL uses a set of conceptual data types.
NodeNode(Object): An unordered collection of key-value pairs, similar to a JSON object.Node(Array): An ordered collection of other Nodes or Leaves.Node(XML | HTML): A structure compliant with the [[XML-INFOSET]].Node(Graph): A structure representing RDF triples.Node(Archive): A virtual filesystem root, containing file and directory nodes. This is produced by |decomp: operators and |ls.Node(Group): A special node produced by the group-by operator, containing a key and a sequence of items.LeafLeaf(String): A Unicode string. This is the primary type for textual data.Leaf(Binary): A sequence of raw bytes. This is the primary type for non-textual data. Text-based parsers like |xml can also consume a Leaf(Binary) directly by auto-detecting character encoding. The |text operator provides an explicit way to interpret binary data as text.Leaf(Number), Leaf(Boolean), Leaf(Null): Primitive data types.Sequence(T)Sequence(Node).ResourceLocatorResourceQuery.The |xml transformation is a strict parser that produces a navigable structure compliant with the [[XML-INFOSET]]. It will fail on malformed documents.
The |html transformation is a lenient parser that mimics browser behavior. It will attempt to fix errors and will always produce a navigable structure compliant with the [[XML-INFOSET]].
From the perspective of subsequent UNL path navigation, a structure parsed from HTML is indistinguishable from one parsed from well-formed XML (like XHTML). The UNL engine operates on the unified Infoset model.
{}) or a JSON Array ([]).The |csv transformation parses data into an array of nodes. All indexing is 1-based.
|csv transformation is a single Array Node. Each record (line) in the CSV is mapped to a child Node within this array./@name. This is the most readable method./@*[n], where n is a 1-based integer. @* selects all leaves, and [n] filters for the n-th position.
// Example 1: CSV with header (in "users.csv")
id,name,role,tags
1,Alice,admin,"a,b,c"
// Query 1: Get the 'role' leaf from rows where 'id' leaf is "1"
users.csv|csv/*[@id="1"]/@role // Returns leaf "admin"
// Query 2: Get the 2nd tag from Alice's record. This requires a nested parse.
users.csv|csv/*[1]/@tags|csv/*[1]/@*[2] // Returns leaf "b"
// Example 2: Headerless CSV (in "logs.csv")
1687354800,ERROR,auth_service
// Get the 2nd column of the 1st record
logs.csv|csv/*[1]/@*[2] // Returns leaf "ERROR"
UNL navigates an RDF graph by following predicates (properties).
@ prefix followed by the EQName of the predicate.
// --- Data (in Turtle syntax) ---
@prefix : <http://example.org/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
:book1 a :Book ;
rdfs:label "UNL Specification"@en ;
rdfs:label "Spécification UNL"@fr ;
foaf:maker :person1 .
:person1 a foaf:Person ;
foaf:name "John Doe" .
// --- UNL Queries (In-Memory Mode) ---
// Note the use of @ followed by the full EQName of the property
:book1/foaf:maker/@foaf:name
// Returns leaf "John Doe"
// Filter leaves based on their language tag using the lang() function
:book1/@rdfs:label[lang()="fr"]
// Returns leaf "Spécification UNL"
Archives (via decomp:) and local filesystems (via |ls) are treated as a virtual filesystem. Both files and directories are modeled as Nodes to allow querying their metadata.
@name: The name of the file or directory.@size: The uncompressed size in bytes (files only).@compressed_size: The compressed size in bytes (files only).@modified_date: The modification timestamp.@is_dir: A boolean that is true if the node is a directory.
// Example 1: List contents of a local directory
"./src"|ls/*
// Example 2: Get the size of a specific file in a ZIP archive.
my_archive.zip|decomp:zip/docs/report.xml/@size
// Example 3: Filter files by metadata from a local directory, then pipe their content.
"./src"|ls/*[@is_dir=false() and @name ~ "\.js$"]|count