Introduction
Software ecosystems have evolved into highly interconnected networks of components, packages, and dependencies. Managing this complexity demands a robust, uniform mechanism to identify and track software packages across diverse ecosystems and tools. Package-URL (PURL) was developed to address this challenge by providing a simple, consistent, and flexible approach to identifying software packages with precision and clarity.
PURL introduces a standardized URL-based syntax that uniquely identifies software packages, independent of their ecosystem or distribution channel. Unlike traditional identification methods, PURL embeds critical metadata directly into its structure, enabling efficient, accurate package identification at scale. This standardization ensures interoperability between tools and ecosystems, fostering greater collaboration and reducing ambiguity in software supply chain management.
Challenges addressed by PURL:
- Ambiguity in Package Identification: With diverse naming conventions across ecosystems, identifying software packages reliably has historically been a challenge. PURL eliminates this ambiguity by creating a universal identifier with a predictable structure.
- Cross-Ecosystem Interoperability: Developers, organizations, and tools often work across multiple ecosystems, each with its own package management systems. PURL harmonizes these differences, enabling seamless interoperability.
- Enhanced Traceability and Risk Management: In an era where supply chain security is critical, PURL provides the foundation for identifying and tracing packages to their origins, dependencies, and potential vulnerabilities.
- Tooling and Automation: By standardizing package identification, PURL simplifies tooling development, automation, and integration for tasks such as software composition analysis, vulnerability management, and license compliance.
As software supply chain security becomes a global priority, formalizing PURL as an international standard ensures its adoption and consistent implementation. Standardization under Ecma International Technical Committee 54 (TC54) positions PURL as a foundational building block for secure, transparent, and efficient software ecosystems worldwide.
By enabling a universally recognized and implementable specification, PURL aligns with global efforts to improve the security, reliability, and accountability of software supply chains. Its adoption ensures that organizations and developers can rely on a common language to manage software packages across the diverse and rapidly evolving software landscape.
What is a PURL
PURL stands for package URL.
A PURL is a URL composed of seven components:
scheme:type/namespace/name@version?qualifiers#subpath
Components are separated by a specific character for unambiguous parsing.
Table 1: Components of a PURL
| Component | Requirement | Description |
|---|---|---|
| scheme | Required | The URL scheme with the constant value of “pkg”. One of the primary reasons for this single scheme is to facilitate the future official registration of the “pkg” scheme for package URLs. |
| type | Required | The package “type” or package “protocol” such as maven, npm, nuget, gem, pypi, etc. |
| namespace | Optional | A name prefix such as a Maven groupid, a Docker image owner, a GitHub user or organization. Namespace is type-specific. |
| name | Required | The name of the package. |
| version | Optional | The version of the package. |
| qualifiers | Optional | Qualifier data for a package such as OS, architecture, repository, etc. Qualifiers are type-specific. |
| subpath | Optional | Subpath within a package, relative to the package root. |
Components are designed such that they form a hierarchy from the most significant on the left to the least significant components on the right.
A PURL must not contain a URL Authority, i.e. there is no support for
username, password, host and port components. A namespace
segment may sometimes look like a host, but its interpretation is
specific to a type.
Some PURL examples
pkg:bitbucket/birkenfeld/pygments-main@244fd47e07d1014f0aed9c pkg:deb/debian/curl@7.50.3-1?arch=i386&distro=jessie pkg:gem/ruby-advisory-db-check@0.12.4 pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c pkg:golang/google.golang.org/genproto#googleapis/api/annotations pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?packaging=sources pkg:npm/foobar@12.3.1 pkg:nuget/EnterpriseLibrary.Common@6.0.1304 pkg:pypi/django@1.11.1 pkg:rpm/fedora/curl@7.50.3-1.fc25?arch=i386&distro=fedora-25
A PURL is a URL
-
A PURL is a valid URL and URI that conforms to the URL definitions or specifications at:
-
This is a valid URL because it is a locator even though it has no Authority URL component: each
typehas a default repository location when defined. -
The PURL components are mapped to these URL components:
- PURL
scheme: this is a URLschemewith a constant value:pkg - PURL
type,namespace,nameandversioncomponents: these are collectively mapped to a URLpath - PURL
qualifiers: this maps to a URLquery - PURL
subpath: this is a URLfragment - In a PURL, there is no support for a URL Authority (e.g. no
username,password,hostandportcomponents).
- PURL
-
Special URL schemes as defined in https://url.spec.whatwg.org/ such as
file://,https://,http://andftp://are not valid PURL types. They are valid URL or URI schemes but they are not PURL. They may be used to reference URLs in separate attributes outside of a PURL or in a PURL qualifier. -
Version control system (VCS) URLs such
git://,svn://,hg://or as defined in Python pip or SPDX download locations are not valid PURL types. They are valid URL or URI schemes but they are not PURL. They are a closely related, compact and uniform way to reference VCS URLs. They may be used as references in separate attributes outside of a PURL or in a PURL qualifier.
Rules for each PURL component
A PURL string is an ASCII URL string composed of seven components.
Except as expressly stated otherwise in this section, each component:
- May be composed of any of the characters defined in the “Permitted characters” section
- Must be encoded as defined in the “Character encoding” section
The “lowercase” rules are defined in the “Case folding” section.
The rules for each component are:
-
scheme:
- The
schemeis a constant with the value “pkg”. - The
schememust be followed by an unencoded colon ‘:’. - PURL parsers must accept URLs where the
schemeand colon ‘:’ are followed by one or more slash ‘/’ characters, such as ‘pkg://’, and must ignore and remove all such ‘/’ characters.
- The
-
type:
- The package
typemust be composed only of ASCII letters and numbers, period ‘.’, and dash ‘-’. - The
typemust start with an ASCII letter. - The
typemust not be percent-encoded. - The
typeis case insensitive. The canonical form is lowercase.
- The package
-
namespace:
-
The
namespaceis optional, unless required by the package’stypedefinition. -
If present, the
namespacemay contain one or more segments, separated by a single unencoded slash ‘/’ character. -
All leading and trailing slashes ‘/’ are not significant and should be stripped in the canonical form. They are not part of the
namespace. -
Each
namespacesegment must be a percent-encoded string. -
When percent-decoded, a segment:
- Must not contain any slash ‘/’ characters
- Must not be empty
- Must contain any Unicode character other than ‘/’ unless the
package’s
typedefinition provides otherwise.
-
A URL host or Authority must not be used as a
namespace. Use instead arepository_urlqualifier. Note however, that for some types, thenamespacemay look like a host.
-
-
name:
- The
nameis prefixed by a single slash ‘/’ separator when thenamespaceis not empty. - All leading and trailing slashes ‘/’ are not significant and should
be stripped in the canonical form. They are not part of the
name. - A
namemust be a percent-encoded string. - When percent-decoded, a
namemay contain any Unicode character unless the package’stypedefinition provides otherwise.
- The
-
version:
- The
versionis prefixed by a ‘@’ separator when not empty. - This ‘@’ is not part of the
version. - A
versionmust be a percent-encoded string. - When percent-decoded, a
versionmay contain any Unicode character unless the package’stypedefinition provides otherwise. - A
versionis a plain and opaque string.
- The
-
qualifiers:
-
The
qualifierscomponent must be prefixed by an unencoded question mark ‘?’ separator when not empty. This ‘?’ separator is not part of thequalifierscomponent. -
The
qualifierscomponent is composed of one or morekey=valuepairs. Multiplekey=valuepairs must be separated by an unencoded ampersand ‘&’. This ‘&’ separator is not part of an individualqualifier. -
A
keyandvaluemust be separated by the unencoded equal sign ‘=’ character. This ‘=’ separator is not part of thekeyorvalue. -
A
valuemust not be an empty string: akey=valuepair with an emptyvalueis the same as if nokey=valuepair exists for thiskey. -
For each
key=valuepair:- The
keymust be composed only of lowercase ASCII letters and numbers, period ‘.’, dash ‘-’ and underscore ’_’. - A
keymust start with an ASCII letter. - A
keymust not be percent-encoded. - Each
keymust be unique among all the keys of thequalifierscomponent. - A
valuemay contain any Unicode character and all characters must be encoded as described in the “Character encoding” section.
- The
-
-
subpath:
-
The
subpathstring is prefixed by a ‘#’ separator when not empty -
The ‘#’ is not part of the
subpath -
The
subpathcontains zero or more segments, separated by slash ‘/’ -
Leading and trailing slashes ‘/’ are not significant and should be stripped in the canonical form
-
Each
subpathsegment must be a percent-encoded string -
When percent-decoded, a segment:
- Must not contain any slash ‘/’ characters
- Must not be empty
- Must not be any of ‘..’ or ‘.’
- May contain any Unicode character other than ‘/’ unless the
package’s
typedefinition provides otherwise.
-
The
subpathmust be interpreted as relative to the root of the package
-
Permitted characters
A canonical PURL is composed of these permitted ASCII characters:
- the Alphanumeric Characters:
A to Z,a to z,0 to 9, - the Punctuation Characters:
.-_~(period ‘.’, dash ‘-’, underscore ’_’ and tilde ‘~’), - the Percent Character:
%(percent sign ‘%’), and - the Separator Characters
:/@?=&#(colon ‘:’, slash ‘/’, at sign ‘@’, question mark ‘?’, equal sign ‘=’, ampersand ‘&’ and hash sign ‘#’).
Separators
This is how each of the Separator Characters is used:
- ‘:’ (colon) is the separator between
schemeandtype - ‘/’ (slash) is the separator between
type,namespaceandname - ‘/’ (slash) is the separator between
subpathsegments - ‘@’ (at sign) is the separator between
nameandversion - ‘?’ (question mark) is the separator before
qualifiers - ‘=’ (equals) is the separator between a
keyand avalueof aqualifier - ‘&’ (ampersand) is the separator between
qualifiers(each being akey=valuepair) - ‘#’ (hash sign) is the separator before
subpath
Character encoding
-
In the “Rules for each PURL component” section, each component defines when and how to apply percent-encoding and decoding to its content.
-
When percent-encoding is required by a component definition, the component string must first be encoded as UTF-8.
-
In the component string, each “data octet” must be replaced by the percent-encoded “character triplet” applying the percent-encoding mechanism defined in RFC 3986 section 2.1, including the RFC definition of “data octet” and “character triplet”, and using these definitions for RFC’s “allowed set” and “delimiters”:
- “allowed set” is composed of the Alphanumeric Characters and the Punctuation Characters
- “delimiters” is composed of the Separator Characters
-
The following characters must not be percent-encoded:
- the Alphanumeric Characters,
- the Punctuation Characters,
- the Separator Characters when being used as PURL separators,
- the colon ‘:’, whether used as a Separator Character or otherwise, and
- the percent sign ‘%’ when used to represent a percent-encoded character.
-
Where the space ’ ’ is permitted, it must be percent-encoded as ‘%20’.
-
With the exception of the percent-encoding mechanism, the rules regarding percent-encoding are defined by this specification alone.
Case folding
References to “lowercase” in this specification refer to the culture-invariant full case mapping defined in Section 3.13.2 of the Unicode Standard.
When applied to the ASCII character set, this operation converts
uppercase Latin letters (A to Z) to their corresponding lowercase
forms (a to z). All other ASCII characters remain unchanged.
Package-URL type definitions
Each package manager, platform, type, or ecosystem has its own conventions and protocols to identify, locate, and provision software packages.
The package type is the component of a Package-URL that is used to
capture this information with a short string such as maven, npm,
nuget, gem, pypi, etc.
PURL type definitions are maintained in a set of JSON Schema files with
a separate file for each purl type. Each purl type has a
corresponding file of automatically generated documentation. There is
also a simple index of all currently registered purl types.
Where to find PURL type information
-
In individual JSON files, one for each
purltypedefinition at: https://github.com/package-url/purl-spec/tree/main/types -
As Markdown documentation, generated from each
purltypeJSON definition at: https://github.com/package-url/purl-spec/tree/main/types-doc -
In the JSON Index listing of all registered PURL types at: https://github.com/package-url/purl-spec/tree/main/purl-types-index.json
How PURL Types are maintained
PURL type definitions are maintained as JSON definition files and JSON test files in the PURL specification repository. These JSON files serve as the source of truth and define the structure of each PURL type, including:
- Namespace and name formatting rules
- Supported qualifiers
- Repository requirements
- Mapping of PURL concepts to the native ecosystem concepts
On commit, a job automatically:
- Checks that all JSON files are schema-valid
- Formats all the JSON files
- Generates the
purl-types-index.jsonfile containing a list of defined registered PURL types - Generates human-readable documentation for each type
How to Propose a New PURL Type
To propose a new PURL type, create an issue and a corresponding pull request in the repository with:
- a new JSON definition file under
types/. - a new JSON test file file under
tests/types/.
Ensure that your proposal follows the PURL Type Definition Schema and includes all required fields. For this see the README-dev.md for details to run local checks.
How to build a purl string from its components
Building a purl ASCII string works from left to right, from type to
subpath.
Note: some extra type-specific normalizations are required. See the “Registered types section” for details.
To build a purl string from its components:
-
Start a
purlstring with the “pkg:”schemeas a lowercase ASCII string -
Append the
typestring to thepurlas an unencoded lowercase ASCII string- Append ‘/’ to the
purl
- Append ‘/’ to the
-
If the
namespaceis not empty:- Strip the
namespacefrom leading and trailing ‘/’ - Split on ‘/’ as segments
- Apply type-specific normalization to each segment if needed
- UTF-8-encode each segment if needed in your programming language
- Percent-encode each segment
- Join the segments with ‘/’
- Append this to the
purl - Append ‘/’ to the
purl - Strip the
namefrom leading and trailing ‘/’ - Apply type-specific normalization to the
nameif needed - UTF-8-encode the
nameif needed in your programming language - Append the percent-encoded
nameto thepurl
- Strip the
-
If the
namespaceis empty:- Apply type-specific normalization to the
nameif needed - UTF-8-encode the
nameif needed in your programming language - Append the percent-encoded
nameto thepurl
- Apply type-specific normalization to the
-
If the
versionis not empty:- Append ‘@’ to the
purl - UTF-8-encode the
versionif needed in your programming language - Append the percent-encoded version to the
purl
- Append ‘@’ to the
-
If the
qualifiersare not empty and not composed only of key/value pairs where thevalueis empty:-
Append ‘?’ to the
purl -
Build a list from all key/value pair:
- Discard any pair where the
valueis empty. - UTF-8-encode each
valueif needed in your programming language - If the
keyischecksumand this is a list of checksums join this list with a ‘,’ to create this qualifiervalue - Create a string by joining the lowercased
key, the equal ‘=’ sign and the percent-encodedvalueto create a qualifier
- Discard any pair where the
-
Sort this list of qualifier strings lexicographically
-
Join this list of qualifier strings with a ‘&’ ampersand
-
Append this string to the
purl
-
-
If the
subpathis not empty and not composed only of empty, ‘.’ and ‘..’ segments:- Append ‘#’ to the
purl - Strip the
subpathfrom leading and trailing ‘/’ - Split this on ‘/’ as segments
- Discard empty, ‘.’ and ‘..’ segments
- Percent-encode each segment
- UTF-8-encode each segment if needed in your programming language
- Join the segments with ‘/’
- Append this to the
purl
- Append ‘#’ to the
How to parse a purl string into its components
Parsing a purl ASCII string into its components works from right to
left, from subpath to type.
Note: some extra type-specific normalizations are required. See the “Registered types section” for details.
To parse a purl string in its components:
-
Split the
purlstring once from right on ‘#’- The left side is the
remainder - Strip the right side from leading and trailing ‘/’
- Split this on ‘/’
- Discard any empty string segment from that split
- Percent-decode each segment
- Discard any ‘.’ or ‘..’ segment from that split
- UTF-8-decode each segment if needed in your programming language
- Join segments back with a ‘/’
- This is the
subpath
- The left side is the
-
Split the
remainderonce from right on ‘?’-
The left side is the
remainder -
The right side is the
qualifiersstring -
Split the
qualifierson ‘&’. Each part is akey=valuepair -
For each pair, split the
key=valueonce from left on ‘=’:- The
keyis the lowercase left side - The
valueis the percent-decoded right side - UTF-8-decode the
valueif needed in your programming language - Discard any key/value pairs where the value is empty
- If the
keyischecksum, split thevalueon ‘,’ to create a list of checksums
- The
-
This list of key/value is the
qualifiersobject
-
-
Split the
remainderonce from left on ‘:’- The left side lowercased is the
scheme - The right side is the
remainder
- The left side lowercased is the
-
Strip all leading ‘/’ characters (e.g., ‘/’, ‘//’, ‘///’ and so on) from the
remainder- Split this once from left on ‘/’
- The left side lowercased is the
type - The right side is the
remainder
-
Split the
remainderonce from right on ‘@’- The left side is the
remainder - Percent-decode the right side. This is the
version. - UTF-8-decode the
versionif needed in your programming language - This is the
version
- The left side is the
-
Strip all trailing ‘/’ characters (e.g., ‘/’, ‘//’, ‘///’ and so on) from the
remainder- Split this once from right on ‘/’
- The left side is the
remainder - Percent-decode the right side. This is the
name - UTF-8-decode this
nameif needed in your programming language - Apply type-specific normalization to the
nameif needed - This is the
name
-
Split the
remainderon ‘/’- Discard any empty segment from that split
- Percent-decode each segment
- UTF-8-decode each segment if needed in your programming language
- Apply type-specific normalization to each segment if needed
- Join segments back with a ‘/’
- This is the
namespace
Known purl qualifiers key/value pairs
Note: Do not abuse qualifiers: it can be tempting to use many
qualifier keys but their usage should be limited to the bare minimum for
proper package identification to ensure that a purl stays compact and
readable in most cases.
Additional, separate external attributes stored outside of a purl are
the preferred mechanism to convey extra long and optional information
such as a download URL, VCS URL or checksums in an API, database or web
form.
With this warning, the known key and value defined here are valid
for use in all package types:
-
versallows the specification of a version range. The value MUST adhere to theVersion Range Specification. This qualifier is mutually exclusive with theversioncomponent. For example:pkg:pypi/django?vers=vers:pypi%2F%3E%3D1.11.0%7C%21%3D1.11.1%7C%3C2.0.0
-
repository_urlis an extra URL for an alternative, non-default package repository or registry. When a package does not come from the default public package repository for itstypeapurlmay be qualified with this extra URL. The default repository or registry of atypeis documented in the “Registeredpurltypes” section. -
download_urlis an extra URL for a direct package web download URL to optionally qualify apurl. -
vcs_urlis an extra URL for a package version control system URL to optionally qualify apurl. The syntax for this URL should be as defined in Python pip or the SPDX specification. See https://github.com/spdx/spdx-spec/blob/cfa1b9d08903/chapters/3-package-information.md#37-package-download-location -
file_nameis an extra file name of a package archive. -
checksumis a qualifier for one or more checksums stored as a comma-separated list. Each item in thevalueis in form oflowercase_algorithm:hex_encoded_lowercase_valuesuch assha1:ad9503c3e994a4f611a4892f2e67ac82df727086. For example (with checksums truncated for brevity):checksum=sha1:ad9503c3e994a4f,sha256:41bf9088b3a1e6c1ef1d
Tests
To support the language-neutral testing of purl implementations, a
test suite is provided as JSON document named test-suite-data.json.
This JSON document contains an array of objects. Each object represents
a test with these key/value pairs some of which may not be normalized:
- purl: a
purlstring. - canonical: the same
purlstring in canonical, normalized form - type: the
typecorresponding to thispurl. - namespace: the
namespacecorresponding to thispurl. - name: the
namecorresponding to thispurl. - version: the
versioncorresponding to thispurl. - qualifiers: the
qualifierscorresponding to thispurlas an object of {key: value} qualifier pairs. - subpath: the
subpathcorresponding to thispurl. - is_invalid: a boolean flag set to true if the test should report an error
To test purl parsing and building, a tool can use this test suite and
for every listed test object, run these tests:
-
parsing the test canonical
purlthen re-building apurlfrom these parsed components should return the test canonicalpurl -
parsing the test
purlshould return the components parsed from the test canonicalpurl -
parsing the test
purlthen re-building apurlfrom these parsed components should return the test canonicalpurl -
building a
purlfrom the test components should return the test canonicalpurl