parsav  parsav.md at [8f954221a1]

File parsav.md artifact ae23203c4b part of check-in 8f954221a1


# parsav

**parsav** is a lightweight fediverse server

## backends
parsav is designed to be storage-agnostic, and can draw data from multiple backends at a time. backends can be enabled or disabled at compile time to avoid unnecessary dependencies.

* postgresql

## dependencies

* runtime
  * mongoose
  * json-c
  * mbedtls
  * **postgresql backend:**
    * postgresql-libs 
* compile-time
  * cmark (commonmark implementation), for transformation of the help files, whose source is in commonmark. online documentation transforms these into html and embeds them in the binary; cmark is also used to to produce the troff source which is used to build the offline documentation. disable with `parsav_online_documentation=no parsav_offline_documentation=no`
  * troff implementation (tested with groff but as far as i know we don't need any groff-specific extensions) to produce PDFs and manpages from the cmark-generated intermediate forms. disable with `parsav_offline_documentation=no`

additional preconfigure dependencies are necessary if you are building directly from trunk, rather than from a release tarball that includes certain build artifacts which need to be embedded in the binary:

* inkscape, for rendering out UI graphics
* cwebp (libwebp package), for transforming inkscape PNGs to webp
* sassc, for compiling the SCSS stylesheet into its final CSS

all builds require terra, which, unfortunately, requires installing an older version of llvm, v9 at the latest (which i develop parsav under). with any luck, your distro will be clever enough to package terra and its dependencies properly (it's trivial on nix, tho you'll need to tweak the terra expression to select a more recent llvm package); Arch Linux is one of those distros which is not so clever, and whose (AUR) terra package is totally broken. due to these unfortunate circumstances, terra is distributed not just in source form, but also in the the form of LLVM IR. distributions will also be made in the form of tarballed object code and assembly listings for various common platforms, currently including x86-32/64, arm7hf, aarch64, riscv, mips32/64, and ppc64/64le.

i've noticed that terra (at least with llvm9) seems to get a bit cantankerous and trigger llvm to fail with bizarre errors when you try to cross-compile parsav from x86-64 to any other platform, even x86-32. i don't know if this problem exists on other architectures or in what form, but as a workaround, the current cross-compile process consists of generating LLVM IR (ostensible for x86-64, though this is in reality an architecture-independent language), and then compiling that down to an object file with llc. this is an enormous hassle; hopefully the terra people will fix this eventually.

also note that, while parsav has a flag to build with ASAN, ASAN has proven unusable for most purposes as it routinely reports false positive buffer-heap-overflows. if you figure out how to defuckulate this, i will be overjoyed.

## building

first, either install any missing dependencies as shared libraries, or build them as static libraries with the command `make dep.$LIBRARY`. as a shortcut, `make dep` will build all dependencies as static libraries. note that if the build system finds a static version of a library in the `lib/` folder, it will use that instead of any system library. note that these commands require GNU make (it may be installed as `gmake` on your system), although this is a fairly soft dependency -- if you really need to build it on BSD make, you can probably translate it with a minute or so of work; you'll just have to do some of the various gmake functions' work manually. this may be worthwhile if you're packaging for a BSD.

postgresql-libs must be installed systemwide, as `parsav` does not currently provide for statically compiling and linking it

if you use nixos and wish to build the pdf documentation, you're going to have to do a bit of extra work (but you're used to that, aren't you). for some incomprehensible reason, the groff package on nix is split up, seemingly randomly, with many crucial output devices relegated to the "perl" output of the package, which is not installed by default (and `nix-env -iA nixos.groff.perl` doesn't work either; i don't know why either). you'll have to instantiate and install the outputs directly by path, e.g. `nix-env -i /nix/store/*groff*/` to get everything you need into your profile. alas, the battle is not over: you also need to change the environment variables `GROFF_FONT_PATH` and `GROFF_TMAC_PATH` to point at the `font` and `tmac` subdirs of `~/.nix-profile/share/groff/$groff_version/`. once this is done, invoking `groff -Tpdf` will work as expected.

## configuring

the `parsav` configuration is comprised of two components: the backends list and the config store. the backends list is a simple text file that tells `parsav` which data sources to draw from. the config store is a key-value store which contains the rest of the server's configuration, and is loaded from the backends. the configuration store can be spread across the backends; backends will be checked for configuration keys according to the order in which they are listed. changes to the configuration store affect parsav in real time; you only need to restart the server if you make a change to the backend list.

eventually, we'll add a command-line tool `parsav-cfg` to enable easy modification of the configuration store from the command line; for now, you'll need to modify the database by hand or use the online administration menu. the schema.sql file contains commands to prompt for various important values like the name of your administrative user.

by default, parsav looks for a file called `backend.conf` in the current directory when it is launched. you can override this default with the `parsav_backend_file` environment or with the `-b`/`--backend-file` flag. `backend.conf` lists one backend per line, in the form `id type confstring`. for instance, if you had two postgresql databases, you might write a backend file like

    master   pgsql   host=localhost dbname=parsav
	tweets   pgsql   host=420.69.dread.cloud dbname=content

the form the configuration string takes depends on the specific backend.

### postgresql backend

currently, postgres needs to be configured manually before parsav can make use of it to store data. the first step is to create a database for parsav's use. once you've done that, you need to create the database schema with the command `$ psql (-h $host) -d $database -f schema.sql`. you'll be prompted for some crucial settings to install in the configuration store, such as the name of the relation you want to use for authentication (we'll call it `parsav_auth` from here on out).

parsav separates the storage of user credentials from the storage of other user data, in order to facilitate centralized user accounting. you don't need to take advantage of this feature, and if you don't want to, you can just create a `parsav_auth` table and have done. however, `parsav_auth` can also be a view, collecting a list of authorized users and their various credentials from whatever source you please.

`parsav_auth` has the following schema:

    create table parsav_auth (
		aid bigint primary key,
		uid bigint,
		newname text,
		kind text not null,
		cred bytea not null,
		restrict text[],
		netmask cidr,
		blacklist bool
    )

`aid` is a unique value identifying the authentication method. it must be deterministic -- values based on time of creation or a hash of `uid`+`kind`+`cred` are ideal. `uid` is the identifier of the user the row specifies credentials for. `kind` is a string indicating the credential type, and `cred` is the content of that credential.for the meaning of these fields and use of this structure, see **authentication** below.

## authentication 
in the most basic case, an authentication record would be something like `{uid = 123, kind = "pw-sha512", cred = "\x12bf90…a10e"::bytea}`. but `parsav` is not restricted to username-password authentication, and in addition to various hashing styles, it also will support more esoteric forms of authentcation. any individual user can have as many auth rows as she likes. there is also a `restrict` field, which is normally null, but can be specified in order to restrict a particular credential to certain operations, such as posting tweets or updating a bio. `blacklist` indicates that any attempt to authenticate that matches this row will be denied, regardless of whether it matches other rows. if `netmask` is present, this authentication will only succeed if it comes from the specified IP mask.

`uid` can also be `0` (not null, which matches any user!), indicating that there is not yet a record in `parsav_actors` for this account. if this is the case, `name` must contain the handle of the account to be created when someone attempts to log in with this credential. whether `name` is used in the authentication process depends on whether the authentication method accepts a username. all rows with the same `uid` *must* have the same `name`.

below is a full list of authentication types we intend/hope to one day support. contributors should consider this a to-do list. a checked box indicates the scheme has been implemented.

* ☑ pw-sha{512,384,256,224}: an ordinary password, hashed with the appropriate algorithm
* ☐ pw-{sha1,md5,clear} (insecure, must be manually enabled at compile time with the config variable `parsav_let_me_be_a_dumbass="i know what i'm doing"`)
* ☐ pw-pbkdf2-hmac-sha{…}: a password hashed with the Password-Based Key Derivation Function 2 instead of plain SHA2
* ☐ pw-extern-ldap: try to authenticate by binding against an LDAP server
* ☐ pw-extern-cyrus: try to authenticate against saslauthd
* ☐ pw-extern-dovecot: try to authenticate against a dovecot SASL socket
* ☐ pw-extern-krb5: abuse MIT kerberos as a password verifier
* ☐ pw-extern-imap: abuse an email server as a password verifier
* (extra credit) ☐ pw-extern-radius: verify a user against a radius server
* ☐ api-digest-sha{…}: a value that can be hashed with the current epoch to derive a temporary access key without logging in. these are used for API calls, sent in the header `X-API-Key`.
* ☐ otp-time-sha1: a TOTP PSK: the first two bytes represent the step, the third byte the OTP length, and the remaining ten bytes the secret key
* ☐ tls-cert-fp: a fingerprint of a client certificate
* ☐ tls-cert-ca: a value of the form `fp/key=value` where a client certificate with the property `key=value` (e.g. `uid=cyberlord19`) signed by a certificate authority matching the given fingerprint `fp` can authenticate the user
* ☐ challenge-rsa-sha256: an RSA public key. the user is presented with a challenge and must sign it with the corresponding private key using SHA256.
* ☐ challenge-ecc-sha256: a Curve25519 public key. the user is presented with a challenge and must sign it with the corresponding private key using SHA256.
* ☐ challenge-ecc448-sha256: a Curve448 public key. the user is presented with a challenge and must sign it with the corresponding private key using SHA256.
* ☑ trust: authentication always succeeds. only use in combination with netmask!!!

## license

parsav is released under the terms of the EUPL v1.2. copies of this license are included in the repository. dependencies are produced

## future direction

parsav needs more storage backends, as it currently supports only postgres. some possibilities, in order of priority, are:

* plain text/filesystem storage
* lmdb
* sqlite3
* generic odbc
* lua
* ldap for auth (and maybe actors?)
* cdb (for static content, maybe?)
* mariadb/mysql
* the various nosql horrors, e.g. redis, mongo, and so on