NAME ssoma_repository - repository and tree description for ssoma DESCRIPTION WARNING: this does NOT describe the scalable v2 format used by public-inbox. Use of ssoma is not recommended for new installations due to scalability problems. ssoma uses a git repository to store each email as a git blob. The tree filename of the blob is based on the SHA1 hexdigest of the first Message-ID header. A commit is made for each message delivered. The commit SHA-1 identifier is used by ssoma clients to track synchronization state. PATHNAMES IN TREES A Message-ID may be extremely long and also contain slashes, so using them as a path name is challenging. Instead we use the SHA-1 hexdigest of the Message-ID (excluding the leading "<" and trailing ">") to generate a path name. Leading and trailing white space in the Message-ID header is ignored for hashing. A message with Message-ID of: <20131106023245.GA20224@dcvr.yhbt.net> Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63 Thus it is easy to look up the contents of a message matching a given a Message-ID. CONFLICTS Message-ID is a unique-enough identifier for practical purposes, but they may still conflict (especially in case of malicious clients and timing issues). In the case of identical Message-ID and different messages, the blob shall become a tree with multiple messages. Likewise, if there is a (rare) SHA-1 conflict on different Message-ID headers, the tree will contain each message (with different Message-ID headers). Thus the blobs for conflicting Message-IDs will be the SHA-1 hexdigest of the Subject header and raw body (no extra whitespace delimiting the two). PFX=21/4527ce3741f50bb9afa65e7c5003c8a8ddc4b1 $PFX/287d8b67bf8ebdb30e34cb4ca9995dbd465f37aa # first copy $PFX/287d8b67bf8ebdb30e34cb4ca9995dbd465f37ab # second copy $PFX/287d8b67bf8ebdb30e34cb4ca9995dbd465f37ac # third copy Note: public-inbox currently uses "ssoma-mda -1" to disable this conflict resolution feature. This simplifies the implementation and use of public-inbox. HEADERS The Message-ID (case-insensitive) header is required. "Bytes", "Lines" and "Content-Length" headers are stripped and not allowed, they can interfere with further processing. When using ssoma with public-inbox-mda, the "Status" mbox header is also stripped as that header makes no sense in a public archive. LOCKING flock(2) locking exclusively locks the empty $GIT_DIR/ssoma.lock file for all non-atomic operations. EXAMPLE INPUT FLOW (SERVER-SIDE MDA) 1. Message is delivered to a mail transport agent (MTA) 1a. (optional) reject/discard spam, this should run before ssoma-lda 1b. (optional) reject/strip unwanted attachments ssoma-mda handles all steps once invoked. 2. Mail transport agent invokes ssoma-mda 3. reads message via stdin, extracting Message-ID 4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock 5. creates or updates the blob of associated 2/38 SHA-1 path 6. updates the index and commits 7. releases $GIT_DIR/ssoma.lock ssoma-mda can also be used as an inotify(7) trigger to monitor maildirs, and the ability to monitor IMAP mailboxes using IDLE will be available in the future. GIT REPOSITORIES (SERVERS) ssoma uses bare git repositories on both servers and clients. Using the git-init(1) command with --bare is the recommend method of creating a git repository on a server: git init --bare /path/to/wherever/you/want.git There are no standardized paths for servers, administrators make all the choices regarding git repository locations. Special files in $GIT_DIR on the server: $GIT_DIR/ssoma.index A git index file used for MDA updates. The normal git index (in $GIT_DIR/index) is not used at all as there is typically no working tree. $GIT_DIR/ssoma.lock An empty file for flock(2) locking. This is necessary to ensure the index and commits are updated consistently and multiple processes running MDA do not step on each other. GIT REPOSITORIES (CLIENTS) ssoma uses bare git repositories for clients (as well as servers). The default is to use GIT_DIR=~/.ssoma/$LISTNAME.git in the user's home directory. This is a bare git repository with two additional files: $GIT_DIR/ssoma.lock empty lock file, same as used by ssoma-mda(1) $GIT_DIR/ssoma.state a git-config(1) format file used by ssoma(1) Each client $GIT_DIR may have multiple mbox/maildir/command targets. It is possible for a client to extract the mail stored in the git repository to multiple mboxes for compatibility with a variety of different tools. $GIT_DIR/ssoma.state format ; "local" is the default name (analogous to "origin" with remotes) [target "local"] path = /path/to/mbox ; this tells ssoma where to start the next import from ; this means ssoma will not redundantly import old ; messages and the user is free to move/delete old ; messages from the mbox. last-imported = 33eaf25f43fd73d8f4f7b0a066b689809d733191 ; "alt" is a user-defined name, in case a user wants to output ; the repo in several formats [target "alt"] ; note the trailing '/' to denote the maildir path, ; the Email::LocalDelivery Perl module depends on this ; trailing slash to identify it as a maildir path = /path/to/maildir/ last-imported = 950815b313a4e616c6fe39f46b2e894b51d7d62f ; users may also choose to pipe to an arbitrary command of their ; choice, this filter may behave like an MDA (and implement ; filtering). Tools like procmail(1)/maildrop(1) may be ; invoked here. [target "script"] command = /path/to/executable/which/reads-mail-from-stdin last-imported = 950815b313a4e616c6fe39f46b2e894b51d7d62f EXAMPLE OUTPUT FLOW (CLIENT) 1. clone or fetches to bare git repo (GIT_DIR=~/.ssoma/$LISTNAME.git) 2. checks for last-imported commit in ~/.ssoma/$LISTNAME.git/ssoma.state 3. diffs last-imported commit with current HEAD 4. imports new emails to mbox/maildir since last-imported up to current HEAD 5. updates last-imported commit CAVEATS It is NOT recommended to check out the working directory of a git. there may be many files. It is impossible to completely expunge messages, even spam, as git retains full history. Projects may (with adequate notice) cycle to new repositories/branches with history cleaned up via git-filter-branch(1). This is up to the administrators. COPYRIGHT Copyright 2013-2016 all contributors License: AGPL-3.0+ SEE ALSO gitrepository-layout(5), ssoma(1)