VaMP's Basic Requirements for Valid Files and Directories

April 19th, 2022 by Diana Coman

Given a directory 1 as input, VaMP will loudly complain and promptly exit if it finds any invalid files or directories in it, meaning specifically any of the following: empty files, duplicate files, non-ordinary files, files not ending with a newline, non-hierarchical structure of directories (for instance via a symlink that creates a loop on any level), empty directories. While some might find VaMP to be therefore quite picky or even *too* picky about what it accepts as valid input, there are informed reasons (see below) for everything included in that list of refused items and this makes all the difference: VaMP is not picky but discerning and in the most helpful way, too, as its explicit requirements enforce and help maintain a clean structure and the clarity that comes with it.

Why no empty files?
Because empty files are worse than meaningless: they are, by definition, void of content, pure form if there ever was such a thing. First of all, if a file is empty, there is nothing to include anyway, so there's no loss to delete it, is there? Second, since VaMP is concerned with *content*, it follows quite directly that files lacking any content whatsoever are not a valid input for it.

Why no duplicate files?
Because there's no acceptable reason to duplicate *content* rather than reference it in a civilised and considerate manner. And there is a high cost to duplicating content, made all the higher for it being generally paid at a later time and by someone other than the person lazily copying instead of appropriately referencing. Yes, VaMP is by design anti-copying and pro-referencing and I firmly intend it to remain this way.

Why only ordinary files?
Because in the "everything is a file" model, this is the only currently available means - if a feeble and unreliable one - for marking in any way the intent of focusing on text meant to be read and comprehended by people first of all. Ideally this filter would be stronger to exclude directly binary files for instance but the reality is that there is no reliable way for the machine to decide 100% accurately on what is "binary" and what isn't: from the machine's point of view, *all* content is binary anyway and so it's always and out of necessity, the user's decision to make.

Why only files ending with a newline?
Because a file not ending with a newline is not a valid file according to the very definition of the "file" format 2. Moreover and in more practical terms - VaMP always creates correctly formatted files and as a result, whenever it applies a patch, it *will* end the file with a newline, meaning that you risk ending up with a different file just because the "original" was an invalid file to start with. It's not VaMP's job to correct files nor is it desirable that such correction occurs silently anyway.

Why only strictly hierarchical directories?
Because any non-hierarchical structure in the given context is utterly broken and ultimately confusing for someone trying to read and understand the content. VaMP is firmly pushing for clarity and meaning at all levels and this starts with an unambiguous structure of the given directories, with as many levels as you require and can handle perhaps but certainly *without* any loops. If your content requires for whatever reason such horror as loops in *how the directories or files are grouped together* then I dare say you have bigger problems to sort out first before worrying about VaMP at all.

Why no empty directories?
Because empty directories are not content and so don't belong under VaMP. Note that a directory containing only other directories but no files is still considered to be empty and thus rejected by VaMP. The reason for this is that content is in fact strictly stored in files, while the whole set of directories (hence, paths) is merely a temporary mapping of content into some structure that is designed - hopefully - for aiding comprehension. As such, a path that doesn't lead to a file is worse than meaningless as it is a burden on the reader, since it adds complexity without being justified by any part of the existing content. Certainly, it can and it does happen that the content justifying that part of the structure is simply not yet available but in this case, the correct options are to either collapse the structure until the content becomes available (thus *not* burdening the reader with it at all) or otherwise providing at least a placeholder file with a brief description of the intended content (thus providing at least a hook for the reader to hang that bit of added structure on, until such time when the full content is available).

More as a continuation than an ending, here's an example run with complaints:

$ vamp diff oldver newver
STATUS: Diff-ing directories oldver and newver to file listing_oldver_newver.diff with hash length -1
STATUS: Entering Walk_Dir with dirname=t1
STATUS: Entering Walk_Dir with dirname=t1/t3
STATUS: Entering Walk_Dir with dirname=t2
STATUS: Entering Walk_Dir with dirname=t2/t4
STATUS: Entering Walk_Dir with dirname=t2/t4/t5
ERROR: empty files/dirs are not acceptable, sorry. Found to be empty:
newver/empty.file
newver/t2/t4
newver/t2
ERROR: VaMP main: Directory newver contains duplicates and/or empty directories and/or special files that are not acceptable, sorry. See the list(s) provided. Kindly clean these up and run VaMP again.

$ tree oldver
oldver
├── manifest.vamp
└── t1
    ├── file.b
    └── t3
        └── file.a

2 directories, 3 files
$ tree newver
newver
├── empty.file
├── manifest.vamp
└── t2
    └── t4
        └── t5
            └── file.a

3 directories, 3 files
  1. Note that the content of this can be anything whatsoever: as long as the content and its authors are important to you, you'll benefit from having it under VaMP. "Code" is at most a *subclass* of text, not the other way around and as such, the first and foremost requirement for it should be that it's clear and readable and that its authors are people you actually trust as competent and reliable for the task, as per your definition of trust, competency and reliability.[]
  2. This also means, of course, that an "empty file" is even more nonsense than mentioned already since it doesn't even match the format requirements to be called a file in the first place.[]