TODO Not everything here makes sense or will necessarily get done. Short term: * Fix autoconf macro so that it doesn't depend on locate. Or maybe use BuildSystem. * Change all occurences of archive to repository. * Make the default log in _arx/log? * have a history of which URL's you merged from in the project tree? Not really needed with hashed revisions, because you just sync repos? * archive-cache mirroring is not robust. * Have a way to make annotate start from a a particular revision. * Make inventory non-recursive by default? What does "cvs status" do? * Allow use of just revision numbers ",23" when inside a project tree. We have to handle all of the --dir options. * Get rid of default archive names. Get default archive and branch from the project tree. * Use ARCHROOT or local tree instead of my-default-archive. * Use revision ranges in a lot more places. Something like foo.1.2,3-6, foo.1.2,-6, and foo.1.2,3- Use it in log, library, replay, get-patch, archive-cache, tree-cache, mirror, sig. * figure out why commit emails are not working * Better annotate: Needed if we want to support weave merge. Also just nice to have. Want to get the contributions from both sides of a merge. e.g we have A1 | A2 | \ A3 B1 | | A4 B2 | / A5 From A5, we annotate through both lines (A and B). We will get different results because some things are changes in A and others changed in B. They will both have some things in A5 which occur because of the merge. We can then compare the annotations side by side. If it is A5 and anything else, then it is not A5. If a line has both A3 and B1, then pick one (A?). This might be caused by cherry-picking, so we need a way to figure out how to annotate cherry-picking. First thing we have to figure out is if a patch actually affected the file. For this to work, we really have to have the inventory id's in the log, because the names can change. Otherwise, we would have to actually download the whole patch. Maybe we can have a single metadata file instead of separate log and (orig|mod)-(file|dir) files. Given that, we can see if a particular patch hunk works against the source file. If it does, then attribute those lines to that patch. The problem arises if we have a series of patches. How do we decide on an ordering? Time? This is starting to sound like where you get into darcs' problems as you try to rearrange patch orderings to get a conflict-free result. Given N patches, there are N! different ways to arrange the ordering. We have some partial ordering, but I could imagine cases where someone deliberately applied a patch out of order because it would not conflict. Maybe the proper response in that case is "You're out of luck". Or maybe make it a command-line argument. Bah. * Write our own version of patch which does not complain about patches already applied. Then we have most of idempotent merge. * hooks for ediff etc. * libcurl + libssh, or python with paramiko? * CIA script * Combine patch logs in the tree, so that we have a single file at patch-logs/branch/,logs. Either serialize a vector of strings, or use a real database (sqlite?). Need to modify mkpatch/dopatch so that it looks at the single file and not at the file directories, but it still reads/writes patches in those directories. This only saves space in the checked out tree, not in the archive. I am not sure how useful that really is, since everything can be hard linked. If we just put all of the patch logs into a single file, then patches would truly be invertible. * Combine the patch-logs within a project tree into a single zip file for each branch * Get rid of patch logs in the tree, and only record the names of the patches in a single file. It would be the same as running "ls" on the archive, with each node showing its parent. It would also get rid of most of the need for the annoying --remote option. Long term: * spawn different threads to download patches in parallel. Use boost::thread or glib's thread facilities. Probably requires a better tempname facility because otherwise temp files will start stepping on each other. Or just use the asynchronous download facilities in gnome-vfs et al. * Use timestamps, but check before using by creating and touching a file twice. If the file timestamp changes each time, then use timestamps. Otherwise, go back to diffing the whole tree. * More robust tempname facility * L10N * get arch-pqm to accept patches mailed directly, have something in ArX which emails patches. Add a "send" command. * Get cscvs or maybe tailor.py working with ArX * ViewARCH or ArchZOOM * VCG graph output for ancestry * Have a status command * modify xtla for emacs mode? Archive breakers: * move _arx to .arx and make it hidden on windows? * Get rid of the "Summary:" header, and just use the body. * Get rid of the "Standard-Date:" header * Change the comma "," prefix to a period "." postfix so that ftp upload places with restrictive policies are ok. Or maybe make branches have periods on the end, and everything else that used to be prefixed with a comma now have nothing. From comcast website: NOTE: File names must consist of characters from "a-z", "A-Z", "0-9", '_' (underscore), '.' (period), '-' (hyphen). No other characters (including spaces) can be included in the file name. File names must not start with '.' or '-'. * skip-deltas? http://svn.collab.net/repos/svn/trunk/notes/skip-deltas It would make it longer to commit. We might also need to make a delta combiner. But O(log(n)) to get any revision is tempting. That might make it unnecessary to have archive caches. Hmm. Wait until the asynchronous downloading is done and go from there. Also, I should write a delta-combiner first. That would be useful for other things, and would make skip-deltas pretty easy to implement. Delta-combiners require the latest continuation revision to be ,0. We can enforce that once we have hashes for revisions, because there is no longer a need to branch to a higher number. Some numbers: For arx.2.2 revisions 0-161 Size of cached revision ,112: 828 K Size of all patches : 6700 K Size of all skip deltas : 10280 K So there is a 30% space penalty. This is for revisions with lots of PDF's. Benefits: log(N) access to any revision no need for cached revisions except for the beginning of a branch. Make that automatic? Because a new branch means that you are splitting off and don't intend to merge. Drawbacks: commits becomes more complicated (have to get an older revision and diff against that, unless we just use a delta-combiner) Getting a patch for a single revision is no longer simple. This makes annotate and replay harder. slight space increase. Merge might be faster (though maybe not, because we can generally patch from the current tree to the merge revisions with hashed revision names). We more or less have to have a delta combiner. Otherwise, during commit, we have to create another tree. This will be slow, especially for no-edit trees. We also have to have a delta splitter. Otherwise, annotate will take too long. To make a delta combiner/splitter, we have to parse gnu patches (possible), but we also have to deal with xdelta patches. We have to make sure that the checksums work out. The big problem is when files alternate between binary and text. Then, in the combined patch, we have to do something like .xdelta1, .patch2, .xdelta3, etc. Annoying, but possible. * Use hashes to uniquify revisions. We already have the hash, so you can append the hash to the revision to uniquely specify a revision. As in foo.bar,23,4239874ab7f We only use the first eight hex characters, because that is all that is needed to guard against accidental collisions. In the archive, we also append the hash of the previous revision. That will allow us to determine the entire graph with a single "ls". Don't use UUID's, because that doesn't have the self-verifying properties that hashes do. When committing, require --force if creating divergence? * need a copy operation Call it "cp". Make a special log header: Copied-files with from and to. In the patch, it looks almost exactly like a new file. There would just be an extra file that is a patch back to the original file that was copied from. That makes annotate pretty easy. For the "cp", add a new type to the ++changes file which is a copy. It would be from the current location of the source file, so you have to figure out where it originally came from. Maybe: * Get AIX working * Use new sha-2 code? * Binaries for Debian, Red Hat, Suse, Mandrake, AIX, HP-UX, OS X, Irix: relocatable install * Write a post-commit hook that does smarter archive caching. Something like 'ssh -f foo@bar "(arx archive-cache -a "$ARX_REVISION" && arx archive-cache -d "$ARX_PREVIOUS_REVISION")' * install a signal handler: We have to have a global list of temp directories and archive locks. Then they could be cleaned up there. The signal handler in svn is really annoying. * update to latest boost, maybe use it as is? Have to get copy and serialization working. * have a way of sealing and unsealing an archive. Not like tla which has versionfix-N stuff. It just prevents you from committing. Maybe a simple chmod on the archive/category/branch/ directory? Can also seal entire archives and categories. * run gcov to figure out what is and is not covered by the test suite. Already done, but difficult to figure out. * Have a command, apply ordinary diff, that first runs "edit" on all of the files to be patched, and then patches. Or maybe just tell people to run "commit *" * trackdown, which allows you to run a function on every revision until it returns true. Something like for i in `arx log --header Revision`; do arx get $i temp; cd temp if(exec function)==true print $i exit cd .. rm -rf temp done Maybe just document it. * make it possible to force xdelta diffing on a file with a property arx:binary? Or use a mime-type? * Archway? Written in perl, so it ain't gonna be me. * default regexes for _arx/ignore with "param ignore-default" * Make ignore regexes work on the entire path? * reorganize the archive so that we have ,0-100 , ,101-200 , ,201-300 , etc. for the patches, and then each patch is within those directories. We could use a merkle tree, but then we could end up with a broken database. Hmm. Currently, we use 20K for each revision (4K per file, patch & sig, sha & sig, and log). This is for something that is really only 1.8K big. Waste, waste, waste. Need to get rid of the multiple files, but then we can't just add sigs and hashes. Gah. Makes me want to use a database. But Monotone's usage was higher. Wierd. Also, with skip deltas, the deltas themselves get larger, making it less useful to combine the files. Hmm. With hashes for revisions, we don't have to sign the hash anymore, so that gets rid of two files. It would be nice to get rid of the log. Hmm. * Have a way to mark whether an archive is remote or not (NFS over thin pipes) * Use the linux crypto API if it exists * Pure-merge, where it applies each patch in turn and then commits with the original message. Make it an option to "merge", or maybe "replay", since replay is already doing things one at a time. Or maybe not worry about it, since with revision hashes, they should have been using the same branch? DONE * make-archive * add * delete * inventory * move * tagging-method * tree-lint * dopatch * mkpatch * naming-convention * tree-version * archives * my-browser * my-default-archive * my-editor * my-guidiff * my-id * register-archive * whereis-archive * undo/redo * hooks * make-log * file-diffs, file-undo * log-ls * logs * cat-log * changelog * patch-report * what-changed * SWIG * Integrate tla-pqm * networking * categories, branches, versions, revisions * get-patch * get * archive-cache-revision etc. renames to cache-revision (-d for uncache) * pristines, delete-pristine, add-pristine * my-revision-library * library-find, remove, archives, categories, branches, versions, revisions. Get rid of log, file. Replace it all with library-browse. * browse, to replace categories, branches, versions, revisions and *-readme. * commit * darcs critique * create-version -> init-tree * delete-category/branch * create-branch -> fork * tag * replay * update * push-mirror * Replace create-version with init-tree when versions are removed. * break-lock * star-merge * Use g_spawn instead of system() * make-sync-tree -> history * explicit-default, join-branch lasts until the C++ rewrite is done * arch-pqm * build/update/replay-config * Something like cat-library-file and library-find. Maybe file-orig --uri? * update-distributions -> make-dist * derive arx_error virtually from std::exception * change parse_package_name to automatically get the archive name * change =tagging-method to tagging-method * Change regexes to only four things: source, ignored, temp, unrecognized. Everything else is unrecognized, and tree lint will complain. Source is the current regex, ignored almost everything else, and temp is the junk syntax. * Make initial patch to base-0 not include any of the directories in {arch}. * change exclude regex to control * Save the tag, not the type of tag + the tag * make naming inventory {arch} like everyone else * Change all arch to arx (arch-params, {arch}) * Remove internal_deprecated * get rid of (orig|mod)-only-metadata in the patch. This is just removing the save_directory_permissions call in make_patch. * Change smash_non_graphical to url encoding * when creating logs with commit, don't change the log contents unless you have to. So Standard-Date etc. will be preserved, though not Revision. * Make dopatch non-destructive by copying files, not just moving them. * replace .arch-ids with a single checksum file * speed up inventory by using an algorithm for names_tagging that doesn't require a directory traversal to find the root for every tag. * Change file-diffs to file-diff * Change all references from inventory tag to inventory id. * get rid of names and internal inventory methods, and change {arx}/tagging-method to {arx}/ignore. There will be a global ignore regex that matches current junk files (,,*). It might also have *.a, *.o, and *.so. That can be extended with appropriate regexes in the ignore file. New ignore command. * Change the ++foo files to ,,foo * move patch-logs to patch-log/archive/branch/revision. * get rid of locked/unlocked pristine, and change ++pristine-trees to ++cached * Change =README, =meta-info, etc to ,README etc. * make the default regexes not be so complicated, with all of the CVS, RCS, etc. stuff. Make it empty. * get rid of short_revision in most places. * change the syntax from a/c--b--v--r to a/c.b.v,r * Change archive and library layout to cat/branch/patch * Generalize delete-branch, remove delete-category. * Allow any branching depth: first remove branch, fix all the errors, and change version to branch. * Make history, list_tree_cached_revisions, list_tree_patch_logs, browse, library_browse, and mirror all use the same recursive browsing mechanism. * Have invoke-hook make branches or revisions, not categories or versions. * Make patch-number handle ",0" revisions. * Clean up the short options * Fix a problem with relative paths with the --paths option (--paths foo doesn't work when in a subdirectory bar, but --paths bar/foo from the parent does work) * tests for (library-)browse and tree-cache. * configure check for python * Make everything accept -H * Make the --paths option require arguments * I18N: use a serializer from boost, not this -> (url encode everything that goes into files, including filenames. Then url decode when we print things out. But do we want to url-encode Summary: and the body in logs?) * Make init-tree automatically add all files in the current directory. * Change {arx} to _arx * Instead of versioning permissions directly (which you don't usually want to do), allow arbitrary properties which can do things on "get" and when patching. Do this by adding a map to file_attributes (which should be path_attributes) and getting rid of permissions. * Use bzip instead of gzip (NOT: bzip is slower. For unpacking lots of small files (package-framework patches 1-532: gzip~3.5 s, bzip ~5.8 s) * Get rid of ".original" and ".modified" in patches, and just use ".orig" and ".mod" * Make most boost::filesystem exceptions caught in main. * update to latest scons * delete-revision, which deletes the contents of a revision, but not the directory. Then you can tag off of an old revision. * Change move to mv and delete to rm. * get rid of make-log and init-tree's call to make-log. The interactive stuff can cause problems if you aren't using a graphical editor. * Put dists and arx together. Only after replacing .arch-ids. * dump/restore * option to do_patch to speed it up for exact patching * integrate xdelta * Consolidate my-* commands into a single my-prog command * Combine log, changelog, and revisions into one command and have --formatted and --remote options * consolidate library-browse and library-revisions * get rid of --paths arguments: undo, mkpatch, commit, diff * automatically get revisions for diffs etc. * arx edit 1) in fill_path_list, check whether _arx/++edit exists. If so, read from it and _arx/++changes and add all of the items to the path list. Need to match up the user-supplied path list. 2) get --no-edit makes an empty _arx/++edit and chmod's all of the files in the inventory 3) mv, rm, add, and property don't need to do anything special, since they are already listed in ++changes. 4) The _arx/++edit list only gets cleaned up and everything chmod'd back during full commits, not during a partial commit. Need to make sure that ++changes isn't cleaned up in partial commits either. 5) arx edit adds an entry into the ++edit file with inventory id and un-chmod's the file. When fill_path_list reads ++edit, it matches the inventory id against any moves or deletes to get the current name. * when getting a revision, replace the comma with a period. * add --no-edit and --no-pristine option to config --get, fork, merge --new-tree. * Make a way for get_revision to use hard links. Have "get --no-edit" use that, and have "arx edit" break the hard link. * set ARXTREEROOT for hooks * improve the emacs mode for --no-edit trees. * Make add_path take a list, so that we don't have to write a file 10,000 times. * Get rid of _arx/.arx-project-tree * Use gnome_vfs_xfer_uri_list when getting a whole lot of patches: Tried (arx.2.1,84), it is actually slower for latent networks, about the same for local. Much more complicated logic. * Make it so that figuring out the continuation revisions doesn't take forever. Something like O(number of continuations) rather than O(number of revisions). Add a header when commiting "Last-continuation" that holds the last continuation before the current revision (empty if Continuation header is present). Then we can just read that header to find out what the last continuation is. That should get rid of any O(number of revisions) behavior. * With no-edit, if mv then rm, diff will fail. To fix that, when reading the files in fill_path_list, we need to save the initial state of any moved files (including if they were just added). * Also, if we rm and then diff against a version that never had that file, diff fails because that file is not in any manifest. So we really want to just skip it. So we skip if we are doing a straight diff with no-edit. * Better-SCM site, dmoz, zooko.com, wikipedia * Add a --delete-removed option to replay and merge * Make a temp_file which deletes itself in the destructor. * Use __DATE__ and __TIME__ in the version string. * signatures 1) Add a sign command with --patch, --revision options to sign only the patch or revision, a --delete option to remove signatures, a --replace option to replace the current signature, and a --verify option to verify the signature 2) Add a --signature option to make-archive. It puts the public keys in ,meta-info/sigs/NAME. It can also be used to add public keys later. Or maybe just an --archive option to sig. 3) register-archive automatically downloads all of the possible signatures in ,meta-info/sigs/NAME. Stores the location and list of valid signatures in .arx/locations/ARCHIVE 4) Make verify_manifest_checksum also check signature matches one of the signatures listed for that archive. 5) When committing, if an archive is signed, then require a signature before committing. Check signature against list of signatures for that archive. 6) Default is off. 7) Store signatures in archive in patch.sig and revision.sig 8) Have a param "gpg" that can be used to override what to use for the gpg program (e.g. agpg). 9) Have a param "sign". If true, then every archive should be signed. * check archive public keys to make sure signature is listed before signing. * Make sig recursive * combine archives and register-archive into archives (-a -d). * make-dist doesn't stop if it can't find a config file. * Ask for confirmation for delete-branch and delete-revision unless --force. Print out "arx browse" or "arx log --all" respectively before asking. With delete-revision, delete any archive caches. Also need to make sure that deleted revisions are not "gotten". Perhaps by treating it as a continuation? * Get plain http working better with something like http-blows and automatically updating .listing files, or maybe just use a cgi script, or maybe a post-commit hook and "arx fix-listing". * Get a better error message when trying to mirror a revision that does not exist in the master. * Get OS X and Windows working * Make archive adding atomic * make patch-report work on archives and tar.gz files, so you don't have to get-patch, untar, and then run patch-report. * rename init-tree to init * remove make-dist, add export, and have export print out the target directory. Then we can tar manually. * Have merge and replay do in-place operations by default. Use the --new-dir option or make it just the last argument. * Make configs be stored separately in the archive. Then you can do a simple "get", "update", "diff", etc. to the whole tree. That would get rid of the --config options, because you can just do the operations directly on the configs. Commands to fix: tag, config, make-dist Fixed commands: commit, fork, file-diff, file-orig, file-undo, tree-cache, get-patch, get, missing, archive-cache, sig, merge, replay, library usage: tag revision head (tail_dir tail)... tag revision -f FILE Tags are just a special case of configs. So a syntax tag revision head (tail_dir tail)... would specify exact revisions tag branch --update would just update those revisions Store the projects and directories in the header "Tags". It would be a list of revisions and directories. We would also need a list of revisions and checksums, so that the log file could be checksummed. Then we would check the checksum when diff'ing, get'ing, etc. This would make tagging very, very cheap, like it used to be. tags only store the log. The log is the payload, so it will have to be checksummed and signed. You can't commit after a tag, or fork from a tag. merge, missing, replay, get, make-dist, archive-cache, work recursively. diff, file-diff, file-orig, file-undo do not work with tags. sig needs to be aware that tags do not have a separate payload. Mirror doesn't need anything * Add a --recursive option to diff * Rename "browse" to "ls"? No. CVS stores everything in the ,v files, so "ls" actually maps well to the storage. That is not the case for "browse". * Get a good copy of the BGL * diff should be able to be faster. It spends a lot on IO, but it should be able to read the whole manifest in one second. Maybe read it into a string and then do all of the manipulations. Actually, it spends most of its time stat'ing the project, checking for tree-lint problems. * make update-listing take --add and --delete. * have "arx help" work. * something like blame, file-history? Search the patch logs to find out which patches modify the files, including renames and deletes. Then manually scan the patches and put the prefix in front of the appropriate lines. * Have an option to log that only prints out patches that modified a file. * Have an option to log that prints out which patches modified which line * get python autodetect from http://autoconf-archive.cryp.to/ax_python.html * Fix bug when you delete a file with no-edit trees and do a partial commit. The file is no longer in the manifest, but still in the ++edit file. * Put note in docs about sftp needing auto-login. * make dopatch not barf when adding the same file with the same id and contents twice. If different contents, then make orig and mod files. * If have a revision library, use that exclusively. Need a way to set policy on the revision library, so that some branches have revisions deleted as new ones are created, while others keep them around. The default is to delete. library-policy? That way, the linking of branches makes branching even cheaper. Or maybe get rid of revision libraries? * When merging, merging a new file into a directory that had moved does not put the new file into the new directory, but uses the old directory instead. Just make sure that all parent directory id's are included in the patch? Then we don't have to change the patch format. * Make merge error out when there is more than one ancestor, because that could be a criss-cross merge. In that case, suggest 2-way merge? * add a --header option to commit, so that you can set creator and date. * Use -v -v and -q -q instead of --silent, --quiet, --report, --verbose * Fix merge speed in the default update case so that it doesn't download and patch twice * Add conflict markers so that you can't commit unless you run resolve * Make an update command that just runs merge --update * 3-way merge: just use diff3 + a way to deal with conflicting moves.