| Beitragsseiten |
|---|
| Short dupmerge tutorial |
| Advanced usage |
| Alle Seiten |
This tutorial is about the freitag version of dupmerge (dupmerge2 on sourceforge.net), version 1.73 or later.
Installation
1. Download and if it's a zip package, unzip with "unzip <package>".
2. Compilation (with preprocessing, linking etc.): gcc -Wall -O3 -o dupmerge dupmerge.c
gcc should give no warning or error (return code 0).
3. local installation: su, than root password
cp dupmerge /usr/bin/
chmod a+x /usr/bin/dupmerge
4. root logout: ctrl+D
Basic usage
Example I
dupmerge -h
This will give the online help of dupmerge, with all availible options.
Example output:
dupmerge version 1.74
This program can reclaim disk space by linking identical files together.
It can also expand all hard links and reads the list of files from standard input.
Example usage: find ./ -type f -print0 | dupmerge 2>&1 | tee ../dupmerge_log.txt
In nodo/read-only mode the correct numbers of files which can be merged and (512 Byte) blocks which can
be reclaimed is on the todo list; the actual values are approx. 2 times higher.
Options:
-h Show this help and exit.
-V Show version number and exit.
-d delete multiple files and hard links. Default: Preserve the alphabetically first file name.
-q Quiet mode.
-n Nodo mode / read-only mode.
-i Inverse switch: Expand all hard links in normal mode, replace files by their desparsed version if it is bigger.
-s Flag for soft linking (default is hard linking). This option is beta because for linking of all equal files more than one run of dupmerge is necessary and the inverse (expanding of soft links) is untested.
-S Flag for Sparse mode: Replace files by their sparse version if it is smaller.
-c Combo mode: Default mode +sparse mode. With -i it means inverse mode with unlinking and desparsing.
If you do not like typing, you can use dupmerge -h and copy&paste the example, e. g. for
find ./ -type f -print0 | dupmerge
to find all duplicates in the current directory (and subdirectories).
Example II
find ./ -type f -print0 | dupmerge 2>&1 | tee ../dupmerge_log.txt
This will hard link all equal files with size > 0 in this directory and all subdirectories, NOT following symbolic links. The output will be written to standard out and ../dupmerge_log.txt.
dupmerge prefers to keep the file with less blocks (e. g. a sparse file), or if they have the same number of blocks, the older one, or if they're the same age, the one with more (hard) links.
It's possible do use other linking policies, but they are not implemented and not planned yet, because good backup/syncing programs like rsync (with option -H) can preserve hard links.
File names with spaces, strange characters and even newlines are no problem because of zero termination of the file names.
It's a good idea to check the output to see the statistics and check if everything is ok, because dupmerge does not care e. g. about a corrupt file system or about the hard link count limit of the file or operating sytem (this is on the todo list).
I've used dupmerged many times with one million files of altogether 1 TB.
I also used it for CDs/DVDs and (compressed) backups because deduplication before compression saves a lot of space and generally deletion instead of linking is not a good idea, e. g. for equal driver files which are needed by hundreds of drivers in hundreds of different directories.
In an archive, e. g. a collection of CDs/DVDs copied on HDD, a common user usually has no rights for writing/deletion, so usually a superuser (root/administrator) has to start dupmerge in an archive.
The dupmerge output often shows interesting results, e. g. for the year DVDs from the german Linux Magazin: dupmerge shows many hidden and equal files like ._006-010_news_09.pdf, which contain only 82 Bytes and seem to be trash from creating the similar file, e. g. 003-003_editorial_09.pdf, in the same directory.
Example III
find ./ -type f -print0 | dupmerge -d
This will delete all duplicate files (including hard links) with size > 0 in the actual directory and all subdirectories, NOT following symbolic links.
This will preserve the alphabetically first file (including the path).
In the output of dupmerge only the deletion of hard links causes the message "freeing 0 blocks".
To delete all files of size 0 you can use
find ./ -type f -size 0 -exec rm -- {} \;
but you should first see the list of these files via
find ./ -type f -size 0 -exec ls -ilaF {} \;
and to delete empty directories recursively you can use
find . -depth -type d -empty -exec rmdir -- {} \;
but you should first see the (not recursive) list of these directories via
find ./ -type d -empty
Example IV
We have some equal files in exampledir and exampledir2, with the same times. To delete the files, which are both in exampledir and exampledir2, only in the directory exampledir2, we need the exampledir alphabetically behind exampledir2, because dupmerge preserves the alphabetically first file (including the path). A workaround would be using a soft link alphabetically behind exampledir2, but that would mean we have to call find with the -L option and this tells find to follow all symbolic links and show the properties of the file to which the link points, not from the link itself. So that's not a good solution. A better solution is to rename exampledir to switch the alphabetic order:
mv -i exampledir z_last
find exampledir2 z_last -type f -print0 | dupmerge -d
mv z_last exampledir
Because this deletes all duplicates, it also deletes duplicates inside exampledir. If you don't want this, you can use a (temporary) working copy or mounting the exampledir read-only:
mount -r --rbind /tmp/exampledir /tmp/foobar
But the last sollution produces error messages because of failed deletion of read-only mounted files.
| < Zurück | Weiter > |
|---|


