A Sed script to remove uuncoded sections from ASCII files

The title says it all! For a long time now, I've been writing huge UUEncoded files to my diary, where I write everything that happens (mostly). This has got to the point where a typical diary file might be about 50K, but the additional binaries would take the size up to about 18M! This obviously could not continue, so today I decoded all of the binaries - but I needed some help from Sed. I came up with this thing:
p;/^begin.*/{; : loop;w output n;/^end.*/!b loop;w output }
This reads input on stdin and sends data to stdout, until it encounters a 'begin' line. When it does that, it writes everything to file 'output' and does this until it finds an 'end' line. Then, it goes back to the start and continues to write to stdout.

Use the script like this:

cat big_file_input.txt &verticalbar; sed -n -f sed_test_1.sed > trimmed_output_file.txt
Here is the commented version.
# This is a sed test script.
# L. 27-Mar-12 (23:51)
# This should find the first occurrence of '^begin', and it should copy
# everything from that point until the next '^end'.  And then delete it.
# Man, this did my head in.
# It's intended to remove great big sections of uuencoded text from the MOO
# logs.
# Call this with this sort of thing:
# cat moo-2012-02-14 | sed -n -f sed_test_1.sed > moo-2012-02-14-trimmed
# The uuencoded sections are written to file 'output'.
# To see the list of uuencoded files:
# grep "^begin" moo-2012-02-14 > moo-2012-02-14.files
# To verify the output with the original file:
# grep "^begin" ../../moo-2012-02-14 | cut -d " " -f 3 | xargs ls -l
: loop
w output
/^end.*/!b loop;w output
You can download the files here:
md5sums: 4d77ee56646dbcbdab52c0a4e91d7a75 sed_test_1_compact.sed e58853db0aa4b0f5a5ff8c7318839b32 sed_test_1.sed
I hope this will be of use to someone. Oh, I almost forgot the other script: with the appearance of binaries where I write these files, I wanted some way to generate regular file md5sums without going through all of the files each time. Here's my messy solution to this:
#!/bin/bash # file_watch.sh # 27-Mar-12 # This script checks for new files in the LambdaMOO binaries directory. # If it sees any, it adds them to the MD5SUMS list. LAMBDAMOO="${HOME}/LambdaMOO" LOGS="${LAMBDAMOO}/logs" MOO_BINS="${LOGS}/binaries" # The subdirs should have the name YYYY-MM-DD (year, month, day). DIR_PATTERN="????-??-??" MD5SUMS="${MOO_BINS}/MD5SUMS" TIMESTAMP="${MOO_BINS}/.timestamp" # Check that all dirs exist if [ ! -d "${LOGS}" ]; then echo "Error: the LambdaMOO logs dir does not exist - please create it" echo "(mkdir ${LOGS})." exit 1 fi if [ ! -d "${MOO_BINS}" ]; then echo "Error: the LambdaMOO logfile binaries dir does not exist - please create it." exit 1 fi # If the timestamp doesn't exist, then this script probably hasn't run before. # If that is the case, the dirs are scanned for files, and an MD5SUMS file is # built. It then exits. if [ ! -f "${TIMESTAMP}" ]; then find ${MOO_BINS} -type f -exec md5sum {} ';' > ${MD5SUMS} touch ${TIMESTAMP} exit 0 fi md5sums_basename=$(basename ${MD5SUMS}) md5sums_dirname=$(dirname ${MD5SUMS}) # Check for files that are newer than the timestamp file. If any are found, # the file's md5sum is added to ${MD5SUMS}, the timestamp file is updated and # all subsequent runs check for newer files. for a in ${MOO_BINS}/${DIR_PATTERN}; do find ${a} -type f -newer ${TIMESTAMP} \ -exec md5sum {} ';' \ -exec touch ${TIMESTAMP} ';' >> ${MD5SUMS} done # Make sure that the MD5SUMS file doesn't become very large - and I mean # _very_ large! In this case, the file is compressed - and so, renamed with # a .gz extension - and a new file is created. This will probably run into # trouble when the .gz file already exists. find ${MD5SUMS} -size +20M -execdir gzip "${md5sums_basename}" ';' -execdir touch "${md5sums_basename}" ';'
Yes, I use LambdaMOO to write my diary - I find it very useful to have a textual virtual reality environment, where I can keep notes. I even have a few books (had to increase my quota for that!).
(To the project page)
(To the site index page)