And You Can Quote Me On That!
This article I wrote back in late 2001. At the time I was proto-blogging via email. I think it stands up pretty well, or at least meets the high usual standard of stuff I post around here. Enjoy.
Quoting filenames so that can be used on a command line is something that I have been dealing with on and off for ‘kn years now. It is an endless source of frustration as you try to tell the computer where the filename stops and the next one begins. Allow me to illustrate with an example I posed recently on the [now defunct?] Link mailing list.
The discussion was about the relative merits of command-line versus graphical user interfaces. Yes that old hairy chestwig. Yes the discussion on Link is normally of higher caliber than this. But anyway, someone made the old point about how renaming a bunch of files from .bat to .bak was really difficult (or tedious and manual) in a GUI environment. Which is right I guess - though it does remind me of the sort of comment on a BBS that a DOS hardliner might have made about Windows circa 1988 …
Anyway Link is the kind of place where lots of Linux hardliners hang out, and I felt like having a bit of fun, so I posed the question as to how this task would be achieved using common Unix tools. At the same time pointing out that Windows handles this task extremely well: ren *.bat *.bak
. Lighting the fire underneath them so to speak :)
Now I don’t profess to be a unix guru by any means, but the following monstrosity is the best I could come up with for achieving the task at hand. Brace yourself.
ls *.bat | sed -e "s/[\\\"$]/\\\&/g" -e "s/\(.*\)\.bat$/mv \"\1.bat\" \"\1.bak\"/" | sh
Yes all those quotes are needed. Beautiful ain’t she.
Actually if you look a bit closer it’s not as bad as all that. The ls *.bat
bit simply gets a list of the files to be renamed. Then we have a sed script (the first -e
bit) which simply quotes any bad characters like slashes and quotes in those filenames. The next sed script takes the entire line minus the .bat
it, wraps it up in an mv
command, making sure to quote the arguments (so as to handle whitespace correctly). Finally we pipe through into a shell to do the actual work. You can change the work done by changing that last sed replacement, if you can find it amongst all that other crap.
I got lots of replies to my little script, all suggesting that a for loop would look a lot prettier. They’re right:
for f in *.bat ; { mv $f `basename $f .bat`.bak ; }
Or even, if you want to get tricky-ricky:
for f in *.bat ; { mv $f $(f%.bat).bak ; }
Only trouble is, it doesn’t work! Well OK it works for most files but as soon as you get a file with bad characters in it (actually spaces are the culprits here), it all falls in a heap.
The problem as I see it is the “for” command itself. What it’s doing is expanding the *.bat bit into a space-delimited list of filenames (eg “foo.bat bar.bat I love Unix.bat”), then iterating over them one word at a time (“foo.bat”, “bar.bat”, “I”, “love”, “Unix.bat”). It’s obvious why whitespace in filenames will completely break this technique. In contrast, my monstrosity starts with each file on a new line, hence as long as there are no end-of-lines in the filenames (Dog forbid), it should still work.
Needless to say, this was not a particularly well-received point amongst the Linux Zealots. Their argument was basically Dont-Do-That-Then. Specifically, don’t put spaces and bad characters in your filenames. Very sound advice I would think, but not always practical. And it’s not surprising that the zealots’ attitude would be that spaces in filenames are fundamentally evil things, rather than an acknowledged limitation in the standard unix toolset (otherwise they wouldn’t be zealots I suppose).
[Actually to be fair it was pointed out that the ‘mmv’ tool would have handled this task perfectly. I had never heard of it before, though it does look like a useful tool. I’ll leave the definition of “standard unix toolset” (and whether or not it includes mmv) to your imagination.]
I’m sure the origin of all this is basically historical, but the fact remains there is a descrepancy between what the tools provide and what the filesystem supports. I’m no unix historian, but I’d bet 5 of your Earth dollars that the Bourne shell (for instance) can trace it’s lineage back to the days when unix filesystems would not allow spaces in filenames. The filesystems were improved, but the tools had to remain static.
As unix/linux becomes less of a law-unto-itself, and more of a citizen that has to interact with other systems, the less of an option Don’t-Do-That-Then becomes. I can think of no better example than the recent Apple iTunes installer debacle.
This was an update for the iTunes thingy (an MP3 player/ripper/encoder for Mac OS X) which Apple released recently. Of course they wanted to use the power of the underlying unix environment of Mac OS X to help the upgrade. So part of it was written using shell scripts. And of course there was a bug with the scripts which didn’t handle spaces in filenames (actually volume names, but same principle applies), something from which Mac users have never shied away. They never needed to until now! Hence vast quantities of people’s data was being DELETED all over the world. The whole sorry story is documented in TidBITS.
As a postscript to this, let me point out that I seem to suffer slightly less with the problems of spaces in filenames these days. Maybe it’s because I spend more of my time on Unix systems, and less on Cygwin. Or\ maybe\ I've\ just\ gotten\ used\ to\ it.
12 Comments