Content
This year has been quite busy with lot’s of great but stressful changes in my professional and
personal life. That’s why I did not find a lot of time to write new blog posts. Now that the year is
turning to an end I at least want to summarize some of the unix commands I found helpful over the
year.
As last year I’m going to list 10 unix commands out of a larger collection of little examples I
jotted down. The list has no particular order, just the way they came in handy for me.
1> tr
Whenever you need to do some small text substitutions tr
can come in handy (tr stands for
translate or transliterate). It will take some input, apply a transformation and spit out the
result.
tr
takes 2 parameters, the first one is a set of characters that it should translate, the second
the set of characters that will act as a replacement. So the arguments “abc” “123” would
mean that a gets replaced by 1, b with 2 and so on.
As a simple example, this line changes the case of the characters ‘a’ through ‘z’:
tmp > echo "Hello" | tr "A-Za-z" "a-zA-Z"
hELLO
More realistic example: split your $PATH into it’s elements:
tmp > echo $PATH | tr ":" "\n" | sort
/Users/oliver/.cabal/bin
/Users/oliver/.rvm/bin
/Users/oliver/.rvm/gems/ruby-1.9.3-p0/bin
/Users/oliver/.rvm/gems/ruby-1.9.3-p0@global/bin
/Users/oliver/.rvm/rubies/ruby-1.9.3-p0/bin
/Users/oliver/local/node/bin
/Volumes/macbox_cs/dev/android-sdk-macosx/platform-tools/
...
2> sort
Simple command to sort input in different manners. By default this in alphabetic order, but using
the -n
option will sort in a numeric fashion:
tmp > du /bin/* | sort -n -r | head -4
1320 /bin/ksh
1264 /bin/sh
1264 /bin/bash
592 /bin/zsh
sort
will take multiple files as input and will merge and sort all of the files for you. Some of
the most used options include -r
for sorting in reverse order and -f
for sorting
case-insensitive.
3> uniq
Want to get rid of duplicate lines? uniq
solves this problem efficiently. Note that it will only
compare adjacent lines for equality, so you might want to sort before you use uniq
.
Nice options: -c
will prepend the count of equal elements before a line, -u
will only output
lines that are not repeated and -i
does the whole thing case-insensitive.
Here is an example that combines tr, sort and uniq such that you can get the frequency of all words in a wikipedia article:
tmp > curl http://en.wikipedia.org/wiki/Minimum_spanning_tree \
| tr -cs "A-Za-z" "\n" | tr "A-Z" "a-z" \
| sort | uniq -c | sort -n -r
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 93342 100 93342 0 0 279k 0 --:--:-- --:--:-- --:--:-- 323k
1031 a
568 span
442 href
435 class
308 li
300 b
284 title
229 wiki
211 the
209 cite
206 id
192 spanning
184 i
169 tree
166 minimum
...
This fetches an html-page from wikipedia and first does some preprocessing using tr
:
tr -cs "A-Za-z" "\n"
— split on all non-alphabetic characters
tr "A-Z" "a-z"
— make everything lowercase
sort | uniq -c
— sort, remove dups but remember the count
sort -n -r
— sort numerically in reverse order
4> split and cat
Again a very simple command but can be surprisingly helpful.
This is an example that splits a huge file into 75 MB chunks:
split -b 75m input.zip
This will result in a bunch of files that are named with 3 letters starting from xaa
,xab
,…
To reassemble the lot, all those files have to be concatinated in alphabetic order:
cat `ls x*` > reassembled.zip
Just a quick check to make sure we ended up with the same content:
tmp > ls *.zip | xargs md5
MD5 (input.zip) = d760b448595f844b1162eaa3c04f83d8
MD5 (reassembled.zip) = d760b448595f844b1162eaa3c04f83d8
5> substitution operations
Operations on multiple files are very frequent. Some situation I found myself in several times was
that I needed to extract audio from a bunch of mp4 files.
I found 2 good ways to solve this: my prefered one makes use of substitution operations:
for i in *.mp4; do ffmpeg -i "$i" "${i%.mp4}.mp3"; done
Here the subtitution operator ${i%.mp4}
deletes the shortest possible match from the right side.
This is nice and terse…but there is another variant that might even be a little more explicit: using basename
for i in *.mp4; do ffmpeg -i "$i" "`basename $i .mp4`.mp3"; done
6> calculate the size of all files found by find
There are for sure hundreds of ways to achieve this…I liked the combination of a simple find
with a short and sweet awk
function:
tmp > find . -iname "*.png" -ls | awk '{s += $7} END {print s}'
2076723
As some people on hn pointed out awk is probably not the simplest solution for summing up space usage. So I include an example inspired from this blog.
tmp > find . -iname "*.png" -print0 | xargs -0 du -ch | tail -1
2.2M total
7> df
Classic. Collects some disk space usage information about your system.
tmp > df -h
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk0s2 156Gi 138Gi 17Gi 89% 36247400 4528347 89% /
...
8> dd
Basically dd
is just a form of copying from some input to some output (by default from stdin to
stdout) that let’s you configure the block size used for the copy. It will duplicate a bitstream
from it’s input. I’ve also heard people call it data destroyer ‘cause you can easily shoot
yourself in the foot by inadvertently mixing up input and output…
Turns out there are quite some interesting usecases for it.
A nice one I found here is to securely wipe your drive: overwrite the entire drive with 0s:
dd if=/dev/zero of=/dev/hda
More secure (means harder to recover) is to use random data to wipe the drive:
dd if=/dev/urandom of=/dev/hda
And for the paranoid and the US Government we can repeatedly execute the fun:
for n in `seq 7`; do dd if=/dev/urandom of=/dev/sda bs=8b conv=notrunc; done
Safe MBR
A less destructive example shows how to create an image of the entire master boot record (including the partition table):
tmp > dd if=/dev/sda of=MBR.img bs=512 count=1
Here count=1 means copy only 1 input block, bs=512 sets both input and output block size to 512 bytes.
Generate Randomness
Sometimes very handy is to use dd
to generate some random data for a file:
dd if=/dev/random of=random.bin bs=100 count=1
Tracking Progress
In some instances the process started with dd will take a considerable amount of time. Since you
will not get any fancy progress bars, there is some trick to find out about the progress.
First you need to find out about the process id of the dd process:
tmp > pgrep -l '^dd$'
4523 dd
Then send the USR1 signal to the dd process:
tmp > kill -USR1 4523
When dd
detects the USR1 signal, it will print out the current statistics to its stderr.
tmp > 123122312 bytes (xxx GB) copied, 3965.94 s, 13.9 MB/s
After reporting the status, dd will resume copying. To keep it going use watch:
tmp > watch -n 10 kill -USR1 4523
9> zip
Even though I prefer tar with either gzip or bzip2, the zip format is widely used especially among
windows users. So I frequently use zip
and unzip
as well. Since it works quite differently
compared to tar, I list the main usecases I need:
Most simple case: add some files to a zip-file (called “abc.zip”):
zip abc file1 file2 file3
Of course you can also copy a whole directory “tmp” into “abc.zip”.
zip -r abc tmp
Also quite handy: creating a password protected archives:
zip -e important.zip file1 file2
And finally list the files inside an archive
unzip -l a.zip
10> hexdump
When dealing with binary files it is often necessary to glimps a quick view to the actual data. I
found that having a little command line utility can be very practical for such cases. hexdump
has
exactly what I need.
tmp > hexdump new.zip | head -5
0000000 70 a9 20 8d b1 a3 5c 1c 16 e3 17 b2 ef 94 16 ac
0000010 85 40 59 f9 89 40 45 ed 61 e8 10 f5 6f f5 99 a2
0000020 3a d6 69 62 e0 ab ee 0a 67 b8 c5 21 58 42 4d 52
0000030 2d 78 ae 2a 31 f2 78 c7 1f 22 99 07 e1 6a 55 bb
0000040 68 9a fe 8f c3 e0 e5 a3 4c 7d b3 6b f9 ae de 92
You can instruct it to display also the corresponding ASCII representation:
tmp > hexdump -C new.zip | head -5 00000000 70 a9 20 8d b1 a3 5c 1c 16 e3 17 b2 ef 94 16 ac |p. ...\.........| 00000010 85 40 59 f9 89 40 45 ed 61 e8 10 f5 6f f5 99 a2 |.@[email protected]...| 00000020 3a d6 69 62 e0 ab ee 0a 67 b8 c5 21 58 42 4d 52 |:.ib....g..!XBMR| 00000030 2d 78 ae 2a 31 f2 78 c7 1f 22 99 07 e1 6a 55 bb |-x.*1.x.."...jU.| 00000040 68 9a fe 8f c3 e0 e5 a3 4c 7d b3 6b f9 ae de 92 |h.......L}.k....|
Combining hex and octal output quickly allows for relating the hex values to their octal counterparts:
tmp > hexdump -xb new.zip | head -5
0000000 a970 8d20 a3b1 1c5c e316 b217 94ef ac16
0000000 160 251 040 215 261 243 134 034 026 343 027 262 357 224 026 254
0000010 4085 f959 4089 ed45 e861 f510 f56f a299
0000010 205 100 131 371 211 100 105 355 141 350 020 365 157 365 231 242
0000020 d63a 6269 abe0 0aee b867 21c5 4258 524d
short update: somebody took the time to translate this article into Serbo-Croatian