Exercises
-
Consider the files
Use the command-line possibilities to inspect these files, i.e., what
does
file
andwc
say about them? Inspect them usingod
with appropriate options; start withod -tcuC
, for instance. How do they differ? Discuss pros and cons of the formats. Tryrecode
to change from one character encoding to another. -
Using the Python
csv
package, read a file in the default csv format and output it in tsv format. -
Define separate
grep -E
regular expressions matching lines with- Scandinavian email address.
- CPR numbers.
- phone numbers written as 2 groups of 4 digits or 4 groups of 2 digits; groups separated by one space.
- dates in the Danish format 1/1 1970.
- Using
/usr/share/dict/words
(or similar), define separategrep -E
regular expressions matching lines (words, since there is only one word per line in that file) with- consecutive repetition of at least three characters.
- a consecutive repetition of the same sequence of four characters.
- a repetition of total length 4 and a palindrome of total length 4.
- words without vowels (a, e, i, o, u, y); use an option.
-
Define separate
grep -E
regular expressions matching lines with- an opening and closing html headline tag, e.g.,
<h2>My Headline</h2>
; use an option to make it case insensitive, then use an option to print the line number for every match. You may require that headlines are on a line by themselves (and of course not nested). - numbers in the range 1000 through 9999.
- numbers in the range 100 through 9999.
- an opening and closing html headline tag, e.g.,
-
Using
ls -l | grep -E REGULAR_EXPRESSION
, list all files in some directory that- others can read or write (it is the 8th and 9th characters that are relevant).
- were created in November and are pdf files.