Table of Contents
Always recurring topic: linefeeds differently handled among operating systems.
- Main source for this: recent thread on the tlug.jp mailinglist.
What issit actually?
The “line feed” is a term explaining the situation, the representation we actually want. Example:
line of text. after this is the line feed and this is after the linefeed.
Ofcourse what you see is a xml/html document in your browser, just imagine the above appears in a textfile.
Now the “line feed”, the wrapping of the line, is presented as above when outputting the text i.e. with cat. In most editors or programming languages the linefeed is presented as '\n', so the above can also be seen as this:
line of text. after this is the line feed\nand this is after the linefeed.
Now how the linefeed is actually stored in files differs among operating systems. In the ASCII character encoding scheme there are CR ('carriage return', ASCII 13) and LF ('Line Feed', ASCII 10).
In modern environments (languages like Python, editors like emacs etc.) there gets a “universal newlines” established, interpreting all of CR, LF, CRLF, LINE SEPARATOR, and PARAGRAPH SEPARATOR as linefeed. This is in standard UAX #9, “The Unicode Line-Breaking Algorithm”.
While unix/linux store linefeeds as LF-character, older MacOS used CR - since MacOsX its also LF. Dos and Windows use CRLF.
Current recommendation for outputting linefeed in Unicode: LINE SEPARATOR for hard line breaks and PARAGRAPH separator where you expect the software to provide appropriate line breaks for you at display time.
How to "see" the used linefeeds?
perl: This perl-snippet shows the used ASCII-characters:
perl -ne 's/\015/<CR>/g; s/\012/<LF>/g; print "$_\n";' filename.txt
Emacs provides an EOL indicator in the mode line, and if you're worried about mixed EOL conventions, you can specify the coding system as “undecided-unix” to enforce Unix EOL, in which case CF displays as “^M” in the buffer.
od or xxd can display the details: 'xxd -g 1 -u filename.txt'
How to fix the linefeeds into unix/linux style?
This snippet is changing “everything we or other operating systems could regard as linefeed” into “our unixish linefeed”:
perl -i.bak -p filename.txt
The more verbose version is this:
perl -i.bak -pe 's/\n/\n/g' filename.txt
Also working: 'dos2unix filename.txt'