# File Input/Output, Character Encodings, Binary Input/Output

(ファイル入出力、文字コード、バイナリ入出力)

## Computing Practice I

### 12th lecture, June 28, 2018

http://www.sw.it.aoyama.ac.jp/2018/CP1/lecture12.html

### Martin J. Dürst

© 2005-18 Martin J. Dürst 青山学院大学

# Today's Schedule

• Minitest
• Summary of last lecture
• File input/output
• Character encodings
• Binary input/output
• Today's exercises

# ミニテスト

• 授業開始までにログイン済み
• 注: Moodle は https://moo.sw.it.aoyama.ac.jp へ自動変更
• ナビゲーションは左に畳み、ブラウザは全画面に拡大
• 授業開始まで教科書、資料、筆箱、財布などを鞄に入れ、鞄を椅子の下に
• テスト終了後その場で待つ

# Summary of Last Lecture

• When passing arguments to a function, there is a choice between pass-by-value and pass-by-reference
• Arrays are passed by reference automatically, but the length has to be managed separately
• Pointers are used in C for references and indirection
• It is possible to pass a (pointer to a) function to another function

# Results of Previous Exercises

 11A1 11C1 11C2 11C3 100 points 99 50 0 2 60 points 1 50 44 46 partial - - - 30 errors - - 30 4 not submitted - - 26 18

Recursive creation of permutations

• How to simplify comparison functions:
• For numbers, use the result of subtraction
• Compare total instead of average to avoid loss of information
• For reverse sorting direction, call another comparison function with arguments exchanged
• Example of how to DRY up the program using arrays

# Importance of Q&A Forum

• Still not many questions (week 11: 11 questions; last year: 26 questions)
• Any question is okay (but no program code)
• Asking question may be necessary for solving problems
• Asking questions is part of the evaluation of this lecture
• Questions and answers are part of the final exam

• Many convenient basic functions are available
• Not necessary to know all details of all functions
• Important to know what areas are covered
• Don't write your own function if a function in the standard library is available

# File Input/Output

• Using redirect, input/output from/to a single file is possible
• How to read/write from/to multiple files?
• How to specify the name of a file inside a program?

# Standard Input and File Input

source preparation standard input (`stdin`) arbitrary file `FILE * f` (not needed) `f = fopen(name,"r")` `getchar()` `getc(f)` or `fgetc(f)` `gets(s)` `fgets(s,l,stdin)` `fgets(s,l,f)` `scanf(format,...) %s` `fscanf(f,format,...) %s` (not needed) `fclose(f)`

# Standard Output and File Output

destination preparation standard output (`stdout`) arbirtary file ```FILE * f``` (not needed) `f = fopen(name,"w")` `putchar(c)` `putc(c,f)` or `fputc(c,f)` `puts(s)` `fputs(s,f)` `printf(format,...)` `fprintf(f,format,...)` (not needed) `fclose(f)`

• `fflush(f)`: Flushing of output buffer
(important after using `printf` for debugging)
• `sprintf(s,format,...)`: Formatted "output" to a string (buffer) in memory, may be dangerous
• `sscanf(s,format,...)`: Formatted "input" from a string (buffer) in memory

# Standard Error Output (`stderr`)

• It is inconvenient to output error messages to standard output:
Ideal: output: redirect to a file; error messages: display on screen
• In addition to `stdin` and `stdout`, `stderr` is always available
• For output to `stderr`, the file output functions have to be used
• `stderr` is not buffered (so `fflush` is not necessary)
• In the `malloc` pattern, `stderr` should be used
• To use `exit`, include `stdlib.h`
• Example (important to use in exercises):
`if (!(file=fopen("myfile.txt", "w"))) {    fprintf (stderr, "Cannot open file myfile.txt for writing.\n");    exit(1);}`

# Overview of Character Encoding

• Inside a computer, characters are represented as one or more bytes
• Depending on the character encoding, the same character may use a different number of bytes and different byte values
Examples of Character Encoding
Character A o y
`Shift_JIS` 8E 52   90 C2   41   6F   79
`UTF-8` E9 9D 92 E5 B1 B1 41   6F   79
`UTF-16` (BE) 97 52   5C 71   00 41 00 6F 00 79

# Japanese Legacy Character Encodings

• "JIS" (`iso-2022-jp`, used for E-mail)
• "SJIS" (`Shift_JIS`、Windows PC, MacIntosh)
(`CP932` is a variant of SJIS)
• "EUC" (`EUC-JP`、Unix/Linux)

# World-Wide Character Encodings

Various names: Unicode/ISO 10646/JIS X 0221/...

Encodings for Unicode:

• `UTF-8`: Compatible with US-ASCII; widely used on Internet/Web and internally in Ruby, Perl, Go,...
• `UTF-16`: Almost all characters use 16 bits; used internally in Windows, Java, JavaScript,...
• `UTF-32`: All characters use 32 bits; rarely used

# Character Encodings during Compilation and Execution

How to indicate character encodings for `gcc`:

• `-finput-charset=encoding`: Source encoding
• `-fexec-charset=encoding`: Execution encoding
• UTF-8 is default

Encoding used for display:

• Windows Command Prompt: System Encoding
(`CP932` on Japanese MS Windows systems, may be changed with `chcp` command)
• Cygwin Terminal: Set/change with Options... → Text → Character set (`UTF-8`)

# Binary Input/Output

• Not converting data to text, but output it directly
• I/O is fast, but file format depends on hardware architecture
• Modes for `fopen`: `"rb"`, `"wb"`, `"r+b"`,... (see p. 318)
• The position inside the file can be changed with the `fseek` function
• Use `fwrite` for output, `fread` for input

# Text Output and Binary Output

 text binary integers conversion to decimal representation (number of digits depends on the size of the number) directly (if internally, `int` uses 4 bytes, then 4 bytes) strings up to (but not including) the first `'\0'` fixed length (may include final `'\0'` and garbage)

Example: fwrite.c

# How to Check Files

Text files:

• Display to console with `cat file.txt`
• Open with a plain text editor (NOT メモ帳), switch on display of whitespace/line endings if necessary

Binary files:

• Display to console with `od -hc raw.bin` or ```hexdump file.bin```
• Open with a specialized editor

# The `fread` and `fwrite` Functions

• Four arguments:
1. `void *`: pointer to data/space
2. `size_t` (close to `int`): Size of one record (element, structure) in bytes
3. `size_t`: Number of records
4. `FILE*`: File pointer
• Return value: Number of sucessfully read/written records
• Between reading and writing, always use `fseek` or `fflush`
• Example: ```fwrite (students, sizeof(Student), studentCount, rawFile);```

# The `fseek` Function

• Change the current reading/writing position
• Similar to video recorder (rewind,...)
• Three arguments:
1. `FILE*`: File pointer
2. `long`: Offset (in bytes)
3. `int`: starting point for offset:
• `SEEK_SET`: from start of file
• `SEEK_CUR`: from current position (negative means backwards)
• `SEEK_END`: from end of file (0 or negative)
• Return value: `0` in case of success, otherwise `EOF`

# Q&A フォーラムの題名

• Problem number
• Main points of error message
• Check number
• Step number (12B1)
• line number
• byte number (12C1/2)

# 今日の演習

• 12A1: ファイル出力
• 12A2: ファイル入力
• 12B1: 文字コード
ソースファイルの文字コード、実行時の文字コード、
出力の文字コードの関係の体験 (文字化けを含む)
各文字種の一文字あたりのバイト数の調査
(各ステップを正確に! 紙は必ず今日提出; 最後のステップのプログラムも提出)
• 12C1: バイナリ出力 (インデントに注意)
• 12C2: 学生の「データベース」 (部分点、発展問題、締切は月曜日)

# 次回の準備

• 残りの演習問題を宿題として完成
• 総合復習テストの準備 (Q&A フォーラムへの質問・回答も含む)

# Glossary

standard library

standard error output

character encoding

legacy

compatible

compatibility

default

rewind

offset
ゲタ