文章大纲

Python 通过文件和文件流能够永久存储数据以及处理来自其它程序的数据。

打开文件

函数 open 来自于自动导入的模块 io 中。
函数必须指定文件名作为参数，然后返回一个文件对象，如果文件不存在将引发异常：

>>> f  = open('demofile.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'demofile.txt'

文件模式

调用函数 open 时，如果只指定文件名，获得的是一个可读取的文件对象。如果要写入文件必须通过显式的指定模式。
open 函数接受 mode 参数来指定模式，有以下模式可用：

值	描述
‘r’	读取模式（默认）
‘w’	写入模式
‘x’	独占写入模式
‘a’	附加模式
‘b’	二进制模式（可结合其它模式使用）
‘t’	文本模式（默认值，可结合其它模式使用）
‘+’	读写模式（可与其它模式结合使用

其中 w 写入模式能够写入文件，并且文件不存在时可以创建文件。
+ 可与其它任何模式结合使用，表示既可以读取也可以写入。
默认模式 rt 代表将文件视为经过编码的 Unicode 文本，因此将自动执行解码和编码，默认使用 UTF-8 编码，可以通过 encoding 和 errors 关键字参数来指定其它编码和 Unicode 错误处理策略。
如果文件包含非文本数据，例如图片，需要使用二进制模式（例如 rb）来禁用与文本相关的功能。

文件的基本方法

打开文件后，对于文件对象有一些基本的方法，以及其它类似于文件的对象（也称为流）。
sys 模块中就包含了三个标准流 sys.stdin sys.stdout sys.stderr 。

读取和写入

文件最重要的功能就是提供和接收数据。
如果有一个文件对象是 f ,可以通过 f.write 写入数据，f.read 读取数据。在文本和二进制模式下，基本上分别将 str 和 bytes 类用作数据。

每次调用 f.write(string) 时，提供的字符串都将写入到文件中既有内容的后面：

>>> f = open('demofile.txt', 'w')
>>> f.write('hello')
5
>>> f.write('world')
5
>>> f.close()

读取文件内容时，可以告诉流需要读取多少字符：

>>> f = open('demofile.txt')
>>> f.read(5)
'hello'
>>> f.read(5)
'world

使用管道重定向输出

在 Bash 中，可以通过管道依次将多个命令链接起来，类似于 cat somefile.txt | python somescript.py | sort ，如何在 Python 中获取标准输入呢？
参考示例：

import sys

text = sys.stdin.read()
words = text.split()
wordcount = len(words)
print('Wordcount: ', wordcount)

通过 sys.stdin 流，然后调用 read 方法来获取数据。

流默认是按顺序从头到尾读取，可以通过 seek 和 tell 方法在文件中移动，值访问感兴趣的内容，称为随机存取。

>>> f = open('demofile.txt', 'w')
>>> f.write('0123456789')
10
>>> f.seek(5)
5
>>> f.write('Hello, World!')
13
>>> f.close()
>>> f = open('demofile.txt')
>>> f.read()
'01234Hello, World!'

tell() 返回当前位于文件的位置：

>>> f = open('demofile.txt')
>>> f.read(3)
'012'
>>> f.read(2)
'34'
>>> f.tell()
5

读取和写入行

要读取一行（从当前位置到一个分行符的文本），可使用 readline 方法，也可以提供一个非负数的整数，来指定最多可读取多少个字符。
要读取文件中所有行，并以列表的方式返回，可以使用 readlines 方法。
wirtelines 接受一个字符串列表（可以是任何序列或可迭代的对象），然后将这些字符串都写入到文件（或流）中，写入时不会自动添加换行。

关闭文件

程序退出时将自动关闭文件对象，因此是否将读取的文件关闭其实影响不大。
对于写入的文件，一定要将其关闭，因为 Python 会缓存写入的额数据，只有关闭时才会写入到磁盘中，如果不想关闭，但是希望数据落盘可以使用 flush 方法。
只要能够方便的关闭文件，就应该将其关闭。

可以使用 try/finally 语句：

>>> try:
     f = open('demofile.txt', 'w')
     f.write('hello')
finally: f.close()

5

但还是很麻烦，Python 存在一个上下文管理器方便自动关闭文件对象：

>>> with open('demofile.txt', 'w') as f:
        f.write('world')

5

达到语句末尾时，将自动关闭文件，即便出现异常也是如此，这样就很方便了。

迭代文件内容

一种常见的文件操作是迭代文件中的内容。

每次一个字符（或字节）

在 while 循环中使用方法 read 来遍历文件中每个字符（二进制模式下是每个字节）：

def process(string):
    print('Processing: ', string)

with open('somefile.txt') as f:
    while True:
        char = f.read(1)
        if not char: break
        process(char)

每次一行

使用 readline 可实现行读取，然后进行迭代：

def process(string):
    print('Processing: ', string)

with open('somefile.txt') as f:
    while True:
        line = f.readline()
        if not line: break
        process(line)

读取所有内容

如果文件不大，可以一次读取整个文件，直接使用 read() 即可，也可以通过 readlines() 来获取包含整个文件内容的行列表。

使用 read 迭代字符：

def process(string):
    print('Processing: ', string)

with open('somefile.txt') as f:
    for char in f.read():
        process(char)

使用 readlines 迭代行：

def process(string):
    print('Processing: ', string)

with open('somefile.txt') as f:
    for line in f.readlines():
        process(line)

使用 fileinput 实现延迟迭代

对大型文件进行迭代行的时候，readlines 将占用大量的内存，可以选择 while 配合 readline 实现，但是在 Python 中，在可能的情况下，尽可能选择 for 循环。可以使用一种名为延迟行迭代的方法——因此它只读取实际需要的文本部分。

使用 fileinput 迭代行：

import fileinput

def process(string):
    print('Processing: ', string)

for line in fileinput.input('demofile.txt'):
    process(line)

文件迭代器

文件实际上是可以迭代的，意味着可以在 for 循环中直接使用它们进行迭代：

def process(string):
    print('Processing: ', string)

with open('somefile.txt') as f:
    for line in f:
        process(line)

sys.stdin 也是可以迭代的，因此要迭代标准输入中的所有行，可以这样做：

import sys

for line in sys.stdin:
    process(line)

可对迭代器做的事情基本上可以对文件做，例如使用 list(open(filename)) 将其转换为字符列表，效果与 readlines 相同。

示例：

>>> f = open('somefile.txt', 'w')
>>> print('First', 'line', file=f)
>>> print('Second', 'line', file=f)
>>> print('Third', 'and final', 'line', file=f)
>>> f.close()

>>> lines = list(open('somefile.txt'))
>>> lines
['First line\n', 'Second line\n', 'Third and final line\n']

>>> first, second, third = open('somefile.txt')
>>> first
'First line\n'
>>> second
'Second line\n'
>>> third
'Third and final line\n'

示例：

>>> f = open('somefile.txt', 'w')
>>> print('First', 'line', file=f)
>>> print('Second', 'line', file=f)
>>> print('Third', 'and final', 'line', file=f)
>>> f.close()

>>> lines = list(open('somefile.txt'))
>>> lines
['First line\n', 'Second line\n', 'Third and final line\n']

>>> first, second, third = open('somefile.txt')
>>> first
'First line\n'
>>> second
'Second line\n'
>>> third
'Third and final line\n'

对打开的文件进行序列解包，这种操作不常见，因为有时不知道文件有多少行。

Python 基础学习13：文件

打开文件

文件模式

文件的基本方法

读取和写入

使用管道重定向输出

读取和写入行

关闭文件

迭代文件内容

每次一个字符（或字节）

每次一行

读取所有内容

使用 fileinput 实现延迟迭代

文件迭代器

发表评论取消回复

Python 基础学习13：文件

打开文件

文件模式

文件的基本方法

读取和写入

使用管道重定向输出

读取和写入行

关闭文件

迭代文件内容

每次一个字符（或字节）

每次一行

读取所有内容

使用 fileinput 实现延迟迭代

文件迭代器

相关文章

发表评论 取消回复

发表评论取消回复