使用 TensorFlow Serving 和 Docker 快速部署机器学习服务

从实验到生产,简单快速部署机器学习模型一直是一个挑战。这个过程要做的就是将训练好的模型对外提供预测服务。在生产中,这个过程需要可重现,隔离和安全。这里,我们使用基于Docker的TensorFlow Serving来简单地完成这个过程。TensorFlow 从1.8版本开始支持Docker部署,包括CPU和GPU,非常方便。

获得训练好的模型

获取模型的第一步当然是训练一个模型,但是这不是本篇的重点,所以我们使用一个已经训练好的模型,比如ResNet。TensorFlow Serving 使用SavedModel这种格式来保存其模型,SavedModel是一种独立于语言的,可恢复,密集的序列化格式,支持使用更高级别的系统和工具来生成,使用和转换TensorFlow模型。这里我们直接下载一个预训练好的模型:

1
2
$ mkdir /tmp/resnet
$ curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz

如果是使用其他框架比如Keras生成的模型,则需要将模型转换为SavedModel格式,比如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from keras.models import Sequential
from keras import backend as K
import tensorflow as tf

model = Sequential()
# 中间省略模型构建

# 模型转换为SavedModel
signature = tf.saved_model.signature_def_utils.predict_signature_def(
inputs={'input_param': model.input}, outputs={'type': model.output})
builder = tf.saved_model.builder.SavedModelBuilder('/tmp/output_model_path/1/')
builder.add_meta_graph_and_variables(
sess=K.get_session(),
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
signature
})
builder.save()

下载完成后,文件目录树为:

1
2
3
4
5
6
7
$ tree /tmp/resnet
/tmp/resnet
└── 1538687457
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index

部署模型

第一步是安装Docker CE。这将为您提供运行和管理Docker容器所需的所有工具。现在我们有了我们的模型,使用Docker服务就像拉动最新发布的TensorFlow服务服务环境镜像一样简单,并将其指向模型:

使用Docker部署模型服务:

1
2
3
4
5
6
7
8

$ docker pull tensorflow/serving
$ docker run -p 8501:8501 --name tfserving_resnet \
--mount type=bind,source=/tmp/resnet,target=/models/resnet \
-e MODEL_NAME=resnet -t tensorflow/serving &

… main.cc:327] Running ModelServer at 0.0.0.0:8500…
… main.cc:337] Exporting HTTP/REST API at:localhost:8501 …

分解命令行参数,分别是:

  • -p 8501:8501 :将容器的端口8501(TF服务响应REST API请求)发布到主机的端口8501
  • --name tfserving_resnet :给容器我们创建名称“tfserving_resnet”,以便我们稍后可以参考它
  • --mount type=bind,source=/tmp/resnet,target=/models/resnet :在主机(/ models / resnet)上安装主机的本地目录(/ tmp / resnet),以便TF服务可以从容器内部读取模型。
  • -e MODEL_NAME=resnet :Telling TensorFlow服务加载名为“resnet”的模型
  • -t tensorflow/serving :根据服务镜像“tensorflow / serving”运行Docker容器

其中,8500端口对于TensorFlow Serving提供的gRPC端口,8501为REST API服务端口。上述命令输出为

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2019-03-04 02:52:26.610387: I tensorflow_serving/model_servers/server.cc:82] Building single TensorFlow model file config:  model_name: resnet model_base_path: /models/resnet
2019-03-04 02:52:26.618200: I tensorflow_serving/model_servers/server_core.cc:461] Adding/updating models.
2019-03-04 02:52:26.618628: I tensorflow_serving/model_servers/server_core.cc:558] (Re-)adding model: resnet
2019-03-04 02:52:26.745813: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: resnet version: 1538687457}
2019-03-04 02:52:26.745901: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: resnet version: 1538687457}
2019-03-04 02:52:26.745935: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: resnet version: 1538687457}
2019-03-04 02:52:26.747590: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /models/resnet/1538687457
2019-03-04 02:52:26.747705: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/resnet/1538687457
2019-03-04 02:52:26.795363: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-03-04 02:52:26.828614: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-04 02:52:26.923902: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:162] Restoring SavedModel bundle.
2019-03-04 02:52:28.098479: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:138] Running MainOp with key saved_model_main_op on SavedModel bundle.
2019-03-04 02:52:28.144510: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:259] SavedModel load for tags { serve }; Status: success. Took 1396689 microseconds.
2019-03-04 02:52:28.146646: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /models/resnet/1538687457/assets.extra/tf_serving_warmup_requests
2019-03-04 02:52:28.168063: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: resnet version: 1538687457}
2019-03-04 02:52:28.174902: I tensorflow_serving/model_servers/server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2019-03-04 02:52:28.186724: I tensorflow_serving/model_servers/server.cc:302] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 237] RAW: Entering the event loop ...

我们可以看到,TensorFlow Serving使用1538687457作为模型的版本号。我们使用curl命令来查看一下启动的服务状态,也可以看到提供服务的模型版本以及模型状态。

1
2
3
4
5
6
7
8
9
10
11
12
13
$ curl http://localhost:8501/v1/models/resnet
{
"model_version_status": [
{
"version": "1538687457",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}

查看模型输入输出

很多时候我们需要查看模型的输出和输出参数的具体形式,TensorFlow提供了一个saved_model_cli命令来查看模型的输入和输出参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ saved_model_cli show --dir /tmp/resnet/1538687457/ --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
The given SavedModel SignatureDef contains the following input(s):
inputs['image_bytes'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
dtype: DT_INT64
shape: (-1)
name: ArgMax:0
outputs['probabilities'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1001)
name: softmax_tensor:0
Method name is: tensorflow/serving/predict

signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['image_bytes'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
dtype: DT_INT64
shape: (-1)
name: ArgMax:0
outputs['probabilities'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1001)
name: softmax_tensor:0
Method name is: tensorflow/serving/predict

注意到signature_definputs的名称,类型和输出,这些参数在接下来的模型预测请求中需要。

使用模型接口预测:REST和gRPC

TensorFlow Serving提供REST API和gRPC两种请求方式,接下来将具体这两种方式。

REST

我们下载一个客户端脚本,这个脚本会下载一张猫的图片,同时使用这张图片来计算服务请求时间。

1
$ curl -o /tmp/resnet/resnet_client.py https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/example/resnet_client.py

以下脚本使用requests库来请求接口,使用图片的base64编码字符串作为请求内容,返回图片分类,并计算了平均处理时间。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from __future__ import print_function

import base64
import requests

# The server URL specifies the endpoint of your server running the ResNet
# model with the name "resnet" and using the predict interface.
SERVER_URL = 'http://localhost:8501/v1/models/resnet:predict'

# The image URL is the location of the image we should send to the server
IMAGE_URL = 'https://tensorflow.org/images/blogs/serving/cat.jpg'


def main():
# Download the image
dl_request = requests.get(IMAGE_URL, stream=True)
dl_request.raise_for_status()

# Compose a JSON Predict request (send JPEG image in base64).
jpeg_bytes = base64.b64encode(dl_request.content).decode('utf-8')
predict_request = '{"instances" : [{"b64": "%s"}]}' % jpeg_bytes

# Send few requests to warm-up the model.
for _ in range(3):
response = requests.post(SERVER_URL, data=predict_request)
response.raise_for_status()

# Send few actual requests and report average latency.
total_time = 0
num_requests = 10
for _ in range(num_requests):
response = requests.post(SERVER_URL, data=predict_request)
response.raise_for_status()
total_time += response.elapsed.total_seconds()
prediction = response.json()['predictions'][0]

print('Prediction class: {}, avg latency: {} ms'.format(
prediction['classes'], (total_time*1000)/num_requests))


if __name__ == '__main__':
main()

输出结果为

1
2
$ python resnet_client.py
Prediction class: 286, avg latency: 210.12310000000002 ms

gRPC

让我们下载另一个客户端脚本,这个脚本使用gRPC作为服务,传入图片并获取输出结果。这个脚本需要安装tensorflow-serving-api这个库。

1
2
$ curl -o /tmp/resnet/resnet_client_grpc.py https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/example/resnet_client_grpc.py
$ pip install tensorflow-serving-api

脚本内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from __future__ import print_function

# This is a placeholder for a Google-internal import.

import grpc
import requests
import tensorflow as tf

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

# The image URL is the location of the image we should send to the server
IMAGE_URL = 'https://tensorflow.org/images/blogs/serving/cat.jpg'

tf.app.flags.DEFINE_string('server', 'localhost:8500',
'PredictionService host:port')
tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format')
FLAGS = tf.app.flags.FLAGS


def main(_):
if FLAGS.image:
with open(FLAGS.image, 'rb') as f:
data = f.read()
else:
# Download the image since we weren't given one
dl_request = requests.get(IMAGE_URL, stream=True)
dl_request.raise_for_status()
data = dl_request.content

channel = grpc.insecure_channel(FLAGS.server)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# Send request
# See prediction_service.proto for gRPC request/response details.
request = predict_pb2.PredictRequest()
request.model_spec.name = 'resnet'
request.model_spec.signature_name = 'serving_default'
request.inputs['image_bytes'].CopyFrom(
tf.contrib.util.make_tensor_proto(data, shape=[1]))
result = stub.Predict(request, 10.0) # 10 secs timeout
print(result)


if __name__ == '__main__':
tf.app.run()

输出的结果可以看到图片的分类,概率和使用的模型信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
$ python resnet_client_grpc.py
outputs {
key: "classes"
value {
dtype: DT_INT64
tensor_shape {
dim {
size: 1
}
}
int64_val: 286
}
}
outputs {
key: "probabilities"
value {
dtype: DT_FLOAT
tensor_shape {
dim {
size: 1
}
dim {
size: 1001
}
}
float_val: 2.4162832232832443e-06
float_val: 1.9012182974620373e-06
float_val: 2.7247710022493266e-05
float_val: 4.426385658007348e-07
...(中间省略)
float_val: 1.4636580090154894e-05
float_val: 5.812107133351674e-07
float_val: 6.599806511076167e-05
float_val: 0.0012952701654285192
}
}
model_spec {
name: "resnet"
version {
value: 1538687457
}
signature_name: "serving_default"
}

性能

通过编译优化的TensorFlow Serving二进制来提高性能

TensorFlows serving有时会有输出如下的日志:

1
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

TensorFlow Serving已发布Docker镜像旨在尽可能多地使用CPU架构,因此省略了一些优化以最大限度地提高兼容性。如果你没有看到此消息,则你的二进制文件可能已针对你的CPU进行了优化。根据你的模型执行的操作,这些优化可能会对你的服务性能产生重大影响。幸运的是,编译优化的TensorFlow Serving二进制非常简单。官方已经提供了自动化脚本,分以下两部进行:

1
2
3
4
5
6
7
# 1. 编译开发版本:首先,我们要构建TensorFlow服务的优化版本。最简单的方法是构建官方的Tensorflow服务开发环境Docker镜像。这具有为图像构建的系统自动生成优化的TensorFlow服务二进制文件的良好特性。为了区分我们创建的图像和官方图像,我们将$ USER /添加到镜像名称之前。让我们称这个开发镜像为$ USER / tensorflow-serving-devel

$ docker build -t $USER/tensorflow-serving-devel -f Dockerfile.devel https://github.com/tensorflow/serving.git#:tensorflow_serving/tools/docker

# 2. 生产新的镜像:构建TensorFlow服务开发映像可能需要一段时间,具体取决于计算机的速度。完成后,让我们使用优化的二进制文件构建一个新的服务图像,并将其命名为$ USER / tensorflow-serving:

$ docker build -t $USER/tensorflow-serving --build-arg TF_SERVING_BUILD_IMAGE=$USER/tensorflow-serving-devel https://github.com/tensorflow/serving.git#:tensorflow_serving/tools/docker

现在我们有了新的服务图像,让我们再次启动服务器::

1
2
3
4
$ docker kill tfserving_resnet
$ docker run -p 8501:8501 --name tfserving_resnet \
--mount type=bind,source=/tmp/resnet,target=/models/resnet \
-e MODEL_NAME=resnet -t $USER/tensorflow-serving &

最后运行我们的客户端程序:

1
2
$ python /tmp/resnet/resnet_client.py
Prediction class: 282, avg latency: 84.8849 ms

在我们的机器上,我们看到使用我们的原生优化二进制文件,每次预测平均加速超过100毫秒(119%)。根据您的机器(和型号),您可能会看到不同的结果。

最后,结束TensorFlow Serving容器:

1
$ docker kill tfserving_resnet

参考

0%