使用基本的低级 TensorFlow 训练循环训练 tf.keras 模型不起作用

2023-03-25 15:51| 来源: 网络整理| 查看: 265

回答问题

注意:可以在下面找到用于重现我的问题的独立示例的所有代码。

我有一个tf.keras.models.Model实例,需要使用用低级 TensorFlow API 编写的训练循环对其进行训练。

问题:使用基本的标准低级 TensorFlow 训练循环训练完全相同的 tf.keras 模型一次,使用 Keras 自己的model.fit()方法训练一次会产生非常不同的结果。我想找出我在低级 TF 训练循环中做错了什么。

该模型是我在 Caltech256 上训练的一个简单的图像分类模型(链接到下面的 tfrecords)。

在低级 TensorFlow 训练循环中,训练损失首先会减少,但在仅仅 1000 个训练步骤之后,损失会达到稳定,然后再次开始增加:

在此处输入图像描述

另一方面,使用正常的 Keras 训练循环在相同的数据集上训练相同的模型,可以按预期工作:

在此处输入图像描述

我在低级 TensorFlow 训练循环中缺少什么?

以下是重现问题的代码(下载 TFRecords,链接位于底部):

import tensorflow as tf from tqdm import trange import sys import glob import os sess = tf.Session() tf.keras.backend.set_session(sess) num_classes = 257 image_size = (224, 224, 3) # Build a tf.data.Dataset from TFRecords. tfrecord_directory = 'path/to/tfrecords/directory' tfrecord_filennames = glob.glob(os.path.join(tfrecord_directory, '*.tfrecord')) feature_schema = {'image': tf.FixedLenFeature([], tf.string), 'filename': tf.FixedLenFeature([], tf.string), 'label': tf.FixedLenFeature([], tf.int64)} dataset = tf.data.Dataset.from_tensor_slices(tfrecord_filennames) dataset = dataset.shuffle(len(tfrecord_filennames)) # Shuffle the TFRecord file names. dataset = dataset.flat_map(lambda filename: tf.data.TFRecordDataset(filename)) dataset = dataset.map(lambda single_example_proto: tf.parse_single_example(single_example_proto, feature_schema)) # Deserialize tf.Example objects. dataset = dataset.map(lambda sample: (sample['image'], sample['label'])) dataset = dataset.map(lambda image, label: (tf.image.decode_jpeg(image, channels=3), label)) # Decode JPEG images. dataset = dataset.map(lambda image, label: (tf.image.resize_image_with_pad(image, target_height=image_size[0], target_width=image_size[1]), label)) dataset = dataset.map(lambda image, label: (tf.image.per_image_standardization(image), label)) dataset = dataset.map(lambda image, label: (image, tf.one_hot(indices=label, depth=num_classes))) # Convert labels to one-hot format. dataset = dataset.shuffle(buffer_size=10000) dataset = dataset.repeat() dataset = dataset.batch(32) iterator = dataset.make_one_shot_iterator() features, labels = iterator.get_next() # Build a simple model. input_tensor = tf.keras.layers.Input(shape=image_size) x = tf.keras.layers.Conv2D(64, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(input_tensor) x = tf.keras.layers.Conv2D(64, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x) x = tf.keras.layers.Conv2D(128, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x) x = tf.keras.layers.Conv2D(256, (3,3), strides=(2,2), activation='relu', kernel_initializer='he_normal')(x) x = tf.keras.layers.GlobalAveragePooling2D()(x) x = tf.keras.layers.Dense(num_classes, activation=None, kernel_initializer='he_normal')(x) model = tf.keras.models.Model(input_tensor, x)

这是简单的 TensorFlow 训练循环:

# Build the training-relevant part of the graph. model_output = model(features) loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output)) train_op = tf.train.AdamOptimizer().minimize(loss) # The next block is for the metrics. with tf.variable_scope('metrics') as scope: predictions_argmax = tf.argmax(model_output, axis=-1, output_type=tf.int64) labels_argmax = tf.argmax(labels, axis=-1, output_type=tf.int64) mean_loss_value, mean_loss_update_op = tf.metrics.mean(loss) acc_value, acc_update_op = tf.metrics.accuracy(labels=labels_argmax, predictions=predictions_argmax) local_metric_vars = tf.contrib.framework.get_variables(scope=scope, collection=tf.GraphKeys.LOCAL_VARIABLES) metrics_reset_op = tf.variables_initializer(var_list=local_metric_vars) # Run the training epochs = 3 steps_per_epoch = 1000 fetch_list = [mean_loss_value, acc_value, train_op, mean_loss_update_op, acc_update_op] sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer()) with sess.as_default(): for epoch in range(1, epochs+1): tr = trange(steps_per_epoch, file=sys.stdout) tr.set_description('Epoch {}/{}'.format(epoch, epochs)) sess.run(metrics_reset_op) for train_step in tr: ret = sess.run(fetch_list, feed_dict={tf.keras.backend.learning_phase(): 1}) tr.set_postfix(ordered_dict={'loss': ret[0], 'accuracy': ret[1]})

下面是标准的 Keras 训练循环,它按预期工作。请注意,上述模型中密集层的激活需要从None更改为 'softmax' 才能使 Keras 循环正常工作。

epochs = 3 steps_per_epoch = 1000 model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) history = model.fit(dataset, epochs=epochs, steps_per_epoch=steps_per_epoch)

您可以在此处下载 Caltech256 数据集的 TFRecords(约 850 MB)。

更新:

我已经设法解决了这个问题:替换低级 TF 损失函数

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))

通过它的 Keras 等价物

loss = tf.reduce_mean(tf.keras.backend.categorical_crossentropy(target=labels, output=model_output, from_logits=True))

成功了。现在低级 TensorFlow 训练循环的行为就像model.fit()一样。

这就提出了一个新问题:

tf.keras.backend.categorical_crossentropy()做了什么而tf.nn.softmax_cross_entropy_with_logits_v2()没有导致后者表现更差? (我知道后者需要 logits,而不是 softmax 输出,所以这不是问题)

Answers

替换低级 TF 损失函数

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf.stop_gradient(labels), logits=model_output))

通过它的 Keras 等价物

loss = tf.reduce_mean(tf.keras.backend.categorical_crossentropy(target=labels, output=model_output, from_logits=True))

成功了。现在低级 TensorFlow 训练循环的行为就像model.fit()一样。

但是,我不知道这是为什么。如果有人知道为什么tf.keras.backend.categorical_crossentropy()表现良好而tf.nn.softmax_cross_entropy_with_logits_v2()根本不起作用,请发布答案。

另一个重要说明:

为了训练具有低级 TF 训练循环和tf.data.Dataset对象的tf.keras模型,通常不应在迭代器输出上调用模型。也就是说,不应该这样做:

model_output = model(features)

相反,应该创建一个模型,其中输入层设置为基于迭代器输出而不是创建占位符,如下所示:

input_tensor = tf.keras.layers.Input(tensor=features)

在此示例中这无关紧要,但如果模型中的任何层具有需要在训练期间运行的内部更新(例如 BatchNormalization),它就会变得相关。

【本文地址】

公司简介

联系我们