How to save MNIST dataset on disk as images using Tensorflow 2.x

Pedro F. Rodenas
2 min readMar 20, 2021
Photo by Samuel Scrimshaw on Unsplash

For those looking to have the dataset as images instead of .npz files, this tutorial will be useful. But if you already know how to do that or if it is not necessary for you please stand by. You can still get useful information by following this tutorial.

I always try to make tutorials as simple as possible but in data science there is a need of data in order to explain concepts. The problem is that if you are following a tutorial and is necessary to download a huge dataset this maybe discourage you to finish the lesson. So I have the need of explain how to store the mnist dataset on disk in order to build subsequently custom input data pipelines.

In a deep learning project is very important to have input data pipelines, training and deployment code as independent modules. It is very common that the raw data have to be preprocessed in order to be suitable for the convolutional neural network to learn. Then building the dataset in such a format that can speed up the process of reading from disk to RAM. The next step would be to train the network whether you do it locally or in the cloud. After several trainings have been carried out with different configurations, the best model has to be selected. The inference step comes into play where different trainings are compared in order to select the neural network with the highest metric.

The step of saving mnist to disk will be the analogy of pre-processing the raw data. In later tutorials I will cover the other modules and steps previously described.

For now I want to keep things simple and the dataset will be stored in a predefined folder structure in the working directory where you execute the python code.

I hope this blog has been helpful! Best Regards.

--

--

Pedro F. Rodenas

Computer Vision | Deep Learning. Currently working at Robotics & Vision Technologies (Spain). https://www.linkedin.com/in/pedrofrodenas/