Image colorization is a fun application of deep learning - but fun doesn’t mean easy. Training examples for image colorization can be made from any RGB image, so data is plentiful. But coloring an image, intuitively, cannot be done without some amount of image classification. We know that a banana should be colored yellow because bananas are yellow. Intuitively, a neural network has to make the same kind of inference over the pixel region corresponding to the banana.
Wait, this sounds a lot like image segmentation. If we work exclusively with images with known segmentation masks, can we just shovel that data into our network and see if our performance increases?
A proper colorization network needs a loss that accounts for the ambiguity of object colors (GAN loss, probabalistic loss). For the sake of time, I decided to use L2 loss. This didn’t work wonders for my final results, but it was good enough for the underlying hypothesis being tested. A FCNN with 6 ResNet blocks was used as the backbone network.
To the surprise of no one, a naive network with L2 loss didn’t do too well with the task. The left image shows the ground truth while the right shows the network output.
Things get more interesting when we take COCO segmentation data and append it to our inputs as a one-hot bitmask. Identified objects take on the average color of that object in the dataset. Things like bananas are colored in a surreal uniform fashion with sharp edges. Some other colorizations are just inexplicable; I’m not sure what was so special about that one cabbage.
All in all, segmentation mask data isn’t entirely useless. With a better loss, this might actually work!