Convolutional Neural Network¶
Download codebase from GitHub: https://github.com/PharrellWANG/fdc_resnet_v3
ConvNet architectures make the explicit assumption that the inputs are images/blocks.
why resnet? 1. easier to train 2. faster to converge
Settings¶
- All the models are trained from scratch.
- Since the datasets for block size 32x32 and size 64x64 are too small, models are not trained for them. Instead, we use Bilinear Interpolation to resize the block to do the prediction for them using learned model for block size 16x16.
- No padding/cropping/flipping applied. No data augmentation applied. The orignal data is distorted enough by nature. See Data Visualization section to get a taste.
- Momentum optimizer 0.9.
- Learning rate schedule: 0.01 (<20k), 0.001 (<40k), 0.0001 (<60k), 0.00001 (else).
- Weight decay rate: 0.0002.
- Batch size 128.
- Filters [16, 16, 32, 64], residual units for last three layers: 5
Note
- Block size 4x4 is for PU, while the smallest size of CU is 8x8.
- From below training results, our model is not so applicable to blocks of size 4x4.
- DMM is not applied for size 64x64.
Our deep learning strategy is targeted to CU from size 8x8 to size 64x64, both texture and depth.
Training for blocks of size 4x4¶
Training for blocks of size 8x8¶
Results¶
The model indeed can learn something for size 8x8. Top 16 is fine, which can reduce the angular modes by half.
Training for blocks of size 16x16¶
Results¶
The model indeed can learn something for size 16x16. Top 16 is fine, which can reduce the angular modes by half.
Training for blocks of size 32x32¶
Dataset obtained after pre-processing is too small for using deep learning to train a model. We use Bilinear Interpolation to resize the block to employ model trained for size 16x16.
Training for blocks of size 64x64¶
Dataset obtained after pre-processing is too small for using deep learning to train a model. We use Bilinear Interpolation to resize the block to employ model trained for size 16x16.