GPU training

the network was split across two GPUs due to memory constraints - **Local Response Normalization** (LRN) -- later superseded by batch normalization