Friday, July 8, 2016

Third Fornight blog post

Hello there,
The PR of the new optimizer is about to be merged, after all the cleanup tasks that are done. Also, there are some progress on the CleanUp PR that i had started last fortnight. In the CleanUp PR, the op_lifter has been testsed with TopoOptimizer. The op_lifter seemed to work well with the TopoOptimizer, paving way for the possibility of implementation of the backward pass.

A quick summary of the work done over the last fortnight

Over the new_graph2gpu PR
  • I did some cleanups and addressed all the comments regarding cleanups, refactoring and optimizations.
  • Pascal helped in fixing the TestDnnConv2d test by figuring out that the `get_scalar_constant_value` method doesn't not handle SharedVariable of dimension (1, 1) and is broadcastable. 
  • I fixed the failing Gpucumsum Op's test by changing the flatten() operation with a corresponding call to GpuReshape.
  • Made few changes to fix local_gpua_eye(handled those optimization similar to local_gpuaalloc) and its test.
  • Applied the changes needed to the interface post merging of dilation PR, by making the test cases inside theano/gpuarray/dnn test with the new filter dilation parameter
  • Line profiled the more time consuming local_gpua_careduce. I initially thought it was because of a call to as_gpuarray_variable that caused this, until Fred pointed me out the actual reason, which is  because of a call to gpuarry.GpuKernel. I am currently trying to find a fix for that.
2) On CleanUp PR,
  •  Replaced the calls to HostFromGpu with transfer.
  •  Added register_topo decorator to op_lifter. Created a new LocalGroupDB instance and registered there all the optimizer to which op_lifter is applied there. Finally, registered this LocalGroupDB into the gpu_seqopt.
  • I had also tried creating a new TopoOptDB, but I had done this implementation wrong. I had created it similar to LocalGroupDB and that didn't seem to work. I was trying few more ways of implementing it, similar to SequenceDB, that also didn't work out. 
  • Reverted the local_gpua_subtensor to its previous version (as in the current master) as it caused some expected transfers to the GPU not happen. 
  • Removed all the separate caching method, for it to be integrated with the __props__ of the class.
3) On Remove_ShapeOpt PR
  • I was able to add exceptions only at one place, to completely ignore fgraph's shape_feature. There are few optimizer's which mandatorily needs them, which i have commented on the PR. 
  • Skipped all the tests that tests infer_shape, Contains MakeVector, Shape_i, T.second, T.fill and other optimizations done by ShapeFeature. 
  • The profiling results didn't seem to give significance improvement in optimization time as more work needs to be done on this case.
That's it for now!
Cheers,

No comments:

Post a Comment