Convolutional neural networks (CNNs) have performed extremely well for many image analysis tasks. However, supervised training of deep CNN architectures requires huge amounts of labeled data, which is unavailable for light field images. In this paper, we leverage on synthetic light field images and propose a two-stream CNN network that learns to estimate the disparities of multiple correlated neighborhood pixels from their epipolar plane images (EPIs). Since the EPIs are unrelated except at their intersection, a two-stream network is proposed to learn convolution weights individually for the EPIs and then combine the outputs of the two streams for disparity estimation. The CNN estimated disparity map is then refined using the central RGB light field image as a prior in a variational technique. We also propose a new real world data set comprising light field images of 19 objects captured with the Lytro Illum camera in outdoor scenes and their corresponding 3D pointclouds, as ground truth, captured with the 3dMD scanner. This data set will be made public to allow more precise 3D pointcloud level comparison of algorithms in the future which is currently not possible. Experiments on the synthetic and real world data sets show that our algorithm outperforms existing state of the art for depth estimation from light field images.