Efficiently modeling seismic data sets in complex 3D anisotropic media by solving the 3D elastic wave equation is an important challenge in computational geophysics. Using a stress-stiffness formulation on a regular grid, we tested a 3D finite-difference time-domain solver using a second-order temporal and eighth-order spatial accuracy stencil that leverages the massively parallel architecture of graphics processing units (GPUs) to accelerate the computation of key kernels. The relatively small memory of an individual GPU limits the model domain sizes that can be computed on a single device. To circumvent this constraint and move toward modeling industry-sized 3D anisotropic elastic data sets, we parallelized computation across multiple GPU devices by using domain decomposition and, for each time step, employing an interdevice communication protocol to exchange data values falling within interior boundaries of each subdomain. For two or more GPU devices within a single compute node, we use direct peer-to-peer (i.e., GPU-to-GPU) communication, whereas for networked nodes we employed message-passing interface directives to route data over the network. Our 2D GPU-based anisotropic elastic modeling tests achieved a 10× speedup relative to an OpenMP CPU implementation run on an eight-core machine, whereas our 3D tests using dual-GPU devices produced up to a 28× speedup. The performance boost afforded by the GPU architecture allowed us to model seismic data for 3D anisotropic elastic models at lower hardware cost and in less time than has been previously possible. © 2013 Society of Exploration Geophysicists.