3D room reconstruction with sound

The present database provides a benchmark for evaluating algorithms for room geometry reconstruction from audio. Data are related to a rectangular room with vaulted ceiling of dimensions about 8.5 x 7 x 7.5 m in which 12 microphones and 17 sources have been deployed in a 3 x 3 m central area. High precision ground truth positions of microphones, sources and planar surfaces of the room (i.e. the four walls, the ceiling and the floor) are provided. In particular ground truth source and microphone positions are acquired by a VICON motion capture system while room geometry is reconstructed by a Leica C10 3D laser scanner. Common markers were used to register source and microphone positions with respect to room geometry.

vgm 3droom dataset image 1  vgm 3droom dataset image 2  vgm 3droom dataset image 3
 

Signals acquired by each microphone corresponding to the transmission of a sine sweep pulse from each source location are provided together with a reference signal acquired very close to the speaker.

For more details please refer to the arxiv paper “Uncalibrated 3D Room reconstruction from Sound”, Marco Crocco, Andrea Trucco, Alessio Del Bue downloadable at the following address https://arxiv.org/abs/1606.06258?context=cs.

The same paper has been submitted to IEEE/ACM Transactions on Audio Speech and Language Processing.

 Dataset description

The database contains three files .mat and a file .m :

  • ground_truth.mat
  • acquired_signals.mat
  • reference_signal.mat
  • estimate_errors.m

ground_truth.mat contains three matrices:

  • ground_truth_sources : a 17 x 3 matrix whose rows are the 3D coordinates in meters of the 17 sources.
  • ground_truth_microphones : a 12 x 3 matrix whose rows are the 3D coordinates in meters of the 12 microphones.
  • ground_truth _plane_normals: a 6x3 matrix. Each row is a 3-vector normal to the corresponding plane, whose norm is equal to the distance between the plane and the center of coordinates (coincident with the center of the room).

reference_signal.mat contains a vector chirpGT representing the samples of the reference signal acquired very close to the speaker, when a chirp is transmitted by the speaker itself. The reference signal can be used to produce a matched filter in order to detect peaks in the received signals.

acquired_signals.mat contains a cell array of 17 cells, corresponding to the 17 sources. Each cell is a matrix of size number of samples x 12. The n-th column of the matrix in cell m contains the samples of the signal acquired by microphone n when the source m is active.

The sampling frequency of the acquired signals is equal to 16 kHz.

estimate_errors.m is provided to evaluate the accuracy of reconstruction the function, based on the following metrics:

  • Euclidean distance between ground truth and estimated positions of sources and microphones.
  • Absolute difference of length of ground truth and estimated normal vector, where normal vectors are defined as 3-vector normal to plane, whose norm is equal to the distance between the plane and the center of coordinates (coincident with the center of the room).
  • Angle between ground truth and normal vector defined as above.

 

Download link: 3D Room Recostruction Dataset