Abstract: The objective of this work is the Development of a Depth Map Generation System,
oriented to the determination of distances. Where a process of exploration of the methods of Artificial
Vision and Structured Light is performed, to obtain the depth map of a certain scene. By means of this map
it is possible to extract the 3D location data of the objects present in the scene. The work addresses the
problem of 3D distance estimation through the generation of the Depth Map, thus presenting a tool for the
development of future applications, such as classification systems, surface reconstruction, 3D printing and
recognition, among others. The algorithmic development is carried out through Matlab software. It should be
noted that the work focuses specifically on the process of determining distances over a fixed scene under
controlled lighting conditions, and not on light variations of the scene or moving environments. With the
proposed development, the system presents a good response in scenes with depth variations between 540mm to
640mm, presenting errors of less than 5.5% in the mentioned range.
Keywords: Depth maps, structured light, distance measurement, computer vision,
computational geometry.
Resumen: Este trabajo tiene como objetivo el Desarrollo de un Sistema de Generación de
Mapas de Profundidad, orientado a la determinación de distancias. Donde se realiza un proceso de exploración
de los métodos de Visión Artificial y Luz Estructurada, para obtener el mapa de profundidad de una
determinada escena. Ya que mediante este mapa se pueden extraer los datos de localización 3D de los objetos
presentes en la escena. El trabajo aborda el problema de la estimación de las distancias 3D mediante la
generación del Mapa de Profundidad, presentando así una herramienta para el desarrollo de futuras
aplicaciones, como sistemas de clasificación, reconstrucción de superficies, impresión y reconocimiento 3D,
entre otras. El desarrollo algorítmico se realiza a través del Software Matlab. Cabe destacar que el trabajo
se centra específicamente en el proceso de determinación de distancias sobre una escena fija en condiciones
de iluminación controlada, y no en variaciones lumínicas de la escena o entornos en movimiento. Con el
Desarrollo planteado, el Sistema presenta una buena respuesta en escenas que tienen variaciones de
profundidad entre 540mm to 640mm, presentando errores de menos del 5.5% en el rango mencionado.
Palabras clave: Mapas de Profundidad, Luz Estructurada, Medición de Distancia, Visión por
Computador, Geometría Computacional.
1. INTRODUCTION
In the last decade, great strides have been made in capturing the three-dimensional structure of real scenes
or objects (
). Growing computational capabilities make it possible today to process previously intractable volumes
of data,
improving techniques and making them faster and more accurate. This ability to model real world scenes on a
computer is very advantageous in areas such as industry
, industrial design
or graphic arts
. 3D reconstruction techniques are an essential tool in all
those disciplines in which the recovery of the three-dimensional structure of a scene is necessary. For this
reason, numerous reconstruction methods have been developed in recent years, including the structured light
technique
, laser telemetry
, reconstruction by moving
camera
or the multiview vision technique, where the stereoscopic vision or
three-dimensional reconstruction of a scene is found from at least two 2D views, or stereo of the same
.
This project was inspired by research work carried out in the Department of Engineering and Computer Science
at Brown University. Which has focused his attention on topics such as Applied Computational Geometry,
Computer Graphics, 3D Modeling, and Computer Vision (
;
;
;
;
;
).
In the same way, this Project provides as a means of consultation for future implementations of 3D scanning
systems using structured light, presenting the first steps to begin creating distance sensor devices from
image acquisition, which may be used in future projects with applications for reconstruction of
three-dimensional scenes.
The goal of this work is the Development of a Depth Map Generation System, aimed at distance determination
purposes. Where a process of exploration of the methods of Artificial Vision and Structured Light is carried
out, in order to be able to obtain the depth map of a certain scene. Since, by means of this map, the 3D
location data of the objects present in the scene can be extracted. The work focuses on the problem of
estimating 3D distances by generating a Depth Map of a scene. Presenting in this way a tool for the
development of future applications, such as classification systems
, surface reconstruction
, 3D recognition and printing
, among other.
The present project proposes as a scope the Implementation of a Depth Map Generation System, oriented to the
purposes of determining distances, through the use of the structured light technique. The algorithmic
development is carried out using the Matlab Software. It should be noted that the work specifically focuses
on the process of determining distances on a fixed scene under controlled lighting conditions, and not on
light variations in the scene or the environment or moving environments.
2. MATERIALS AND METHODS
2.1 Structured Light
The structured light technique is based on the relationship between a camera and a light source, which
projects a known pattern on the scene to be scanned. With this pattern, only the points illuminated by the
light source are captured by a camera, which will be in sync with the light emitting source. By matching the
captured image to the original pattern, it is possible to triangulate the position of each pixel and
determine its depth with respect to the plane perpendicular to the camera. In this way, the camera will
capture the deformations of the patterns emitted by the light source (Projector).
Sensors based on structured light are based on three elements: a light pattern, a detector or camera system,
and a processing system. The light source is responsible for emitting a light pattern on the object to be
scanned, which will cause a deformation in the pattern that will later be measured by a camera
. The 3D information comes from the deformation of this pattern based on trigonometric functions
.
Fig. 1. presents a general diagram of a 3D capture system based on structured light.
Fig. 1. Scheme of a 3D capture system based on structured light
2.2 Line-Plane Intersection
A projected line creates a light plane, the intersection of a light plane with the object, usually
containing many illuminated curved segments. Each of these segments is made up of many illuminated dots. A
single illuminated point, visible to the camera, defines a camera beam. The equations of the projected
planes, as well as the equations of the camera rays corresponding to the illuminated points, are defined by
parameters that can be measured. From these measurements, the location of the illuminated points can be
recovered by intersecting the planes or rays of light with the camera rays corresponding to the illuminated
points. In this way, the recovery of a 3D surface model is allowed
.
To calculate the intersection of a line and a plane, the line is represented in parametric form (1):
\[ L = \{ p = q_L + \lambda v ; \ \lambda \in \mathbb{R} \} \hspace{1cm} (1)\]
\[ P = \{ p \, | \, n^t (p - q_P) = 0 \} \hspace{1cm} (2)\]
If the line and the plane did not cross, it could be said that they are parallel. This is the case if the
vectors v and n are orthogonal, that is, if \(n^t v = 0 \). The vectors v and n are also orthogonal when the
line L
is contained in the plane P. Neither of these two conditions would be applicable to the structured light
system proposed in this work.
If the vectors v and n are not orthogonal, that is if \( n^t v \neq 0 \), then the intersection of the line
and the
plane contains exactly one-point p. Since this point belongs to the line, it can be written as \( p = q_L +
\lambda v \), for
a value \(\lambda\) that would need to be determined. Since the point also belongs to the plane, the value
\(\lambda\) must
satisfy the linear equation (3):
\[ n^t (p - q_P) = n^t (q_L + \lambda v - q_P) = 0 \hspace{1cm} (3)\]
And when solving for \(\lambda\) we have (4):
\[ \lambda = \frac{{n^t (q_P - q_L)}}{{n^t v}} \hspace{1cm} (4)\]
A geometric interpretation of the line-plane intersection is presented in Fig. 2.
Fig. 2. Triangulation principle by Line-Plane intersection
2.3 General Perspective
Fig. 3 presents the general scheme of the proposed system. In this scheme, the central part is composed of a
computer that has the Matlab R2018a software installed, and it is through this software that the execution
of the system is developed, and that the user can appreciate the depth map finally generated in the
reconstruction process. To the computer equipment, a projector and a camera are connected, responsible for
the emission and acquisition respectively, of each of the considered patterns. These teams are each located
on a tripod and oriented towards the scene to be scanned.
Fig. 3. General Scheme of the System
The structured light scanning system can be explained from the block diagram of Fig. 4.
Fig. 4. General Block Diagram of the System
Each of these stages are presented below:
-
Camera - Projector Calibration: A flat checkerboard calibration method is used to minimize the
complexity and cost of calibration . This method is based on considering the projector as
a reverse camera, assigning 2D image intensities in 3D rays, so the calibration of a projector is
the
same as that of a camera. In this way, the system can be calibrated using a standard camera
calibration
method as implemented in the Bouguet calibration toolbox. .
-
Sequence Generation and Acquisition: In this stage a set of 42 structured light patterns is
generated, using Gray coding; as these patterns are projected, an acquisition process is developed
for their subsequent processing.
-
Processing and Reconstruction: With the images from the acquisition system, the necessary
processing is carried out to reconstruct a 3D point cloud, through triangulation. In this way, the
depth map of the scene is generated.
-
Generation of the Depth Map: With the point cloud of the reconstructed scene, the depth map of the
scene is obtained, thus determining the separation distance that may exist between the acquisition
device (camera) and a determined object being scanned.
-
Data Collection, System Tests and Results Analysis: Once the implementation has been carried out,
the behavior of the system is verified through data collection, system tests and analysis of the
results obtained. As a result of this stage, the advantages and disadvantages of the implemented
system can be concluded and future works that complement the proposed development can be proposed.
In Fig. 5, the placement of the camera and the projector can be seen, where both point to a scene arranged
on a table for scanning.
Fig. 5. Photograph of the placement of the Camera and Projector
3. RESULTS
3.1 Calibration Results
In this work, the method implemented for the calibration of the camera-projector system
, is an extension of the method proposed by Bouguet in Matlab's camera calibration toolbox
. As a first step in the development process, the images were acquired with the camera. For this, a
sequential image acquisition and storage program developed from Matlab was generated.
After performing the camera calibration process, the extrinsic and intrinsic parameters of the camera are
obtained, which will be loaded to continue with the projector calibration process. The results of the
calibration are recorded in a file called calib_cam_proj.mat, which is called from the Matlab program to
take these parameters into account in its execution. Likewise, to observe the calibration parameters, a
graph is presented with the extrinsic components of the calibration, in a plane that includes the camera and
the projector. In this graph, the extrinsic components of the calibration can be visually verified and
compared to the actual configuration.
Fig. 6. Extrinsic parameters of the camera-projector calibration
3.2 Results of the generation and acquisition of sequences
As a test sequence, a set of blocks stacked in three stairs is used, which increase in depth as their levels
increase; In total, there are 24 stairs that are stacked on a common base in three blocks of 8 stairs each,
the stairs on the left being the ones closest to the camera and those on the right the furthest.
For the system tests, a sequence of 42 pattern images was generated following a gray code in the stripes
projected on the selected scene. In this way, for each of the tests carried out, 42 images were acquired for
subsequent processing. Each one of the images was stored in .BMP bitmap format and has a resolution of 1600
x1200 pixels and a size of 5.49MB. Fig 7 shows the images acquired with structured light sequence in gray
code.
Fig. 7. Images Acquired with Structured Light Sequence in Gray Code
3.3 Processing and Reconstruction Results
Decoded correspondences are used to reconstruct a 3D point cloud of the scene, taking as a texture reference
the first image acquired with a projected white pattern. By reconstructing the scene of stairs, the results
presented below are obtained in a point cloud in an X, Y, Z plane. In Fig. 8, the distribution of the point
cloud is shown taking the Z axis as the axis of depth in millimeters. The image presents the scene including
the camera (represented in red) and the projector (represented in green).
Fig. 8. 3D reconstruction through a decoded point cloud of the scene
3.4 Results of the Depth Map Generation
From the correspondence values of the previously presented point clouds, the depth map of the treated scenes
was generated. For this, the coordinates of each of the points in X, Y, Z were taken, where Z represents the
depth in millimeters of each point in the scene under consideration. Fig. 9, shows the depth map of the
Stairs scene, where an equivalence of depth is proposed, associated with a map of color intensities of 100
rows per 100 columns of data and an interpolation with the closest values is proposed, to appreciate a more
uniform behavior on the depth map.
Fig. 9. Depth Map of Stairs Scene
To validate the data produced by the developed system, measurements of the scene were taken with a digital
laser distance meter from the manufacturer Mileseey Technology, which has a measurement precision of +/-
1.5mm. Measurements were made of the distance from the camera to each of the stairs in the scene, to make
comparisons of its behavior, with respect to a set of reference data from another measurement instrument.
Fig. 10 shows the validation procedure developed.
Fig. 10. Validation of depth with a digital laser distance meter
In total, 26 measurement points were taken, equivalent to the common base (Eb), the stairs of each of the
columns ((Left column (Ea1 - Ea8)), (Central column (Eb1 - Eb8)) and (Column right (Ec1 - Ec8))) and the
distance to the wall (P). With these measurements, a graph was developed that shows the behavior of each of
the measurements. In Fig. 11 the results of the structured light measurement system can be seen in blue and
in orange, the results obtained with the laser meter.
Likewise, and according to the data presented, a graph was made showing the percentage of error for each of
the samples taken. In Fig. 12, this error is shown in gray, which, according to the measurement scale on the
right side of the graph, reaches its minimum point (0%) in stair Ea1 and its maximum point (11.7%) in the
measurement made at the point corresponding to wall P. It can also be seen that there is a gradual increase
in the error that corresponds to a second-order polynomial trend curve, as the depth is greater, and that
the error is it maintains less than 5% in the range from 540mm to 630mm.
Fig. 11. Depth Measurement Data with the Structured Light system and the Laser
meter on the Stairs scene
Fig. 12. Error % for each data
3.5 Compute Time
The computational performance of the system is verified through the time invested in the execution of each
of its stages. The execution times of each stage were measured with the Matlab commands “tic” (at the
beginning of each section) and “toc” (at the end of each section), giving the results presented in table 1.
Table 1: Measured Time for the Execution of the System
Stage |
Measured time for the execution of the Stage |
Camera Settings |
0,2976 seconds |
Generation and Acquisition of a single Structured Light sequence |
0,7673 seconds |
Generation and Acquisition of the 42 Structured Light sequences |
35,1516 seconds |
Processing and Reconstruction Stage |
5,0931 seconds |
Depth Map Generation Stage |
0,1412 seconds |
Full System Execution |
41,8109 seconds |
With the above table, it can be seen that the total time invested for the execution of the system is
approximately 41,8109 seconds.
3.6 t Student Statistical Test
With the data presented, a statistical t Student test was carried out to determine if there are significant
differences between the two groups of collected data (taken with the laser meter or taken with the
structured light system proposed). This test was performed using the Excel data analysis tool, with a
significance of 95% (that is, with an alpha of 0.05). For the test in question, two contexts were considered
in the acquisition of the scene stairs; The first consisted of evaluating the similarity between the data
taken, applied to the first line of stairs (Eb to Ea8), yielding the results presented in table 2. When
developing this test, it was observed that its value is of 0.075225, and being greater than the value
defined for alpha, it can be concluded that there is no evidence of a significant difference between the
samples of the data taken in the first line of stairs with the two measurement systems used.
The same procedure was developed considering all the stairs taken in the scene. The results are presented in
table 3 and in this case the value of the test is less than that defined for alpha, which implies a
difference that becomes significant between the values taken with the two measuring instruments.
Table 2: Results of the t Student test applied to the first line of the Stairs
scene
First Line of Stairs |
Depth with Implemented System (mm) |
Depth with Laser meter (mm) |
Mean |
563,2667 |
566,4444 |
Observations |
9,0000 |
9,0000 |
Pearson's correlation coefficient |
0,9887
|
Degrees of freedom |
8,0000
|
P(T<=t) two tails
|
0,9887
|
Table 3: Results of the t Student test applied to all samples of the Stairs scene
First Line of Stairs |
Depth with Implemented System (mm) |
Depth with Laser meter (mm) |
Mean |
614,5346 |
640,7308 |
Observations |
26,0000 |
26,0000 |
Pearson's correlation coefficient |
0,9965
|
Degrees of freedom |
25,0000
|
P(T<=t) two tails
|
0,000009
|
4. CONCLUSIONS
This project provides basic parameters that lead to the development of a depth map generation system,
capable of being presented as an alternative to determine distances in a scene, generating a plane of Z
values that can be used for subsequent applications where require the distances of the mentioned map.
It can be concluded that the system presents a good response to scenes that have depth variations between
540mm to 640mm, presenting errors of less than 5.5% in the mentioned range. This error is gradual,
corresponding to a second order polynomial trend curve, as the depth is greater.
The development of distance determination systems through depth maps presents a much more complete data
alternative than that found when making measurements with single point digital laser measurement systems,
since it allows generating more information on the various depths at multiple coordinate points in the
scene.
It could be seen that the shadows generated by the scene itself generate areas where the bands of coded
structured light cannot be interpreted. For this reason, wrong depth measurements can be generated over the
shadow area.
The total processing time of the system is close to 42 seconds, investing a time close to 35 seconds in the
process of generation and acquisition of sequences.
ACKNOWLEDGMENTS
Special thanks go to Dr Gabriel Taubin, Professor and Researcher in the Department of Engineering and
Computer Science at Brown University. given that, thanks to his interest and attention in topics such as
Applied Computational Geometry, Computer Graphics, Digital Geometry Processing, and Computer Vision; in a
kind and disinterested way, he provided a large amount of information and codes that were used and adapted
to carry out this work successfully.
REFERENCES
Akleman, E., Xing, Q., Garigipati, P., Taubin, G., Chen, J., & Hu, S. (2015). Special Section on
Expressive Graphics Hamiltonian cycle art: Surface covering wire sculptures and duotone surfaces. Computers
and Graphics, 37, 316–332. https://doi.org/10.1016/j.cag.2013.01.004
Andaló, F. A., Taubin, G., & Goldenstein, S. (2015). Efficient height measurements in single images
based on the detection of vanishing points. Computer Vision and Image Understanding, 138, 51–60.
https://doi.org/10.1016/j.cviu.2015.03.017
Avendano, J., Ramos, P. J. J., & Prieto, F. A. A. (2017). A system for classifying vegetative
structures on coffee branches based on videos recorded in the field by a mobile device. Expert Systems with
Applications, 88, 178–192. https://doi.org/10.1016/j.eswa.2017.06.044
Bouguet, J.-Y. (n.d.). Camera Calibration Toolbox for Matlab. Retrieved October 4, 2018, from
http://www.vision.caltech.edu/bouguetj/calib_doc/
Falcao, G., Hurtos, N., & Massich, J. (2008). Plane-based calibration of a projector-camera system.
Godin, G., Hébert, P., Masuda, T., & Taubin, G. (n.d.). Special issue on new advances in 3D imaging
and modeling. Computer Vision and Image Understanding, 113, 1105–1106.
https://doi.org/10.1016/j.cviu.2009.09.007
Herrero-Huerta, M., González-Aguilera, D., Rodriguez-Gonzalvez, P., & Hernández-López, D. (2015).
Vineyard yield estimation by automatic 3D bunch modelling in field conditions. Computers and Electronics in
Agriculture, 110, 17–26. https://doi.org/10.1016/j.compag.2014.10.003
Ivorra Martínez, E. (2015). Desarrollo de técnicas de visión hiperespectral y tridimensional para el
sector agroalimentario. Universidad Politécnica de Valencia.
Lanman, D., Crispell, D., & Taubin, G. (2009). Surround structured lighting: 3-D scanning with
orthographic illumination. Computer Vision and Image Understanding, 113, 1107–1117.
https://doi.org/10.1016/j.cviu.2009.03.016
Lanman, D., & Taubin, G. (2009). Build Your Own 3D Scanner : 3D Photography for Beginners. Siggraph,
94. https://doi.org/10.1145/1665817.1665819
Maurice, X., Graebling, P., & Doignon, C. (2011). Epipolar Based Structured Light Pattern Design for
3-D Reconstruction of Moving Surfaces. IEEE International Conference on Robotics and Automation Shanghai
International Conference Center May 9-13, 2011, Shanghai, China, 5301–5308.
Mera, C., Orozco-Alzate, M., Branch, J., & Mery, D. (2016). Automatic visual inspection: An approach
with multi-instance learning. Computers in Industry, 83, 46–54.
https://doi.org/10.1016/j.compind.2016.09.002
Montalto, A., Graziosi, S., Bordegoni, M., & Di Landro, L. (2016). An inspection system to master
dimensional and technological variability of fashion-related products: A case study in the eyewear industry.
https://doi.org/10.1016/j.compind.2016.09.007
Oh, J., Lee, C., Lee, S., Jung, S., Kim, D., & Lee, S. (2010). Development of a Structured-light
Sensor Based Bin-Picking System Using ICP Algorithm. International Conference on Control, Automation and
Systems 2010, Oct. 27-30, 2010 in Kintex, Gyeonggi-Do, Korea, 1673–1677.
Pardo-Beainy, C., Gutiérrez-Cáceres, E., Pardo, D., Medina, M., & Jiménez, F. (2020). Sistema de
Interacción con Kinect Aplicado a Manipulación de Procesos. Revista Colombiana de Tecnologías de Avanzada
(RCTA), Ed. Especial, 11–16. https://doi.org/10.24054/16927257.VESPECIAL.NESPECIAL.2020.849
Parmehr, E. G., Fraser, C. S., Zhang, C., & Leach, J. (2014). Automatic registration of optical
imagery with 3D LiDAR data using statistical similarity. ISPRS Journal of Photogrammetry and Remote Sensing,
88, 28–40. https://doi.org/10.1016/j.isprsjprs.2013.11.015
Saiz Muñoz, M. (2010). Reconstrucción Tridimensional Mediante Visión Estéreo y Técnicas de
Optimización. Universidad Pontificia Comillas.
Verdú, S., Ivorra, E., Sánchez, A. J., Girón, J., Barat, J. M., & Grau, R. (2013). Comparison of TOF
and SL techniques for in-line measurement of food item volume using animal and vegetable tissues. Food
Control, 33(1), 221–226.
Young, M., Beeson, E., Davis, J., Rusinkiewicz, S., & Ramamoorthi, R. (2007). Viewpoint-Coded
Structured Light. 2007 IEEE Conference on Computer Vision and Pattern Recognition, 1–8.
https://doi.org/10.1109/CVPR.2007.383292
Zeng, Q., Martin, R. R., Wang, L., Quinn, J. A., Sun, Y., & Tu, C. (2014). Region-based bas-relief
generation from a single image. Graphical Models, 76, 140–151. https://doi.org/10.1016/j.gmod.2013.10.001
Zhao, Y., & Taubin, G. (2011). Chapter 31 - Real-Time Stereo on GPGPU Using Progressive
Multiresolution Adaptive Windows. GPU Computing Gems, 473–495.
https://doi.org/10.1016/B978-0-12-384988-5.00031-0