Computer Vision

Ever since the release of Kinect, I have been fascinated by the possibilities of computer vision. I always seize the opportunity to create with it, as long as it does not invade anyone's privacy. I have had the privilege of working on many projects, which I like to call "happy surveillance." Although I am able to see people, I never use this information in a harmful way. My goal is always to create delightful experiences, whether it be playing with computer-generated snow, dancing with robots, or walking through water.

Image descriptions generated by OpenAI's GPT-4.

Show image descriptions

GPT-4, the latest language model from OpenAI, is capable of describing images. This technology has the potential to make content more accessible. Check out how AI describes images below with minimal prompts. I only asked the algorithm to move past its limitations in describing humans by emphasizing the purpose was to aid the visually impaired.

True Sight vision system
Gesture tracking tests
No items found.


The image you've provided appears to be a screenshot of a computer interface with multiple windows and applications open. Here's a breakdown of its main elements:

  1. Main Grid View: The left two-thirds of the screenshot displays a grid with colored rectangular divisions and numbered sections. There are circles within the grid that might represent objects or markers. This could be a layout for a physical environment or a digital representation of some spatial data.
  2. Top Right Grid View: In the top right corner, there's another grid view, possibly showing an overhead view of an area with various objects represented by different icons. These icons might symbolize real-world items or entities, such as robots, people, or obstacles.
  3. Multiple Windows at the Bottom: There are several smaller windows displaying various images and information. Some notable aspects include:
  4. Centroid Image: An image showing a highlighted region labeled "centroid."
  5. Dilated Image: A binary image depicting a shape that appears to be processed using dilation, a morphological operation commonly used in image processing.
  6. Contour Mask: Another binary image, likely showcasing contours or boundary detection.
  7. Console Window: On the bottom right, there's a console window displaying logs and command-line outputs. Some lines seem to be related to Python scripts, with mentions of particular modules or functionalities. This suggests that the user might be running or debugging some code.
  8. Other Interface Elements: There are various other interface elements, including icons for apps, a file transfer dialog, and more. Several Python scripts are visible in a list, hinting at a development or testing environment.

Overall, the screenshot appears to be from a technical setup, possibly related to robotics, image processing, or spatial analysis.

Triptych image description. Notably, on each image, there's a human figure overlayed with a skeletal structure, which is commonly used in computer vision and robotics for pose estimation. Pose estimation is a technique used to detect the position and orientation of an object, in this case, a human, using algorithms.

Here's a breakdown:

  1. Laboratory Setting: Each image showcases a lab with various pieces of equipment, computers, and tools. This suggests that the setting might be a research lab, possibly related to robotics, electronics, or computer science.
  2. Pose Estimation: The colorful lines and points superimposed on the human figure represent the skeletal structure and joints of the person. Each color corresponds to a different part of the body. This is indicative of pose estimation, where the computer tries to understand the position and posture of the human figure in real-time. Such technology is useful in a wide range of applications, from human-computer interaction to sports analysis.
  3. Motion Sequence: The sequence of images seems to capture the motion of the individual, from standing upright to extending an arm. This might be a demonstration or test of the pose estimation algorithm's accuracy and responsiveness.
  4. Watermark: There's a watermark on each image reading "www.openpnprefactored.io - complex problem of robotics." This suggests that the images might be related to the OpenPnP project, which is an open-source platform for pick-and-place machines, or it could be associated with another project that uses the OpenPnP name.

Overall, these images seem to demonstrate the capabilities of a pose estimation algorithm in a laboratory setting.

Perspective calibration work
Assembly of dual thermal and rgb cameras
No items found.

The image showcases a computer screen displaying a computer vision processing task, which seems to be related to calibration or object detection using OpenCV (a popular open-source computer vision library). Here's a breakdown:

1. Detection of Circular Patterns: The image shows two software windows displaying a board with circular patterns (probably green dots). One window ('cam0_frame') displays the original or unprocessed image of the board, while the other ('cam0_output') displays the processed image where each green dot is identified and surrounded by a green square with corresponding coordinates.

2. Coordinates: The processed image labels each detected green dot with its x and y coordinates. This data can be useful for various purposes, such as camera calibration, where these reference points help in determining the intrinsic and extrinsic parameters of a camera.

3. Python Script with OpenCV: The bottom of the screen displays a segment of a Python script. This script utilizes the OpenCV library, as indicated by the 'cv2' prefix in the function calls. The script appears to perform actions like undistorting the image (`cv2.undistort`), cropping the image, and possibly drawing circles on the image (`cv2.circle`).

4. Error Message: On the right side of the Python script, there seems to be an error message or traceback in the terminal. This indicates that the script encountered an issue during execution. The details of the error might provide insights into what went wrong, and it's common for developers to debug and make corrections based on these messages.

5. Application/Environment: The visual interface and the terminal suggest that the user might be using an IDE or code editor (possibly Visual Studio Code or another popular code editor) on a Windows operating system.

Overall, the image seems to depict a computer vision task, likely related to camera calibration or object detection, being executed in a development environment using the OpenCV library in Python.

The image showcases an electronics workshop or lab environment with the following elements:

1. Array of Electronic Devices: In the foreground, there's a line of devices, possibly prototypes, each equipped with electronic components, ribbon cables, and boards. The boards appear to be microcontrollers or single-board computers, which might be used for various applications, from robotics to IoT (Internet of Things) projects.

2. Workspace Background: On the wall behind, there's a pegboard with various tools, components, and equipment organized on it. This kind of organization is common in workshops where quick access to frequently used tools is necessary.

3. Workspace Furniture: There's a tool chest or drawer unit, which is another common furniture piece in workshops for storing tools, components, and other necessary items. In the distance, you can also spot chairs, indicating a workspace area.

4. General Environment: The overall setting, with its neatly organized tools, components, and prototypes in progress, paints the picture of a workspace dedicated to electronics, hardware development, or prototyping.

In summary, the image depicts a workshop or lab focused on electronic projects, where devices are being developed or tested. The organized setup suggests a professional or semi-professional environment dedicated to electronics or related fields.

An overhead shot of a room with a concrete or stone-textured floor. In the center of the image, two individuals are seen from a bird's-eye view. One person is sitting next to an open laptop, while the other stands closely nearby. Both are focused on the laptop screen. The surrounding room features smooth walls, with a partial view of two walls meeting in a corner on the left. The lighting casts a soft reflection on the floor.

The image showcases two sections of software interfaces. On the left, there's a window titled 'Hand tracking' displaying a black background with the text 'FPS=27.70' at the top, indicating the frames per second. Below the text, there's a graphical representation of a hand in an orange outline with points marking various parts of the hand, suggesting some form of hand tracking functionality.

On the right, there's another window with the title 'HAL-9000-Default'. This window displays a 'PREVIEW' toggle button turned on. The preview itself showcases an underwater scene where a person is seen swimming amidst vibrant blue and pink colors, possibly from lights reflecting in the water. The bottom of this preview window has a play/pause control button.