The symposium was a great success with 82 participants.
Thank you very much!
参加者数を把握するため、参加登録をお願いします(無料)
To help us estimate the number of participants, please register for the event (free of charge).
U.A and Helen Whitaker Professor of Robotics
Interim Director, The Robotics Institute
Carnegie Mellon University
Abstract:
In recent years, computer vision has made remarkable strides, yet it remains largely confined to what our eyes can see. In this talk, we step into a world beyond the visible spectrum, where light, sound, and heat converge to unlock new possibilities in computer vision. By harnessing ultrasound to control light, we can image deeper and with higher resolution, overcoming tissue scattering and enable ultra-fast optical scanning without moving parts. By examining the interplay between light and heat transport, we can reconstruct object shapes regardless of their visible reflectances and scene lighting, resolving long-standing ambiguities in physics-based vision. By seeing extremely tiny vibrations, we turn our visual cameras into audio and mechanical sensors. Our research has practical applications across various scientific domains, from detecting cancerous tumors to measuring plant photosynthesis to material defect detection. These research projects are a result of collaborations with my fantastic faculty colleagues, postdocs and students.
Bio:
Srinivasa Narasimhan is the Interim Director and the U.A. and Helen Whitaker Professor of the Robotics Institute at Carnegie Mellon University. He also served as Interim Director of the Robotics Institute from Aug 2019 to Dec 2021. He obtained his PhD from Columbia University in Dec 2003. His group focuses on novel techniques for imaging and illumination to enable applications in vision, graphics, robotics, agriculture, intelligent transportation and medical imaging. His works have received over a dozen Best Paper or Best Demo or Honorable mention awards at major conferences [IV (2021), ICCV (2013), CVPR (2022, 2019, 2015, 2000), ICCP (2020, 2015, 2012), I3D (2013), CVPR/ICCV Workshops (2007, 2009)]. In addition, he has received the Ford URP Award (2013), Okawa Research Grant (2009) and the NSF CAREER Award (2007). He is the co-inventor of programmable headlights, Aqualux 3D display, Assorted-pixels, Motion-aware cameras, Episcan360, Episcan3D, EpiToF3D, and programmable triangulation light curtains. He co-chaired the International Symposium on Volumetric Scattering in Vision and Graphics in 2007, the IEEE Workshop on Projector-Camera Systems (PROCAMS) in 2010, and the IEEE International Conference on Computational Photography (ICCP) in 2011, co-edited a special journal issue on Computational Photography, served on the editorial board of the International Journal of Computer Vision (2009-2023) and serves frequently as Senior or Lead Area Chair of top computer vision conferences (CVPR, ICCV, ECCV, BMVC, ACCV, 3DV).
Professor
University of California at Merced
Google DeepMind
Abstract:
Recent advances in vision and language models have significantly improved 3D and 4D generation tasks. In this talk, I will present our latest research on 4D foundation models, view synthesis, real-time deformable 4D reconstruction from monocular video, reanimating deformable 3D reconstruction, and instant 4D scene inpainting. When time allows, I will present the other most recent findings on feed-forward Gaussian splatting.
Bio:
Ming-Hsuan Yang is a Professor at University of California, Merced, and a Research Scientist at Google DeepMind. His research has received numerous honors, including Google Faculty Award (2009), NSF CAREER Award (2012), NVIDIA Pioneer Research Awards (2017, 2018), and Sony Faculty Award (2025). He has received Best Paper Honorable Mentions at UIST 2017 and CVPR 2018, Best Student Paper Honorable Mention at ACCV 2018, Longuet-Higgins Prize (Test-of-Time Award) at CVPR 2023, Best Paper Award at ICML 2024, and Test-of-Time Award from WACV 2025. He has been recognized as a Highly Cited Researcher from 2018 to 2025. Prof. Yang currently serves as Associate Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and as an Associate Editor of International Journal of Computer Vision (IJCV) and Transactions on Machine Learning Research (TMLR). He is a Fellow of IEEE, ACM, AAAI, and AAAS.
Ph.D. student
Columbia University
Abstract:
Conventional cameras produce high resolution images using millions of pixels. As a result, they make significantly more measurements than needed to solve lightweight vision tasks. I will present the minimalist camera, which uses a small number of “freeform pixels” whose shapes are automatically designed to be most information rich for the task at hand. We show that a minimalist camera can be used to monitor an indoor space with 6 pixels, estimate traffic flow with 8 pixels, and compute robot odometry with 4 pixels. Since a minimalist camera uses a very small number of measurements (freeform pixels), it preserves privacy and can be fully powered using just the light falling on it.
Next, I will present an “irradiance camera,” which, for any environmental illumination, measures the irradiance incident on every point on a sphere. We show that this irradiance function can be accurately estimated using just 49 detectors. Since the number of measurements are small, we show that the camera can produce video of the irradiance function while being entirely self-powered. We conclude with our plans to use the camera to compute egomotion, solve lightweight vision tasks, and estimate sky and weather conditions.
Bio:
Jeremy is a fourth-year Ph.D. student in the Computer Science department at Columbia University, advised by Shree Nayar. Jeremy received his BS and MS from CMU in Electrical and Computer Engineering. His research explores visual sensing methods that capture the least information necessary to solve a task. He is supported by an NDSEG fellowship, and his work received the Best Paper Award at ECCV 2024.
Ph.D. student
Columbia University
Abstract:
Our ability as humans to recognize materials is critical to every action we take. Using vision alone, we can infer whether an object will be heavy or light, rough or smooth, and even rigid or soft -- each of which determines how we interact with the object. I will present an approach to material recognition that leverages a taxonomy of materials, which is arranged by shared mechanical properties. Our recognition model explicitly wires hierarchical relationships between materials to achieve higher performance. Due to the hierarchical nature of our approach, we can recognize materials and their properties at different levels of specificity depending on the context and confidence.
While appearance conveys class-level properties of a material, touch can reveal instance-level properties. In the second part of my talk, I will present how we enable tactile robotic systems to perceive materials in real time. We show that, through simple tactile signals, we can recover the mechanical properties of an object while grasping it and adjust the force we are using to grasp it. This allows us to use the minimum force required to grasp and lift the object, thereby mitigating the risk of damage. We conclude by showing how our approach can be used to differentiate and sort objects, for example, arranging avocados by their level of ripeness.
Bio:
Matt is a third-year PhD student in the Computer Science Department at Columbia University, advised by Shree Nayar. Matt received his BS and MEng. degrees from MIT in Electrical Engineering and Computer Science. His research focuses on understanding the material properties of our lived environment and developing autonomous systems that leverage this knowledge.
Sr. Principal Research Manager
Microsoft Research Asia – Tokyo
Abstract:
Recent progress in AI has been driven by large-scale data and models for vision and language. Compared to vision and language AIs, Embodied AI - AI systems that interact with the physical world - remains constrained due to the sensing capability and data scarcity. This talk examines Embodied AI through the lens of ``from sensing to acting,'' highlighting the current chief bottlenecks in sensing and action learning and discusses our path for achieving the Embodied AI that can cooperate with us in the real world.
Bio:
Yasuyuki Matsushita is a founding member and head of Microsoft Research Asia - Tokyo since 2024. He received his B.S., M.S., and Ph.D. degrees in EECS from the University of Tokyo in 1998, 2000, and 2003, respectively. From April 2003 to March 2015, he was with Visual Computing group at Microsoft Research Asia. From April 2015 to September 2024, he was a Professor at Osaka University. He is interested in Computer Vision and Embodied AI. He is an Editor-in-Chief of International Journal of Computer Vision. He served as a Program Co-Chair for ICCV 2017 an a General Co-Chair for ICCV 2021. He has won the Osaka Science Prize in 2022 and Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2025. He is a Fellow of IEEE and a member of IPSJ.
Auditorium (講堂) on the ground floor of the Administration Building (Bldg A/管理棟) of SANKEN, Suita Campus, the University of Osaka.
Approximately 20 minutes walk from Kita-senri (北千里) Station (Hankyu Senri Line) and Handaibyouin-Mae (阪大病院前) Station (Osaka Monorail Saito Line).
How to get to SANKEN?
https://www.sanken.osaka-u.ac.jp/en/access.html
Closed.
This symposium is supported by JST ASPIRE JPMJAP2502.