Larry Lai

TinyWT: A Large-Scale Wind Turbine Dataset of Satellite Images for Tiny Object Detection

Mingye Zhu, Zhicheng Yang, Hang Zhou, Chen Du, Andy Wong, Yibing Wei, Zhuo Deng, Mei Han, and Jui-Hsin Lai

Conference Paper WACV 2024

Abstract

Tiny object detection is a challenging task. Many datasets for this task are released in past years, spanning from natural scene to remote sensing images. However, wind turbines in satellite images, a significant category of tiny objects, have not been well included. Aiming at completing the tiny object datasets, we release TinyWT, a large-scale year-round tiny wind turbine dataset of satellite images. It has 8k+ images, a very tiny object size of 3-6 pixels, and 700k+ annotations in total with the extensive effort of human correction. Unlike other tiny object datasets of aerial/satellite images that are limited to academic research only, our dataset is free for commercial use. Every pixel's geographic coordinates are also explicitly extracted for researchers without related domain knowledge. Meanwhile, we reposition the tiny object detection task as a localizing-and-counting problem and incorporate segmentation techniques, and propose a novel design to exploit the strengths of contextual similarity constraint and supervised contrastive learning. The experiment results of both baseline models (CNN-based and Transformer-based models) as well as our special design are presented. Without bells and whistles, our design effectively improves the baseline models' performance, achieving a maximum of 4.94% mIoU gain where 21.15% of false negatives are recalled and 22.02% of false positives are removed. TinyWT is available on https://github.com/MingyeZhu123/TinyWT-dataset.

PARCS: A Deployment-Oriented AI System for Robust Parcel-Level Cropland Segmentation of Satellite Images

Chen Du, Yiwei Wang, Zhicheng Yang, Hang Zhou, Mei Han, and Jui-Hsin Lai

Conference Paper AAAI 2023

Abstract

Cropland segmentation of satellite images is an essential basis for crop area and yield estimation tasks in the remote sensing and computer vision interdisciplinary community. Instead of common pixel-level segmentation results with salt-and-pepper effects, a parcel-level output conforming to human recognition is required according to the clients’ needs during the model deployment. However, leveraging CNN-based models requires fine-grained parcel-level labels, which is an unacceptable annotation burden. To cure these practical pain points, in this paper, we present PARCS, a holistic deployment-oriented AI system for PARcel-level Cropland Segmentation. By consolidating multi-disciplinary knowledge, PARCS has two algorithm branches. The first branch performs pixel-level crop segmentation by learning from limited labeled pixel samples with an active learning strategy to avoid parcel-level annotation costs. The second branch aims at generating the parcel regions without a learning procedure. The final parcel-level segmentation result is achieved by integrating the outputs of these two branches in tandem. The robust effectiveness of PARCS is demonstrated by its outstanding performance on public and in-house datasets (an overall accuracy of 85.3% and an mIoU of 61.7% on the public PASTIS dataset, and an mIoU of 65.16% on the in-house dataset) . We also include subjective feedback from clients and discuss the lessons learned from deployment.

Multispectral Masked Autoencoder for Remote Sensing Representation Learning

Yibing Wei, Zhicheng Yang, Hang Zhou, Mei Han, Pedro Morgado, and Jui-Hsin Lai

Conference Paper NeurISP 2022

Abstract

Automated analysis of remote sensing (RS) imagery is the key to monitoring global issues. Hundreds of satellites collect plentiful RS data on a daily basis. However, most images remain unlabeled, thus supervised learning algorithms are unable to make full use of the massive amounts of RS data. To address this issue, we leverage the benefits of generative methods and build a multispectral masked autoencoder (MAE) to learn RS representation from RGB and Near-infrared (RGBN) data. The results indicate that the features extracted from RS images are more effective than those from the natural images for the RS task. This domain gap makes RS self-supervised pre-training generalized better in RS tasks. Moreover, the multispectral feature learned with a near-infrared signal increases the top-1 validation accuracy by 3.8%, showing that the multispectral feature is crucial in RS representation learning.

MultiEarth 2022 - The Champion Solution for the Matrix Completion Challenge via Multimodal Regression and Generation

Bo Peng, Hongchen Liu, Hang Zhou, Yuchuan Gou, and Jui-Hsin Lai

Competition CVPR 2022 - Multimodal Learning for Earth and Environment

Abstract

Earth observation satellites have been continuously monitoring the earth environment for years at different locations and spectral bands with different modalities. Due to complex satellite sensing conditions (e.g., weather, cloud, atmosphere, orbit), some observations for certain modalities, bands, locations, and times may not be available. The MultiEarth Matrix Completion Challenge in CVPR 2022 [1] provides the multimodal satellite data for addressing such data sparsity challenges with the Amazon Rainforest as the region of interest. This work proposes an adaptive realtime multimodal regression and generation framework and achieves superior performance on unseen test queries in this challenge with an LPIPS of 0.2226, a PSNR of 123.0372, and an SSIM of 0.6347.

MultiEarth 2022 - The Champion Solution for Image-to-Image Translation Challenge via Generation Models

Yuchuan Gou, Bo Peng, Hongchen Liu, Hang Zhou, and Jui-Hsin Lai

Competition CVPR 2022 - Multimodal Learning for Earth and Environment

Abstract

The MultiEarth 2022 Image-to-Image Translation challenge provides a well-constrained test bed for generating the corresponding RGB Sentinel-2 imagery with the given Sentinel-1 VV & VH imagery. In this challenge, we designed various generation models and found the SPADE [1] and pix2pixHD [2] models could perform our best results. In our self-evaluation, the SPADE-2 model with L1-loss can achieve 0.02194 MAE score and 31.092 PSNR dB. In our final submission, the best model can achieve 0.02795 MAE score ranked No.1 on the leader board.

Agriculture-Vision Challenge 2022 – The Runner-Up Solution for Agricultural Pattern Recognition via Transformer-based Models

Zhicheng Yang, Jui-Hsin Lai, Jun Zhou, Hang Zhou, Chen Du and Zhongcheng Lai

Competition CVPR 2022 - Agriculture Vision

Abstract

The Agriculture-Vision Challenge in CVPR is one of the most famous and competitive challenges for global researchers to break the boundary between computer vision and agriculture sectors, aiming at agricultural pattern recognition from aerial images. In this paper, we propose our solution to the third Agriculture-Vision Challenge in CVPR 2022. We leverage a data pre-processing scheme and several Transformer-based models as well as data augmentation techniques to achieve a mIoU of 0.582, accomplishing the 2nd place in this challenge.

PDD-GAN: Prior-based GAN Network with Decoupling Ability for Single Image Dehazing

Xiaoxuan Chai, Junchi Zhou, Hang Zhou, and Jui-Hsin Lai

Conference Paper ACM Multimedia 2022

Abstract

Single image dehazing is a challenging vision problem aiming to provide clear images for downstream computer vision applications (e.g., semantic segmentation, object detection, and super resolution). Most existing methods leverage the physical scattering model or convolutional neural networks (CNNs) for haze removal, which however ignore the complementary advantages between each other. Especially lacking marginal and visual prior instructions, CNN-based methods still have gaps in details and color recovery. To solve these, we propose a Prior-based with Decoupling ability Dehazing GAN Network (PDD-GAN), which is based on PeleetNet and attached with an attention module (CBAM). The prior-based decoupling approach consists of two parts: high and low frequency filtering and HSV contrastive loss. We process the image via a band-stop filter and add it as the fourth channel of data (RGBFHL) to decouple the hazy image at the structural level. Besides, a novel prior loss with contrastive regularization is proposed at the visual level. Sufficient experiments are carried out to demonstrate that PDD-GAN outperforms state-of-the-art methods by up to 0.86db in PSNR. In particular, extensive experiments indicate that RGBFHL increases by 0.99db compared with the original three-channel data (RGB) and the extra HSV prior loss escalates by 2.0db. Above all, our PDD-GAN indeed has the decoupling ability and improves the dehazing results.

Unsupervised Superpixel-Driven Parcel Segmentation of Remote Sensing Images Using Graph Convolutional Network

Fulin Huang, Zhicheng Yang, Hang Zhou, Chen Du, Andy J.Y. Wong, Yuchuan Gou, Mei Han, and Jui-Hsin Lai

Conference Paper The Web Conference (WWW) 2022

Abstract

Accurate parcel segmentation of remote sensing images plays an important role in ensuring various downstream tasks. Traditionally, parcel segmentation is based on supervised learning using precise parcel-level ground truth information, which is difficult to obtain. In this paper, we propose an end-to-end unsupervised Graph Con- volutional Network (GCN)-based framework for superpixel-driven parcel segmentation of remote sensing images. The key component is a novel graph-based superpixel aggregation model, which effectively learns superpixels’ latent affinities and better aggregates similar ones in spatial and spectral spaces. We construct a multi-temporal multi-location testing dataset using Sentinel-2 images and the ground truth annotations in four different regions. Extensive experiments are conducted to demonstrate the efficacy and robustness of our proposed model. The best performance is achieved by our model compared with the competing methods.

Theme-Matters: Fashion Compatibility Learning via Theme Attention

Jui-Hsin Lai, Bo Wu, Xin Wang, Dan Zeng, Tao Mei, and Jingen Liu

Conference Paper in arXiv, 2019

Abstract

Fashion compatibility learning is important to many fashion markets such as outfit composition and online fashion recommendation. Unlike previous work, we argue that fashion compatibility is not only a visual appearance compatible problem but also a theme-matters problem. An outfit, which consists of a set of fashion items (e.g., shirt, suit, shoes, etc.), is considered to be compatible for a “dating” event, yet maybe not for a “business” occasion. In this paper, we aim at solving the fashion compatibility problem given specific themes. To this end, we built the first real-world theme-aware fashion dataset comprising 14K around outfits labeled with 32 themes. In this dataset, there are more than 40K fashion items labeled with 152 fine-grained categories. We also propose an attention model learning fashion compatibility given a specific theme. It starts with a category-specific subspace learning, which projects compatible outfit items in certain categories to be close in the subspace. Thanks to strong connections between fashion themes and categories, we then build a theme-attention model over the category-specific embedding space. This model associates themes with the pairwise compatibility with attention, and thus compute the outfit-wise compatibility. To the best of our knowledge, this is the first attempt to estimate outfit compatibility conditional on a theme. We conduct extensive qualitative and quantitative experiments on our new dataset. Our method outperforms the state-of-the-art approaches.

NISP: Pruning Networks using Neuron Importance Score Propagation

Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, and Larry S. Davis

Conference Paper in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), 2018

Abstract

To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering statistics of an individual layer or two consecutive layers (e.g., prune one layer to minimize the reconstruction error of the next layer), ignoring the effect of error propagation in deep networks. In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the “final response layer” (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power. Specifically, we apply feature ranking techniques to measure the importance of each neuron in the FRL, and formulate network pruning as a binary integer optimization problem and derive a closed-form solution to it for pruning neurons in earlier layers. Based on our theoretical analysis, we propose the Neuron Importance Score Propagation (NISP) algorithm to propagate the importance scores of final responses to every neuron in the network. The CNN is pruned by removing neurons with least importance, and then fine-tuned to retain its predictive power. NISP is evaluated on several datasets with multiple CNN models and demonstrated to achieve significant acceleration and compression with negligible accuracy loss.

Brain Neuron Network Extraction and Analysis of Live Mice from Imaging Videos

Ruichi Yu, Jui-Hsin(Larry) Lai, Shun-Xuan Wang, and Ching-Yung Lin

Journal Paper International Journal of Multimedia Data Engineering and Management, Volume 8, Issue 4, 2017, Pages 1-20

Abstract

TModern brain mapping techniques are producing increasingly large datasets of anatomical or functional connection patterns. Recently, it became possible to record detailed live imaging videos of mammal brain while the subject is engaging routine activity. We analyze videos recorded from ten mice to describe how to detect neurons, extract neuron signals, map correlation of neuron signals to mice activity, detect the network topology of active neurons, and analyze network topology characteristics. We propose a neuron position alignment method to compensate the distortion and movement of cerebral cortex in live mouse brain and the background luminance compensation to extract and model neuron activity. To find out the network topology, a cross-correlation based method and a causal Bayesian network method are proposed and used for analysis. Afterwards, we did preliminary analysis on network topologies. The significance of this paper is on how to extract neuron activities from live mouse brain imaging videos and a network analysis method to analyze its topology.

Neuron Activity Extraction and Network Analysis on Mouse Brain Videos

Jui-Hsin(Larry) Lai , Ruichi(Rich) Yu, and Ching-Yung Lin

Conference Paper in Proceedings of IEEE International Symposium on Multimedia, 2016

Abstract

Modern brain mapping techniques are produc- ing increasingly large datasets of anatomical or functional connection patterns. Recently, it became possible to record detailed live imaging videos of mammal brain while the subject is engaging routine activity. We analyze a dataset of videos recorded from ten mice to describe how to detect neurons, extract neuron signals, map correlation of neuron signals to mice activity, detect the network topology of active neurons, and analyze network topology characteristics. We propose neuron position alignment to compensate the distortion and movement of cerebral cortex in live mouse brain and the background luminance compensation to extract and model neuron activity. To find out the network topology as an undirected graph model, a cross-correlation based method is proposed and used for analysis. Afterwards, we did preliminary analysis on network topologies. The significance of this paper is on how to extract neuron activities from live mouse brain imaging videos and a network analysis method to analyze topology that can potentially provide insight on how neurons are actively connected under stimulus, rather than analyzing static neural networks.

Multi-Modality Mobile Image Recognition Based on Thermal and Visual Cameras

Jui-Hsin(Larry) Lai , Chung-Ching Lin, Chun-Fu(Richard) Chen, and Ching-Yung Lin

Conference Paper in Proceedings of IEEE International Symposium on Multimedia, 2015

Abstract

The advances of mobile computing and sensor technology have turned the mobile devices into powerful instruments. The integration of thermal and visual cameras extends the capability of computer vision, due to the fact that both images reveal different characteristics in images; however, image alignment is a challenge. This paper proposes an effective approach to align image pairs for event detection on mobile through image recognition. We leverage thermal and visual cameras as multi-modality sources for image recognition. By analyzing the heat pattern, the proposed APP can identify the heating sources and help users inspect their house heating system; on the other hand, with applying image recognition, the proposed APP furthermore can help field workers identify the asset condition and provide the guidance to solve their issues.

Real-Time Human Movement Retrieval and Assessment with Kinect Sensor

Min-Chun Hu, Chi-Wen Chen, Wen-Huang Cheng, Che-Han Chang, Jui-Hsin Lai, and Ja-Ling Wu

Journal Paper IEEE Transactions on Cybernetics, Volume 45, Issue 4, 2015, Page 742-753

Abstract

The difficulty of vision-based posture estimation is greatly decreased with the aid of commercial depth camera, such as Microsoft Kinect. However, there is still much to do to bridge the results of human posture estimation and the understanding of human movements. Human movement assessment is an important technique for exercise learning in the field of healthcare. In this paper, we propose an action tutor system which enables the user to interactively retrieve a learning exemplar of the target action movement and to immediately acquire motion instructions while learning it in front of the Kinect. The proposed system is composed of two stages. In the retrieval stage, nonlinear time warping algorithms are designed to retrieve video segments similar to the query movement roughly performed by the user. In the learning stage, the user learns according to the selected video exemplar, and the motion assessment including both static and dynamic differences is presented to the user in a more effective and organized way, helping him/her to perform the action movement correctly. The experiments are conducted on the videos of ten action types, and the results show that the proposed human action descriptor is representative for action video retrieval and the tutor system can effectively help the user while learning action movements.

IBM System G Social Media Solution: Analyze Multimedia Content, People, and Network Dynamics in Context

Ching-Yung Lin, Danny Yeh, Nan Cao, Jui-Hsin Lai, Chun-Fu (Richard) Chen*, Conglei Shi, Jie Lu, Jason Crawford, Keith Houck, Yinglong Xia, Sabrina Lin, Richard B. Hull, Fenno F. Heath III, Piyawadee Sukaviriya, and SweeFen Goh

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2015, pp. 1-4

Abstract

The advances of mobile computing and sensor technology have turned the mobile devices into powerful instruments. The integration of thermal and visual cameras extends the capability of computer vision, due to the fact that both images reveal different characteristics in images; however, image alignment is a challenge. This paper proposes an effective approach to align image pairs for event detection on mobile through image recognition. We leverage thermal and visual cameras as multi-modality sources for image recognition. By analyzing the heat pattern, the proposed APP can identify the heating sources and help users inspect their house heating system; on the other hand, with applying image recognition, the proposed APP furthermore can help field workers identify the asset condition and provide the guidance to solve their issues.

Towards Balance-Affinity Tradeoff in Concurrent Subgraph Traversals

Yinglong Xia, Lifeng Nai, and Jui-Hsin Lai

Conference Paper in Proceedings of 29th IEEE International Parallel & Distributed Processing Symposium, IPDSP 2015, pp. 936-945

Abstract

Graph technologies have been widely utilized for building big data analytics systems. Since those systems are typically wrapped as service providers in industry, it is critical to handle concurrent queries at runtime by incorporating a set of parallel processing units. In many cases, such queries result in local subgraph traversals, which essentially require an efficient scheduling scheme to explore the tradeoff between the workload balance and the task affinity. In this paper, we present an auction based approach for allocating concurrent subgraph traversals onto the processors. A dynamic weighted bipartite graph is built to model the affinity between subgraph traversals and processors, and the workload of processors. In particular, an edge between a task and a processor in the bipartite graph represents that the data needed by this task is likely cached by this processor. The task vertices and edges are dynamically added or removed, and the heavier edge weight represents stronger belief of the affinity. Besides, the edge weight is also governed by the current workload of the corresponding processor. We perform a parallel auction algorithm to figure out a near-optimal assignment of the subgraph traversal tasks onto the processors, which therefore addresses both the workload balance and the task affinity. The auction algorithm is performed incrementally, so as to capture the changes of the bipartite graph structure. Our experiments show the superior performance of the proposed method for various real-world use cases based on concurrent subgraph traversals.

VLSI Architecture Design of Guided Filter for 30fps Full-HD Video

Chieh-Chi Kao, Jui-Hsin Lai, and Shao-Yi Chien

Journal Paper IEEE Transactions on Circuits and Systems for Video Technology, Volume 24, Issue 3, 2014, Page 513-524

Abstract

Filtering is widely used in image and video process- ing for various applications. Recently, the guided filter had been proposed and became one of the popular filtering methods. In this paper, to achieve the computation demand of guided filtering in Full-HD video, a double integral image architecture for guided filter ASIC design is proposed. In addition, a reformation of guided filter formula is proposed which can prevent the error resulted from truncation in fractional part and modify the regularization parameter ε on user’s demand. The hardware architecture of guided image filter is then proposed and can be embedded in mobile devices to achieve real-time HD applications. To the best of our knowledge, this work is also the first ASIC design for guided image filter. With TSMC 90nm cell library, the design can operate at 100MHz and support for Full-HD (1920x1080) 30 fps with 92.9K gate counts and 3.2KB on-chip memory. Moreover, for the hardware efficiency, our architecture is also the best comparing to other previous works with bilateral filter.

Concurrent Image Query Using Local Random Walk With Restart on Large Scale Graphs

Yinglong Xia, Jui-Hsin Lai, Lifeng Nai, and Ching-Yung Lin

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2014, pp. 1-6

Abstract

Efficient image query is a fundamental challenge in many large scale multimedia applications, especially when han dling many queries concurrently. In this paper, we proposed a novel approach called graph local random walk for high performance concurrent image query. Specifically, we or ganize the massive images set into a large scale graph us ing graph database, according to the similarity between im ages. A heuristic method is utilized to map each query im age to some vertex in the graph, followed by a local search to refine the query results using an alternative of local ran dom walk on graph. The local random walk process is es sentially a weighted partial traversal in the local subgraphs for finding a better match of the query images. We organize the graph of the image set in a parallelization amenable ap proach,so that a set of partial graph traversal for local random walk can be performed concurrently,taking the advantage of the multithreading capability of processors. We implemented the proposed method in state-of-the-art multicore platforms. The experimental result shows that the graph local random walk based approach outperforms baseline methods in terms of both throughput and scalability.

TravelBuddy: Interactive Travel Route Recommendation with a Visual Scene Interface

Cheng-Yao Fu, Min-Chun Hu, Jui-Hsin Lai, Hsuan Wang, and Ja-Ling Wu

Conference Paper in Proceedings of The 20th IEEE International Conference on MultiMedia Modeling, MMM 2014, pp.219-230

Abstract

In this work, we propose a convenient system for trip planning and aim to change the behavior of trip planners from exhaustively searching information to receiving useful travel recommendations. Given the essential and optional user inputs, our system automatically recommends a route that suits the traveler based on a real-time route planning algorithm and allows the user to make adjustment according to their preferences. We construct a traveling database by collecting photos taken around famous attractions and analyzing these photos to extract each attraction’s travel information including popularity, typical stay time, available visiting time in a day, and visual scenes of different time. All the extracted travel information are presented to the user to help him/her efficiently know more about different attractions so that he/she can modify the inputs to obtain a more favorable travel route. The experimental results show that our system can effectively help the user to plan the journey.

Tennis Real Play

Jui-Hsin Lai, Chieh-Li Chen, Chieh-Chi Kao, Po-Chen Wu, Min-Chun Hu, and Shao-Yi Chien

Journal Paper IEEE Transaction on Multimedia, Volume 14, Issue 6, 2012, Page 1602-1617

Abstract

Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from videos of real matches. The key techniques proposed for TRP include player modeling and video-based player/court rendering. For player model creation, we propose the process for database normaliza- tion and the behavioral transition model of tennis players, which might be a good alternative for motion capture in the conven- tional video games. For player/court rendering, we propose the framework for rendering vivid game characters and providing the real-time ability. We can say that image-based rendering leads to a more interactive and realistic rendering. Experiments show that video games with vivid viewing effects and characteristic players can be generated from match videos without much user intervention. Because the player model can adequately record the ability and condition of a player in the real world, it can then be used to roughly predict the results of real tennis matches in the next days. The results of a user study reveal that subjects like the increased interaction, immersive experience, and enjoyment from playing TRP.

Semantic Scalability Using Tennis Videos as Examples

Jui-Hsin Lai and Shao-Yi Chien

Journal Paper Multimedia Tool and Applications (MTAP), Volume 59, Issue 2, 2012, Page 585-599

Abstract

Scalable video is the research topic to provide different size of video bitstream under different transmission bandwidth. In this paper, the semantic scalability is proposed that provides the scalable videos in semantic domain, and the tennis videos are used as the experiments. Contrary to decreasing the video quality to reduce the bitrates, the lower bitstream size is achieved by abandoning the video contents with less semantic importance. The experimental results show that the proposed semantic scalability provides four levels of the scalable videos and maintains the visual quality in watching the game video. For user study, evaluators identify the visual quality of semantic scalability is more acceptable and the game information is clearer than Scalable Video Coding. The proposed scalability in semantic domain provides a new aspect for the scalable video.

Action Tutor : Real-Time Exemplar-based Sequential Movement Assessment with Kinect Sensor

Chi-Wen Chen, Min-Chun Hu, Wen-Huang Cheng, Che-Han Chang, Jui-Hsin Lai, and Ja-Ling Wu

Conference Paper in Proceedings of The 20th ACM International Conference on Multimedia, Technical Demo, MM 2012, pp.1263-1264

Abstract

With the aid of depth camera, such as Microsoft Kinect, the difficulty of vision-based posture estimation is greatly decreased, and human action analysis has achieved a wide range of applications. However, there is still much to do to develop effective movement assessment technique, which bridges the results of human posture estimation and the un- derstanding of human action performance. In this work, we propose an action tutor system which enables the user to interactively retrieve the learning exemplar of the target action movement and to immediately acquire motion instructions while learning it in front of the Kinect. In the retrieval stage, non-linear time warping algorithms are designed to retrieve video segments similar to the query movement roughly performed by the user. In the learning stage, the user learns according to the selected video exemplar, and the motion assessment including both static and dynamic differences is presented to the user in a more effective and organized way, helping him/her to perform the action movement correctly.

Stable Pose Estimation with a Motion Model in Real-Time Application

Po-Chen Wu, Jui-Hsin Lai, Ja-Ling Wu, and Shao-Yi Chien

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2012, pp.314-319

Abstract

Estimation of a object pose from camera is a well-developing topic in computer vision. In theory, the pose from a calibrated camera can be uniquely determined. But in practice, most of the real-time pose estimation algorithms suffer from pose ambiguity due to low accuracy of the target object. We think that pose ambiguity—two distinct local minima of the according error function—exist because of the phenomenon of geometric illusions. Both of the ambiguous poses are plausible. After obtaining the solution of two minima (pose candidates), we develop a real-time algorithm for stable pose estimation of a target objects with a motion model. In the experimental results, the proposed algorithm diminish the significance of pose jumping and pose jittering effectively. To the best of our knowledge, this is the first work to solve the pose ambiguity problem with motion model in real-time application.

Sampling Technique Analysis of Nystrom Approximation in Pixel-Wise Affinity Matrix

Chieh-Chi Kao, Jui-Hsin Lai, Ja-Ling Wu, and Shao-Yi Chien

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2012, pp.1009-1014

Abstract

Spectral graph methods are widely employed in image segmentation, and they exhibit excellent performance. However, for high-resolution images, it is impractical to directly calculate the eigenvectors of the affinity matrix owing to the high computational requirements. The Nystrom method provides an efficient way to approximate the large-scale affinity matrix by low-rank approximation. In the machine learning field, previous studies have mainly focused on less data points with high dimensional features. To the best of our knowledge, this is the first study to discuss the performance of sampling methods for Nystrom approximation, in which we focus on the pixel-wise affinity matrix for a single image. In this paper, we propose a mean-shift segmentation-based Nystrom sampling technique for image analysis. The experimental results show that for images with simple compositions and backgrounds, k-means sampling performs better, whereas for images with more complicated compositions and backgrounds, the proposed method can perform better.

Tennis Video 2.0: A New Presentation of Sports Videos with Content Separation and Rendering

Jui-Hsin Lai, Chieh-Li Chen, Chieh-Chi Kao, and Shao-Yi Chien

Journal Paper Journal of Visual Communication and Image Representation (JVCI), Volume 22, Issue 3, 2011, Pages 271-283

Abstract

Tennis videos are used as an example for the implementation of a viewing program called as Tennis Video 2.0. For the methods in video analysis, background generation by considering the pixels in temporal and spatial distribution is proposed; fore- ground segmentation combining automatic trimap generation and matting model is proposed. To provide more functions in watching videos, the rendering flow of video contents and the semantic Scalability are proposed. With the new analysis and rendering tools, the presentation of sports videos has three prop- erties—Structure, Interactivity, and Scalability. The experiments show that several broadcasting game videos are employed to evaluate the robustness and performance of the proposed system. For user study, 20 evaluators highly identify that Tennis Video 2.0 is a new presentation of sports videos and give people better viewing experience.

Tennis Real Play: an Interactive Tennis Game with Models from Real Videos

Jui-Hsin Lai, Chieh-Li Chen, Po-Chen Wu, Chieh-Chi Kao, and Shao-Yi Chien

Conference Paper in Proceedings of the 19th ACM International Conference on Multimedia, MM 2011, pp.483-492. (Full Paper 10 Pages)

Abstract

Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from videos of real matches. The key techniques proposed for TRP include player modeling and video-based player/court rendering. For player model creation, we propose a database normalization process and a behavioral transition model of tennis players, which might be a good alternative for motion capture in the conventional video games. For player/court rendering, we propose a framework for rendering vivid game characters and providing the real-time ability. We can say that image-based rendering leads to a more interactive and realistic rendering. Experiments show that video games with vivid viewing effects and characteristic players can be generated from match videos without much user intervention. Because the player model can adequately record the ability and condition of a player in the real world, it can then be used to roughly predict the results of real tennis matches in the next days. The results of a user study reveal that subjects like the increased interaction, immersive experience, and enjoyment from playing TRP.

Architecture Design and Analysis of Image-Based Rendering Engine

Jui-Hsin Lai, Chieh-Li Chen, and Shao-Yi Chien

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2011, pp. 1-6. (Top 15% Paper in ICME2011)

Abstract

Image-based rendering (IBR) is a technique to render the video from images, and it provides users to have more interaction and immersive experience in watching a video. In this paper, we integrate the computation of several IBR applications, analyze the bandwidth of memory access, and design an architecture to process the computation of IBR. Experimental results show that the proposed IBR Engine is able to render a video with resolution 720×480 and 30 frames per second, which is 12.7 times faster than a Core2Due 2.83 GHz CPU. For the extensions, IBR Engine can be embedded in the television system and lets viewers enjoy the functions from IBR.

Automatic Object Segmentation with Salient Color Model

Chieh-Chi Kao, Jui-Hsin Lai, and Shao-Yi Chien

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2011, pp. 1-6. (Best Paper Candidate & Top 15% Paper in ICME2011)

Abstract

Image segmentation is a well-developing topic in the image processing, and a number of previous works have been proposed and achieved high performance. However, most previous works needed user-assistance to provide the prior information of the target object in the segmentation. In this paper we propose an unsupervised scheme, combining the salient object detection and segmentation method, to segment the target object without any prior information from users. The experimental results show that the proposed salient color model derived with salient features can provide a prior information with high confidence to generate precise segmentation automatically. The proposed color model of salient objects can not only be applied with Min-Cut algorithm, but also extended to more segmentation algorithms, like matting or non-parametric model.

Tennis Real Play

Jui-Hsin Lai, Po-Chen Wu, Chieh-Li Chen, Chieh-Chi Kao, and Shao-Yi Chien

Conference Paper in Proceedings of IEEE International Conference on Consumer Electronics, ICCE 2011, pp. 275-276

Abstract

Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from real game videos. The key techniques proposed for TRP include player modeling and video-based player/court rendering. Experiments show that vivid rendering results can be generated.

Vivid Tennis player Rendering System Using Broadcasting Game Videos

Chieh-Li Chen, Jui-Hsin Lai, and Shao-Yi Chien

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2010, pp.1085-1090

Abstract

Image-based rendering has been highly developed for its wide applications such as view synthesis and special effects in movies. In this paper, we proposed a tennis player rendering system synthesizing diverse player action/motion based on extracted database from broadcasting game videos. The system gathers database by retrieving the player from videos and synthesizes various kinds of player action/motion according to the user's instructions. The results show that the proposed rendering system can render smooth action/motion transition with satisfactory visual effect. For further applications, the proposed system can be used in interactive tennis games with image textures.

Tennis Video with Semantic Scalability

Jui-Hsin Lai, and Shao-Yi Chien

Conference Paper in Proceedings of 11th IEEE International Symposium on Multimedia, ISM 2009, pp. 523-526

Abstract

Scalable video is the research topic to provide different size of video bitstream under different transmission bandwidth. In this paper, the semantic scalability is proposed that provides the scalable videos in semantic domain, and the tennis videos are used as the experiments. Contrary to decreasing the video quality to reduce the bitrates, the lower bitstream size is achieved by abandoning the video contents with less semantic importance. The experimental results show that the proposed semantic scalability provides four levels of the scalable videos and maintains the visual quality in watching the game video. The study of the scalability in semantic domain provides a new aspect for the scalable video.

Super-Resolution Sprite with Foreground Removal

Jui-Hsin Lai, Chieh-Chi Kao, and Shao-Yi Chien

Conference Paper in Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2009, pp.1306-1309

Abstract

Sprite is an image constructed from video clips and is also a medium for multimedia applications. An automatic sprite generation with foreground removal and super-resolution is proposed in this paper. To remove the foreground objects, each pixel-value on the sprite is iteratively updated by the value with maximum appearance probability on temporal and spatial distribution. By storing the half-pixel, superresolution sprite has less blurring-defect from source video. In the result, the generated sprite preserves the complete scenes of background and has higher image quality, and it can used to increase the visual quality in current sprite applications and also employed to facilitate video segmentation.

Tennis Video Enrichment with Content Layer Separation and Real-Time Rendering in Sprite Plane

Jui-Hsin Lai and Shao-Yi Chien

Conference Paper in Proceedings of IEEE 10th Workshops on Multimedia Signal Processing, MMSP 2008, pp.672-675

Abstract

Sport video enrichment can provide viewers more interaction and user experiences. In this paper, with tennis sport video as an example, two techniques are proposed for video enrichment: content layer separation and real-time rendering. The video content is decomposed into different layers, like field, players and ball, and the enriched video is rendered by re-integrated these layers information. They are both executed in sprite plane to avoid complex 3D model construction and rendering. Experiments shows that it can generate nature and seamless edited video by viewerspsila requests, and the real-time processing speed of 30 720times480 frames per second can be achieved on a 3 GHz CPU.

Baseball and Tennis Video Annotation with Temporal Structure Decomposition

Jui-Hsin Lai and Shao-Yi Chien

Conference Paper in Proceedings of IEEE 10th Workshops on Multimedia Signal Processing, MMSP 2008, pp.676-679

Abstract

Sport video annotation can help viewers easily browse sport video content and quickly find the hot events and highlights in a game. Although many annotation algorithms have been proposed, they are not suitable for practical implementation since the high complexity and the low precision rates are not acceptable. In this paper, a method of sport video temporal structure decomposition, which decomposes the sport video into many video clips, is proposed. Then score box information and additional semantic information are important clues for event annotation. Experimental results show that the proposed algorithm can successfully and effectively decompose video into clips. The annotation results also have extremely high precision and recall rates for both baseball and tennis videos.

Tennis Video 2.0: A New Framework of Sport Video Applications

Jui-Hsin Lai and Shao-Yi Chien

Conference Paper in Proceedings of the 15th ACM International Conference on Multimedia, MM 2007, pp. 1087-1088

Abstract

This video demo presents a new framework of sport video applications called as Tennis Video 2.0. The proposed information extraction scheme retrieves the temporal structure of a video and separates the video foreground and background objects into different layers. With the structure and layer information, the new multimedia is generated. Contrary to the conventional video contents, the proposed new multimedia enables users to generate their own contents and feedback requests to the video players for more interaction. Users even can share their created contents with friends in different transmission bandwidth with considering the semantic.

Publications

Filter by type:

TinyWT: A Large-Scale Wind Turbine Dataset of Satellite Images for Tiny Object Detection

Abstract

PARCS: A Deployment-Oriented AI System for Robust Parcel-Level Cropland Segmentation of Satellite Images

Abstract

Multispectral Masked Autoencoder for Remote Sensing Representation Learning

Abstract

MultiEarth 2022 - The Champion Solution for the Matrix Completion Challenge via Multimodal Regression and Generation

Abstract

MultiEarth 2022 - The Champion Solution for Image-to-Image Translation Challenge via Generation Models

Abstract

Agriculture-Vision Challenge 2022 – The Runner-Up Solution for Agricultural Pattern Recognition via Transformer-based Models

Abstract

PDD-GAN: Prior-based GAN Network with Decoupling Ability for Single Image Dehazing

Abstract

Unsupervised Superpixel-Driven Parcel Segmentation of Remote Sensing Images Using Graph Convolutional Network

Abstract

Theme-Matters: Fashion Compatibility Learning via Theme Attention

Abstract

NISP: Pruning Networks using Neuron Importance Score Propagation

Abstract

Brain Neuron Network Extraction and Analysis of Live Mice from Imaging Videos

Abstract

Neuron Activity Extraction and Network Analysis on Mouse Brain Videos

Abstract

Multi-Modality Mobile Image Recognition Based on Thermal and Visual Cameras

Abstract

Real-Time Human Movement Retrieval and Assessment with Kinect Sensor

Abstract

IBM System G Social Media Solution: Analyze Multimedia Content, People, and Network Dynamics in Context

Abstract

Towards Balance-Affinity Tradeoff in Concurrent Subgraph Traversals

Abstract

VLSI Architecture Design of Guided Filter for 30fps Full-HD Video

Abstract

Concurrent Image Query Using Local Random Walk With Restart on Large Scale Graphs

Abstract

TravelBuddy: Interactive Travel Route Recommendation with a Visual Scene Interface

Abstract

Tennis Real Play

Abstract

Semantic Scalability Using Tennis Videos as Examples

Abstract

Action Tutor : Real-Time Exemplar-based Sequential Movement Assessment with Kinect Sensor

Abstract

Stable Pose Estimation with a Motion Model in Real-Time Application

Abstract

Sampling Technique Analysis of Nystrom Approximation in Pixel-Wise Affinity Matrix

Abstract

Tennis Video 2.0: A New Presentation of Sports Videos with Content Separation and Rendering

Abstract

Tennis Real Play: an Interactive Tennis Game with Models from Real Videos

Abstract

Architecture Design and Analysis of Image-Based Rendering Engine

Abstract

Automatic Object Segmentation with Salient Color Model

Abstract

Tennis Real Play

Abstract

Vivid Tennis player Rendering System Using Broadcasting Game Videos

Abstract

Tennis Video with Semantic Scalability

Abstract

Super-Resolution Sprite with Foreground Removal

Abstract

Tennis Video Enrichment with Content Layer Separation and Real-Time Rendering in Sprite Plane

Abstract

Baseball and Tennis Video Annotation with Temporal Structure Decomposition

Abstract

Tennis Video 2.0: A New Framework of Sport Video Applications

Abstract