Hi, I am Larry. Welcome to my selected research.  
In this talk, we introduce our proposed AI+ Remote Sensing techniques from the Research Lab of Ping An Technology. One of the techniques is our deep learning haze removal model which can effectively remove the interference of haze in the satellite images and observe the true ground reflectance. Next, we introduce our super-resolution model which can enhance 4x image details. The SR model has been deployed to the Sentinel-2 satellite imagery and greatly improve its image quality. Last, we introduce our crop recognition system. The system includes a user interface for a user to label a few of training samples, and the proposed crop recognition model can be trained on the fly to be deployed on a broad geo-area immediately. In addition to the techniques, our AI+ Remote Sensing technologies have been supporting the carbon(CO2) emission analysis for Environment, Society, and Government(ESG) Department, flooding and disaster analysis for Smart City Department, and crop field forecast for Investment Department in Ping An Group. Watch Talk Video & Slide
Fashion compatibility learning is important to many fashion markets such as outfit composition and online fashion recommendation. Unlike previous work, we argue that fashion compatibility is not only a visual appearance compatible problem but also a theme-matters problem. An outfit, which consists of a set of fashion items (e.g., shirt, suit, shoes, etc.), is considered to be compatible for a “dating” event, yet maybe not for a “business” occasion. In this paper, we aim at solving the fashion compatibility problem given specific themes. To this end, we built the first real-world theme-aware fashion dataset comprising 14K around outfits labeled with 32 themes. In this dataset, there are more than 40K fashion items labeled with 152 fine-grained categories. We also propose an attention model learning fashion compatibility given a specific theme. It starts with a category-specific subspace learning, which projects compatible outfit items in certain categories to be close in the subspace. Thanks to strong connections between fashion themes and categories, we then build a theme-attention model over the category-specific embedding space. This model associates themes with the pairwise compatibility with attention, and thus compute the outfit-wise compatibility. To the best of our knowledge, this is the first attempt to estimate outfit compatibility conditional on a theme. We conduct extensive qualitative and quantitative experiments on our new dataset. Our method outperforms the state-of-the-art approaches. More details.
The frontal cameras on mobile devices are pervasive that makes face authentication become a convenient way to authenticate access. However, one of drawbacks, it's vulnerable to be broken by placing a face picture in front of camera. How to differentiate a live face or a face picture becomes a challenge for the applications of face authentication. From our findings, the facial temperature distribution is unique to each person, which could be extracted by thermal camera. Here, we propose an algorithm to extract facial thermal signatures for identity recognition and apply this idea for live face recognition. More details.
How to prevent information leak from mobile devices is a big challenge for corporates and governments. Especially, cameras on smartphones are pervasive and confidential information could be easily captured by just one click. Here, we design an APP running in the background for document image analysis, logo detection, and scene recognition, which is running on the edge without cloud support with frame rate 10 fps (on iPhone 7). Once detecting any confidential documents or secured scene on the preview, the APP will send out warnings to the corporates/governments and disable the smartphone immediately. For more details, please see demo videos. More details.
The advances of mobile computing and sensor technology have turned the mobile devices into powerful instruments. The integration of thermal and visual cameras extends the capability of computer vision, due to the fact that both images reveal different characteristics in images; however, image alignment is a challenge. This paper proposes an effective approach to align image pairs for event detection on mobile through image recognition. We leverage thermal and visual cameras as multi-modality sources for image recognition. By analyzing the heat pattern, the proposed APP can identify the heating sources and help users inspect their house heating system; on the other hand, with applying image recognition, the proposed APP furthermore can help field workers identify the asset condition and provide the guidance to solve their issues. More details.
A real-time object tracking algorithm is proposed to cope with the variables of appearance changes like translation, zooming, rotation, panning/tilting, occlusion, luminance change, and blur. The proposed tracking scheme includes three steps. First, regional filter is employed to detect the candidate regions of targets. Next, these candidate regions are scaled to an uniform size for feature extraction. Finally, using feature matching to calculate the similarity between an instance and the target, and then store this instance if recognized as the target. We can see that the instance database would contain object's difference appearances as the tracking time going on. In other words, recognition capability will increase while the database become enlarging. To keep high computation performance, an algorithm with database reduction is proposed to limit the size of database. From our experiments, the proposed tracking system can achieve 15 FPS with resolution 1280x720 on an Intel I5 CPU 1.8GHz. More details.
Modern brain mapping techniques are producing increasingly large datasets of anatomical or functional connection patterns. A datasets of videos recorded from ten mice’s brain neuron is provided in this project and multiple methods of network analysis are used to model the mouse brain neuron network. To find out the network topology as an undirected graph model, a cross-correlation based method is proposed and multiple thresholds are chosen to adjust the model. Considering the causality of the brain neuron network, a score-based algorithm is implemented to learn the neuron network’s structure as a causal Bayesian network. Then, network clustering is conducted to separate the mouse brain neurons as interconnected functional groups. Conclusions about mouse brain neuron network topology and functional segregation are drawn after several verifications and simulations of the proposed models and algorithms. More details.
Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from videos of real matches. The key techniques proposed for TRP include player modeling and video-based player/court rendering. For player model creation, we propose a database normalization process and a behavioral transition model of tennis players, which might be a good alternative for motion capture in the conventional video games. For player/court rendering, we propose a framework for rendering vivid game characters and providing the real-time ability. We can say that image-based rendering leads to a more interactive and realistic rendering. Experiments show that video games with vivid viewing effects and characteristic players can be generated from match videos without much user intervention. Because the player model can adequately record the ability and condition of a player in the real world, it can then be used to roughly predict the results of real tennis matches in the next days. The results of a user study reveal that subjects like the increased interaction, immersive experience, and enjoyment from playing TRP. More details.
Image-based rendering (IBR) is a technique to render the video from images, and it provides users to have more interaction and immersive experience in watching a video. In this paper, we integrate the computation of several IBR applications, analyze the bandwidth of memory access, and design an architecture to process the computation of IBR. Experimental results show that the proposed IBR Engine is able to render a video with resolution 720×480 and 30 frames per second, which is 12.7 times faster than a Core2Due 2.83 GHz CPU. For the extensions, IBR Engine can be embedded in the television system and lets viewers enjoy the functions from IBR. More details.
Scalable video is the research topic to provide different size of video bitstream under different transmission bandwidth. In this paper, the semantic scalability is proposed that provides the scalable videos in semantic domain, and the tennis videos are used as the experiments. Contrary to decreasing the video quality to reduce the bitrates, the lower bitstream size is achieved by abandoning the video contents with less semantic importance. The experimental results show that the proposed semantic scalability provides four levels of the scalable videos and maintains the visual quality in watching the game video. For user study, evaluators identify the visual quality of semantic scalability is more acceptable and the game information is clearer than Scalable Video Coding. The proposed scalability in semantic domain provides a new aspect for the scalable video. More details.
Research on sports videos is interesting and full of challenges due to the increase in the number of game videos and the demand for video diversification. This paper proposes a new method for presenting sports videos. Tennis videos are used as an example for the implementation of a viewing program called as Tennis Video 2.0. By video processes of structure analysis, content extraction, and enriched video rendering, the presentation of sports videos has three properties---Structure, Interactivity, and Scalability. Structure allows people to browse game videos and watch highlights on demands. Furthermore, the proposed strategy search is a convenient way to find favorite hit patterns. Interactivity provides people with functions to watch enriched game video rendered in real-time. These functions can provide more enjoyment to viewers watching games. Scalability enables the video to be scalable in a semantic domain. Four different levels of video content are transmitted to accommodate different bandwidth limitations. In conclusion, the proposed sports video viewer allows people to watch games in a different way than previously possible. More details.
Sprite is an image constructed from video clips and is also a medium for multimedia applications. An automatic sprite generation with foreground removal and super-resolution is proposed in this paper. To remove the foreground objects, each pixel-value on the sprite is iteratively updated by the value with maximum appearance probability on temporal and spatial distribution. By storing the half-pixel, super-resolution sprite has less blurring-defect from source video. In the result, the generated sprite preserves the complete scenes of background and has higher image quality, and it can used to increase the visual quality in current sprite applications and also employed to facilitate video segmentation. More details.