Video Applications

Published: 09 Oct 2015 Category: deep_learning

Papers

You Lead, We Exceed: Labor-Free Video Concept Learningby Jointly Exploiting Web Videos and Images

intro: CVPR 2016
intro: Lead–Exceed Neural Network (LENN), LSTM
paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/CVPR16_webly_final.pdf

Video Fill in the Blank with Merging LSTMs

intro: for Large Scale Movie Description and Understanding Challenge (LSMDC) 2016, “Movie fill-in-the-blank” Challenge, UCF_CRCV
intro: Video-Fill-in-the-Blank (ViFitB)
arxiv: https://arxiv.org/abs/1610.04062

Video Pixel Networks

intro: Google DeepMind
arxiv: https://arxiv.org/abs/1610.00527

Robust Video Synchronization using Unsupervised Deep Learning

arxiv: https://arxiv.org/abs/1610.05985

Video Propagation Networks

intro: CVPR 2017. Max Planck Institute for Intelligent Systems & Bernstein Center for Computational Neuroscience
project page: https://varunjampani.github.io/vpn/
arxiv: https://arxiv.org/abs/1612.05478
github(Caffe): https://github.com/varunjampani/video_prop_networks

Video Frame Synthesis using Deep Voxel Flow

project page: https://liuziwei7.github.io/projects/VoxelFlow.html
arxiv: https://arxiv.org/abs/1702.02463

Optimizing Deep CNN-Based Queries over Video Streams at Scale

intro: Stanford InfoLab
keywords: NoScope. difference detectors, specialized models
arxiv: https://arxiv.org/abs/1703.02529
github: https://github.com/stanford-futuredata/noscope
github: https://github.com/stanford-futuredata/tensorflow-noscope

NoScope: 1000x Faster Deep Learning Queries over Video

http://dawn.cs.stanford.edu/2017/06/22/noscope/

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

intro: CVPR 2017. Stanford University & University of Southern California
arxiv: https://arxiv.org/abs/1703.02521

ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos

https://arxiv.org/abs/1703.09788

Unsupervised Learning Layers for Video Analysis

intro: Baidu Research
intro: “The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization”
arxiv: https://arxiv.org/abs/1705.08918

Look, Listen and Learn

intro: DeepMind
intro: “Audio-Visual Correspondence” learning
arxiv: https://arxiv.org/abs/1705.08168

Video Imagination from a Single Image with Transformation Generation

intro: Peking University
arxiv: https://arxiv.org/abs/1706.04124
github: https://github.com/gitpub327/VideoImagination

Learning to Learn from Noisy Web Videos

intro: CVPR 2017. Stanford University & CMU & Simon Fraser University
arxiv: https://arxiv.org/abs/1706.02884

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

intro: Accepted on the second International Workshop on Egocentric Perception, Interaction and Computing(EPIC) at International Conference on Computer Vision(ICCV-17)
arxiv: https://arxiv.org/abs/1709.06495

Learning Binary Residual Representations for Domain-specific Video Streaming

intro: AAAI 2018
project page: http://research.nvidia.com/publication/2018-02_Learning-Binary-Residual
arxiv: https://arxiv.org/abs/1712.05087

Video Representation Learning Using Discriminative Pooling

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1803.10628

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1804.07667

Deep Keyframe Detection in Human Action Videos

intro: two-stream ConvNet
arxiv: https://arxiv.org/abs/1804.10021

FFNet: Video Fast-Forwarding via Reinforcement Learning

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1805.02792

Fast forwarding Egocentric Videos by Listening and Watching

https://arxiv.org/abs/1806.04620

Scanner: Efficient Video Analysis at Scale

intro: CMU
arxiv: https://arxiv.org/abs/1805.07339

Massively Parallel Video Networks

intro: DeepMind & University of Oxford
arxiv: https://arxiv.org/abs/1806.03863

Object Level Visual Reasoning in Videos

intro: LIRIS & Facebook AI Research
arxiv: https://arxiv.org/abs/1806.06157

Video Time: Properties, Encoders and Evaluation

intro: BMVC 2018
arxiv: https://arxiv.org/abs/1807.06980

Inserting Videos into Videos

intro: CVPR 2019
arxiv: https://arxiv.org/abs/1903.06571

Video Classification

Large-scale Video Classification with Convolutional Neural Networks

intro: CVPR 2014
project page: http://cs.stanford.edu/people/karpathy/deepvideo/
paper: www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

intro: Video-level event detection. extracting deep features for each frame, averaging frame-level deep features
arxiv: http://arxiv.org/abs/1503.04144

Beyond Short Snippets: Deep Networks for Video Classification

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

intro: ACM Multimedia, 2015
arxiv: http://arxiv.org/abs/1504.01561

Video Content Recognition with Deep Learning

author: Zuxuan Wu, Fudan University
slides: http://vision.ouc.edu.cn/valse/slides/20160420/Zuxuan%20Wu%20-%20Video%20Content%20Recognition%20with%20Deep%20Learning-Zuxuan%20Wu.pdf

Video Content Recognition with Deep Learning

author: Yu-Gang Jiang, Lab for Big Video Data Analytics (BigVid), Fudan University
slides: http://www.yugangjiang.info/slides/DeepVideoTalk-2015.pdf

Efficient Large Scale Video Classification

intro: Google
arxiv: http://arxiv.org/abs/1505.06250

Fusing Multi-Stream Deep Networks for Video Classification

arxiv: http://arxiv.org/abs/1509.06086

Learning End-to-end Video Classification with Rank-Pooling

paper: http://jmlr.org/proceedings/papers/v48/fernando16.html
paper: http://jmlr.csail.mit.edu/proceedings/papers/v48/fernando16.pdf
summary(by Hugo Larochelle): http://www.shortscience.org/paper?bibtexKey=conf/icml/FernandoG16#hlarochelle

Deep Learning for Video Classification and Captioning

arxiv: http://arxiv.org/abs/1609.06782

Fast Video Classification via Adaptive Cascading of Deep Models

arxiv: https://arxiv.org/abs/1611.06453

Deep Feature Flow for Video Recognition

intro: CVPR 2017
intro: It provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos)
arxiv: https://arxiv.org/abs/1611.07715
github(official, MXNet): https://github.com/msracver/Deep-Feature-Flow
youtube: https://www.youtube.com/watch?v=J0rMHE6ehGw

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks

https://arxiv.org/abs/1706.04488

Deep Learning Methods for Efficient Large Scale Video Labeling

intro: Solution to the Kaggle’s competition Google Cloud & YouTube-8M Video Understanding Challenge
arxiv: https://arxiv.org/abs/1706.04572
github: https://github.com/mpekalski/Y8M

Learnable pooling with Context Gating for video classification

intro: CVPR17 Youtube 8M workshop. Kaggle 1st place
arxiv: https://arxiv.org/abs/1706.06905
github: https://github.com/antoine77340/LOUPE

Aggregating Frame-level Features for Large-Scale Video Classification

intro: Youtube-8M Challenge, 4th place
arxiv: https://arxiv.org/abs/1707.00803

Tensor-Train Recurrent Neural Networks for Video Classification

https://arxiv.org/abs/1707.01786

Hierarchical Deep Recurrent Architecture for Video Understanding

intro: Classification Challenge Track paper in CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding
arxiv: https://arxiv.org/abs/1707.03296

Large-scale Video Classification guided by Batch Normalized LSTM Translator

intro: CVPR2017 Workshop on Youtube-8M Large-scale Video Understanding
arxiv: https://arxiv.org/abs/1707.04045

UTS submission to Google YouTube-8M Challenge 2017

intro: CVPR’17 Workshop on YouTube-8M
arxiv: https://arxiv.org/abs/1707.04143
github: https://github.com/ffmpbgrnn/yt8m

A spatiotemporal model with visual attention for video classification

https://arxiv.org/abs/1707.02069

Cultivating DNN Diversity for Large Scale Video Labelling

intro: CVPR 2017 Youtube-8M Workshop
arxiv: https://arxiv.org/abs/1707.04272

Attention Transfer from Web Images for Video Recognition

intro: ACM Multimedia, 2017
arxiv: https://arxiv.org/abs/1708.00973

Non-local Neural Networks

intro: CVPR 2018. CMU & Facebook AI Research
arxiv: https://arxiv.org/abs/1711.07971
github(Caffe2): https://github.com/facebookresearch/video-nonlocal-net

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

https://arxiv.org/abs/1711.08200

Appearance-and-Relation Networks for Video Classification

arxiv: https://arxiv.org/abs/1711.09125
github: https://github.com/wanglimin/ARTNet

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

intro: ECCV 2018. Google Research & University of California San Diego
arxiv: https://arxiv.org/abs/1712.04851

Long Activity Video Understanding using Functional Object-Oriented Network

https://arxiv.org/abs/1807.00983

Deep Architectures and Ensembles for Semantic Video Classification

https://arxiv.org/abs/1807.01026

Deep Discriminative Model for Video Classification

intro: ECCV 2018
arxiv: https://arxiv.org/abs/1807.08259

Deep Video Color Propagation

intro: BMVC 2018
arxuv: https://arxiv.org/abs/1808.03232

Non-local NetVLAD Encoding for Video Classification

intro: ECCV 2018 workshop on YouTube-8M Large-Scale Video Understanding
intro: Tencent AI Lab & Fudan University
arxiv: https://arxiv.org/abs/1810.00207

Learnable Pooling Methods for Video Classification

intro: Youtube 8M ECCV18 Workshop
arxiv: https://arxiv.org/abs/1810.00530
github: https://github.com/pomonam/LearnablePoolingMethods

NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification

intro: ECCV 2018 workshop
arxiv: https://arxiv.org/abs/1811.05014
github: https://github.com/linrongc/youtube-8m

High Order Neural Networks for Video Classification

intro: Fudan University, Carnegie Mellon University, Qiniu Inc., ByteDance AI Lab
arxiv: https://arxiv.org/abs/1811.07519

SlowFast Networks for Video Recognition

intro: Facebook AI Research (FAIR)
arxiv: https://arxiv.org/abs/1812.03982

Efficient Video Classification Using Fewer Frames

intro: CVPR 2019
arxiv: https://arxiv.org/abs/1902.10640

Video Classification with Channel-Separated Convolutional Networks

intro: Facebook AI
arxiv: https://arxiv.org/abs/1904.02811

Two-Stream Video Classification with Cross-Modality Attention

https://arxiv.org/abs/1908.00497

Action Detection / Activity Recognition

3d convolutional neural networks for human action recognition

paper: http://www.cs.odu.edu/~sji/papers/pdf/Ji_ICML10.pdf

Sequential Deep Learning for Human Action Recognition

paper: http://liris.cnrs.fr/Documents/Liris-5228.pdf

Two-stream convolutional networks for action recognition in videos

arxiv: http://arxiv.org/abs/1406.2199

Finding action tubes

intro: “built action models from shape and motion cues. They start from the image proposals and select the motion salient subset of them and extract saptio-temporal features to represent the video using the CNNs.”
arxiv: http://arxiv.org/abs/1411.6031

Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

paper: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Du_Hierarchical_Recurrent_Neural_2015_CVPR_paper.pdf

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

intro: CVPR 2015. TDD
paper: www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Wang_Action_Recognition_With_2015_CVPR_paper.pdf
ext: http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2B_105_ext.pdf
poster: https://wanglimin.github.io/papers/WangQT_CVPR15_Poster.pdf
github: https://github.com/wanglimin/TDD

Action Recognition by Hierarchical Mid-level Action Elements

paper: http://cvgl.stanford.edu/papers/tian2015.pdf

Contextual Action Recognition with R CNN

arxiv: http://arxiv.org/abs/1505.01197
github: https://github.com/gkioxari/RstarCNN

Towards Good Practices for Very Deep Two-Stream ConvNets

arxiv: http://arxiv.org/abs/1507.02159
github: https://github.com/yjxiong/caffe

Action Recognition using Visual Attention

intro: LSTM / RNN
arxiv: http://arxiv.org/abs/1511.04119
project page: http://shikharsharma.com/projects/action-recognition-attention/
github(Python/Theano): https://github.com/kracwarlock/action-recognition-visual-attention

End-to-end Learning of Action Detection from Frame Glimpses in Videos

intro: CVPR 2016
project page: http://ai.stanford.edu/~syyeung/frameglimpses.html
arxiv: http://arxiv.org/abs/1511.06984
paper: http://vision.stanford.edu/pdf/yeung2016cvpr.pdf

Multi-velocity neural networks for gesture recognition in videos

arxiv: http://arxiv.org/abs/1603.06829

Active Learning for Online Recognition of Human Activities from Streaming Videos

arxiv: http://arxiv.org/abs/1604.02855

Convolutional Two-Stream Network Fusion for Video Action Recognition

arxiv: http://arxiv.org/abs/1604.06573
github: https://github.com/feichtenhofer/twostreamfusion

Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables

arxiv: http://arxiv.org/abs/1604.08880

Unsupervised Semantic Action Discovery from Video Collections

arxiv: http://arxiv.org/abs/1605.03324

Anticipating Visual Representations from Unlabeled Video

paper: http://web.mit.edu/vondrick/prediction.pdf

VideoLSTM Convolves, Attends and Flows for Action Recognition

arxiv: http://arxiv.org/abs/1607.01794

Hierarchical Attention Network for Action Recognition in Videos (HAN)

arxiv: http://arxiv.org/abs/1607.06416

Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

arxiv: http://arxiv.org/abs/1607.07043

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

arxiv: http://arxiv.org/abs/1607.08584

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

intro: won the 1st place in the untrimmed video classification task of ActivityNet Challenge 2016. TSN
arxiv: http://arxiv.org/abs/1608.00797
github: https://github.com/yjxiong/anet2016-cuhk

Actionness Estimation Using Hybrid FCNs

intro: CVPR 2016. H-FCN
project page: http://wanglimin.github.io/actionness_hfcn/index.html
paper: http://wanglimin.github.io/papers/WangQTV_CVPR16.pdf
github: https://github.com/wanglimin/actionness-estimation/

Real-time Action Recognition with Enhanced Motion Vector CNNs

intro: CVPR 2016
project page: http://zbwglory.github.io/MV-CNN/index.html
paper: http://wanglimin.github.io/papers/ZhangWWQW_CVPR16.pdf
github: https://github.com/zbwglory/MV-release

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

intro: ECCV 2016. HMDB51: 69.4%, UCF101: 94.2%
arxiv: http://arxiv.org/abs/1608.00859
paper: http://wanglimin.github.io/papers/WangXWQLTV_ECCV16.pdf
github: https://github.com/yjxiong/temporal-segment-networks

Temporal Segment Networks for Action Recognition in Videos

intro: An extension of submission http://arxiv.org/abs/1608.00859
arxiv: https://arxiv.org/abs/1705.02953

Hierarchical Attention Network for Action Recognition in Videos

arxiv: http://arxiv.org/abs/1607.06416

DeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns

intro: CVPR 2016
arxiv: http://arxiv.org/abs/1608.03217

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition

arxiv: http://arxiv.org/abs/1608.04339

Dynamic Image Networks for Action Recognition

intro: CVPR 2016
arxiv: http://users.cecs.anu.edu.au/~sgould/papers/cvpr16-dynamic_images.pdf
github: https://github.com/hbilen/dynamic-image-nets

Human Action Recognition without Human

arxiv: http://arxiv.org/abs/1608.07876

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

arxiv: http://arxiv.org/abs/1608.08242
ECCV 2016 workshop: http://bravenewmotion.github.io/

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

intro: Bachelor Thesis Report at ETSETB TelecomBCN
project page: https://imatge-upc.github.io/activitynet-2016-cvprw/
arxiv: http://arxiv.org/abs/1608.08128
github: https://github.com/imatge-upc/activitynet-2016-cvprw

Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN

arxiv: http://arxiv.org/abs/1609.03056

Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

arxiv: https://arxiv.org/abs/1610.03898

Spatiotemporal Residual Networks for Video Action Recognition

intro: NIPS 2016
arxiv: https://arxiv.org/abs/1611.02155

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

arxiv: https://arxiv.org/abs/1611.02447

Deep Recurrent Neural Network for Mobile Human Activity Recognition with High Throughput

arxiv: https://arxiv.org/abs/1611.03607

Joint Network based Attention for Action Recognition

arxiv: https://arxiv.org/abs/1611.05215

Temporal Convolutional Networks for Action Segmentation and Detection

arxiv: https://arxiv.org/abs/1611.05267

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

arxiv: https://arxiv.org/abs/1611.08240

ActionFlowNet: Learning Motion Representation for Action Recognition

arxiv: https://arxiv.org/abs/1612.03052

Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition

intro: Australian Center for Robotic Vision & Data61/CSIRO
arxiv: https://arxiv.org/abs/1701.05432

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

https://arxiv.org/abs/1703.10664

Temporal Action Detection with Structured Segment Networks

project page: http://yjxiong.me/others/ssn/
arxiv: https://arxiv.org/abs/1704.06228
github: https://github.com/yjxiong/action-detection

Recurrent Residual Learning for Action Recognition

https://arxiv.org/abs/1706.08807

Hierarchical Multi-scale Attention Networks for Action Recognition

https://arxiv.org/abs/1708.07590

Two-stream Flow-guided Convolutional Attention Networks for Action Recognition

intro: International Conference of Computer Vision Workshop (ICCVW), 2017
arxiv: https://arxiv.org/abs/1708.09268

Action Classification and Highlighting in Videos

https://arxiv.org/abs/1708.09522

Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN

https://arxiv.org/abs/1710.03383

End-to-end Video-level Representation Learning for Action Recognition

keywords: Deep networks with Temporal Pyramid Pooling (DTPP)
arxiv: https://arxiv.org/abs/1711.04161

Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

intro: WACV 2018
arxiv: https://arxiv.org/abs/1801.03983

DiscrimNet: Semi-Supervised Action Recognition from Videos using Generative Adversarial Networks

https://arxiv.org/abs/1801.07230

A Fusion of Appearance based CNNs and Temporal evolution of Skeleton with LSTM for Daily Living Action Recognition

https://arxiv.org/abs/1802.00421

Real-Time End-to-End Action Detection with Two-Stream Networks

https://arxiv.org/abs/1802.08362

A Closer Look at Spatiotemporal Convolutions for Action Recognition

intro: CVPR 2018. Facebook Research
intro: R(2+1)D and Mixed-Convolutions for Action Recognition.
project page: https://dutran.github.io/R2Plus1D/
arxiv: https://arxiv.org/abs/1711.11248
github: https://github.com/facebookresearch/R2Plus1D

VideoCapsuleNet: A Simplified Network for Action Detection

https://arxiv.org/abs/1805.08162

Where and When to Look? Spatio-temporal Attention for Action Recognition in Videos

https://arxiv.org/abs/1810.04511

Relational Long Short-Term Memory for Video Action Recognition

https://arxiv.org/abs/1811.07059

Temporal Recurrent Networks for Online Action Detection

https://arxiv.org/abs/1811.073910

Video Action Transformer Network

intro: Carnegie Mellon University & DeepMind & University of Oxford
intro: Ranked first on the AVA (computer vision only) leaderboard of the ActivityNet Challenge 2018
project page: https://rohitgirdhar.github.io/ActionTransformer/
arxiv: https://arxiv.org/abs/1812.02707

D3D: Distilled 3D Networks for Video Action Recognition

intro: Google & University of Michigan & Princeton University
arxiv: https://arxiv.org/abs/1812.08249

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

intro: CVPR 2019
arxiv: https://arxiv.org/abs/1905.13417

Deformable Tube Network for Action Detection in Videos

https://arxiv.org/abs/1907.01847

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

https://arxiv.org/abs/1911.06644

TubeR: Tube-Transformer for Action Detection

intro: University of Amsterdam & Amazon Web Service
arxiv: https://arxiv.org/abs/2104.00969

Revisiting Skeleton-based Action Recognition

https://arxiv.org/abs/2104.13586

Projects

A Torch Library for Action Recognition and Detection Using CNNs and LSTMs

intro: CS231n student project report
paper: http://cs231n.stanford.edu/reports2016/221_Report.pdf
github: https://github.com/garythung/torch-lrcn

2016 ActivityNet action recognition challenge. CNN + LSTM approach. Multi-threaded loading.

github: https://github.com/jrbtaylor/ActivityNet

LSTM for Human Activity Recognition

Scanner: Efficient Video Analysis at Scale

intro: Locate and recognize faces in a video, Detect shots in a film, Search videos by image
github: https://github.com/scanner-research/scanner

Charades Starter Code for Activity Classification and Localization

intro: Activity Recognition Algorithms for the Charades Dataset
github: https://github.com/gsig/charades-algorithms

NonLocalNetwork and Sequeeze-Excitation Network

intro: MXNet implementation of Non-Local and Squeeze-Excitation network
github: https://github.com/WillSuen/NonLocalandSEnet

Event Recognition

TagBook: A Semantic Video Representation without Supervision for Event Detection

arxiv: http://arxiv.org/abs/1510.02899

AENet: Learning Deep Audio Features for Video Analysis

arxiv: https://arxiv.org/abs/1701.00599
github: https://github.com/znaoya/aenet

Event Detection

DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting

Detecting events and key actors in multi-person videos

intro: CVPR 2016
arxiv: http://arxiv.org/abs/1511.02917
paper: www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Ramanathan_Detecting_Events_and_CVPR_2016_paper.pdf
paper: http://vision.stanford.edu/pdf/johnson2016cvpr.pdf
blog: http://www.leiphone.com/news/201606/l1TKIRFLO3DUFNNu.html

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

intro: INTERSPEECH 2016
arxiv: https://arxiv.org/abs/1604.07160

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

arxiv: https://arxiv.org/abs/1612.07403

Joint Event Detection and Description in Continuous Video Streams

intro: Joint Event Detection and Description Network (JEDDi-Net)
arxiv: https://arxiv.org/abs/1802.10250

Abnormality / Anomaly Detection

Fully Convolutional Neural Network for Fast Anomaly Detection in Crowded Scenes

arxiv: http://arxiv.org/abs/1609.00866

Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks

intro: Rochester Institute of Technology
arxiv: https://arxiv.org/abs/1612.00390

Abnormal Event Detection in Videos using Spatiotemporal Autoencoder

arxiv: https://arxiv.org/abs/1701.01546
github: https://github.com/yshean/abnormal-spatiotemporal-ae

Abnormal Event Detection in Videos using Generative Adversarial Nets

intro: Best Paper / Student Paper Award Finalist, IEEE International Conference on Image Processing (ICIP), 2017
arxiv: https://arxiv.org/abs/1708.09644

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

intro: ICCV 2017
arxiv: https://arxiv.org/abs/1709.09121

An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos

intro: Uncanny Vision Solutions
arxiv: https://arxiv.org/abs/1801.03149

STAN: Spatio-Temporal Adversarial Networks for Abnormal Event Detection

intro: ICASSP 2018
arxiv: https://arxiv.org/abs/1804.08381

Video Anomaly Detection and Localization via Gaussian Mixture Fully Convolutional Variational Autoencoder

https://arxiv.org/abs/1805.11223

Attentioned Convolutional LSTM InpaintingNetwork for Anomaly Detection in Videos

https://arxiv.org/abs/1811.10228

Video Prediction

Deep multi-scale video prediction beyond mean square error

intro: ICLR 2016
arxiv: http://arxiv.org/abs/1511.05440
github: https://github.com/coupriec/VideoPredictionICLR2016
github(TensorFlow): https://github.com/dyelax/Adversarial_Video_Generation
demo: http://cs.nyu.edu/~mathieu/iclr2016.html

Unsupervised Learning for Physical Interaction through Video Prediction

intro: NIPS 2016
arxiv: https://arxiv.org/abs/1605.07157
github: https://github.com/tensorflow/models/tree/master/video_prediction

Generating Videos with Scene Dynamics

intro: NIPS 2016
intro: The model learns to generate tiny videos using adversarial networks
project page: http://web.mit.edu/vondrick/tinyvideo/
paper: http://web.mit.edu/vondrick/tinyvideo/paper.pdf
github: https://github.com/cvondrick/videogan

PredNet

Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning

project page: https://coxlab.github.io/prednet/
arxiv: http://arxiv.org/abs/1605.08104
github: https://github.com/coxlab/prednet
github: https://github.com/e-lab/torch-prednet

Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction

arxiv: https://arxiv.org/abs/1611.04899

Video Ladder Networks

inro: NIPS 2016 workshop on ML for Spatiotemporal Forecasting
arxiv: https://arxiv.org/abs/1612.01756

Unsupervised Learning of Long-Term Motion Dynamics for Videos

intro: Stanford University
arxiv: https://arxiv.org/abs/1701.01821

One-Step Time-Dependent Future Video Frame Prediction with a Convolutional Encoder-Decoder Neural Network

intro: NCCV 2016
arxiv: https://arxiv.org/abs/1702.04125

Fully Context-Aware Video Prediction

intro: ETH Zurich & NNAISENSE
keywords: unsupervised learning through video prediction, Parallel Multi-Dimensional LSTM
project page: https://sites.google.com/view/contextvp
arxiv: https://arxiv.org/abs/1710.08518

Novel Video Prediction for Large-scale Scene using Optical Flow

intro: University of Victoria & Tongji University & Horizon Robotics
arxiv: https://arxiv.org/abs/1805.12243

Video Tagging

Automatic Image and Video Tagging

blog: http://scottge.net/2015/06/30/automatic-image-and-video-tagging/

Tagging YouTube music videos with deep learning - Alexandre Passant

keywords: Clarifai’s deep learning API
blog: http://apassant.net/2015/07/03/tagging-youtube-music-clarifai-deep-learning/

Shot Boundary Detection

Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks

https://arxiv.org/abs/1705.03281

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

intro: obtains state-of-the-art results while running at an unprecedented speed of more than 120x real-time.
arxiv: https://arxiv.org/abs/1705.08214

Video Action Segmentation

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

intro: University of Rochester
arxiv: https://arxiv.org/abs/1705.07818

Video2GIF

Video2GIF: Automatic Generation of Animated GIFs from Video (Robust Deep RankNet)

intro: 3D CNN, ranking model, Huber loss, 100K GIFs/video sources dataset
arxiv: http://arxiv.org/abs/1605.04850
github(dataset): https://github.com/gyglim/video2gif_dataset
results: http://video2gif.info/
demo site: http://people.ee.ethz.ch/~gyglim/work_public/autogif/
review: http://motherboard.vice.com/read/these-fire-gifs-were-made-by-artificial-intelligence-yahoo

Creating Animated GIFs Automatically from Video

https://yahooresearch.tumblr.com/post/148009705216/creating-animated-gifs-automatically-from-video

GIF2Video: Color Dequantization and Temporal Interpolation of GIF images

intro: Stony Brook University & Megvii Research USA & UCLA
arxiv: https://arxiv.org/abs/1901.02840

Video2Speech

Vid2speech: Speech Reconstruction from Silent Video

intro: ICASSP 2017
project page: http://www.vision.huji.ac.il/vid2speech/
arxiv: https://arxiv.org/abs/1701.00495
github(official): https://github.com/arielephrat/vid2speech

Video Captioning

http://handong1587.github.io/deep_learning/2015/10/09/image-video-captioning.html#video-captioning

Video Summarization

Video summarization produces a short summary of a full-length video and ideally encapsulates its most informative parts, alleviates the problem of video browsing, editing and indexing.

Video Summarization with Long Short-term Memory

arxiv: http://arxiv.org/abs/1605.08110

DeepVideo: Video Summarization using Temporal Sequence Modelling

intro: CS231n student project report
paper: http://cs231n.stanford.edu/reports2016/216_Report.pdf

Semantic Video Trailers

arxiv: http://arxiv.org/abs/1609.01819

Video Summarization using Deep Semantic Features

inro: ACCV 2016
arxiv: http://arxiv.org/abs/1609.08758

CNN-Based Prediction of Frame-Level Shot Importance for Video Summarization

intro: International Conference on new Trends in Computer Sciences (ICTCS), Amman-Jordan, 2017
arxiv: https://arxiv.org/abs/1708.07023

Video Summarization with Attention-Based Encoder-Decoder Networks

https://arxiv.org/abs/1708.09545

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward

intro: AAAI 2018. Chinese Academy of Sciences & Queen Mary University of London
project page: https://kaiyangzhou.github.io/project_vsumm_reinforce/index.html
arxiv: https://arxiv.org/abs/1801.00054
github: https://github.com//KaiyangZhou/vsumm-reinforce

Viewpoint-aware Video Summarization

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1804.02843

DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

https://arxiv.org/abs/1804.11228

Learning Video Summarization Using Unpaired Data

https://arxiv.org/abs/1805.12174

Video Summarization Using Fully Convolutional Sequence Networks

https://arxiv.org/abs/1805.10538

Video Summarisation by Classification with Deep Reinforcement Learning

intro: BMVC 2018
arxiv: https://arxiv.org/abs/1807.03089

Query-Conditioned Three-Player Adversarial Network for Video Summarization

intro: BMVC 2018
arxiv: https://arxiv.org/abs/1807.06677

Discriminative Feature Learning for Unsupervised Video Summarization

intro: AAAI 2019
arxiv: https://arxiv.org/abs/1811.09791

Rethinking the Evaluation of Video Summaries

intro: CVPR 2019 poster
arxiv: https://arxiv.org/abs/1903.11328

Video Highlight Detection

Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders

intro: ICCV 2015
intro: rely on an assumption that highlights of an event category are more frequently captured in short videos than non-highlights
arxiv: http://arxiv.org/abs/1510.01442

Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization